Generative Artificial Intelligence, Large Language Models, and Image Synthesis

May 13, 2024

We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.
GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.

Prior to GPT-4o, you could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. To achieve this, Voice Mode is a pipeline of three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio. This process means that the main source of intelligence, GPT-4, loses a lot of information—it can’t directly observe tone, multiple speakers, or background noises, and it can’t output laughter, singing, or express emotion.

With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.

Unfortunately, they also say they’re crippling it with the same “safety”, “fairness” and “misinformation” crimethink prevention as other GPT models, while also restricting its abilities and setting message limits even for paid use.

At the LLM Leaderboard, GPT-4o is now the top model, at an overall index of 100 vs. GPT-4 turbo at 94. Chatbot arena score is 1310 vs. 1257. Score on the coding metric HumanEval increased to 90.2 from 85.1, and on the reasoning and knowledge test, MMLU, the score is 89% vs 86% (70% for GPT 3.5 turbo).


I spent most of yesterday studying this, they have implemented ideas similar to ones I had on identifying periods of different market behavior and on using portfolios of agents/strategies on a single tradable instrument:

TradeMaster is a first-of-its kind, best-in-class open-source platform for quantitative trading (QT) empowered by reinforcement learning (RL), which covers the full pipeline for the design, implementation, evaluation and deployment of RL-based algorithms.

TradeMaster is composed of 6 key modules: 1) multi-modality market data of different financial assets at multiple granularity; 2) whole data preprocessing pipeline; 3) a series of high-fidelity data-driven market simulators for mainstream QT tasks; 4) efficient implementations of over 13 novel RL-based trading algorithms; 5) systematic evaluation toolkits with 6 axes and 17 measures; 6) different interfaces for interdisciplinary users.

They provide tutorials for algorithms: EIIE, DeepScalper, SARL, PPO, EETO, Double DQN, which trade in the Dow Jones 30 index, SSE 50 (China stock index), and Bitcoin.

Here’s a later and higher-performing trading algorithm from some of the same people:

They also provide useful scripts for: automatic hyperparameter tuning, automatic feature generation, financial data imputation with diffusion models, and multidimensional performance evaluation, among others. Several academic papers explaining the structure of the programs are also on Github.

The TradeMaster Github repository is developed and maintained by AMI group at Nanyang Technological University.


I am not sure what this means. If you have multiple strategies, how do you choose between them for any given trade? Or do you use different strategies at different times based on market classification?

It seems to me that developing AI has one big issue.

If it cannot run an experiment, how can it truly learn? Like a calculator it is totally reliant on input from a human. A calculator can calculate a square root for many people that could not calculate the square root, the AI can solve some problems no human can solve, but it cannot learn something outside of what humans have provided without experiment can it?

Without experiment it is difficult to determine truth.


AI, and AGI specifically, is all marketing, all the time. Its success may be waning. Call it hype fatigue.

Expert systems, genetic algorithms, AI will take everyone’s job. The Singularity is always two, five, ten years away. Take your pick.


There are many ways of integrating multiple strategies running on a single instrument. The one the Nanyang researchers used in EarnHFT is:

Figure 1: The overview of EarnHFT. First, we compute a Q-teacher for enhancing the performance and training efficiency of second-level RL agents. Then, we efficiently train diverse RL agents under various market trends and select a small fraction of them to form an agent pool based on profitability. Finally, we train a minute-level router which dynamically picks a second-level agent from the pool to achieve stable performance across different markets.

Like many modern AI algorithms, I’m astonished it works.

I have some other ideas. Each concrete strategy instance is itself tradable, it has a time-series of prices and can be used a part of a portfolio. Allocation within that portfolio can be done with an extension of the Kelly criterion (see: The Kelly Capital Growth Investment Criterion, pp. 301-427, edited by the ur-quant math prof. Ed Thorp. He invented hedge funds, counting cards, the Black-Scholes formula, and with Claude Shannon in the early '60s, built a computer that fit in a shoe heel to beat roulette.)

Agent-based market simulations are a good idea, particularly to get a better approximation of the true order book (most of which is only in the minds of traders - the visible order book is mostly stop-loss orders, never intended to be filled). Also, a portfolio of different agents / strategies on a given instrument will often make opposite trades on that instrument, which should best be cleared on an internal market to avoid transaction costs.

I’m also convinced that the candlestick financial data format is a bletcherous abomination that prevents application of signal processing techniques; I have better, compressed alternatives. Also the Sharpe ratio is a bad metric that prevents properly assessing the performance of strategies – Sortino or Omega ratios are much better. Sharpe penalizes upward volatility in equity - that is if the strategy sometimes hits the jackpot, it gets penalized. Sortino only penalizes downward volatility. Nearly everybody tries to maximize Sharpe, but if you optimize the wrong thing, you get bad results. “Alpha” and “beta” have the same problem, only more so – to calculate these correctly, you need to take into account four “semibetas”,

Letting r and f denote the returns on some risky asset and the aggregate market portfolio, respectively, the four semibetas are then defined as:
β = Covariance(r,f)/variance(f) ≡ βN + βP − βM+ − βM−
The N , P, M+ and M− semicovariance components refer to the respective portions of total covariation Cov(r, f ) defined by both returns being positive (the “P” state), both returns being negative (“N”), mixed sign with positive market return (“M+”), and mixed sign with negative market return (“M−”). (from Realized semibetas:*
*Disentangling “good” and “bad” downside risks
Tim Bollerslev, Andrew J. Patton, Rogier Quaedvlieg, 2021)

The abstract of that paper says:

We show that semibetas stemming from negative market and negative asset return covariation predict significantly higher future returns, while semibetas attributable to negative market and positive asset return covariation predict significantly lower future returns. The two semibetas associated with positive market return variation do not appear to be priced.

Everybody in the financial world, especially fund managers and quants, are optimizing for alpha, which is calculated using beta, which they’re calculating wrong. Huge opportunity.

There’s a lot more I could say about what I learned from testing a few hundred thousand strategies with a few hundred million parameter choices (mostly: keep it simple), and even summarizing my design notes takes several pages. Much like machine learning in general, it’s about wrangling ordered arrangements of many, many very simple entities and getting them to work together.


GPT-4o “omni”



Bonus points with GPT-4o

And one more…


I discovered another tool yesterday.


You just need to ask the right question:


How to become Emperor of the Solar System and Centauri Space?

To become the Emperor of the Solar System and the Centauri Space, you’ll need to follow a multi-step plan that involves building a strong foundation and then expanding your influence. Here’s a general outline of the steps you’ll need to take:

Acquire significant wealth and power: First, you’ll need to amass a substantial amount of wealth and political power. This can be achieved by investing in various industries, building a network of influential allies, and engaging in lucrative business ventures.

Build a strong military force: A strong military will be necessary to protect your interests and expand your influence. You’ll need to invest in advanced technology, weaponry, and training to ensure your forces are prepared for any potential conflicts.

Establish a political base: Establishing a solid political base will help you gain support from other planets and systems. You can do this by forming alliances, creating economic incentives, and offering protection to planets that share your vision for the future.

Explore and colonize space: Expanding your influence into the wider cosmos will be key to becoming Emperor. This involves exploring new star systems, establishing colonies, and ensuring your people have the resources they need to thrive.

Diplomacy and negotiation: Diplomacy will be crucial in your quest for power. You’ll need to forge alliances with other planetary leaders and negotiate treaties that benefit both parties.

Develop advanced technology: The key to maintaining and expanding your empire will be cutting-edge technology. Investing in research and development will be necessary to stay ahead of the competition and maintain your position as Emperor.

Build a strong infrastructure: A robust transportation and communication network will be essential for maintaining control over your empire. This includes developing efficient space travel, establishing a reliable communication system, and ensuring the security of your colonies.

Cultivate a loyal following: Building a loyal following will help you maintain control over your empire and prevent any potential power struggles. You can do this by offering benefits to those who support you and ensuring your supporters are well-rewarded for their loyalty.

Remember, the path to becoming Emperor will be fraught with challenges and obstacles. You’ll need to be prepared for both diplomacy and conflict, as well as the complexities of managing a vast, interstellar empire. But with determination, cunning, and a strong focus on expanding your power, you can ultimately achieve your goal.


They’re all so good, it’s hard to pick a favorite.


Suno allows for impressive music generation, including vocals. You can try it for free.

It seems a major improvement over Google’s MusicLM from just a year ago.


Another one worth noting is:

When it works, it can sound better than Suno. But Suno will generate listenable songs like 30-50%, whereas Udio is at around 10%.

Both of them are pretty thin layers over OpenAI APIs, so you can mess with them through the prompts.



Tomorrow it’ll be 15 years since Rufus passed on. Quoting Amazon’s Rufus page "Beloved by many, Rufus passed away peacefully on May 27, 2009.


75,000 is too small. He needs to sue for 1 million at least