Cerebras Systems announces “Cerebras Systems Releases Seven New GPT Models Trained on CS-2 Wafer-Scale Systems”:
SUNNYVALE, Calif.—BUSINESS WIRE—Cerebras Systems, the pioneer in artificial intelligence (AI) compute for generative AI, today announced it has trained and is releasing a series of seven GPT-based large language models (LLMs) for open use by the research community. This is the first time a company has used non-GPU based AI systems to train LLMs up to 13 billion parameters and is sharing the models, weights, and training recipe via the industry standard Apache 2.0 license. All seven models were trained on the 16 CS-2 systems in the Cerebras Andromeda AI supercomputer.
Spearheaded by OpenAI’s ChatGPT, the rapid growth of LLMs has spurred a race to create more powerful, specialized AI chips. While many companies have promised alternatives to Nvidia® GPUs, none have demonstrated both the ability to train large-scale models and the willingness to open source the results with permissive licenses. In fact, competitive pressures have systematically reduced the willingness of companies to release their LLMs to the public, even with restrictive licenses (see GPT-4 for the most recent example). This concentrates ownership, limits ecosystem growth and creates safety risk.
Cerebras’ release today directly addresses these issues. In a first among AI hardware companies, Cerebras researchers trained, on the Andromeda supercomputer, a series of seven GPT models with 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B parameters. Typically a multi-month undertaking, this work was completed in a few weeks thanks to the incredible speed of the Cerebras CS-2 systems that make up Andromeda, and the ability of Cerebras’ weight streaming architecture to eliminate the pain of distributed compute. These results demonstrate that Cerebras’ systems can train the largest and most complex AI workloads today.
This is the first time a suite of GPT models, trained using state-of-the-art training efficiency techniques, has been made public. These models are trained to the highest accuracy for a given compute budget (i.e. training efficient using Chinchilla recipe) so they have lower training time, lower training cost, and use less energy than any existing public models.
⋮
All seven Cerebras-GPT models are immediately available on Hugging Face and Cerebras Model Zoo on GitHub. The Andromeda AI supercomputer used to train these models is available on-demand on https://www.cerebras.net/andromeda/.
For those interested in the technical details, Cerebras has published a technical blog post with the details of the 7 models and the scaling laws that they produce. A research paper will be released shortly.
The Cerebras Andromeda wafer-scale AI training system was discussed here on 2022-11-26, “Cerebras Andromeda: 13.5 Million Core AI Training Supercomputer”.