Cerebras and G42 Building 36 Exaflop Artificial Intelligence Training Supercomputer Network

Some hours after I posted this video, the creator, TechTechPotato (Dr Ian Cutress), marked it private. This sometimes happens when there’s a mistake in a video or inadvertent mention of information under embargo by a manufacturer. If and when a corrected or replacement video is posted, I will include it in this post and remove this note.

We discussed the Cerebras Andromeda supercomputer in a post here on 2022-11-26, “Cerebras Andromeda: 13.5 Million Core AI Training Supercomputer”. Built from the company’s wafer-scale integration artificial intelligence (AI) training Wafer-Scale Engine units, each with 850,000 computational cores, the Condor Galaxy, announced on 2023-07-20, integrates these engines into a supercomputer with 54 million cores providing 4 exaflops of AI training compute power. Simultaneously, Abu Dhabi-based G42 announced their order for nine Condor Galaxy installations with a total capacity of 36 exaflops, to be sited around the world and available through their cloud services operations. Each of the nine Condor Galaxy installations is as described for the first.

Optimized for large language models and generative AI, CG-1 will deliver 4 exaFLOPs of 16 bit AI compute, with standard support for up to 600 billion parameter models and extendable configurations that support up to 100 trillion parameter models. With 54 million AI-optimized compute cores, 388 terabits of fabric bandwidth, and fed by 72,704 AMD EPYC processor cores, unlike any known GPU cluster, CG-1 delivers near-linear performance scaling from 1 to 64 CS-2 systems using simple data parallelism.

Deployment is described as follows:

CG-1 offers native support for training with long sequence lengths, up to 50,000 tokens out of the box, without any special software libraries. Programing CG-1 is done entirely without complex distributed programming languages, meaning even the largest models can be run without weeks or months spent distributing work over thousands of GPUs.

Located at Colovore, a high-performance colocation facility in Santa Clara, California, CG-1 is operated by Cerebras under U.S. laws, ensuring state of the art AI systems are not used by adversary states. Each Cerebras CS-2 system is designed, packaged, manufactured, tested, and integrated in the U.S.; Cerebras is the only AI hardware company to package processors and manufacture AI systems in the U.S.

CG-1 is the first of three 4 exaFLOPs AI supercomputers (CG-1, CG-2, and CG-3) to be built and located in the U.S. by Cerebras and G42 in partnership. These three AI supercomputers will be interconnected in a 12 exaFLOPs, 162 million core distributed AI supercomputer built from 192 Cerebras CS-2s and fed by more than 218,000 high performance AMD EPYC CPU cores. This will be the largest supercomputer for AI training in existence.

G42 and Cerebras then plan to add six additional Condor Galaxy supercomputers in 2024, bringing the total compute power to 36 exaFLOPs with 576 CS-2s.

Fearful of the threat from artificial intelligence, Eliezer Yudkowsky wrote in Time magazine, “Pausing AI Developments Isn’t Enough. We Need to Shut it All Down”:

Shut down all the large GPU clusters (the large computer farms where the most powerful AIs are refined). Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.

Whether the G42/Cerebras data centres will be protected by anti-aircraft and anti-missile batteries was not discussed in the companies’ joint announcements.

3 Likes

The video link did not work.

After I posted the article here, the embedded video was marked private by the creator. See the note in the main post about why this sometimes happens. Meanwhile, the links to the press releases by Cerebras and G42 should still work.

2 Likes