Changing the compute paradigm in the datacenter, or even extending it or augmenting it in some fashion, is no easy task. A company, even one that has raised $720 million in seven rounds of funding in the past six years, has to be careful to not try to do too much too fast and lose focus while at the same time adapting to the conditions in the field to get its machines doing real work and proving their worth on tough tasks.
This is where machine learning upstart and wafer-scale computing pioneer Cerebras Systems finds itself today, and it does not have the benefit of ubiquity that the Intel X86 architecture or the relative ubiquity that the Nvidia GPU architecture have had as they challenged the incumbents in datacenter compute in the 1990s and the 2010s, respectively.
If you wanted to write software to do distributed computing on these architectures, you could start with a laptop and then scale the code across progressively larger clusters of machines. But the AI engines created by Cerebras and its remaining rivals, SambaNova Systems and Graphcore and possibly Intels Habana line, are large and expensive machines. Luckily, we live in a world that has become accustomed to cloud computing, and now it is perfectly acceptable to do timesharing on such machines to test ideas out.
This is precisely what Cerebras is doing as it stands up a 13.5 million core AI supercomputer nicknamed Andromeda in a colocation facility run by Colovore in Santa Clara, the very heart of Silicon Valley.
This machine, which would cost just under $30 million if you had to buy it, is being rented by dozens of customers who are paying to use it to train on a per-model basis with cloud-like pricing, Andrew Feldman, one of the companys co-founders and its chief executive officer, tells The Next Platform. There are a bunch of academics who have access to the cluster as well. The capacity on Andromeda is not as cheap and easy as running CUDA on an Nvidia GPU embedded on a laptop in 2008, but it is about as close as you can get with a wafer-scale processor that would not fit inside of a normal server chassis, much less a laptop.
This is similar to the approach that rival SambaNova Systems has taken, but as we explained when talking to the companys founders back in July, a lot of customers are going even further and are tapping SambaNovas expertise in training foundation models for specific use cases as well as renting capacity on its machines to do their training runs.
This approach, which we think all of the remaining important AI training hardware vendors will need to take that would be Cerebras, SambaNova, Graphcore, and, if you want to be generous, Intels Habana Labs division (if it doesnt shut it down as part of its looming cost cuts) is not so much a cloud or hosting consumption model as it is the approach IBM took in the 1960s at the dawn of the mainframe era with its System/360s. Back then, you bought a machine and you got white glove service and programming included with the very expensive services because so few people understood how to program applications and create databases underneath them.
Andromeda is, we think, a first step in this direction for Cerebras, whose customers are very large enterprises and HPC centers who already have plenty of AI expertise. But the next and larger round of customers the ones that will constitute a real revenue stream and possibly profits for Cerebras and its AI hardware peers are going to want access not just to flops, but deep expertise so models can be created and trained for very specific workloads as quickly as possible.
Here are the basic feeds and speeds of the Andromeda system:
Each of the CS-2 nodes in the Andromeda cluster has four 64-core AMD Epyc processors in it that do housekeeping tasks for each of the WSE-2 wafers, which have 2.6 trillion transistors implementing 850,000 cores and their associated 40 GB of SRAM. That embedded SRAM memory on the die has 20 PB/sec of aggregate bandwidth, and the fabric implemented between the cores on the wafer has an aggregate bandwidth of 220 Pb/sec. Cerebras calls this mesh fabric linking the cores SwarmX, and a year ago this interconnect was extended over a dozen 100 Gb/sec Ethernet transports to allow the linking of up to 192 CS-2 systems into a single system image. Across those 16 CS-2 machines, the interconnect fabric has 96.8 Tb/sec of aggregate bandwidth.
Just like you can plug FPGAs together with high speed interconnects and run a circuit simulation as a single logical unit because of the high speed SerDes that wrap around the FPGA pool of configurable logic gates, the Cerebras architecture uses the extended SwarmX interconnect to link the AI engines together so they can train very large models across up to 163 million cores. Feldman says that Cerebras has yet to build such a system and that this scale has been validated thus far only in its simulators.
That SwarmX fabric has also been extended out to what is essentially a memory area network, called MemoryX, that stores model parameter weights and broadcast them to one or more CS-2 systems. The SwarmX fabric also reduces gradients from the CS-2 machines as they do their training runs. So the raw data from training sets and the model weights that drive the training are disaggregated. In prior GPU architectures, the training data and model weights have been in GPU memory, but with fast interconnects between CPUs and GPUs and the fatter memory of the CPU, data is being pushed out to the host nodes. Cerebras is just aggregating parameter weights in a special network-attached memory server. The SwarmX fabric has enough bandwidth and low enough latency mainly because it is actually not running the Ethernet protocol, but a very low latency proprietary protocol to quickly stream weights into each CS-2 machine.
By contrast, the 1.69 exaflops Frontier supercomputer at Oak Ridge National Laboratories has 8.73 million CPU cores and GPU streaming multiprocessors (the GPU equivalent to a core), and 8.14 million of those are the GPU SMs that comprise 97.7 percent of the floating point capacity. At the same FP16 precision that is the high end for the Cerebras precision, Frontier would weigh in at 6.76 exaflops across those GPU cores. AMD does not yet have sparse matrix support for its GPUs, but we strongly suspect that will double the performance as is the case with Nvidia Ampere A100 and Hopper H100 GPU accelerators when the Instinct MI300 GPU accelerators which we will start codenaming Provolone as the companion to Genoa CPUs if AMD doesnt give us a nickname soon ship next year.
In any event, as Frontier is with its Instinct MI250X GPUs, you get 6.76 exaflops of aggregate peak FP16 for $600 million, which works out to $88.75 per teraflops for either dense or sparse matrices. (We are not including the electric bill for power and cooling, the cost of storage just the core system.)
Thats a lot of flops compared to the 16-node Andromeda machine, which only drives 120 petaflops at FP16 precision with dense matrices but very much importantly delivers close to 1 exaflops with the kind of sparse matrix data that is common with the foundational large language models that are driving AI today.
Hold on. Why is Cerebras getting an 8X boost for its sparsity support when Nvidia is only getting a 2X boost? We dont know yet, but we just noticed that and are trying to find out.
The WSE-2 compute engine only supports quarter precision FP16 and half precision FP32 math, plus a proprietary format called CB16 floating point format that has 6 bit exponents; regular IEEE FP16 has 5 bit exponents, and the BF16 format from Googles Brain division has 8 bit exponents which makes it easier to convert to FP32 formats. So there is no extra boost coming from further reduced precision down to, say, FP8, FP4, or FP2. As far as we know.
At $30 million, the 16-node Andromeda cluster costs $250 flat per teraflops for dense matrices, but only $31.25 per teraflops with sparse matrices. It only burns 500 kilowatts, compared to the 21.9 megawatts of Frontier, too.
But here is the real cost savings: A whole lot less grief. Because GPUs are relatively small devices at least compared with an entire wafer with 850,000 cores running large machine learning models means chopping up datasets and using a mix of model parallelism (running different layers of the model on different GPUs that have to communicate over the interconnect) and data parallelism (running different portions of the training sets on each device and doing all of the work of the model on each device individually). Because the WSE-2 chip has so many cores and so much memory, the training set can fit in the SRAM and Cerebras only has to do data parallelism and only calculates one set of gradients on that dataset rather than having to average them across tens of thousands of GPUs. This makes it much easier to train an AI model, and because of the SwarmX interconnect, the model can scale nearly linearly with training data and parameter count, and because the weights are propagated using the dedicated MemoryX memory server, getting weights to all of the machines is also not a problem.
Today, we can support 9 trillion parameter models on one CS-2, says Feldman. It takes a long time, but the compiler can work through them and it can place work and we can store it using MemoryX and SwarmX. We dont do model parallelism because our wafer is so big that we dont have to. We extract all of the parallelism by being strictly data parallel, and that is the beauty of this.
To be honest, one of us (Tim) did not fully appreciate the initial architecture choice Cerebras made and the changes announced to it last year, while the one other of us (Nicole) did. Thats why we are each others co-processor. . . .
To be very clear, Cerebras is not doing model parallelism across those 16 CS-2 nodes in any fashion. Youi chop the dataset into the same number of pieces as the nodes you have. The SwarmX and MemoryX work together to accumulate the weights of the model for the 16 nodes, each with their piece of the training data, but the whole model runs entirely on that subset of data within one machine and then the SwarmX network averages the gradients and stores the final weights on the MemoryX device. Like this:
The scaling that the Andromeda machine it was very hard to not say Strain there is seeing is damned near linear across a wide variety of GPT models from OpenAI:
With each increase in scale, the time it takes to train a model is proportionately reduced, and this is important because training times on models with tens of billions of parameters are still on the order of days to months. If you can chip that by a factor of 16X, it might be worth it particularly if you have a business that requires retraining often.
Here is the thing. The sequence lengths a gauge of the resolution of the data keep getting longer and longer to provide more context for the machine learning training. AI inference might have a sequence length of 128 or 256 or 384, but rarely 1,024 but training sequence lengths can be much higher. In the table above, the 1.3 billion GPT-3 and 25 billion GPT-J runs had 10,240 sequence lengths, and the current CS-2 architecture can support close to 50,000 sequence lengths. When Argonne National Laboratory pit a cluster of a dozen CS-2s against the 2,000-GPU Polaris cluster, which is based on Nvidia Ampere A100 GPUs and AMD Epyc 7003 CPUs, Polaris could not even run the 2.5 billion and 25 billion GPT-3 models at the 10,240 sequence level. And on some tests, where a 600 GPU partition of Polaris was pit against the Dozen machines, it took more than a week for the Polaris system to converge when using a large language model to predict the behavior of the coronavirus genome, but the Cerebras clusters AI training converged in less than a day, according to Feldman.
The grief of using Andromeda is also lower in another way: It costs less than using GPUs in the cloud.
Just because Andromeda costs around $30 million to buy doesnt mean that a timeslice of the machine is proportional to its cost, any more than the price that Amazon Web Services pays for a server directly reflects the cost of an EC2 instance sold from the cloud. GPU capacity costs are all over the map on the clouds, on the order of $4 to $6 an hour per GPU on the big clouds, and for an equivalent amount of training for GPT-3 models Feldman says that the Andromeda setup could cost half of that of GPUs and sometimes a lot less, depending on the situation.
At least for now, Cerebras is seeing a lot of action as an AI accelerator for established HPC systems, often machines accelerated by GPUs and doing a mix of simulation and modeling as well as AI training and inference. And Feldman thinks it is absolutely normal that organizations of all kinds and sizes will be using a mix of machinery a workflow of machinery, in fact instead of trying to do everything on one architecture.
It is interesting to me that this sounds like a big idea, says Feldman with a laugh. We build a bunch of different cars to do different jobs. You have a minivan to go to Grandmas house and soccer practice, but it is terrible for carrying 2x4s and 50 pound bags of concrete and a truck isnt. And you want a different machine to have fun, or to haul logs, or whatever. But the idea that we can have one machine and drive its utilization up to near 100 percent is out the window. And what we will have are computational pipelines, a kind of factory view of big compute.
And that also means, by the way, that keeping a collection of machines busy all the time and getting the maximum value out of the investment is probably also out the window. We will be lucky, says Feldman, if this collection of machinery gets anywhere between 30 percent and 40 percent utilization. But this will be the only way to get all kinds of work done in a timely fashion.
Follow this link:
Cerebras Wants Its Piece Of An Increasingly Heterogenous HPC World - The Next Platform
- Nigeria's Okra joins cloud hosting race to challenge AWS and Azure - Developing Telecoms - October 10th, 2024
- US Signal Introduces IaaS Solution OpenCloud for Open-Source Cloud Hosting - The Fast Mode - October 1st, 2024
- Waite Park hosting Coffee with a Cop on Wednesday - St. Cloud Live - October 1st, 2024
- Internet Vikings Approved to Offer VMware Private Cloud Hosting in Arizona - Cision News - August 23rd, 2024
- We wanted to become the Rolls-Royce of cloud hosting: Inside Hyve Managed Hostings global expansion plans - ITPro - July 20th, 2024
- Hostinger Review: VPS, Cloud, and Shared Hosting - Tom's Hardware - July 12th, 2024
- Optimizing Web Performance with Cloud Hosting - Spiceworks News and Insights - June 26th, 2024
- Oracle to open third Spanish cloud region with Telefonica as hosting partner - Telecompaper EN - June 26th, 2024
- Interior awards $2 billion cloud hosting contract to 7 vendors - FedScoop - June 5th, 2024
- From Clean Energy to Cloud Hosting: Bitcoin Miners Have Diverse Operations - Finance Magnates - June 5th, 2024
- Top 10 Cloud Hosting Providers in 2024: Plans, Prices, and Key Factors - mitechnews.com - May 27th, 2024
- Bare Metal Cloud Market Grows with Demand for High-Performance Hosting Solutions As Revealed In New Report - WhaTech - May 19th, 2024
- Ahrefs Joins Others in Suggesting That On-Premises Hosting Can Be More Cost Effective than Cloud - InfoQ.com - May 19th, 2024
- St. Cloud's Rainbow Wellness Collective Hosting Series of Events - WJON News - March 20th, 2024
- Safe in the Cloud: A Deep Dive Into Hosting Security Measures - AppleMagazine - February 11th, 2024
- Why Peachtree Cloud Hosting Is The Future Of Streamlined Accounting - WhaTech Technology and Markets News - January 13th, 2024
- Unravelling The Secrets Of Sage 50 Cloud Hosting: Everything You Need To Know - WhaTech Technology and Markets News - January 13th, 2024
- Gift a Blogger, Student, or Professional a Lifetime of Cloud Web Hosting With iBrave, Now Only $40 - PCMag - December 26th, 2023
- These are the factors you need to take into account for Cloud hosting - TechiExpert.com - December 26th, 2023
- Andrew Lobel: Tech Luminary's Perspective On Cloud Hosting And AWS Lightsail's Prowess - Business Manchester - December 18th, 2023
- Hostereo revolutionizes cloud hosting with user-centric solutions, Powered by Interhost B.V. - NL Times - December 10th, 2023
- What Are The Advantages and Drawbacks of Cloud Hosting and ... - Analytics Insight - November 24th, 2023
- Cloud Computing Hosting Service Market 2031 Insights with Key Innovations Analysis | Leading Companies Acce... - SeeDance News - October 17th, 2023
- Multi Cloud Hosting and its Impact on Businesses - Digital Journal - April 26th, 2023
- What is cloud hosting and how do you use it? - TechRadar - April 26th, 2023
- How QuickBooks Hosting on the Cloud Server Helps Businesses in ... - Universe News Network - April 26th, 2023
- Mayor of St. Cloud Hosting State of the City Address - KVSC-FM News - April 18th, 2023
- Moro Hub join hands with Indias Cloud4C to offer cloud hosting ... - Arabian Business - March 25th, 2023
- Build unlimited sites with this $86 cloud-based web hosting - Cult of Mac - March 25th, 2023
- The role of cloud hosting in digital transformation and cloud computing - HostReview.com - March 9th, 2023
- [Webinar] Cloud Utility Pricing: Reduce Hosting Costs and Go Green ... - JD Supra - March 1st, 2023
- Features of Cloud Hosting Services Offered By Hosting Companies - HostReview.com - March 1st, 2023
- Cloud Hosting Contracts | Freedom of Information - Ordnance Survey - February 21st, 2023
- Cost Comparison of Cloud Hosting vs Traditional Hosting: What You ... - HostReview.com - February 13th, 2023
- Google hosting in-person Cloud Next 23 this August - 9to5Google - February 5th, 2023
- Agreement inked to provide cloud, hosting services at Musandam ... - Times of Oman - January 28th, 2023
- St. Cloud State Huskies have ended their losing streak after 0-2 vs ... - The Rink Live - January 28th, 2023
- Rise in Cyber Attacks Expected in 2023: Passwords and Cloud ... - TECH dot AFRICA - January 28th, 2023
- How to Find the Best Web Host for Your Business - The Yucatan Times - January 28th, 2023
- Here Are 2 Technology Stocks of the Future You Can Buy Today - The Motley Fool - January 28th, 2023
- What is PSaaS and is it Worthwhile? - Security Boulevard - January 28th, 2023
- Business was always a way of serving people - New Hampshire Business Review - January 28th, 2023
- Squire Patton Boggs assists in the acquisition of Sered - Iberian Lawyer - January 28th, 2023
- Whats Ahead for the Future of Data Streaming? - DevOps.com - January 28th, 2023
- How to create a new project in the self-hosted version of Orangescrum - TechRepublic - January 28th, 2023
- The Global Access Control as a Service (ACaaS) Market size is expected to reach $2.3 billion by 2028, rising at a market growth of 15.0% CAGR during... - January 28th, 2023
- Amazon wanted to discuss opportunities for fine-tuning NZs policy ... - New Zealand Herald - January 28th, 2023
- 3 Reasons Why Wall Street Analysts Think Amazon Stock Could ... - The Motley Fool - January 28th, 2023
- OnePlus Cloud 11 launch event: Heres everything OnePlus is launching in India on February 7 - Times Now - January 28th, 2023
- Auckland's giant new data centres - and the power they'll chug - New Zealand Herald - January 28th, 2023
- Octo Consulting Group, Inc. | U.S. - Government Accountability Office - January 28th, 2023
- The Venture Leaders Mobile 2023 kick off their roadshow to the ... - Venturelab - January 28th, 2023
- Demand for Server Virtualization Software Rises as Cloud and OS Technologies Proliferate: Fact.MR Exclusive Analysis - Yahoo Finance - January 20th, 2023
- Sabre CIO on the impact of cloud in travel - PhocusWire - January 20th, 2023
- cPanel Partners With CloudFest to Bring CloudFest USA Back to ... - InvestorsObserver - January 20th, 2023
- Basecamp details 'obscene' $3.2 million bill that caused it to quit the cloud - The Register - January 20th, 2023
- Microsoft set to make 5% of workforce redundant - Information Age - January 20th, 2023
- Who Owns the Generative AI Platform? - Andreessen Horowitz - January 20th, 2023
- 3 Warren Buffett Stocks That Could Soar 33% to 80% in 2023 ... - The Motley Fool - January 20th, 2023
- Earth Bogle: Campaigns Target the Middle East with Geopolitical ... - Trend Micro - January 20th, 2023
- Many businesses are set to spend big to raise their security game - TechRadar - January 20th, 2023
- Nvidia and 2 Other Stocks That Could Be Helped or Hurt by ChatGPT - Barron's - January 20th, 2023
- ESGold Welcomes Mr. Pierre-Olivier Mathys to its Advisory Board - TheNewswire.ca - January 20th, 2023
- How Has the Ramsar Convention Shaped China's Wetland ... - Sixth Tone - January 20th, 2023
- Chengdu Science Fiction Museum by Zaha Hadid Architects to host ... - Archilovers.com - January 20th, 2023
- Why I Bought This Promising Cloud Computing Stock - The Motley Fool - January 4th, 2023
- Brighton cloud company bringing 100 new skilled jobs to city - The Argus - January 4th, 2023
- Apache Iceberg promises to change the economics of cloud-based data analytics - The Register - January 4th, 2023
- MSP vs Vms: What Are the Differences? - StartupGuys.net - January 4th, 2023
- 5 Unstoppable Metaverse Stocks to Buy in 2023 - The Motley Fool - January 4th, 2023
- Top 10 Middle East IT stories of 2022 - ComputerWeekly.com - January 4th, 2023
- Potential cloud protests and maybe, finally, more JADC2 jointness ... - Breaking Defense - January 4th, 2023
- Double Down On Innovation With Edge Computing | - Spiceworks News and Insights - December 27th, 2022
- Simplifying digital sovereignty in a multi-cloud world - The Register - December 27th, 2022
- The Global IT Services Market size is expected to reach $2,013.6 billion by 2028, rising at a market growth of 8.4% CAGR during the forecast period -... - December 27th, 2022
- St. Cloud hockey games scheduled in honor of player killed in crash - SC Times - December 27th, 2022
- 2 Metaverse Stocks That Could Make You Richer in 2023 - The Motley Fool - December 27th, 2022
- EDNS inks a partnership deal with Alibaba Cloud to explore the ... - PR Newswire - December 27th, 2022
- Looking for a Surefire Winner in the Next Bull Market? Buy Amazon ... - The Motley Fool - December 27th, 2022
- Bank of England mulls future regulatory oversight over Ethereum ... - Ledger Insights - December 27th, 2022