Superlatives abound in Cerebras, the present-day stealthy next-generation silicon chip company that wants to turn training into a deep learning model as fast as buying toothpaste from Amazon. Launching after nearly three years of quiet development, Cerebras introduced its new piece today – and it's a doozy. The "Wafer Scale Engine" is 1.2 trillion transistors (the most ever), 46 225 square meters (the largest ever), and includes 18 gigabytes of on-chip memory (the most of any chip on the market today) and 400,000 processor cores (guess superlative.]
A big splash has been made here at Stanford University at the Hot Chips Conference, one of the silicon industry's major configurations for product introductions and roadmaps, featuring different levels of oohs and aahs among participants.You can read more about Tiernan Ray's piece at Fortune and read the message from Cerebras himself.
Superlatives aside, the technical challenges that Cerebras had to overcome in order to reach this milestone, I think is the more interesting story here. I sat down with founder and CEO Andrew Feldman in retrospect today to discuss what his 173 engineers have been building quietly down the street here in recent years with $ 112 million in venture capital from Benchmark and others.
Going big means nothing but challenges
First, a quick background on how the chips that power your phones and computers are made. Factories like TSMC take standard size silicon wafers and divide them into individual pieces using light to etch the transistors in the piece. Wafers are circles and chips are squares, so there is some basic geometry involved in dividing that circle into a distinct selection of individual chips.
A major challenge in this lithography process is that errors can creep into the production process, requiring extensive testing to verify quality and force fabs to throw away poor-performance chips. The smaller and more compact the chip, the less likely an individual chip will be inactive, and the higher the yield for the fab. Higher returns correspond to higher profits.
Cerebras casts the idea of etching a bunch of individual pieces on a single slice instead of just using the whole slice itself as a giant piece. It allows all of these individual cores to come in direct contact with each other – tremendously faster the critical feedback loops used in deep learning algorithms – but comes at the expense of enormous manufacturing and design challenges to create and manage these pieces. Sean "width =" 665 "height =" 680 "srcset =" https://techcrunch.com/wp-content/uploads/2019/08/CS_Wafer_Sean.jpg 4390w, https://techcrunch.com/wp-content/ uploads / 2019/08 / CS_Wafer_Sean.jpg? resize = 147,150 147w, https://techcrunch.com/wp-content/uploads/2019/08/CS_Wafer_Sean.jpg?resize=293,300 293w, https://techcrunch.com/ wp-content / uploads / 2019/08 / CS_Wafer_Sean.jpg? resize = 768,786 768w, https://techcrunch.com/wp-content/uploads/2019/08/CS_Wafer_Sean.jpg?resize=665,680 665w, https: // techcrunch.com/wp-content/uploads/2019/08/CS_Wafer_Sean.jpg?resize=32,32 32w, https://techcrunch.com/wp-content/uploads/2019/08/CS_Wafer_Sean.jpg?resize=50 , 50 50w, https://techcrunch.com/wp-content/uploads/2019/08/CS_Wafer_Sean.jpg?resize=64,64 64w "sizes =" (max-width: 665px) 100vw, 665px "/>
The first challenge the team encountered, according to Feldman, was to handle communication across the "writer's lines." While the Cerebras chip comprises a full disk, today's lithographic equipment must still function as if individual chips were etched into the silicon wafer. . So the company had to come up with new techniques to let each of these individual pieces communicate with each other across the entire slice. Working with TSMC, they not only invented new channels of communication, but also had to write new software to handle chips with trillions plus transistors.
The second challenge was dividends. With a piece covering an entire silicon wafer, a single imperfection in the etching of that wafer can render the entire piece inactive. This has been the block for decades for the entire wafer technology: due to the laws of physics it is essentially impossible to etch a trillion transistors with perfect accuracy repeatedly.
Cerebras approached the problem by using redundancy by adding extra cores throughout the chip that would be used as a backup in case an error occurred in the kernel's neighborhood on the disk. "You just have to keep 1%, 1.5% of these guys aside," Feldman explained to me. Leaving extra cores allows the chip itself to heal, direct the lithography file and make a whole silicon chip viable.
Into Uncharted Territory in Chip Design
The first two challenges – communicating across printer lines between chips and managing dividends – have flummoxed chip designers studying entire wafer chips for decades. But they were known problems, and Feldman said they were actually easier to solve than expected by reusing them using modern tools.
He resembles the challenge of climbing Mount Everest. "It's like the first set of guys failed to climb Mount Everest, they said, 'Shit, the first part is really hard. "And then the next set came along and said" That shit was nothing. The last hundred meters, there is a problem. & # 39; & # 39;
And the toughest challenges, according to Feldman for Cerebras, were actually the next three, since no other chip designer had moved past the communication line and posed challenges to find what was happening. next.
The third challenge Cerebras faced was dealing with thermal expansion. Chips get extremely hot in operation, but different materials expand at different speeds. This means that the contacts that attach a chip to the motherboard must also be thermally expanded at exactly the same speed so that no cracks occur between the two.
Feldman said that "How do you get a contact that can withstand [that]? Nobody had ever done it before, [and so] we had to invent a material. So we have doctorates in materials science, [and] we had to invent a material that could absorb some of that difference. "
Once a chip is manufactured, it must be tested and packaged for delivery to original equipment manufacturers (OEMs) who add chips to the products used by end customers (whether they are data centers or laptops for consumers). However, it is a challenge: Absolutely nothing on the market is designed to handle a whole-wafer chip.
Well, the answer is you invent a lot of crap, that's the truth. Nobody had a circuit board this size. Nobody had contacts. Nobody had a cold plate. Nobody had tools. Nobody had tools to fix them. No one had the tools to handle them. No one had any software to test, "Feldman explained." And then we designed this whole production flow because no one has ever done it. "Cerebras & # 39; technology is a lot more than just the piece it sells – it also includes all the associated machines required to actually produce and package these pieces.  Finally, all the processing power in a piece requires tremendous power and cooling. The Cerebras chip uses 15 kilowatts of power to operate – an enormous amount of power for an individual chip, although relatively comparable to a modern-sized AI cluster. All that power also needs to be cooled, and Cerebras had to devise a new way of delivering both for such a large piece.
The problem was mainly approached by turning the piece on its side, in what Feldman called “using the Z dimension. “The idea was that instead of trying to move power and cooling horizontally across the chip as traditionally, power and cooling are delivered vertically at all points across the chip, ensuring smooth and even access to both.
it was the next three challenges – thermal expansion, packaging and power / cooling – that the company has been working around the clock to deliver in recent years.
From Theory to Reality
Cerebras has a demo chip (I saw one, and yes, it's about head on), and it has begun delivering prototypes to customers according to reports. However, as with all new pieces, the major challenge is to scale production to meet customer needs.
For Cerebras, the situation is a bit unusual. Since it puts so much computing power on a wafer, customers don't necessarily have to buy tens or hundreds of pieces and sew them together to create a calculated cluster. Instead, they may only need a handful of Cerebras pieces for their deep learning needs. The company's next main phase is to reach the scale and ensure a smooth delivery of the chips, as it is packed as a whole system apparatus which also includes the proprietary cooling technology.
Expect to hear more details about Cerebras technology in the coming months, especially as the battle for the future of deep learning workflows continues to heat up.