AI Chip Dark Horse, Can It "Crush" NVIDIA?

8 months ago 212
Recently, the self-developed LPU (Language Processing Unit) inference chip by the overseas AI chip startup Groq has sparked a heated discussion. This chip greatly enhances the generation speed of large models, reaching 500 tokens per second, which far exceeds...
Recently, the "fastest large model in history" has exploded in popularity. An overseas AI chip startup company Groq utilized its self-developed LPU (Language Processing Unit) as an inference chip, enabling the large model to generate text at a rate close to 500 tokens per second (a token being the smallest unit of text), which crushes the speed of GPT-3.5 at 40 tokens per second. This means that the time it takes for a large model to process a request and obtain a response is significantly reduced, with some netizens exclaiming "it replies faster than I can blink"; some believe Groq's LPU could potentially replace NVIDIA's GPU chips; and some media claim that Groq's LPU has "crushed" NVIDIA. However, subsequent industry experts have questioned the cost-effectiveness and competitive strength of Groq's LPU, denying its potential to impact NVIDIA. Calculations have shown that the hardware cost of Groq LPU is about 40 times that of NVIDIA's H100 GPU, and its energy consumption cost is about 10 times higher. Groq has been committed to disrupting traditional architectures like GPUs and CPUs for many years. Groq's official website explains that the LPU represents a Language Processing Unit, a new type of end-to-end processing unit system, which provides the fastest inference for computationally intensive applications with sequential components (such as large language models LLM). Simplifying LPU Architecture As for why the LPU is much faster than GPUs when used for LLMs and generative AI, Groq's official website explains that the LPU is designed to overcome two bottlenecks of LLMs: computational density and memory bandwidth. For LLMs, the computing capacity of the LPU is greater than that of GPUs and CPUs, reducing the time required to compute each word, allowing for faster generation of text sequences. Additionally, by eliminating external memory bottlenecks, LPU inference engines can deliver performance several orders of magnitude higher than GPUs on LLMs.

Groq was founded in 2016. As early as 2021, Groq was called the "strongest challenger to NVIDIA". In 2021, Groq raised $300 million led by well-known investment firms such as Tiger Global Management and D1 Capital, bringing its total funding to $367 million.

In August 2023, Groq introduced the Groq LPU, which can run a 70 billion parameter enterprise-level language model at a record-breaking speed of over 100 tokens per second. Groq estimates it has a speed advantage of 10 to 100 times over other systems.

Groq's founder and CEO Jonathan Ross once said: "Artificial intelligence is constrained by existing systems, many of which are being followed by newcomers or gradually improved. No matter how much money you invest in this issue, traditional architectures like GPUs and CPUs struggle to meet the rapidly growing demands of artificial intelligence and machine learning... Our mission is more disruptive: Groq seeks to unlock the potential of artificial intelligence by reducing computing costs to zero."

Experts Question the Cost-Effectiveness and Competitiveness of Groq LPU

Associate Professor He Hu from the School of Integrated Circuits at Tsinghua University stated that LPU chips belong to the reasoning category and are not competing in the same field as the currently in-demand GPUs, which are primarily used for training large models. In terms of reasoning chips, LPUs may have achieved relatively high performance, but their operating costs are not low. High-performance, low-cost reasoning chips could reduce inference costs and broaden the application scope of AI large models. Their market prospects largely depend on market choices for inference needs and are less about technological competition.

As the name suggests, training chips are mainly used for training large models, while reasoning chips are mainly for AI applications. The industry believes that as various sectors welcome vertical large models and AI large model applications gradually take shape, computing power for inference will receive as much attention as that for training.

However, even for inference, some experts have calculated based on the memory capacity and throughput of large model operations of LPU and GPU that, in terms of cost-effectiveness and energy efficiency, LPU cannot compete with Nvidia's GPU.

Jia Yangqing, a former AI scientist at Facebook and former Vice President of Technology at Alibaba, analyzed on an overseas social media platform that the Groq LPU has a very small memory capacity (230MB). Simple calculations show that running a model with 70 billion parameters would require 305 Groq cards, equivalent to using 8 Nvidia H100 cards. Looking at the current prices, this means that for the same throughput, the hardware cost of Groq LPU is about 40 times that of H100, and the energy cost is about 10 times as much.

A leader from a top domestic AI chip company also agrees with the aforementioned calculation. He believes that unlike GPUs, which use HBM (High Bandwidth Memory), LPUs use SRAM (Static Random-Access Memory) for storage, which means many cards are required to run a large model.

Tencent's chip expert Yao Jinxin even bluntly stated: "Nvidia's absolute lead in this AI wave has the world eagerly anticipating challengers. Each eye-catching article is initially believed, not only for this reason but also because of the 'tricks' used in comparisons, deliberately ignoring other factors, and comparing with a single dimension."