Start-up of the Week: Groq to challenge Nvidia's AI dominance?

Eyeing to challenge the dominance of AI chip maker Nvidia in the tech industry, semiconductor start-up Groq has raised a fresh USD 750 million in funding at a post-money valuation of USD 6.9 billion. According to PitchBook’s estimates, Groq has raised over USD 3 billion to date.

Groq’s chips are not GPUs, which typically power AI systems. Instead, Groq calls them LPUs (language processing units), branding its hardware as an inference engine—specialised computers optimised for running AI models quickly and efficiently.

Groq’s LPU-based systems, built for real-time inference with unmatched speed and efficiency, have been in high demand. Groq delivers the lowest cost per token without compromise, making large-scale AI viable for governments, developers, and enterprises worldwide.

The start-up is rapidly building and operationalising data centres in weeks, bringing AI closer to users and giving partners more control over where and how inference runs. Groq’s vision is simple: local infrastructure means lower latency, stronger data governance, and faster response times at scale.

Infrastructure For Inference

There has been a massive shift in the AI domain: industry players are more interested in deploying or running models known as inference. Understanding this reality, Groq is aggressively investing its efforts in providing fast AI inference in the cloud and on-premise AI compute centres, giving developers and enterprises an instant intelligence experience.

Groq technology can be accessed by anyone via GroqCloud, while enterprises and partners can choose between cloud or on-premise AI compute centre deployment. The start-up aims to deploy millions of LPUs in the coming days, providing the world with access to the value of AI. While the start-up’s first-generation LPU has been commercially available, Groq says more innovations are on the way.

Groq’s on-premise hardware is essentially a server rack outfitted with a stack of its integrated hardware/software nodes. Both the cloud and on-premise hardware run open versions of popular models, such as those from Meta, DeepSeek, Qwen, Mistral, Google, and OpenAI. Groq claims its offerings maintain, or in some cases improve, AI performance at significantly less cost than alternatives.

Groq’s founder, Jonathan Ross, previously worked at Google developing its Tensor Processing Unit (TPU) chip, specialised processors designed for machine learning tasks. The TPU was announced in 2016, the same year Groq emerged from stealth. TPUs still power Google Cloud’s AI services. Groq claims it powers the AI apps of more than two million developers.

“Our custom LPU is built for this phase—developed in the US with a resilient supply chain for consistent performance at scale. It powers GroqCloud, a full-stack platform for fast, affordable, production-ready inference. Groq provides the lowest cost per token, even as usage grows, without sacrificing speed, quality, or control,” the start-up remarked.

Groq has sub-millisecond latency that remains consistent across traffic, regions, and workloads, providing speed at any scale. Groq’s architecture is designed to preserve model quality at every size, from compact voice models to large-scale MoEs, consistently and at production scale.

Meet The Products

Groq’s custom LPU is built for inference, developed in the United States with a resilient supply chain for consistent performance at scale. The LPU powers both GroqCloud, a full-stack platform for fast, affordable, production-ready inference, and GroqRack Compute Clusters, ideal for enterprises needing on-premise solutions for their cloud or AI Compute Centre. Speaking of GroqCloud, the platform provides fast and affordable inference. Available as public, private, and co-cloud instances, GroqCloud redefines real-time performance and unlocks a new set of use cases by running business AI applications instantly. More than one million developers are already building on the GroqCloud platform, designed to meet businesses’ needs as they migrate from other AI providers like OpenAI.

GroqCloud is an “Agentic Ready” full-stack platform, seamlessly integrating tools, leveraging real-time streaming, and connecting to external sources to empower agents with enhanced intelligence. It also transforms natural language into actionable API calls and builds dynamic, real-time workflows, driving efficiency and innovation. Developers can further build applications with the Groq API using the language of their choice, with support for curl, JavaScript, Python, and JSON. They can also create cutting-edge applications leveraging industry-leading frameworks and integrations like LangChain, LlamaIndex, CrewAI, Vercel AI SDK, and more, apart from designing context-aware apps and real-time streamed UIs for dynamic, responsive applications that adapt to user needs.

GroqCloud provides a “No-code Developer Playground” for app developers, allowing them to explore the Groq API and featured models without writing a single line of code on the GroqCloud Developer Console. And the best part is that these developers don’t need to pay large upfront costs to start generating tokens. The Groq on-demand tokens-as-a-service model is simple. The user pays as they go for the tokens consumed without any upfront costs.

Next is the “GroqRack Compute Cluster,” which helps developers and businesses elevate their own cloud/AI Compute Centre with on-premise deployments. The start-up has also made its “Groq LPU AI Inference” technology available in various interconnected rack configurations to meet clients’ preferred model sizes.

With no exotic cooling or power requirements, deploying Groq systems requires no major overhaul of existing data centre infrastructure. With routing built into the LPU, both dollars and data centre space are focused on its main purpose: compute. Business leaders don’t need to worry about the high overhead costs of networking infrastructure. Groq currently has an available supply, ready to help tech businesses get up and running quickly with LPU AI inference technology that is designed and manufactured in North America.

Groq In Recent News

In the first half of 2025, Groq entered into a partnership with Bell Canada to power Bell AI Fabric, the country’s largest sovereign AI infrastructure project. Bell AI Fabric will establish a national AI network across six sites, targeting 500MW of clean, hydro-powered compute. The project launched with a 7MW Groq facility in Kamloops, British Columbia, which came online in June. In May, Groq also brought new data centres online in Houston and Dallas, pushing total global network capacity to over 20 million tokens per second.

Groq has teamed up with tech giant Meta to deliver fast inference for the official Llama API, to provide developers the fastest, most cost-effective way to run the latest Llama models. The Llama 4 API model, accelerated by Groq, will run on the Groq LPU, the world’s most efficient inference chip. Developers will be able to run Llama models with no trade-offs, offering features like low cost, fast responses, predictable low latency, and reliable scaling for production workloads.