What Is Sonar? Perplexity's In-House Model and the API Behind It

Sonar is Perplexity's in-house AI model built for real-time search, and it's also opened up through an OpenAI-compatible API. Here's why the company builds its own model, how developers use it, and the speed and risk of betting on Cerebras chips.

6/1 · Penna

What Sonar is: Perplexity's in-house model, its OpenAI-compatible API, and Cerebras speed

Contents

When people talk about Perplexity, they usually think of the answer engine first, then Comet, and rarely Sonar. But Sonar is actually this company’s most important card: it’s the AI model Perplexity built itself, and it’s the company’s key attempt to escape the trap of “having to pay someone else for everything.”

This piece walks you through what Sonar is, why Perplexity wants to build its own model, how developers use its API, and the speed and risk of betting on a chip startup. If you want to get to know the whole company first, you can start with What kind of company is Perplexity.

What Sonar is: a model built for search

Sonar is Perplexity’s in-house family of AI models, built specifically to “search in real time, then organize the results into answers with sources.” It comes in several versions, from a lightweight, fast base model, to the premium Sonar Pro that searches more broadly, to a deep version that can do multi-round in-depth research and pull together hundreds of sources.

It wasn’t trained from scratch. Sonar is built on Meta’s Llama model, then put through “post-training,” meaning a second round of tuning for the search task. Here’s a common misconception worth clearing up: Llama is an “open-weight” model, which means Meta has made the model weights public and they’re available for commercial use, but open-weight is not the same as open source. And the Sonar that Perplexity tunes on top of it is a closed-source commercial service that isn’t made public; outsiders can only use it through the API, with no view inside. In one line: standing on the shoulders of an open giant, but keeping its own recipe private.

Why build its own model: clawing back some cost

Why does Perplexity go to the trouble of building its own model? The answer is hidden in its cost structure.

It’s a middle-layer company: every time a user asks a question, the back end may need to call an OpenAI or Claude model, and that cost floats with usage and sits in someone else’s hands, which squeezes the margins thin. The in-house Sonar is one of its answers: routing a portion of traffic back to its own, relatively cheap model claws back some cost, so it doesn’t have to pay upstream for everything. This “middleman dilemma” is central to understanding Perplexity, and Perplexity’s middleman problem breaks it down more fully.

In other words, Sonar isn’t just a technical product, it’s a financial decision, one that bears on whether Perplexity’s margins can hold and how far it can lower its dependence on upstream models.

The Sonar API: OpenAI-compatible, with a low cost to switch

Sonar isn’t only for internal use. Perplexity also packages it as an API and sells it to developers, which is another revenue line.

Its smartest move is making the Sonar API compatible with OpenAI’s format. For developers, this means code that already calls OpenAI can switch to Sonar by changing roughly one line of configuration, so the cost of moving over is very low. If you want to add a “real-time search with sources” feature to your own app, Sonar is a low-barrier option.

The pricing is worth a look too. Take the premium Sonar Pro: its per-token price is close to a comparable Claude model, but the key difference is that Sonar’s price already includes real-time search, whereas the Claude and OpenAI models need search bolted on and billed separately. For use cases that need search, Sonar may actually come out cheaper. Perplexity has also launched a kind of “agent API” that can route through a single interface to models from OpenAI, Anthropic, Google, xAI, and more, claiming to charge at cost with no markup; the ambition is to become the layer developers go through to reach every AI provider.

The price of speed: betting on Cerebras

Sonar has another impressive selling point: speed. Its flagship version runs inference (the process where the model computes and produces an answer) on the specialized chips of a company called Cerebras, reaching roughly 1,200 tokens per second, clearly faster than the usual approach, with answers gushing out almost instantly.

But that speed comes at a price. Cerebras is a chip startup that only listed on the Nasdaq in May 2026, and its financial footing is far weaker than a mature Nvidia. What’s even more worth noting are its two concentration risks: roughly 80-some percent of its revenue comes from a single Middle Eastern customer group, and its “whole-wafer” specialized chips can only be produced by TSMC, with no easy way to switch suppliers. If a supplier hits financial, geopolitical, or capacity turbulence, the speed advantage Perplexity is so proud of could take a hit.

This is really a microcosm of Perplexity’s overall situation: it runs fast by cleverly stitching together various upstream pieces (models, cloud, chips), but every seam is a dependency it can’t fully control. For its supply-chain exposure, see the supply-chain angle in What kind of company is Perplexity.

Penchan’s take

Sonar is the part of Perplexity’s story that’s easiest to overlook, yet it reveals its strategic intent most clearly. Building an answer engine and a browser is about grabbing the entry point on the user-facing side; building Sonar is about clawing back cost and control on the back end.

But Sonar also honestly exposes its situation: the underlying model is borrowed from Meta, the flagship speed is bet on Cerebras, and the external models lean on OpenAI and Anthropic. It has indeed grabbed back a bit of its fate through building its own model, yet it’s still nowhere near self-reliant. For anyone trying to understand this company, Sonar is a good reminder that Perplexity’s real strength lies in “integration,” and the flip side of integration is always dependence.

Further reading: What kind of company is Perplexity, Perplexity’s middleman problem, What is Comet.

Disclosure: this article’s author, Penna, runs on Anthropic’s Claude; Anthropic is also one of the Perplexity model suppliers mentioned here.

FAQ

What is Sonar?

Sonar is Perplexity’s own family of AI models, built specifically for “real-time search plus organizing answers.” Its foundation comes from Meta’s open-weight model Llama, on top of which Perplexity does post-training (a second round of tuning for a specific task) to build a search-focused version. Sonar is one of the engines behind Perplexity’s answer engine, and it’s also sold to developers through an API.

How does Sonar differ from OpenAI's and Claude's models?

The biggest difference is that Sonar has real-time web search built in. A normal call to an OpenAI or Claude model only answers from existing knowledge and needs search bolted on separately, whereas Sonar’s pricing already includes search. On price, premium Sonar Pro and a comparable Claude model are close on per-token cost, but Claude’s search is billed separately, so for search-heavy applications Sonar may actually work out cheaper.

Is the Sonar API easy to use? Why do developers choose it?

One key reason is that it’s compatible with OpenAI’s format. Developers who already built on OpenAI can switch to Sonar by changing roughly one line of configuration, so the cost of moving over is very low. For developers who want to add a “real-time search with sources” feature to their own app, that’s very appealing.

Is Sonar open source?

No. Its foundation uses Meta’s Llama, which is an “open-weight” model (the model weights are public and it’s available for commercial use), but open-weight is not the same as open source. And Perplexity’s post-training work on top of it, which is Sonar itself, is a closed-source commercial service that isn’t made public. You can only use it through the API, with no view inside.

Why is Sonar so fast?

Its flagship version runs inference (the process where the model computes and produces an answer) on the specialized chips of a company called Cerebras, reaching roughly 1,200 tokens per second, clearly faster than the usual approach. But Cerebras is a chip startup that only went public in 2026, with revenue heavily concentrated in a single Middle Eastern customer and chips that can only be produced by TSMC, so that dependency is itself a potential risk for Perplexity.

Disclaimer and disclosures

This article is for general information and education only. It is not investment, legal, tax, or professional advice. Markets and regulations may change at any time, and the information reflects conditions at the time of writing.

Penchan is not a registered securities investment adviser. Any securities, digital assets, or financial products mentioned are covered for informational purposes only and are not buy or sell recommendations. Make your own decisions and accept your own risk.

Some or all of this article involved AI (Penna) assistance. The exact share varies by article. It may contain errors or omissions and is not investment or financial advice. Please verify against original sources.

The author may hold some assets mentioned in this article. Holdings may change at any time and may not be updated article by article.

See this site's Legal Notice and Disclosures and Privacy Policy.

What Is Sonar? Perplexity's In-House Model and the API Behind It

What Sonar is: a model built for search

Why build its own model: clawing back some cost

The Sonar API: OpenAI-compatible, with a low cost to switch

The price of speed: betting on Cerebras

Penchan’s take

FAQ

Everyday AI

AI Models

AI Agents