Skip to content
Tier A — Frontier
Runs in:Multi-regionMade in:United States
OpenRouter

Nous Hermes 3 70B

Tier A — Frontier · 131K tokens · 70B

Tokonomix Editorial Team·Reviewed by Mes Kalkan··

Nous Hermes 3 70B is a large language model developed by Nous Research and made available through the OpenRouter platform. Built on the Llama 3 architecture with 70 billion parameters, this model represents the third major iteration of the Hermes series. It features an extensive context window of 131,000 tokens, enabling it to process and maintain coherence across lengthy documents and extended conversations. The model is designed as a general-purpose assistant with particular strengths in function calling, structured output generation, and creative applications. Unlike many commercial models, Nous Hermes 3 70B is trained with minimal content filtering, allowing it to engage with a broader range of topics and scenarios. This "uncensored" approach makes it suitable for applications requiring nuanced handling of sensitive subjects, creative writing without artificial constraints, and roleplay scenarios where strict content boundaries may be limiting. Nous Hermes 3 70B sits in the middle tier of OpenRouter's model offerings in terms of capability and resource requirements. It balances strong performance across diverse tasks with reasonable computational demands, positioning it between smaller, faster models and larger flagship systems. The model's tool-use capabilities allow it to interact with external functions and APIs, making it practical for agentic workflows and applications requiring structured data extraction or multi-step reasoning processes.

Nous Hermes 3 70B occupies a practical sweet spot: large enough to handle complex reasoning and extended contexts, yet efficient enough for production deployment without enterprise-scale infrastructure.

Tokonomix model positioning analysis
Section 01

Speed analysis

Latency measured across all benchmark runs. P50 (median) and P95 (95th percentile) give a realistic picture of response speed under normal and peak load.

P50 latency (median)P95 latency66 runs
150105519592864376805-2406-09ms
Section 02

Pricing history

Direct provider rates per million tokens, plus a typical-conversation cost estimate.

💰
API rates — Nous Hermes 3 70B
$0.7000 per 1M input tokens
$0.7000 per 1M output tokens
≈ $0.0006 per typical conversation (800 tokens)
Input vs output price (per 1M tokens)
per 1M input tokens$0.7000
per 1M output tokens$0.7000

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.7000

input / 1M

— stable

$0.7000

output / 1M

— stable

2026-05-312026-06-072026-06-07
Input
Output
Price change
⟳ synced weekly
Section 03

Tokens per second

Throughput in tokens per second, derived from measured P50 latency. Higher is better; fluctuations track provider-side load.

Throughput (tokens / s)1000 / avg 981
1318301

Estimated from P50 latency × 200 output tokens — the absolute number depends on this assumption; the trend is what matters.

Section 04

Strengths & weaknesses

Drawn from benchmark results and aggregated community feedback on real use-cases.

Strengths

131K token context windowRobust function calling and tool useMinimal content filtering for creative tasksBalanced performance-to-resource ratioStrong structured output generationMulti-step reasoning and agentic workflowsCreative writing without artificial constraintsBuilt on proven Llama 3 architecture

Weaknesses

Text-only, no vision capabilitiesKnowledge cutoff limits current eventsSlower than sub-10B parameter modelsNot specialized for domain-specific tasks
Section 05

Capabilities

toolsroleplayuncensored
Section 06

Frequently asked questions

The model has minimal built-in content filtering, placing responsibility on developers to implement application-layer guardrails appropriate to their use case. This offers flexibility but requires thoughtful safety design for user-facing applications.

For teams seeking a capable general-purpose model without content guardrails or the overhead of 400B+ parameter systems, Hermes 3 70B delivers strong performance across technical and creative workloads alike.

Tokonomix editorial assessment
Section 07

Tokonomix benchmark verdicts

2026-06-07

Nous Hermes 3 70B maintains baseline performance with stable capabilities

Nous Hermes 3 70B continues to operate at its established baseline performance level with no significant changes detected in this benchmark window. The model retains its support for tools, roleplay, and uncensored interactions that were introduced in the previous period. While the model provides consistent functionality across these capability areas, no measurable improvements in performance metrics or expanded feature set have emerged. Users can expect the same level of service that characterized the initial release, with tool use integration and roleplay scenarios remaining functional but showing no advancement in sophistication or accuracy. The uncensored nature of responses continues as before. This stability may benefit users who have integrated the model into existing workflows and prefer predictable behavior, though those seeking performance gains or enhanced capabilities will need to look elsewhere. The model occupies a steady position in the 70B parameter class without distinguishing improvements or concerning regressions during this evaluation period.

Quality

Latency p50

Test runs

0

Stable baseline performance maintained No capability improvements detected
Section 08

Full model profile

Nous Hermes 3 70B — illustration 1
Nous Hermes 3 70B: The Open-Weight Model Built for Unconstrained Reasoning

When a developer reaches for Nous Hermes 3 70B, they are typically solving one of two problems: they need a model that will follow complex instructions without second-guessing every edge case, or they have hit the constraints of mainstream commercial APIs and need something more accommodating. Built on Meta's Llama 3.1 base and fine-tuned by Nous Research with an emphasis on instruction-following and reduced refusal behaviour, Hermes 3 sits in that productive middle ground between raw base models and the heavily safety-layered offerings from the big three providers.

This is a 70-billion-parameter model with a 131,000-token context window, positioned deliberately as an alternative to Claude or GPT-4 class models when your use case doesn't fit their editorial guidelines. It runs on OpenRouter and other aggregator platforms, making it accessible without self-hosting infrastructure while maintaining the philosophical advantages of open-weight architecture. The model carries tool-use capabilities, handles extended roleplaying scenarios, and operates with minimal content filtering, making it a pragmatic choice for developers building agents, creative applications, or systems that need to reason about sensitive subject matter without constant guardrail interference.

Training Story and Technical Foundation

Hermes 3 70B starts with Meta's Llama 3.1 70B base, which gives it a strong multilingual foundation and the architectural improvements that came with the 3.1 series: better long-context performance, improved instruction adherence, and more stable reasoning chains. Nous Research then applies targeted fine-tuning with a dataset emphasising high-quality instruction pairs, multi-turn dialogue, and examples that reward nuanced thinking over pattern-matched refusals.

The "uncensored" designation doesn't mean the model is reckless. It means Nous deliberately reduced the aggressive safety filters that cause commercial models to refuse benign requests when they pattern-match on surface-level keywords. If you are building a medical education tool that needs to discuss symptoms frankly, a legal research assistant that must reason about criminal statutes, or a creative writing tool that handles mature themes, Hermes 3 will generally engage with the task rather than deliver a boilerplate refusal. The model still understands context and can decline genuinely problematic requests, but it doesn't trip over false positives the way heavily post-trained models often do.

The 131k context window is a practical differentiator. While not the largest available, it comfortably handles full codebases, long-form documents, or extended conversation histories without the truncation headaches that come with smaller windows. For agent workflows where you need to maintain state across dozens of turns, or document analysis pipelines processing research papers, this breathing room matters.

Where Hermes 3 70B Excels

The model shines in three core scenarios. First, structured agentic workflows where tool use and multi-step reasoning are the backbone. Hermes 3 supports function calling natively, and its instruction-following is strong enough that you can build agents that chain multiple tool invocations reliably. If you are constructing a research assistant that needs to query databases, synthesise findings, then format output according to a strict schema, Hermes 3 will follow that choreography without the drift or hallucination that plagues smaller models.

Second, extended creative and roleplaying applications. The combination of a large context window and reduced content filtering makes this a go-to model for interactive fiction, game NPC dialogue systems, or creative writing assistants. The model can maintain character consistency across long conversations and will engage with narrative premises that might trigger refusals elsewhere. If you are building a Dungeon Master bot or a collaborative storytelling platform, Hermes 3 handles the tonal range and narrative complexity without falling back on sanitised responses.

Third, any domain where you need straightforward engagement with complex or sensitive material. If you are building compliance software that needs to reason about regulatory edge cases, a mental health support tool that must discuss difficult topics candidly, or a harm-reduction application, Hermes 3 will work with the material rather than deflecting. The model understands nuance and doesn't mistake mention for endorsement, which makes it viable for educational and support contexts where overly cautious filtering actively harms the user experience.

The tool-use implementation is solid. You can define functions with JSON schemas, and the model will invoke them appropriately within conversational flow. It is not quite as polished as the function-calling in GPT-4 or Claude, but for most production use cases—particularly if you are building internal tools or vertical SaaS features—it clears the bar. The model understands when to call a tool versus when to synthesise from existing context, which reduces spurious API hits.

Where It Doesn't Fit

Hermes 3 70B is not the right choice if you need state-of-the-art performance on highly specialised tasks where the big providers have invested heavily in post-training. For instance, advanced mathematical reasoning, formal logic proofs, or the kind of deep code comprehension required for security audits—these are areas where Claude or GPT-4 variants will outperform. The base Llama architecture is capable, but the additional fine-tuning that Anthropic and OpenAI apply for these narrow domains adds up.

The model also doesn't match GPT-4 or Claude in conversational polish when you need consumer-facing interaction. If you are building a customer support bot where tone, empathy, and brand voice consistency are critical, the extra refinement in commercial models shows. Hermes 3 is direct and functional, which is excellent for developer-facing tools or internal workflows, but it doesn't have the same smooth conversational veneer for end-user chat applications.

Latency-sensitive applications may find the 70B parameter size a constraint. While OpenRouter and similar aggregators provide decent throughput, this is still a large model, and if you need sub-second response times for high-concurrency user-facing features, you might hit bottlenecks. Smaller models or distilled versions of commercial offerings will serve you better in those contexts.

Finally, if your use case requires the absolute highest level of factual accuracy and up-to-date knowledge, the model's training cutoff and the open-weight ecosystem's slower iteration cycles mean you will be behind the frontier. Commercial providers update their models more frequently and integrate retrieval-augmented generation features more tightly. If you are building a news summarisation tool or a product that must reflect current events, you will need to supplement with external knowledge pipelines.

Comparison to Peer Models

Within the open-weight 70B class, Hermes 3 competes primarily with other fine-tuned Llama derivatives. Compared to base Llama 3.1 70B, Hermes 3 offers meaningfully better instruction-following and reduced refusal rates without sacrificing general capability. If you tried Llama 3.1 directly and found it too cautious or inconsistent on edge cases, Hermes 3 is the next logical step.

Against other Nous models, Hermes 3 represents the current production-ready iteration. Earlier Hermes versions were built on Llama 2 and had narrower context windows. If you used those and found them useful but limiting, Hermes 3 is a straightforward upgrade with better reasoning and more headroom.

When compared to commercial models, the trade-offs become clearer. Claude Sonnet offers more polish, better long-context retrieval, and stronger safety guarantees if your compliance requirements demand auditable filtering. GPT-4 Turbo or GPT-4o brings faster iteration, tighter ecosystem integrations, and better performance on specialised reasoning tasks. But both come with editorial constraints that make certain applications difficult or impossible. If your feature set includes creative tools, harm-reduction content, legal or medical education, or agent workflows that need to reason about sensitive domains, Hermes 3 offers a path that simply doesn't exist with the big providers.

The cost positioning also matters. Hermes 3 sits in the low tier for 70B-class models, making it accessible for prototyping and for production use cases with moderate traffic. You are not going to build a high-volume consumer chatbot on this, but for internal tooling, vertical SaaS features, or developer-facing products, the economics work.

Cost and Availability

Hermes 3 70B is available through OpenRouter and other aggregator platforms, which handle the infrastructure and scaling so you don't need to spin up your own GPU clusters. This deployment model strikes a useful middle ground: you get the flexibility and policy advantages of an open-weight model without the operational burden of self-hosting a 70B parameter beast.

The pricing is positioned competitively within the aggregator ecosystem. It is meaningfully cheaper than running equivalent commercial models at this scale, though not as cheap as smaller distilled alternatives. For teams building features that need the reasoning depth of a large model but don't require the absolute frontier performance of GPT-4 or Claude, this price band makes sense.

One consideration is that aggregator availability can fluctuate based on provider capacity. OpenRouter pools multiple backend providers for each model, which generally keeps uptime high, but it is not the same as the SLA you would get from a direct commercial API. For mission-critical production systems where downtime is costly, you might want to run your own instance or maintain fallback routes to commercial models.

Self-hosting is an option if you have the infrastructure appetite. The model weights are open, so you can deploy on your own hardware or rent dedicated GPU capacity from cloud providers. This makes sense if you have particularly high throughput needs, strict data residency requirements, or want to further fine-tune the model for your domain. But for most teams, the aggregator route is the pragmatic choice—it gets you to production faster and lets you scale without managing infrastructure.

Our Verdict

Hermes 3 70B occupies a valuable niche in the production model landscape. It is not trying to beat GPT-4 at every benchmark or replace Claude in customer-facing chat. Instead, it offers a capable, large-context model with minimal editorial friction, available at a cost point that makes sense for a wide range of applications that don't fit cleanly into the big-three paradigm.

If you are building agent systems, creative tools, or applications in domains where content policies create friction, this model deserves evaluation. It brings enough reasoning capability for complex workflows, enough context for long-form tasks, and enough flexibility to engage with the material your application actually needs to handle. The tool-use support is solid, the instruction-following is reliable, and the deployment model through aggregators keeps operational complexity low.

The trade-offs are clear: you sacrifice some polish, some specialised performance, and the tight ecosystem integrations that come with commercial APIs. But in exchange, you gain control, cost efficiency, and the ability to build features that would be rejected or hobbled by mainstream providers. For many production teams—particularly those in creative, educational, legal, or health-adjacent domains—that is a trade worth making.

Hermes 3 70B is not a flagship model. It is a workhorse. It shows up, does the job, and doesn't get in your way. For a large segment of real-world development problems, that is exactly what you need.

Nous Hermes 3 70B — illustration 2Nous Hermes 3 70B — illustration 3
Last automated test
Jun 9, 2026 · 20:02 UTC · Speed benchmark
P50 latency
200 ms
P95 latency
216 ms
Errors
0 / 6 runs
Last reviewed by Tokonomix Team·May 24, 2026