| _______ __ _______ | |
| | | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----. | |
| | || _ || __|| < | -__|| _| | || -__|| | | ||__ --| | |
| |___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____| | |
| on Gopher (inofficial) | |
| Visit Hacker News on the Web | |
| COMMENT PAGE FOR: | |
| Nvidia Nemotron 3 Family of Models | |
| thoughtpeddler wrote 10 hours 12 min ago: | |
| Is it fair to view this release as Nvidia strategically flexing that | |
| they can compete with their own customers in the model layer -- that | |
| they can be as vertically integrated as, say, GDM? | |
| omneity wrote 13 hours 24 min ago: | |
| Nemotron now works on LM Studio if you update the runtime (from the | |
| settings > Runtime screen). | |
| The default chat template is incorrect though and will fail but I | |
| published a corrected one you can replace it with: | |
| [1]: https://gist.github.com/omarkamali/a594b6cb07347f501babed48989... | |
| Tepix wrote 20 hours 4 min ago: | |
| Is it just me or is Nvidia trolling hard by calling a model with 30b | |
| parameters "nano"? With a bit of context, it doesn't even fit on a RTX | |
| 5090. | |
| Other LLMs with the "nano" moniker are around 1b parameters or less. | |
| patpatpat wrote 5 hours 17 min ago: | |
| FWIW It runs on my 9060xt(AMD) 16gb, without any tweaks just fine. | |
| It's very useable. | |
| I asked it to write a prime sieve in c#, started responding in .38 | |
| seconds, and wrote an implementation @ 20 tokens/sec | |
| genpfault wrote 1 hour 3 min ago: | |
| Getting ~150 tok/s on an empty context with a 24 GB 7900XTX via | |
| llama.cpp's Vukan backend. | |
| jonrosner wrote 20 hours 53 min ago: | |
| after testing it for a little I am pretty disappointed. While I do get | |
| 90 token per second out of it from my M4 Pro which is more than enough | |
| for a real world use case, the quality is just not there. I gave it a | |
| codebase that it should analyze and answer me some questions and it | |
| started hallucinating right away. No replacement for a "real" coding | |
| agent - maybe for other agentic work like sorting emails though. | |
| dJLcnYfsE3 wrote 20 hours 53 min ago: | |
| I would say it is weird, that NVidia competes with own customers but | |
| looking back at "Founders Edition" cards maybe it isn't that weird at | |
| all. The better question probably is - with every big corporation | |
| having its own LLM, what exactly is OpenAI moat that would explain | |
| their valuation? | |
| lukeinator42 wrote 10 hours 52 min ago: | |
| I wonder if they also want to create more of a market for their | |
| products such as the DGX Spark. | |
| notyourwork wrote 18 hours 29 min ago: | |
| They and Tesla know something no one else does. | |
| beng-nl wrote 12 hours 40 min ago: | |
| Can you tell us more? Iâm curious to hear what is behind this | |
| implication. | |
| leobg wrote 11 hours 23 min ago: | |
| A guess: | |
| They both believe the product people focus on will commoditize. | |
| Tesla realized early that EVs without autonomy are a dead end for | |
| long-term dominance, just as NVIDIA believes models without | |
| infrastructure are a dead end for durable AI profits. | |
| (Am I close?) | |
| radarsat1 wrote 21 hours 7 min ago: | |
| I find it really interesting that it uses a Mamba hybrid with | |
| Transformers. Is it the only significant model right now using (at | |
| least partially) SSM layers? This must contribute to lower VRAM | |
| requirements right? Does it impact how KV caching works? | |
| ofermend wrote 1 day ago: | |
| We just evaluated Nemotron-3 for Vectara's hallucination leaderboard. | |
| It scores at 9.6% hallucination rate, similar to | |
| qwen3-next-80b-a3b-thinking (9.3%) but of course it is much smaller. | |
| [1]: https://github.com/vectara/hallucination-leaderboard | |
| DoctorOetker wrote 1 day ago: | |
| can it understand input in and generate output for different language | |
| tokens? does it know narrow IPA transcription of sentences in arbitrary | |
| languages? | |
| sosodev wrote 1 day ago: | |
| The claim that a small, fast, and decently accurate model makes a good | |
| foundation for agentic workloads seems like a reasonable claim. | |
| However, is cost the biggest limiting factor for agent adoption at this | |
| point? I would suspect that the much harder part is just creating an | |
| agent that yields meaningful results. | |
| ineedasername wrote 1 day ago: | |
| No, I really don't think cost is the limiting factor- it's tooling | |
| and competent workforce to implement it. Every company of any | |
| substantial size, or near enough, is trying to implement and hire for | |
| those roles, and the # of people familiar with the specific tooling + | |
| lack of maturity in tooling increasing the learning curve, these are | |
| the bottlenecks. | |
| all2 wrote 1 day ago: | |
| This has been my major concern, so much do that I'm going to be | |
| launching a tool to handle this specific task: agent conception and | |
| testing. There is so little visibility in the tools I've used that | |
| debug is just a game of whackamole. | |
| sosodev wrote 13 hours 58 min ago: | |
| Did you see this HN submission? [1] It seems similar to what you're | |
| describing. | |
| [1]: https://news.ycombinator.com/item?id=46242838 | |
| all2 wrote 9 hours 56 min ago: | |
| I did not. Thanks for the heads up! | |
| kristopolous wrote 1 day ago: | |
| I was just using the embeddings model last night. Boy is it slow. Nice | |
| results but this 5090 isn't cutting it. | |
| I'm guessing there's some sophistication in the instrumentation I'm | |
| just not up to date with. | |
| sosodev wrote 1 day ago: | |
| I love how detailed and transparent the data set statistics are on the | |
| huggingface pages. [1] I've noticed that open models have made huge | |
| efficiency gains in the past several months. Some amount of that is | |
| explainable as architectural improvements but it seems quite obvious | |
| that a huge portion of the gains come from the heavy use of synthetic | |
| training data. | |
| In this case roughly 33% of the training tokens are synthetically | |
| generated by a mix of other open weight models. I wonder if this trend | |
| is sustainable or if it might lead to model collapse as some have | |
| predicted. I suspect that the proliferation of synthetic data | |
| throughout open weight models has lead to a lot of the ChatGPT writing | |
| style replication (many bullet points, em dashes, it's not X but | |
| actually Y, etc). | |
| [1]: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-F... | |
| jtbayly wrote 1 day ago: | |
| Any chance of running this nano model on my Mac? | |
| keyle wrote 1 day ago: | |
| LMStudio and 32+ gb of RAM. [1] Simplest to just install it from the | |
| app. | |
| [1]: https://lmstudio.ai/models/nemotron-3 | |
| jonrosner wrote 1 day ago: | |
| running it on my M4 @ 90tps, takes 18GB of RAM. | |
| Tepix wrote 20 hours 0 min ago: | |
| If it uses 18GB of RAM, you're not using the official model | |
| (released in BF16 and FP8), but a quantization of unknown quality. | |
| If you write "M4", you mean M4 and not M4 Pro or M4 Max? | |
| pylotlight wrote 22 hours 8 min ago: | |
| M2 Max @ 17tps btw | |
| mark_l_watson wrote 1 day ago: | |
| I used Nemotron 3 nana on LM Studio yesterday on my 32G M2-Pro mac | |
| mini. It is fast and passed all of my personal tool use tests, and | |
| did a good job analyzing code. Love it. | |
| Today I ran a few simple cases on Ollama, but not much real testing. | |
| axoltl wrote 1 day ago: | |
| There's MLX versions of the model, so yes. LM Studio hasn't updated | |
| their mlx-lm runtime yet though, you'll get an exception. | |
| But if you're OK running it without a UI wrapper, mlx_lm==0.30.0 will | |
| serve you fine. | |
| anon373839 wrote 1 day ago: | |
| Looks like LM Studio just updated the MLX runtime, so there's | |
| compatibility now. | |
| axoltl wrote 12 hours 14 min ago: | |
| Yep! 60t/s on the 8 bit MLX on an M4 Pro with 64GB of RAM. | |
| netghost wrote 1 day ago: | |
| Kind of depends on your mac, but if it's a relatively recent apple | |
| silicon model⦠maybe, probably? | |
| > Nemotron 3 Nano is a 3.2B active (3.6B with embeddings) 31.6B total | |
| parameter model. | |
| So I don't know the exact math once you have a MoE, but 3.2b will run | |
| on most anything, 31.6b and you're looking at needing a pretty large | |
| amount of ram. | |
| vessenes wrote 1 day ago: | |
| Given Mac bandwidth, you'll generally want to load the whole thing | |
| in RAM. You get speed benefits based on smaller-size active | |
| experts, since the Mac compute is slow compared to Nvidia hardware. | |
| This should be relatively snappy on a Mac, if you can load the | |
| entire thing. | |
| kristianp wrote 1 day ago: | |
| The article seem to focus on the nano model. Where are the details of | |
| the larger ones? | |
| shikon7 wrote 1 day ago: | |
| > We are releasing the Nemotron 3 Nano model and technical report. | |
| Super and Ultra releases will follow in the coming months. | |
| max002 wrote 1 day ago: | |
| Im upvoting, im happy to finally see open source model with commercial | |
| use from Nvidia as most of the models ive been checking from you guys | |
| couldnt be used in commercial settings. Bravo Nvidia! | |
| teleforce wrote 19 hours 53 min ago: | |
| Just wondering is any commercial restriction can be considered open | |
| source at all? Even the most stringent GPL allows you to | |
| commercialize [1]. | |
| But we are talking about LLM model here not software, but the same | |
| principle should applies. [1] Open-source license: | |
| [1]: https://en.wikipedia.org/wiki/Open-source_license | |
| wcallahan wrote 2 days ago: | |
| I donât do âevalsâ, but I do process billions of tokens every | |
| month, and Iâve found these small Nvidia models to be the best by far | |
| for their size currently. | |
| As someone else mentioned, the GPT-OSS models are also quite good | |
| (though I havenât found how to make them great yet, though I think | |
| they might age well like the Llama 3 models did and get better with | |
| time!). | |
| But for a defined task, Iâve found task compliance, understanding, | |
| and tool call success rates to be some of the highest on these Nvidia | |
| models. | |
| For example, I have a continuous job that evaluates if the data for a | |
| startup company on aVenture.vc could have overlapping/conflated two | |
| similar but unrelated companies for news articles, research details, | |
| investment rounds, etc⦠which is a token hungry ETL task! And I | |
| recently retested this workflow on the top 15 or so models today with | |
| <125b parameters, and the Nvidia models were among the best performing | |
| for this type of work, particularly around non-hallucination if given | |
| adequate grounding. | |
| Also, re: cost - I run local inference on several machines that run | |
| continuously, in addition to routing through OpenRouter and the | |
| frontier providers, and was pleasantly surprised to find that if Iâm | |
| a paying customer of OpenRouter otherwise, the free variant there from | |
| Nvidia is quite generous for limits, too. | |
| selfhoster11 wrote 21 hours 19 min ago: | |
| You may want to use the new "derestricted" variants of gpt-oss. While | |
| the ostensible goal of these variants is to de-censor them, it ends | |
| up removing the models' obsession with policy and wasting thinking | |
| tokens that could be used towards actually reasoning through a | |
| problem. | |
| dandelionv1bes wrote 22 hours 0 min ago: | |
| Completely agree. I was working on something with TensorRT LLM and | |
| threw Nemotron in there more on a whim. It completely mopped the | |
| floor with other models for my task (text style transfer), following | |
| joint moderation with another LLM & humans. Really impressed. | |
| kgeist wrote 1 day ago: | |
| >the GPT-OSS models are also quite good | |
| I recently pitted gpt-oss 120b against Qwen3-Next 80b on a lot of | |
| internal benchmarks (for production use), and for me, gpt-oss was | |
| slightly slower (vLLM, both fit in VRAM), much worse at multilingual | |
| tasks (33 languages evaluated), and had worse instruction following | |
| (e.g., Qwen3-Next was able to reuse the same prompts I used for | |
| Gemma3 perfectly, while gpt-oss struggled and RAG benchmarks suddenly | |
| went from 90% to 60% without additional prompt engineering). | |
| And that's with Qwen3-Next being a random unofficial 4-bit quant | |
| (compared to gpt-oss having native support) + I had to disable | |
| multi-token prediction in Qwen3-Next because vLLM crashed with it. | |
| Has someone here tried both gpt-oss 120b and Qwen3-Next 80b? Maybe I | |
| was doing something wrong because I've seen a lot of people praise | |
| gpt-oss. | |
| scrlk wrote 1 day ago: | |
| gpt-oss is STEM-maxxed, so I imagine most of the praise comes from | |
| people using it for agentic coding. | |
| > We trained the models on a mostly English, text-only dataset, | |
| with a focus on STEM, coding, and general knowledge. | |
| [1]: https://openai.com/index/introducing-gpt-oss/ | |
| andy99 wrote 1 day ago: | |
| What do you mean about not doing evals? Just literally that you | |
| donât run any benchmarks or do you have something against them? | |
| danielmarkbruce wrote 1 day ago: | |
| He's just saying anecdotally these models are good. A reasonable | |
| response might be "have you systematically evaluated them?". He has | |
| pre-answered - no. | |
| woodson wrote 1 day ago: | |
| Not OP, but perhaps they mean not putting too much faith in common | |
| benchmarks (thanks to benchmaxxing). | |
| btown wrote 1 day ago: | |
| Would you mind sharing what hardware/card(s) you're using? And is [1] | |
| one of the ones you've tested? | |
| [1]: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B... | |
| heavyset_go wrote 23 hours 2 min ago: | |
| Support for this landed in llama.cpp recently if anyone is | |
| interested in running it locally. | |
| red2awn wrote 2 days ago: | |
| Very interesting release: | |
| * Hybrid MoE: 2-3x faster than pure MoE transformers | |
| * 1M context length | |
| * Trained on NVFP4 | |
| * Open Source! Pretraining, mid-training, SFT and RL dataset released | |
| (SFT HF link is 404...) | |
| * Open model training recipe (coming soon) | |
| Really appreciate Nvidia being the most open lab but they really should | |
| make sure all the links/data are available on day 0. | |
| Also interesting that the model is trained in NVFP4 but the inference | |
| weights are FP8. | |
| bcatanzaro wrote 1 day ago: | |
| The Nano model isnât pretrained in FP4, only Super and Ultra are. | |
| And posttraining is not in FP4, so the posttrained weights of these | |
| models are not native FP4. | |
| pants2 wrote 2 days ago: | |
| If it's intelligence + speed you want, nothing comes close to | |
| GPT-OSS-120B on Cerebras or Groq. | |
| However, this looks like it has great potential for cost-effectiveness. | |
| As of today it's free to use over API on OpenRouter, so a bit unclear | |
| what it'll cost when it's not free, but free is free! | |
| [1]: https://openrouter.ai/nvidia/nemotron-3-nano-30b-a3b:free | |
| viraptor wrote 2 days ago: | |
| > nothing comes close to GPT-OSS-120B on Cerebras | |
| That's temporary. Cerebras speeds up everything, so if Nemotron is | |
| good quality, it's just a matter of time until they add it. | |
| credit_guy wrote 2 days ago: | |
| That's unlikely. Cerebras doesn't speed up everything. Can it speed | |
| up everything? I don't know, I'm not an insider. But does it speed | |
| up everything? That is evidently not the case. Their page [1] lists | |
| only 4 production models and 2 preview models. | |
| [1]: https://inference-docs.cerebras.ai/models/overview | |
| agentastic wrote 1 day ago: | |
| They need to compile the model for their chips. Standard | |
| transformers are easier, so GPT-OSS, Qwen, GLM, etc if there is | |
| demand, they will deploy it. | |
| Nemotron on the other hand is a hybrid (Transformer + Mamba-2) so | |
| it will be more challenging to compile it on Cerebras/Groq chips. | |
| (Me thinks Nvidia is purposefully picking architecture+FP4 that | |
| is easy to ship on Nvidia chips, but harder for TPU or | |
| Cerebras/Groq to deploy) | |
| Y_Y wrote 2 days ago: | |
| Wow, Nvidia keepson pushing the frontier of misleading benchmarks | |
| <- back to front page |