Jump to content

UHQBot

Forum Bot
  • Posts

    51,948
  • Joined

  • Last visited

  • Days Won

    27

Everything posted by UHQBot

  1. NVIDIA Blackwell swept the new SemiAnalysis InferenceMAX v1 benchmarks, delivering the highest performance and best overall efficiency. InferenceMax v1 is the first independent benchmark to measure total cost of compute across diverse models and real-world scenarios. Best return on investment: NVIDIA GB200 NVL72 delivers unmatched AI factory economics — a $5 million investment generates $75 million in DSR1 token revenue, a 15x return on investment. Lowest total cost of ownership: NVIDIA B200 software optimizations achieve two cents per million tokens on gpt-oss, delivering 5x lower cost per token in just 2 months. Best throughput and interactivity: NVIDIA B200 sets the pace with 60,000 tokens per second per GPU and 1,000 tokens per second per user on gpt-oss with the latest NVIDIA TensorRT-LLM stack. As AI shifts from one-shot answers to complex reasoning, the demand for inference — and the economics behind it — is exploding. The new independent InferenceMAX v1 benchmarks are the first to measure total cost of compute across real-world scenarios. The results? The NVIDIA Blackwell platform swept the field — delivering unmatched performance and best overall efficiency for AI factories. A $5 million investment in an NVIDIA GB200 NVL72 system can generate $75 million in token revenue. That’s a 15x return on investment (ROI) — the new economics of inference. “Inference is where AI delivers value every day,” said Ian Buck, vice president of hyperscale and high-performance computing at NVIDIA. “These results show that NVIDIA’s full-stack approach gives customers the performance and efficiency they need to deploy AI at scale.” Enter InferenceMAX v1 InferenceMAX v1, a new benchmark from SemiAnalysis released Monday, is the latest to highlight Blackwell’s inference leadership. It runs popular models across leading platforms, measures performance for a wide range of use cases and publishes results anyone can verify. Why do benchmarks like this matter? Because modern AI isn’t just about raw speed — it’s about efficiency and economics at scale. As models shift from one-shot replies to multistep reasoning and tool use, they generate far more tokens per query, dramatically increasing compute demands. NVIDIA’s open-source collaborations with OpenAI (gpt-oss 120B), Meta (Llama 3 70B), and DeepSeek AI (DeepSeek R1) highlight how community-driven models are advancing state-of-the-art reasoning and efficiency. Partnering with these leading model builders and the open-source community, NVIDIA ensures the latest models are optimized for the world’s largest AI inference infrastructure. These efforts reflect a broader commitment to open ecosystems — where shared innovation accelerates progress for everyone. Deep collaborations with the FlashInfer, SGLang and vLLM communities enable codeveloped kernel and runtime enhancements that power these models at scale. Software Optimizations Deliver Continued Performance Gains NVIDIA continuously improves performance through hardware and software codesign optimizations. Initial gpt-oss-120b performance on an NVIDIA DGX Blackwell B200 system with the NVIDIA TensorRT LLM library was market-leading, but NVIDIA’s teams and the community have significantly optimized TensorRT LLM for open-source large language models. The TensorRT LLM v1.0 release is a major breakthrough in making large AI models faster and more responsive for everyone. Through advanced parallelization techniques, it uses the B200 system and NVIDIA NVLink Switch’s 1,800 GB/s bidirectional bandwidth to dramatically improve the performance of the gpt-oss-120b model. The innovation doesn’t stop there. The newly released gpt-oss-120b-Eagle3-v2 model introduces speculative decoding, a clever method that predicts multiple tokens at a time. This reduces lag and delivers even quicker results, tripling throughput at 100 tokens per second per user (TPS/user) — boosting per-GPU speeds from 6,000 to 30,000 tokens. For dense AI models like Llama 3.3 70B, which demand significant computational resources due to their large parameter count and the fact that all parameters are utilized simultaneously during inference, NVIDIA Blackwell B200 sets a new performance standard in InferenceMAX v1 benchmarks. Blackwell delivers over 10,000 TPS per GPU at 50 TPS per user interactivity — 4x higher per-GPU throughput compared with the NVIDIA H200 GPU. Performance Efficiency Drives Value Metrics like tokens per watt, cost per million tokens and TPS/user matter as much as throughput. In fact, for power-limited AI factories, Blackwell delivers 10x throughput per megawatt compared with the previous generation, which translates into higher token revenue. The cost per token is crucial for evaluating AI model efficiency, directly impacting operational expenses. The NVIDIA Blackwell architecture lowered cost per million tokens by 15x versus the previous generation, leading to substantial savings and fostering wider AI deployment and innovation. Multidimensional Performance InferenceMAX uses the Pareto frontier — a curve that shows the best trade-offs between different factors, such as data center throughput and responsiveness — to map performance. But it’s more than a chart. It reflects how NVIDIA Blackwell balances the full spectrum of production priorities: cost, energy efficiency, throughput and responsiveness. That balance enables the highest ROI across real-world workloads. Systems that optimize for just one mode or scenario may show peak performance in isolation, but the economics of that doesn’t scale. Blackwell’s full-stack design delivers efficiency and value where it matters most: in production. For a deeper look at how these curves are built — and why they matter for total cost of ownership and service-level agreement planning — check out this technical deep dive for full charts and methodology. What Makes It Possible? Blackwell’s leadership comes from extreme hardware-software codesign. It’s a full-stack architecture built for speed, efficiency and scale: The Blackwell architecture features include: NVFP4 low-precision format for efficiency without loss of accuracy Fifth-generation NVIDIA NVLink that connects 72 Blackwell GPUs to act as one giant GPU NVLink Switch, which enables high concurrency through advanced tensor, expert and data parallel attention algorithms Annual hardware cadence plus continuous software optimization — NVIDIA has more than doubled Blackwell performance since launch using software alone NVIDIA TensorRT-LLM, NVIDIA Dynamo, SGLang and vLLM open-source inference frameworks optimized for peak performance A massive ecosystem, with hundreds of millions of GPUs installed, 7 million CUDA developers and contributions to over 1,000 open-source projects The Bigger Picture AI is moving from pilots to AI factories — infrastructure that manufactures intelligence by turning data into tokens and decisions in real time. Open, frequently updated benchmarks help teams make informed platform choices, tune for cost per token, latency service-level agreements and utilization across changing workloads. NVIDIA’s Think SMART framework helps enterprises navigate this shift, spotlighting how NVIDIA’s full-stack inference platform delivers real-world ROI — turning performance into profits. View the full article
  2. Hello MissMonocle, Welcome to UnityHQ Nolfseries Community. Please feel free to browse around and get to know the others. If you have any questions please don't hesitate to ask. Be sure to join our Discord MissMonocle joined on the 10/09/2025. View Member
  3. Hello Alan, Welcome to UnityHQ Nolfseries Community. Please feel free to browse around and get to know the others. If you have any questions please don't hesitate to ask. Be sure to join our Discord Alan joined on the 10/10/2025. View Member
  4. The post <i>Arc Raiders</i> Is Trying To Fix One Of The Most Frustrating Things About Extraction Shooters appeared first on Kotaku. View the full article
  5. The post The Secret Behind The Fastest XP/JP Grind In <i>Final Fantasy Tactics</i>? Frogs appeared first on Kotaku. View the full article
  6. The post <i>Mad Max</i> Director Says ‘AI Here To Stay’ And Likens It To The Renaissance In Worst Betrayal Yet appeared first on Kotaku. View the full article
  7. The post The 10 Highest-Priced Cards In <i>Pokémon TCG’</i>s Mega Evolution Set appeared first on Kotaku. View the full article
  8. The post <em>Overwatch 2</em>‘s New Hero Tease Overshadowed By Lifeweaver Absolutely Serving In His New Mythic Skin appeared first on Kotaku. View the full article
  9. The post The Cheapest Way To Buy The Switch 2’s Biggest Holiday Games appeared first on Kotaku. View the full article
  10. The post This 4.6-Star Rated Soundcore Portable Bluetooth Speaker Selling for Post-Prime Pennies appeared first on Kotaku. View the full article
  11. The post The Discord Hack Sounds Really, Really Bad appeared first on Kotaku. View the full article
  12. The post <i>Black Ops 7</i> Finally Bails On A Contentious Multiplayer Matchmaking Fight appeared first on Kotaku. View the full article
  13. The post This HP Laptop Was Pricier Than a MacBook, Now It’s Nearly 90% Off and Feels Practically Free (16GB RAM, 512GB SSD) appeared first on Kotaku. View the full article
  14. Microsoft Azure today announced the new NDv6 GB300 VM series, delivering the industry’s first supercomputing-scale production cluster of NVIDIA GB300 NVL72 systems, purpose-built for OpenAI’s most demanding AI inference workloads. This supercomputer-scale cluster features over 4,600 NVIDIA Blackwell Ultra GPUs connected via the NVIDIA Quantum-X800 InfiniBand networking platform. Microsoft’s unique systems approach applied radical engineering to memory and networking to provide the massive scale of compute required to achieve high inference and training throughput for reasoning models and agentic AI systems. Today’s achievement is the result of years of a deep partnership between NVIDIA and Microsoft purpose-building AI infrastructure for the world’s most demanding AI workloads and to deliver infrastructure for the next frontier of AI. It marks another leadership moment, ensuring that leading-edge AI drives innovation in the United States. “Delivering the industry’s first at-scale NVIDIA GB300 NVL72 production cluster for frontier AI is an achievement that goes beyond powerful silicon — it reflects Microsoft Azure and NVIDIA’s shared commitment to optimize all parts of the modern AI data center,” said Nidhi Chappell, corporate vice president of Microsoft Azure AI Infrastructure. “Our collaboration helps ensure customers like OpenAI can deploy next-generation infrastructure at unprecedented scale and speed.” Inside the Engine: The NVIDIA GB300 NVL72 At the heart of Azure’s new NDv6 GB300 VM series is the liquid-cooled, rack-scale NVIDIA GB300 NVL72 system. Each rack is a powerhouse, integrating 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs into a single, cohesive unit to accelerate training and inference for massive AI models. The system provides a staggering 37 terabytes of fast memory and 1.44 exaflops of FP4 Tensor Core performance per VM, creating a massive, unified memory space essential for reasoning models, agentic AI systems and complex multimodal generative AI. NVIDIA Blackwell Ultra is supported by the full-stack NVIDIA AI platform, including collective communication libraries that tap into new formats like NVFP4 for breakthrough training performance, as well as compiler technologies like NVIDIA Dynamo for the highest inference performance in reasoning AI. The NVIDIA Blackwell Ultra platform excels at both training and inference. In the recent MLPerf Inference v5.1 benchmarks, NVIDIA GB300 NVL72 systems delivered record-setting performance using NVFP4. Results included up to 5x higher throughput per GPU on the 671-billion-parameter DeepSeek-R1 reasoning model compared with the NVIDIA Hopper architecture, along with leadership performance on all newly introduced benchmarks like the Llama 3.1 405B model. The Fabric of a Supercomputer: NVLink Switch and NVIDIA Quantum-X800 InfiniBand To connect over 4,600 Blackwell Ultra GPUs into a single, cohesive supercomputer, Microsoft Azure’s cluster relies on a two-tiered NVIDIA networking architecture designed for both scale-up performance within the rack and scale-out performance across the entire cluster. Within each GB300 NVL72 rack, the fifth-generation NVIDIA NVLink Switch fabric provides 130 TB/s of direct, all-to-all bandwidth between the 72 Blackwell Ultra GPUs. This transforms the entire rack into a single, unified accelerator with a shared memory pool — a critical design for massive, memory-intensive models. To scale beyond the rack, the cluster uses the NVIDIA Quantum-X800 InfiniBand platform, purpose-built for trillion-parameter-scale AI. Featuring NVIDIA ConnectX-8 SuperNICs and Quantum-X800 switches, NVIDIA Quantum-X800 provides 800 Gb/s of bandwidth per GPU, ensuring seamless communication across all 4,608 GPUs. Microsoft Azure’s cluster also uses NVIDIA Quantum-X800’s advanced adaptive routing, telemetry-based congestion control and performance isolation capabilities, as well as NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) v4, which accelerates operations to significantly boost the efficiency of large-scale training and inference. Driving the Future of AI Delivering the world’s first production NVIDIA GB300 NVL72 cluster at this scale required a reimagination of every layer of Microsoft’s data center — from custom liquid cooling and power distribution to a reengineered software stack for orchestration and storage. This latest milestone marks a big step forward in building the infrastructure that will unlock the future of AI. As Azure scales to its goal of deploying hundreds of thousands of NVIDIA Blackwell Ultra GPUs, even more innovations are poised to emerge from customers like OpenAI. Learn more about this announcement on the Microsoft Azure blog. View the full article
  15. The post 17-Year <i>Halo</i> Veteran Departs Saying He’ll Share His Story ‘When It Is Absolutely Safe To Do So’ appeared first on Kotaku. View the full article
  16. The post Prime Day’s Over, but Amazon Echo Dot Is Still Hanging On at Almost Free appeared first on Kotaku. View the full article
  17. The post <i>Battlefield 6</i>: The <i>Kotaku</i> Review appeared first on Kotaku. View the full article
  18. The post One Of 2025’s Most Promising Games Is Already Winding Down Before Leaving Early Access appeared first on Kotaku. View the full article
  19. The post Nintendo’s Short Films Feel Like They Could Be The Start Of Something Much Bigger appeared first on Kotaku. View the full article
  20. The post DJI Osmo Mobile Gimbal Stabilizer Falls to a Price Even Lower Than Prime Day, Feels Like Clearance appeared first on Kotaku. View the full article
  21. The post ASUS TUF Gaming Laptop (NVIDIA RTX 4050) Still at an All-Time Low With Hundreds Off, but Returning to Full Price Soon appeared first on Kotaku. View the full article
  22. The post The <em>Streets Of Rage 4</em> Team’s New Fantasy Beat-Em-Up Breathes Fresh Life Into The Genre appeared first on Kotaku. View the full article
  23. Lock, load and stream — the battle is just beginning. EA’s highly anticipated Battlefield 6 is set to storm the cloud when it launches tomorrow with GeForce RTX 5080 power, delivering the high-intensity combat and heart-pounding chaos the series is known for. Catch it as part of the six games joining the cloud, including Bethesda’s The Elder Scrolls III: Morrowind Game of the Year Edition and the launch of King of Meat. The Discord integration first unveiled at Gamescom is now live, debuting with Fortnite — showcasing how easy it is to discover games and stay connected with friends via Discord and GeForce NOW. Ashburn, Portland and London will soon be the latest regions to get GeForce RTX 5080-class power. Stay tuned to GFN Thursday for updates. Follow along with the latest progress on the server rollout page. Ashburn, Portland and London will be the next regions to light up with GeForce RTX 5080-class power. War Waits for No Respawn Turn cover into rubble — and rubble into cover. Battlefield 6 drops gamers into all-out warfare, where every shot and explosion can change the tide of the fight. With a new Kinesthetic Combat System, in-game movements and gunplay feel sharper and more instinctive than ever. Battle through Conquest and Breakthrough modes or a global campaign spanning Cairo to Gibraltar. The Phantom Edition packs exclusive skins, weapons, vehicle cosmetics and a Battle Pass so gamers are geared up from day one. When every second counts, GeForce NOW puts players in control with ultralow-latency streaming and razor-sharp frames — whether dodging artillery or rocketing off a grenade mid-air. Stream at up to 240 frames per second (fps) from almost any device and take the fight anywhere when Battlefield 6 joins the cloud at launch. Ready, Set, Click From chat to combat — instantly. NVIDIA and Discord have teamed to change how people discover and play games together, making it easier than ever for friends and communities to stay connected. Powered by GeForce NOW, the new integrated experience — first shown at Gamescom — lets anyone discover and try games like Fortnite directly on Discord with no downloads or installs needed, and even without a GeForce NOW membership. This next-level experience is fueled by a limited-time trial of the GeForce NOW Performance tier, letting anyone jump into streaming at up to 1440p and 60 fps — all without ever leaving Discord. For Discord’s hundreds of millions of users, it’s a fast, simple way to play together and keep the gaming conversation going, right where it’s already happening. The first playable game with this integration is Fortnite, rolling out to gamers now. A single click on a Discord chat link and an Epic Games account login are all it takes to join the action. The integration opens a new way to discover and play games, enabling publishers to create smooth gaming experiences for communities everywhere. Wander Where You Want To Adventure awaits. Bethesda’s The Elder Scrolls III: Morrowind Game of the Year Edition is the single-player role-playing game that defined open-world fantasy adventures. This edition bundles the original The Elder Scrolls III: Morrowind with its celebrated expansions, Tribunal and Bloodmoon, delivering the definitive version of the iconic game. Step into a rich world where every choice shapes destiny. Take on the epic quest to save Morrowind from a terrible blight, or carve a path across vast landscapes filled with ancient cities, dangerous dungeons and curious creatures. With the expansions, visit the mystical Clockwork City, brave frozen Solstheim, clash with werewolves and direct the rise of a mining colony, enjoying up to 80 hours of quests and stories. The game’s open-ended structure lets players build any character imaginable, offering unmatched depth and replayability. Jump into the world of Vvardenfell on nearly any device with GeForce NOW, from laptops and Macs to phones and TVs, and pick the adventure back up anytime, anywhere. Meaty New Games We got the meats. King of Meat is a wild and over-the-top brawler that doesn’t take itself too seriously — except when it comes to juicy combos and meaty takedowns. Part cooking show, part gladiator arena, it throws players into chaotic battles where the weapon of choice is meat. Whether tenderizing foes with a ham hock or unleashing absurd special moves, players can lean hard into the game’s goofy, carnivorous charm. It’s fast, silly and unashamedly over-seasoned fun. In addition, members can look for the following: Deathground (New release on Steam, Oct. 7) King of Meat (New release on Steam, Oct. 7) Seafarer: The Ship Sim (New release on Steam, Oct. 7) Little Nightmares III (New release on Steam, Oct. 9) Battlefield 6 (New release on Steam and EA app, Oct. 10, GeForce RTX 5080-ready) The Elder Scrolls III: Morrowind (Steam, Epic Games Store and Xbox, available on PC Game Pass) What are you planning to play this weekend? Let us know on X or in the comments below. If you could play one game RIGHT NOW, regardless of its release date, what game would it be and why? — NVIDIA GeForce NOW (@NVIDIAGFN) October 8, 2025 View the full article
  24. The post Apple Watch Series 10 Is Still Going at Prime Day’s Lowest Price as Amazon’s Clearing Out Leftovers appeared first on Kotaku. View the full article
  25. The post Seagate’s 2TB Xbox Expansion Card Is Just $0.10 per GB, Lowest Price Even After Prime Day appeared first on Kotaku. View the full article
×
×
  • Create New...

Important Information

By using this site, you agree to our Guidelines Privacy Policy.