Nvidia releases its own brand of world models | TechCrunch

Photo of author

By admin


Nvidia is entering model of the world — AI models that take inspiration from people's naturally developed mental models of the world.

At CES 2025 in Las Vegas, The company announced that it is making publicly available a family of world models that can make predictions and produce “physics-aware” videos. Nvidia is calling this family the Cosmos World Foundation Model, or Cosmos WFM for short.

Models, which can be fine-tuned for specific applications, are available from Nvidia's API and NGC catalog, GitHub, and the AI ​​dev platform Hugging Face.

“Nvidia is making available the first wave of Cosmos WFM for physics-based simulation and synthetic data generation,” the company wrote in a blog post to TechCrunch. “Researchers and developers, regardless of their company size, can use Cosmos models freely under Nvidia's permissive Open Model License that allows commercial use.”

Nvidia Cosmos WFM model
Output from one of Nvidia's Cosmos models.Image credit:Nvidia

The Cosmos WFM family consists of several models, divided into three categories: Nano for low latency and real-time applications, Super for “high performance baseline” models, and Ultra for the highest quality and reliable output.

Models range in size from 4 billion to 14 billion parameters, with nano being the smallest and ultra the largest. Parameters roughly correspond to a model's problem-solving ability, and models with more parameters generally perform better than models with fewer parameters.

As part of Cosmos WFM, Nvidia is releasing an “upsampling model,” a video decoder optimized for augmented reality, and fine-tuned for applications such as guardrail models to ensure responsible use, as well as generating sensor data for autonomous vehicle development. -tuned models. . These, as well as other Cosmos WFM models, were trained on 9,000 trillion tokens from 20 million hours of real-world human interaction, environmental, industrial, robotics and driving data, Nvidia said. (In AI, “tokens” represent bits of raw data — in this case, video footage.)

Nvidia won't say where this training data came from, but at least one report — and casecomplaint A company that has trained on copyrighted YouTube videos without permission.

When reached for comment, an Nvidia spokesperson told TechCrunch that Cosmos was “not designed to copy or infringe any protected work.”

“The cosmos learns just as humans learn,” the spokesperson said. “To help Cosmos learn, we collect data from a variety of public and private sources and ensure that our use of data complies with the letter and spirit of the law. The information about how the world works — which Cosmos models learn — is not copyrightable or owned by any individual author or company. Not a matter of control.”

Setting aside models like Cosmos People will not learn the way they learnCopyright experts say claims like Nvidia's, from which the support comes Fair use is a legal doctrine, Cannot stand up to judicial scrutiny. Whether these companies prevail will largely depend on how courts decide fair use, which allows copyrighted works to be innovative as long as it's transformative, applies to AI training.

Nvidia claims that Cosmos can generate “controllable, high-quality” synthetic data to bootstrap training models for WFM models, given text or video frames, robotics, driverless cars, and more.

Nvidia Cosmos WFM model
Cosmos can simulate realistic factory floors.Image credit:Nvidia

“Nvidia Cosmos' suite of open models means developers can customize WFMs with data sets, such as videos recording autonomous vehicles traveling or robots navigating a warehouse,” Nvidia wrote in a press release. “Cosmos WFMs are purpose-built for physical AI research and development, and can generate physics-based video from a combination of inputs such as text, images and video, as well as robot sensors or motion data.”

Nvidia says companies including Wabi, Wave, Fortelix and Uber have already committed to piloting Cosmos WFM for a variety of use cases, from video search and curation to creating AI models for self-driving vehicles.

“Productive AI will power the future of mobility, which requires both rich data and extremely powerful computations,” Uber CEO Dara Khosrowshahi said in a statement. “By working with Nvidia, we're confident we can help supercharge the timeline for safe and scalable autonomous driving solutions for the industry.”

It is important to note that Nvidia's world models are not “open source” in the strict sense. Obey one Widely accepted definition In “open source” AI, an AI model must provide sufficient information about its design so that a person can “significantly” recreate it and disclose relevant details about its training data, including its source and how the data was obtained; or Licensed

Nvidia Cosmos has not released details on WFM training, nor has it made available all the tools needed to recreate the models from scratch. Perhaps that's why the tech giant is referring to the models as “open” as opposed to open source.

“We really hope so [Cosmos will] “What Lama … has done for the world of robotics and industrial AI is for the enterprise,” Nvidia CEO Jensen Huang said onstage during a press event Monday.



Source link

Leave a Comment