Meta open-sources multisensory AI model that combines six types of data

By Cindy Hopkins May 9, 2023 4 mins read 233 Views

Illustration by Alex Castro / The Verge

Meta has announced a new open-source AI model that links together multiple streams of data, including text, audio, visual data, temperature, and movement readings.

The model is only a research project at this point, with no immediate consumer or practical applications, but it points to a future of generative AI systems that can create immersive, multisensory experiences and shows that Meta continues to share AI research at a time when rivals like OpenAI and Google have become increasingly secretive.

The core concept of the research is linking together multiple types of data into a single multidimensional index (or “embedding space,” to use AI parlance). This idea may seem a little abstract, but it’s this same concept that underpins the recent boom in generative AI.

For example, AI image generators like DALL-E, Stable Diffusion, and Midjourney all rely on systems that link together text and images during the training stage. They look for patterns in visual data while connecting that information to descriptions of the images. That’s what then enables these systems to generate pictures that follow users’ text inputs. The same is true of many AI tools that generate video or audio in the same way.

Meta says that its model, ImageBind, is the first to combine six types of data into a single embedding space. The six types of data included in the model are: visual (in the form of both image and video); thermal (infrared images); text; audio; depth information; and — most intriguing of all — movement readings generated by an inertial measuring unit, or IMU. (IMUs are found in phones and smartwatches, where they’re used for a range of tasks, from switching a phone from landscape to portrait to distinguishing between different types of physical activity.)

A screenshot from Meta’s blog post showing different types of linked data, e.g., a picture of a train, audio of a train horn, and depth information about a train’s 3D shape. — *Meta’s ImageBind model combines six types of data: audio, visual, text, depth, temperature, and movement.*

The idea is that future AI systems will be able to cross-reference this data in the same way that current AI systems do for text inputs. Imagine, for example, a futuristic virtual reality device that not only generates audio and visual input but also your environment and movement on a physical stage. You might ask it to emulate a long sea voyage, and it would not only place you on a ship with the noise of the waves in the background but also the rocking of the deck under your feet and the cool breeze of the ocean air.

In a blog post, Meta notes that other stream of sensory input could be added to future models, including “touch, speech, smell, and brain fMRI signals.” It also claims the research “brings machines one step closer to humans’ ability to learn simultaneously, holistically, and directly from many different forms of information.” (Which, sure, whatever. Depends how small these steps are.)

This is all very speculative, of course, and it’s likely that the immediate applications of research like this will be much more limited. For example, last year, Meta demonstrated an AI model that generates short and blurred videos from text descriptions. Work like ImageBind shows how future versions of the system could incorporate other streams of data, generating audio to match the video output, for example.

For industry watchers, though, the research is also interesting as Meta is open-sourcing the underlying model — an increasingly scrutinized practice in the world of AI.

Those opposed to open-sourcing, like OpenAI, say the practice is harmful to creators because rivals can copy their work and that it could be potentially dangerous, allowing malicious actors to take advantage of state-of-the-art AI models. Advocates respond that open-sourcing allows third parties to scrutinize the systems for faults and ameliorate some of their failings. They note it may even provide a commercial benefit, as it essentially allows companies to recruit third-party developers as unpaid workers to improve their work.

Meta has so far been firmly in the open-source camp, though not without difficulties. (Its latest language model, LLaMA, leaked online earlier this year, for example.) In many ways, its lack of commercial achievement in AI (the company has no chatbot to rival Bing, Bard, or ChatGPT) has enabled this approach. And for the meantime, with ImageBind, it’s continuing with this strategy.

------------
Read More
By: James Vincent
Title: Meta open-sources multisensory AI model that combines six types of data
Sourced From: www.theverge.com/2023/5/9/23716558/meta-imagebind-open-source-multisensory-modal-ai-model-research
Published Date: Tue, 09 May 2023 16:00:00 +0000

Did you miss our previous article...
https://trendinginbusiness.business/technology/logic-pro-for-ipad-release-date-features-compatibility-price

aimodel data

Detroit Cops Suspended for Calling Border Patrol During Traffic Stops in Violation of Policies, Chief Says His Goal is to Fire Them

February 22, 2026 7 Views

US MENS HOCKEY WINS GOLD IN MILAN FOR AMERICA

February 22, 2026 6 Views

Tech prices are going up, but these MacBooks are still under $200

February 22, 2026 6 Views

They Built the Most Colorful Home on the Block. What Do the Neighbors Think?

February 22, 2026 6 Views

Maine’s New “Gym Membership” Law: 5 Hidden Fees That Are Now Illegal in 2026

February 22, 2026 6 Views

Record January surplus boosts public finances as tax receipts surge

February 22, 2026 6 Views

Detroit Cops Suspended for Calling Border Patrol During Traffic Stops in Violation of Policies, Chief Says His Goal is to Fire Them

US MENS HOCKEY WINS GOLD IN MILAN FOR AMERICA

Tech prices are going up, but these MacBooks are still under $200

They Built the Most Colorful Home on the Block. What Do the Neighbors Think?

Maine’s New “Gym Membership” Law: 5 Hidden Fees That Are Now Illegal in 2026

Meta open-sources multisensory AI model that combines six types of data

Latest Posts

Detroit Cops Suspended for Calling Border Patrol During Traffic Stops in Violation of Policies, Chief Says His Goal is to Fire Them

US MENS HOCKEY WINS GOLD IN MILAN FOR AMERICA

Tech prices are going up, but these MacBooks are still under $200

They Built the Most Colorful Home on the Block. What Do the Neighbors Think?

Maine’s New “Gym Membership” Law: 5 Hidden Fees That Are Now Illegal in 2026

Record January surplus boosts public finances as tax receipts surge

Categories

Trending Posts

Tom Basile of Newsmax and Podcaster Chuck Todd on the Economy Deciding Midterms (VIDEO)

Keep calm and carry on: Mortgage rates steady as inflation cools

Latest Posts

Top 10 Business Intelligence Trends

Social License to Operate

Top Business Trends That Will Shape The World

Most Shared

4 Multifamily Real Estate Trends For 2022, With Scott Hawksworth

4 Ways to Promote Art Businesses Using Social Media Technology

8 Steps To An Effective Social Media Strategy

Popular Tags

Newsletter

Meta open-sources multisensory AI model that combines six types of data

Share This

Latest Posts

Detroit Cops Suspended for Calling Border Patrol During Traffic Stops in Violation of Policies, Chief Says His Goal is to Fire Them

US MENS HOCKEY WINS GOLD IN MILAN FOR AMERICA

Tech prices are going up, but these MacBooks are still under $200

They Built the Most Colorful Home on the Block. What Do the Neighbors Think?

Maine’s New “Gym Membership” Law: 5 Hidden Fees That Are Now Illegal in 2026

Record January surplus boosts public finances as tax receipts surge

Categories

Trending Posts