Multimodal: AI’s new frontier

By Reggie Thompson May 22, 2024 2 mins read 259 Views

Multimodality is a relatively new term for something extremely old: how people have learned about the world since humanity appeared. Individuals receive information from myriad sources via their senses, including sight, sound, and touch. Human brains combine these different modes of data into a highly nuanced, holistic picture of reality.

“Communication between humans is multimodal,” says Jina AI CEO Han Xiao. “They use text, voice, emotions, expressions, and sometimes photos.” That’s just a few obvious means of sharing information. Given this, he adds, “it is very safe to assume that future communication between human and machine will also be multimodal.”

A technology that sees the world from different angles

We are not there yet. The furthest advances in this direction have occurred in the fledgling field of multimodal AI. The problem is not a lack of vision. While a technology able to translate between modalities would clearly be valuable, Mirella Lapata, a professor at the University of Edinburgh and director of its Laboratory for Integrated Artificial Intelligence, says “it’s a lot more complicated” to execute than unimodal AI.

DOWNLOAD THE REPORT

In practice, generative AI tools use different strategies for different types of data when building large data models—the complex neural networks that organize vast amounts of information. For example, those that draw on textual sources segregate individual tokens, usually words. Each token is assigned an “embedding” or “vector”: a numerical matrix representing how and where the token is used compared to others. Collectively, the vector creates a mathematical representation of the token’s meaning. An image model, on the other hand, might use pixels as its tokens for embedding, and an audio one sound frequencies.

A multimodal AI model typically relies on several unimodal ones. As Henry Ajder, founder of AI consultancy Latent Space, puts it, this involves “almost stringing together” the various contributing models. Doing so involves various techniques to align the elements of each unimodal model, in a process called fusion. For example, the word “tree”, an image of an oak tree, and audio in the form of rustling leaves might be fused in this way. This allows the model to create a multifaceted description of reality.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

------------
Read More
By: MIT Technology Review Insights
Title: Multimodal: AI’s new frontier
Sourced From: www.technologyreview.com/2024/05/08/1092009/multimodal-ais-new-frontier/
Published Date: Wed, 08 May 2024 13:00:00 +0000

Did you miss our previous article...
https://trendinginbusiness.business/technology/sonos-says-redesigned-app-took-courage-as-users-freak-out-over-missing-features

Multimodal communication

The A19, N1, and C1X: The drumbeat of impressive Apple silicon continues

July 29, 2026 8 Views

Ask a Realtor: I’m a First-Time Homebuyer. Should I Even Bother?

July 29, 2026 9 Views

The iPhone Air is a stepping stone to something even better

July 29, 2026 10 Views

This Eye-Popping Madrid Apartment Overachieves on Its Only Goal

July 29, 2026 11 Views

Universal Credit families to get £4,500 apprenticeship bursary

July 29, 2026 11 Views

The Download: America’s gun crisis, and how AI video models work

July 29, 2026 6 Views

The A19, N1, and C1X: The drumbeat of impressive Apple silicon continues

Ask a Realtor: I’m a First-Time Homebuyer. Should I Even Bother?

The iPhone Air is a stepping stone to something even better

This Eye-Popping Madrid Apartment Overachieves on Its Only Goal

Universal Credit families to get £4,500 apprenticeship bursary

Multimodal: AI’s new frontier

A technology that sees the world from different angles

Latest Posts

The A19, N1, and C1X: The drumbeat of impressive Apple silicon continues

Ask a Realtor: I’m a First-Time Homebuyer. Should I Even Bother?

The iPhone Air is a stepping stone to something even better

This Eye-Popping Madrid Apartment Overachieves on Its Only Goal

Universal Credit families to get £4,500 apprenticeship bursary

The Download: America’s gun crisis, and how AI video models work

Categories

Trending Posts

Universal Credit families to get £4,500 apprenticeship bursary

This Eye-Popping Madrid Apartment Overachieves on Its Only Goal

Latest Posts

Top 10 Business Intelligence Trends

Social License to Operate

Top Business Trends That Will Shape The World

Most Shared

4 Multifamily Real Estate Trends For 2022, With Scott Hawksworth

4 Ways to Promote Art Businesses Using Social Media Technology

8 Steps To An Effective Social Media Strategy

Popular Tags

Newsletter

Multimodal: AI’s new frontier

A technology that sees the world from different angles

Share This

Latest Posts

The A19, N1, and C1X: The drumbeat of impressive Apple silicon continues

Ask a Realtor: I’m a First-Time Homebuyer. Should I Even Bother?

The iPhone Air is a stepping stone to something even better

This Eye-Popping Madrid Apartment Overachieves on Its Only Goal

Universal Credit families to get £4,500 apprenticeship bursary

The Download: America’s gun crisis, and how AI video models work

Categories

Trending Posts