Microsoft Unveils Multimodal AI Capabilities to the Masses With JARVIS

K.C. Sabreena Basheer Last Updated : 18 Apr, 2023
3 min read

Microsoft has recently unveiled an innovative multimodal AI-powered platform known as JARVIS. The AI can connect and collaborate with multiple artificial intelligence models, such as ChatGPT and t5-base, to deliver a final result. With a demo hosted on the popular AI platform Huggingface, users can now explore and test JARVIS’s extraordinary capabilities.

Microsoft introduced JARVIS, an AI-powered multimodal platform, on HuggingFace.
Also Read: Microsoft Integrates ChatGPT into Windows OS for Enhanced AI Experience

JARVIS: Microsoft’s Advanced AI System Taking AI Collaboration to the Next Level

Microsoft is currently working on an innovative AI system called JARVIS that links multiple AI models and delivers a unified result. The project, hosted on GitHub, showcases Microsoft’s unique collaborative approach to developing AI solutions. The system functions with ChatGPT as the task controller, directing the operation.

To witness JARVIS’s capabilities firsthand, users can visit Huggingface, where Microsoft hosts a demo of this powerful AI system.

Also Read: Microsoft Releases VisualGPT: Combines Language and Visuals

Multimodal AI Integration: The Future of AI Task Management

You can test the endless possibilities of Microsoft JARVIS on HuggingFace.

JARVIS extends OpenAI’s GPT-4 multimodal capabilities, demonstrated through text and image processing, by incorporating various open-source LLMs for images, videos, audio, and more. Additionally, it connects to the internet, allowing access to files and data from various sources.

This innovative approach enables users to add multiple tasks in a single query. For example, asking JARVIS to create an image of an alien invasion and write poetry about it would result in ChatGPT analyzing the request, planning the task, selecting the appropriate model (hosted on Huggingface), and executing the task. The chosen model completes the task and returns the results to ChatGPT.

Also Read: Microsoft Loop: The Collaboration Revolution Your Team Can’t Afford to Miss

A Network of 20 Powerful Models Linked to JARVIS

JARVIS, or HuggingGPT, is connected to as many as 20 different models, including t5-base, stable-diffusion 1.5, bert, Facebook’s bart-large-cnn, Intel’s dpt-large, and more. Users interested in experiencing multimodal capabilities can check out Microsoft JARVIS without delay.

While JARVIS has been tested multiple times and shown to perform exceptionally well, it requires a significant amount of resources, including at least 16GB of VRAM and around 300GB of storage space for various models. Consequently, JARVIS cannot be run locally on an average PC.

Jarvis can simultaneously connect and collaborate with 20 different AI models, such as ChatGPT and T5-base.

Huggingface Queue and Subscription Requirements

At present, users cannot clone JARVIS on Huggingface under a free account and bypass the queue. To run the powerful model on an Nvidia A10G, a large GPU costing $3.15 per hour, users need to subscribe to Huggingface’s services.

Despite these limitations, Microsoft’s JARVIS project marks a significant step in the advancement of AI systems, bringing the power of multimodal AI capabilities and collaboration to the masses. Its potential to revolutionize the way we interact with and utilize AI technology is undeniable, and its development will undoubtedly continue to push the boundaries of what is possible in the field of artificial intelligence.

Sabreena Basheer is an architect-turned-writer who's passionate about documenting anything that interests her. She's currently exploring the world of AI and Data Science as a Content Manager at Analytics Vidhya.

Responses From Readers

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details