Chitrarth-1: A Multilingual Vision Language Model by Krutrim AI Labs

Nitika Sharma Last Updated : 02 Mar, 2025

4 min read

India is steadily progressing in the field of artificial intelligence, demonstrating notable growth and innovation. Krutrim AI Labs, a part of the Ola Group, is one of the organizations actively contributing to this progress. Krutrim recently introduced Chitrarth-1, a Vision Language Model (VLM) developed specifically for India’s diverse linguistic and cultural landscape. The model supports 10 major Indian languages, including Hindi, Tamil, Bengali, Telugu, along with English, effectively addressing the varied needs of the country. This article explores Chitrarth-1 and India’s expanding capabilities in AI.

What is Chitrarth?
Chitrarth Architecture and Parameters
Training Data and Methodology
- Stage 1: Adapter Pre-Training (PT)
- Stage 2: Instruction Tuning (IT)
Performance and Evaluation
How to Access Chitrarth?
Chitrarth-1 Examples
End Note

What is Chitrarth?

Chitrarth (derived from Chitra: Image and Artha: Meaning) is a 7.5 billion-parameter VLM that combines cutting-edge language and vision capabilities. Developed to serve India’s linguistic diversity, it supports 10 prominent Indian languages – Hindi, Bengali, Telugu, Tamil, Marathi, Gujarati, Kannada, Malayalam, Odia, and Assamese – alongside English.

This model is a testament to Krutrim’s mission: creating AI “for our country, of our country, and for our citizens.”

By leveraging a culturally rich and multilingual dataset, Chitrarth minimizes biases, enhances accessibility, and ensures robust performance across Indic languages and English. It stands as a step toward equitable AI advancements, making technology inclusive and representative for users in India and beyond.

Research behind Chitrarth-1 has been featured in prominent academic papers like “Chitrarth: Bridging Vision and Language for a Billion People” (NeurIPS) and “Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation” (Ninth Conference on Machine Translation).

Also Read: India’s AI Moment: Racing Against China and the U.S. in GenAI

Chitrarth Architecture and Parameters

Chitrarth builds on the Krutrim-7B LLM as its backbone, augmented by a vision encoder based on the SIGLIP (siglip-so400m-patch14-384) model. Its architecture includes:

A pretrained SIGLIP vision encoder to extract image features.
A trainable linear mapping layer that projects these features into the LLM’s token space.
Fine-tuning with instruction-following image-text datasets for enhanced multimodal performance.

This design ensures seamless integration of visual and linguistic data, enabling Chitrarth to excel in complex reasoning tasks.

Training Data and Methodology

Chitrarth’s training process unfolds in two stages, utilizing a diverse, multilingual dataset:

Stage 1: Adapter Pre-Training (PT)

Pre-trained on a carefully selected dataset, translated into multiple Indic languages using an open-source model.
Maintains a balanced split between English and Indic languages to ensure linguistic diversity and equitable performance.
Prevents bias toward any single language, optimizing for computational efficiency and robust capabilities.

Stage 2: Instruction Tuning (IT)

Fine-tuned on a complex instruction dataset to boost multimodal reasoning.
Incorporates an English-based instruction-tuning dataset and its multilingual translations.
Includes a vision-language dataset with academic tasks and culturally diverse Indian imagery, such as:
- Prominent personalities
- Monuments
- Artwork
- Culinary dishes
Features high-quality proprietary English text data, ensuring balanced representation across domains.

This two-step process equips Chitrarth to handle sophisticated multimodal tasks with cultural and linguistic nuance.

Also Read: Top 10 LLM That Are Bulit In India

Performance and Evaluation

Chitrarth has been rigorously evaluated against state-of-the-art VLMs like IDEFICS 2 (7B) and PALO 7B, consistently outperforming them on various benchmarks while remaining competitive on tasks like TextVQA and Vizwiz. It also surpasses LLaMA 3.2 11B Vision Instruct in key metrics.

BharatBench: A New Standard

Krutrim introduces BharatBench, a comprehensive evaluation suite for 10 under-resourced Indic languages across three tasks. Chitrarth’s performance on BharatBench sets a baseline for future research, showcasing its unique ability to handle all included languages. Below are sample results:

Language	POPE	LLaVA-Bench	MMVet
Telugu	79.9	54.8	43.76
Hindi	78.68	51.5	38.85
Bengali	83.24	53.7	33.24
Malayalam	85.29	55.5	25.36
Kannada	85.52	58.1	46.19
English	87.63	67.9	30.49

To know more click here.

How to Access Chitrarth?

Hugging Face: Available for direct use or fine-tuning. (Click here to visit)
GitHub:

git clone https://github.com/ola-krutrim/Chitrarth.git  
conda create --name chitrarth python=3.10  
conda activate chitrarth  
cd Chitrarth  
pip install -e .  
python chitrarth/inference.py --model-path "krutrim-ai-labs/Chitrarth" --image-file "assets/govt_school.jpeg" --query "Explain the image."

Krutrim Cloud: Click here to explore

Chitrarth-1 Examples

1. Image Analysis

2. Image Caption Generation

3. UI/UX Screen Analysis

Also Read: SUTRA-R0: India’s Leap into Advanced AI Reasoning

End Note

A part of the Ola Group, Krutrim is dedicated to creating the AI computing stack of tomorrow. Alongside Chitrarth, its offerings include GPU as a Service, AI Studio, Ola Maps, Krutrim Assistant, Language Labs, Krutrim Silicon, and Contact Center AI. With Chitrarth-1, Krutrim AI Labs sets a new standard for inclusive, culturally aware AI, paving the way for a more equitable technological future.

Stay updated with the latest happenings of the AI world with Analytics Vidhya News!

Nitika Sharma

Hello, I am Nitika, a tech-savvy Content Creator and Marketer. Creativity and learning new things come naturally to me. I have expertise in creating result-driven content strategies. I am well versed in SEO Management, Keyword Operations, Web Content Writing, Communication, Content Strategy, Editing, and Writing.

News

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Chitrarth-1: A Multilingual Vision Language Model by Krutrim AI Labs

Table of contents

What is Chitrarth?

Chitrarth Architecture and Parameters