India is steadily progressing in the field of artificial intelligence, demonstrating notable growth and innovation. Krutrim AI Labs, a part of the Ola Group, is one of the organizations actively contributing to this progress. Krutrim recently introduced Chitrarth-1, a Vision Language Model (VLM) developed specifically for India’s diverse linguistic and cultural landscape. The model supports 10 major Indian languages, including Hindi, Tamil, Bengali, Telugu, along with English, effectively addressing the varied needs of the country. This article explores Chitrarth-1 and India’s expanding capabilities in AI.
Chitrarth (derived from Chitra: Image and Artha: Meaning) is a 7.5 billion-parameter VLM that combines cutting-edge language and vision capabilities. Developed to serve India’s linguistic diversity, it supports 10 prominent Indian languages – Hindi, Bengali, Telugu, Tamil, Marathi, Gujarati, Kannada, Malayalam, Odia, and Assamese – alongside English.
This model is a testament to Krutrim’s mission: creating AI “for our country, of our country, and for our citizens.”
By leveraging a culturally rich and multilingual dataset, Chitrarth minimizes biases, enhances accessibility, and ensures robust performance across Indic languages and English. It stands as a step toward equitable AI advancements, making technology inclusive and representative for users in India and beyond.
Research behind Chitrarth-1 has been featured in prominent academic papers like “Chitrarth: Bridging Vision and Language for a Billion People” (NeurIPS) and “Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation” (Ninth Conference on Machine Translation).
Also Read: India’s AI Moment: Racing Against China and the U.S. in GenAI
Chitrarth builds on the Krutrim-7B LLM as its backbone, augmented by a vision encoder based on the SIGLIP (siglip-so400m-patch14-384) model. Its architecture includes:
This design ensures seamless integration of visual and linguistic data, enabling Chitrarth to excel in complex reasoning tasks.
Chitrarth’s training process unfolds in two stages, utilizing a diverse, multilingual dataset:
This two-step process equips Chitrarth to handle sophisticated multimodal tasks with cultural and linguistic nuance.
Also Read: Top 10 LLM That Are Bulit In India
Chitrarth has been rigorously evaluated against state-of-the-art VLMs like IDEFICS 2 (7B) and PALO 7B, consistently outperforming them on various benchmarks while remaining competitive on tasks like TextVQA and Vizwiz. It also surpasses LLaMA 3.2 11B Vision Instruct in key metrics.
Krutrim introduces BharatBench, a comprehensive evaluation suite for 10 under-resourced Indic languages across three tasks. Chitrarth’s performance on BharatBench sets a baseline for future research, showcasing its unique ability to handle all included languages. Below are sample results:
Language | POPE | LLaVA-Bench | MMVet |
---|---|---|---|
Telugu | 79.9 | 54.8 | 43.76 |
Hindi | 78.68 | 51.5 | 38.85 |
Bengali | 83.24 | 53.7 | 33.24 |
Malayalam | 85.29 | 55.5 | 25.36 |
Kannada | 85.52 | 58.1 | 46.19 |
English | 87.63 | 67.9 | 30.49 |
git clone https://github.com/ola-krutrim/Chitrarth.git
conda create --name chitrarth python=3.10
conda activate chitrarth
cd Chitrarth
pip install -e .
python chitrarth/inference.py --model-path "krutrim-ai-labs/Chitrarth" --image-file "assets/govt_school.jpeg" --query "Explain the image."
1. Image Analysis
2. Image Caption Generation
3. UI/UX Screen Analysis
Also Read: SUTRA-R0: India’s Leap into Advanced AI Reasoning
A part of the Ola Group, Krutrim is dedicated to creating the AI computing stack of tomorrow. Alongside Chitrarth, its offerings include GPU as a Service, AI Studio, Ola Maps, Krutrim Assistant, Language Labs, Krutrim Silicon, and Contact Center AI. With Chitrarth-1, Krutrim AI Labs sets a new standard for inclusive, culturally aware AI, paving the way for a more equitable technological future.
Stay updated with the latest happenings of the AI world with Analytics Vidhya News!