Top 13 Small Language Models (SLMs)

Yashashwy Alok Last Updated : 05 Feb, 2025

16 min read

This year, large language models (LLMs) likeOpenAI’s o1 have dominated the headlines, showcasing their remarkable capabilities in natural language understanding and generation. However, not every application requires the immense computational power or the hefty size of these behemoths. Enter small language models — compact, efficient, and tailored solutions for tasks that demand high performance on a budget of computational resources.

Small language models are designed to strike a balance between capability and efficiency. By optimizing model size and architecture, they offer lightweight solutions ideal for edge devices, resource-constrained environments, or applications requiring faster inference. From powering mobile applications to providing offline NLP functionalities, these models are reshaping the landscape of AI by making advanced language technologies more accessible.

In this blog, we’ll explore the top 13 small language models that deliver impressive results while staying compact. Whether you’re a developer looking for lightweight solutions or a researcher exploring efficient NLP, this list highlights models that prove that bigger isn’t always better. Let’s dive in and discover how small models are making a big impact!

Versatile Multi-Task Performance (Translation, Summarization, Q&A)
Reasoning-Heavy Tasks
- o3-mini
- Phi-4
Text Generation
- DistilGPT-2
- SmolLM
General NLU (Text Classification, Sentiment Analysis, Named Entity Recognition)
Frequently Asked Questions

If you want to know about Small Language Models in more detail, here is a resource for you: What are Small Language Models (SLMs)? Let us now look at the top 13 small language models.

Versatile Multi-Task Performance (Translation, Summarization, Q&A)

T5

The T5 (Text-To-Text Transfer Transformer) model is a versatile language model introduced by Google Research. It is designed with a unified framework where all NLP tasks are framed as a text-to-text problem. This approach enables the model to handle a variety of tasks, such as translation, summarization, and question-answering, using a single architecture and training process.

Size of Parameters

T5 is available in various sizes, ranging from small to extra-large configurations. The smaller versions include models like T5-Small with 60 million parameters and T5-Base with 220 million parameters. Larger configurations, such as T5-Large and T5-3B, offer 770 million and 3 billion parameters, respectively, while T5-11B, the largest variant, boasts 11 billion parameters. This scalability allows T5 to cater to both resource-constrained environments and high-performance tasks.

Architecture

The architecture of T5 is based on the Transformer model, utilizing both encoder and decoder components. Its design emphasizes flexibility, as it reframes input and output for any task into a text sequence. Thus allowing T5 to excel in fine-tuning for diverse NLP applications. The model incorporates pre-training on a diverse dataset, using objectives like a modified span-based corruption task, which enhances its understanding of language and context.

Availability

T5 is open-source and freely available to the research and developer community under the Apache 2.0 license. Its implementation and pre-trained weights can be accessed through platforms like TensorFlow and Hugging Face’s Transformers library. This open access has facilitated widespread experimentation and adoption in the NLP domain.

Qwen-2

Qwen-2 is a small language model designed to provide efficient natural language processing capabilities with a focus on computational resource optimization. Developed with cutting-edge machine learning techniques, Qwen-2 demonstrates strong capabilities across text generation, classification, summarization, and other NLP tasks, making it suitable for applications in diverse domains. Its modular architecture and lightweight design make it ideal for developers seeking performance on constrained hardware.

Size of Parameters

Qwen-2 is available in multiple parameter configurations to cater to varied use cases. The smaller version, with approximately 3 billion parameters, is optimized for edge devices and environments with limited computational power. For more demanding applications, a mid-sized variant with 7 billion parameters offers a balance between performance and resource requirements. At the upper end, the 13 billion parameter version is designed for applications requiring higher accuracy and complex task-handling capabilities, competing with larger language models while maintaining efficiency.

Architecture

The architecture of Qwen-2 is based on an advanced Transformer model, employing state-of-the-art techniques like multi-head self-attention and feed-forward neural networks. It incorporates optimizations such as rotary positional embeddings and adaptive pre-normalization to enhance both inference speed and training stability. The architecture is highly modular, enabling scalability and compatibility with a range of pretraining and fine-tuning frameworks. These features ensure Qwen-2’s robustness and adaptability in real-world deployments.

Availability

Qwen-2 is open-source and freely available for use, with certain advanced features accessible through a subscription-based tier. This ensures that developers and organizations of all scales can access and integrate the model into their projects.

Llama 3.2

Llama 3.2 is a compact yet powerful language model designed to cater to various natural language processing tasks while maintaining efficiency and adaptability. This model is part of the Llama series, which emphasizes high performance combined with resource efficiency, making it suitable for applications requiring lower computational overhead without sacrificing accuracy.

Size of Parameters

Llama 3.2 comes in several parameter configurations, allowing users to select the version that best meets their needs. These configurations typically range from a lightweight version with 1,3 billion parameters for mobile and edge deployments to a more robust version with 13 billion parameters for server-side applications. This scalability ensures the model can handle tasks of varying complexity while remaining efficient.

Architecture

The LLaMA 3.2 architecture begins with token embeddings and employs Grouped Query Attention, incorporating Rotary Positional Embedding (RoPE) for enhanced context encoding. RMS normalization is applied before attention and feedforward operations, stabilizing learning. Feed Forward networks utilize SwiGLU activations for efficient non-linear transformations. The architecture includes multiple stacked layers (repeated NNN-times), concluding with an RMS norm, linear layer, and softmax for output probabilities. Thus streamlining design balances computational efficiency with state-of-the-art performance, optimized for large-scale language modeling tasks.

Availability

Llama 3.2 is an open-source language model, making it accessible to a wide audience. It includes a free tier that allows users to experiment with its capabilities without incurring costs. Additionally, it offers extended features and enterprise-level support through paid licensing, catering to both individual developers and organizations.

Also Read: 3 Ways to Run Llama 3.2 on Your Device

Mistral Nemo

Mistral Nemo is a compact and efficient language model. Developers focused on delivering high-quality language understanding and generation capabilities while maintaining scalability and speed. Built to support diverse applications, it emphasizes efficiency in performance and ease of integration into various systems.

Size of Parameters

Mistral Nemo is available in multiple configurations, catering to a range of use cases. The model comes in sizes including 1.3 billion, 7 billion, and 13 billion parameters, allowing users to balance computational resource requirements with model complexity and performance. Each size variant optimizes specific scenarios, ranging from lightweight applications to those requiring deeper linguistic nuance.

Architecture

The architecture of Mistral Nemo is grounded in transformer-based design principles. Leveraging advancements in transformer models, Mistral Nemo incorporates innovations such as optimized attention mechanisms and enhanced token embeddings, ensuring efficient memory usage and computational throughput. The architecture is structured to maximize performance on both single-node and distributed setups, making it highly adaptable for diverse workloads.

Availability

Mistral Nemo is open-source, providing developers with free access to the model and its underlying codebase. This accessibility enables extensive customization and integration for various applications.

Mistral Small 3

The Mistral Small 3 model is a cutting-edge language model designed to provide robust language and instruction-following capabilities. It is engineered to handle approximately 80% of generative AI tasks with a relatively modest hardware requirement, making it an attractive option for a wide range of applications.

Size of Parameters

Mistral Small 3 boasts 24 billion parameters, which enables it to achieve performance competitive with much larger models, such as Llama 3.3 70B and Qwen 32B. Despite its significant parameter count, the model can be deployed locally on a single Nvidia RTX 4090 or a Macbook with 32GB of RAM, making it accessible to developers and researchers with relatively modest hardware.

Architecture

The architecture of Mistral Small 3 is designed with fewer layers compared to competing models, which reduces the time per forward pass and optimizes the model for low-latency performance across common AI tasks. The model is available in both pre-trained and instruction-tuned versions, and it is not trained with reinforcement learning or synthetic data. This design approach allows Mistral Small 3 to excel in a variety of NLP tasks while maintaining a relatively small footprint.

Availability

Mistral Small 3 is released under the Apache 2.0 license, making it freely available to the research and developer community. The model can be deployed across different computing environments, and its pre-trained and instruction-tuned checkpoints are accessible through various platforms such as Huggingface, Ollama, Kaggle. Performance highlights of Mistral Small 3 include an impressive 81% accuracy on the MMLU benchmark, a latency of 150 tokens/s, and a speed that is more than 3x faster than comparable larger models.

Reasoning-Heavy Tasks

o3-mini

The o3-mini model is a revolutionary small language model that has made significant strides in the field of artificial intelligence. Designed as a compact model with a reduced parameter count, o3-mini achieves high performance despite its smaller size, making it an attractive option for devices with limited computational resources, including smartphones.

Size of Parameters

o3-mini boasts a significantly reduced parameter count, allowing it to run efficiently on devices with limited computational resources. This compact design enables the model to achieve high performance while minimizing computational requirements, making it an ideal choice for a wide range of applications. Despite its smaller size, o3-mini demonstrates strong performance, outperforming even larger models like the full o1 on most benchmarks.

Architecture

As part of OpenAI’s reasoning model series, o3-mini supports text inputs and outputs, and offers adjustable reasoning levels (low, medium, high). Because of its adaptability, the model may be customized for particular tasks and applications, making it a useful tool for academics and developers. O3-mini is a desirable choice for a variety of use cases because of its architecture, which is meant to deliver excellent performance while keeping a comparatively compact footprint.

Availability

Released on January 31, 2025, o3-mini is accessible via ChatGPT, OpenAI API, Microsoft Azure OpenAI Service, and Open Router, among other platforms. The model is available to a broad spectrum of consumers and applications due to its three cost variations: “Low,” “Medium,” and “High.” O3-mini, which has an October 2023 knowledge cutoff, is a big step toward AI models that are more effective and approachable.

Phi-4

Phi-4 is a 14-billion parameter language model developed by Microsoft Research. It is designed to excel in reasoning tasks while maintaining computational efficiency. This model builds on the Phi family of models and incorporates advanced techniques in data generation and refinement to deliver high performance on reasoning-focused tasks. Unlike many larger models, Phi-4 aims to strike a balance between capability and resource efficiency. Hence making it a practical tool for real-world applications.

Parameter Sizes

The Phi-4 model features 14 billion parameters. This is a deliberate choice that aligns with its focus on reasoning efficiency and reduced computational demands. This size is optimized to outperform larger models such as GPT-4 and Llama-3 in specific benchmarks. Therefore, showcasing the potential of compact architectures when paired with innovative training methodologies.

Architecture and Training

Phi-4’s architecture is tailored to enhance reasoning and problem-solving. Key elements of its training process include the use of synthetic data generated through multi-agent prompting and instruction reversal, which helps create datasets rich in structured, real-world scenarios. Post-training refinements, such as rejection sampling and Direct Preference Optimization (DPO), further improve the model’s logical consistency and usability. Additionally, the model’s context length increased from 4,000 to 16,000 tokens during midtraining, enabling it to handle complex, long-chain reasoning tasks effectively.

Availability

Phi-4 is currently not open-source and remains a proprietary model. Details on access, including any free or limited-tier usage options, remain undisclosed, suggesting that it primarily targets specific research and enterprise applications.

Text Generation

DistilGPT-2

DistilGPT-2 is a smaller and more efficient version of OpenAI’s GPT-2 model, developed to offer a lighter alternative for applications requiring lower computational resources. By leveraging knowledge distillation techniques, DistilGPT-2 retains most of GPT -2’s capabilities while significantly reducing its size. This makes it a practical choice for tasks like text generation, summarization, and conversational agents where performance and resource efficiency are critical.

Size of Parameters

DistilGPT-2 is designed with approximately half the number of parameters compared to its parent model, GPT-2. While GPT-2 itself has multiple variants ranging from 117M to 1.5B parameters, DistilGPT-2 typically corresponds to the 82M parameter range, striking a balance between performance and computational efficiency. This reduction is achieved without a substantial compromise in the model’s understanding or generation capabilities, owing to the knowledge distillation process.

Architecture

DistilGPT-2 maintains a similar architecture to GPT-2, built upon the Transformer model. It uses multi-head self-attention layers and feed-forward neural networks to process and generate text. However, to reduce its size and computational requirements, DistilGPT-2 cuts down on the number of layers while keeping the key structural elements intact. The underlying methodology involves training the smaller model to mimic the output distributions of the larger GPT-2, enabling it to generalize effectively with fewer parameters.

Availability

DistilGPT-2 is open-source and freely available through the Hugging Face model repository. Its accessibility, combined with its reduced size, makes it a popular choice for developers and researchers working on resource-constrained systems.

SmolLM

SmolLM is a lightweight language model designed to provide efficient natural language processing capabilities while maintaining a reduced computational footprint. Its development focuses on striking a balance between model performance and accessibility, making it ideal for applications where resource constraints are a primary concern. SmolLM is particularly suitable for edge devices, quick prototyping, and tasks that require low-latency responses.

Parameter Sizes

SmolLM is available in multiple configurations to accommodate different performance and resource needs. The smallest model contains approximately 10 million parameters, while mid-range versions include models with 50 million and 100 million parameters. For applications that require slightly higher capacity without sacrificing speed, a 300-million-parameter variant also offers enhanced performance. Each configuration optimizes efficient inference, allowing deployment on resource-constrained devices such as mobile phones and edge servers.

Architecture

The architecture of SmolLM is rooted in transformer-based designs, specifically tailored to reduce parameter redundancy without compromising performance. It employs advanced pruning and quantization techniques, alongside lightweight attention mechanisms, to achieve its compact form. Additionally, SmolLM integrates adaptive computation methods, enabling it to allocate resources dynamically based on task complexity. This design ensures that the model retains high accuracy and fluency in natural language tasks while maintaining efficiency.

Availability

SmolLM is open-source and available for download under a permissive license. A free tier for online use is also offered, with extended features accessible through a subscription plan.

General NLU (Text Classification, Sentiment Analysis, Named Entity Recognition)

MiniLM

MiniLM, developed by Microsoft, is a compact and efficient language model designed to deliver high performance while requiring fewer computational resources. It is part of a family of models that focus on optimizing knowledge distillation techniques, making it suitable for scenarios where computational efficiency and speed are critical. By compressing the knowledge of larger transformer models into a smaller architecture, MiniLM achieves a balance between size and performance, making it a popular choice for tasks like natural language understanding and text generation.

Size of Parameters

MiniLM is available in several sizes to accommodate different use cases and resource constraints. The smallest models feature as few as 6 layers and 22 million parameters, providing a lightweight option for resource-constrained environments. Medium-sized configurations with 12 layers and 33 million parameters are commonly used for applications requiring a balance between speed and accuracy. The largest version of MiniLM includes 384 million parameters and 24 transformer layers, delivering robust performance closer to larger transformer models while maintaining a smaller memory footprint.

Architecture

MiniLM is based on the transformer architecture, with specific adaptations to make it more compact. It utilizes a deep self-attention mechanism similar to models like BERT but incorporates innovations in knowledge distillation to transfer the performance of a larger teacher model to the smaller MiniLM. This process involves minimizing the difference between the teacher’s attention distributions and MiniLM’s, as well as aligning their hidden states, which ensures that the smaller model retains a significant portion of the larger model’s knowledge. The architecture supports multi-head attention and feed-forward layers but optimizes these components for faster inference and reduced computational costs.

Availability

MiniLM is open-source and freely available through platforms like Hugging Face Transformers and GitHub. Its accessibility allows developers and researchers to integrate it into diverse applications without licensing restrictions, fostering widespread adoption.

MobileBERT

MobileBERT is a lightweight and efficient adaptation of the popular BERT (Bidirectional Encoder Representations from Transformers) model, designed specifically to enable natural language processing tasks on resource-constrained devices such as mobile phones and edge devices. The model was introduced as a way to balance computational efficiency with accuracy, ensuring that smaller devices could perform complex language understanding tasks without compromising performance significantly.

Size of Parameters

The MobileBERT model is remarkably compact compared to the original BERT. It features a smaller number of parameters while retaining the ability to deliver high-quality results. The size of the parameters varies depending on the variant, but the standard MobileBERT configuration consists of approximately 25 million parameters, a significant reduction from the original BERT model’s 110 million parameters. This reduction is achieved through a careful process of knowledge distillation and architectural optimization.

MobileBERT employs a teacher-student training framework where the teacher model is a fine-tuned version of BERT and the student model is the compact MobileBERT. This process ensures that MobileBERT retains much of the knowledge and performance of its larger counterpart while significantly reducing the number of parameters and computational overhead.

Architecture

The architecture of MobileBERT tailors for efficiency while preserving the core principles of the transformer model. Unlike BERT, which relies on a multi-layer transformer encoder with large hidden sizes, MobileBERT uses a bottleneck structure to reduce complexity. It incorporates a smaller embedding size and employs inverted bottleneck layers, inspired by techniques in mobile neural networks like MobileNet.

MobileBERT replaces the original BERT’s feed-forward layers with a quadruple feed-forward network that adds depth and retains sufficient representational capacity despite the reduction in size. The model uses a 24-layer architecture with each layer featuring fewer parameters than the original BERT but maintaining a comparable level of accuracy through knowledge distillation.

Availability

MobileBERT is open-source and freely available for use, making it accessible to developers and researchers alike. Integrate the model into applications without licensing restrictions to ensure widespread adoption across various platforms, including mobile devices.

Microsoft Phi 3.5 Mini

Microsoft Phi 3.5 Mini is a compact version of the Phi language model series developed by Microsoft. Designed to balance efficiency and performance, it caters to scenarios requiring robust natural language understanding with limited computational resources. The model is part of Microsoft’s ongoing efforts to create versatile AI systems optimized for a wide range of applications, including chatbots, summarization, and code generation.

Size of Parameters

The Phi 3.5 Mini model comes in various parameter configurations to suit diverse needs. The smallest variant contains 1.3 billion parameters, offering lightweight deployment capabilities. Larger configurations, such as the 3 billion-parameter version, are available for applications demanding higher accuracy and more contextual depth. This scalability makes Phi 3.5 Mini a flexible choice for users with different resource constraints and performance requirements.

Architecture

The model architecture builds upon the Transformer framework, incorporating innovations from the Phi series. It features advanced attention mechanisms optimized for computational efficiency and memory usage. Researchers have employed techniques like layer sparsification and dynamic token reduction to enhance processing speed while maintaining the model’s ability to generate coherent and contextually relevant outputs. These enhancements make Phi 3.5 Mini well-suited for real-time applications.

Availability

Microsoft Phi 3.5 Mini is a proprietary model, integrated into Microsoft’s Azure AI services. While the model is not open-source, it offers a free tier for limited usage, making it accessible for developers and researchers exploring its capabilities. Commercial applications require subscription plans, providing scalability and support for enterprise-grade deployments.

Gemma 2

Gemma 2 is a small language model designed for efficient natural language understanding and generation tasks. Tailored for applications requiring lower computational resources, Gemma 2 balances accuracy and speed, making it suitable for use cases such as chatbots, content summarization, and interactive tools. Despite its smaller size compared to large-scale models, it achieves competitive performance through optimized training and architecture.

Size of Parameters

Gemma 2 is available in multiple parameter sizes, catering to a range of computational and application needs. The smallest variant, with 125 million parameters, is designed for lightweight tasks and edge devices. A mid-range version, featuring 350 million parameters, is ideal for tasks requiring slightly higher accuracy while still maintaining efficiency. The largest configuration, at 1.2 billion parameters, provides a more robust understanding and generation capability, suited for moderately complex NLP tasks while remaining manageable in terms of hardware requirements.

Architecture

The architecture of Gemma 2 is a transformer-based model, following the attention mechanism that has become a cornerstone of modern NLP. It employs a streamlined version of the transformer block to reduce computational overhead. Innovations such as dynamic attention heads and layer normalization enhancements improve both speed and model accuracy. The smaller parameter variants use fewer layers and reduced embedding dimensions, allowing for rapid inference on devices with limited resources. These adaptations make Gemma 2 an optimal choice for deploying high-performing models in resource-constrained environments.

Availability

Gemma 2 is open-source, with a permissive license that encourages community contributions and customization. Additionally, developers and researchers can experiment and integrate this free tier into their personal projects, making it accessible. For enterprise use, premium options with extended support are available.

TinyBERT

TinyBERT is a distilled version of BERT (Bidirectional Encoder Representations from Transformers), designed to reduce the computational complexity and memory footprint of the original BERT model while retaining comparable performance. Developed with knowledge distillation techniques, TinyBERT compresses the knowledge of larger BERT models into a smaller form, making it suitable for resource-constrained environments like mobile devices and edge computing. The model is particularly useful for natural language understanding tasks, including sentiment analysis, question answering, and text classification.

Size of Parameters

TinyBERT is available in multiple configurations to balance model size and performance. The smallest version consists of 4 transformer layers, each with 312 hidden units, amounting to approximately 14 million parameters. This configuration is ideal for lightweight applications with stringent memory and computational limitations. A slightly larger variant, with 6 transformer layers and 768 hidden units, contains about 66 million parameters, offering improved accuracy while remaining significantly smaller than the original BERT, which has 110 million parameters.

Architecture

The architecture of TinyBERT closely mirrors the transformer-based design of the original BERT, albeit with fewer layers and reduced dimensions for efficiency. Each transformer layer in TinyBERT consists of a multi-head self-attention mechanism, followed by a feed-forward neural network with layer normalization and residual connections. Knowledge distillation ensures that the smaller model inherits knowledge from the teacher model (typically BERT), focusing on mimicking the teacher’s predictions, intermediate representations, and attention distributions. This allows TinyBERT to achieve strong performance relative to its compact size.

AvailabilityTinyBERT is open-source and freely available under the Apache License 2.0. You can access and integrate it into workflows via platforms like Hugging Face Transformers, ensuring developers and researchers can use it without licensing constraints.

DistilBERT

DistilBERT is a smaller, faster, and lighter version of the widely popular BERT (Bidirectional Encoder Representations from Transformers) model. Developed by Hugging Face, DistilBERT retains much of BERT’s performance while being more computationally efficient. It achieves this by leveraging a process called knowledge distillation, wherein a smaller “student” model learns to mimic the behavior of a larger “teacher” model. The result is a model that is significantly smaller yet delivers comparable results on various natural language processing tasks.

Parameter Size

DistilBERT reduces the size of BERT by 40% while retaining 97% of its language understanding capabilities. The standard version of DistilBERT has approximately 66 million parameters compared to BERT-base’s 110 million. This reduction in size makes it highly suitable for applications requiring low-latency inference or deployment on resource-constrained devices. There are no additional variations with different sizes within DistilBERT itself, but it serves as a midpoint between compact and full-scale transformer models.

Architecture

DistilBERT retains the Transformer architecture but simplifies it by reducing the number of layers. It has six Transformer layers compared to the twelve layers in BERT-base, with each layer consisting of a multi-head self-attention mechanism and feed-forward networks. Additionally, the model employs sinusoidal positional encodings to handle word position and uses layer normalization to stabilize training. DistilBERT also benefits from techniques such as dynamic masking, which improves generalization during pretraining. Despite having fewer layers, it achieves competitive performance by pretraining on the same corpus as BERT and using a combination of language modeling and distillation objectives.

Availability

DistilBERT is open-source and freely available on platforms like Hugging Face’s Transformers library. It supports various tasks, such as text classification, question answering, and named entity recognition, without the need for extensive computational resources, making it accessible to developers and researchers alike.

Conclusion

Therefore, SLMs are making significant strides in transforming the field of NLP by offering a balance between performance, efficiency, and accessibility. Unlike their larger counterparts, these models are designed to operate in resource-constrained environments. Thus making them ideal for mobile applications, edge devices, and scenarios requiring real-time responses. By leveraging advancements in model compression, knowledge distillation, and optimized architectures, small models prove that compactness does not necessarily mean a compromise in quality.

Moreover, the versatility of small language models is evident in their applications. They have the ability to power chatbots and summarization tools to enable offline NLP capabilities. Open-source models like T5, Qwen-2, and Mistral Nemo drive innovation by making advanced technology accessible to more people. Proprietary models like Microsoft Phi 3.5 Mini show how tailored solutions meet specific enterprise needs.

As AI demand rises across sectors, small language models will remain crucial for scaling NLP technologies efficiently and inclusively. These models prove that smaller, optimized architectures can achieve impressive results, bringing AI to new domains and users.

Frequently Asked Questions

Q1. Can small language models be used offline?

A. Yes, due to their lightweight nature, developers can deploy small language models offline on devices like smartphones or embedded systems, depending on the application.

Q2. How are small language models fine-tuned?

A. Fine-tuning involves adjusting a pretrained model to improve its performance on a specific task using a smaller, task-specific dataset. This is done by continuing the training process with the new data.

Q3. Are small language models secure and private?

A. They can be more secure as they are often deployed locally, minimizing the need to send sensitive data over the internet. However, the level of security depends on the implementation.

Yashashwy Alok

Hello, my name is Yashashwy Alok, and I am passionate about data science and analytics. I thrive on solving complex problems, uncovering meaningful insights from data, and leveraging technology to make informed decisions. Over the years, I have developed expertise in programming, statistical analysis, and machine learning, with hands-on experience in tools and techniques that help translate data into actionable outcomes.

I’m driven by a curiosity to explore innovative approaches and continuously enhance my skill set to stay ahead in the ever-evolving field of data science. Whether it’s crafting efficient data pipelines, creating insightful visualizations, or applying advanced algorithms, I am committed to delivering impactful solutions that drive success.

In my professional journey, I’ve had the opportunity to gain practical exposure through internships and collaborations, which have shaped my ability to tackle real-world challenges. I am also an enthusiastic learner, always seeking to expand my knowledge through certifications, research, and hands-on experimentation.

Beyond my technical interests, I enjoy connecting with like-minded individuals, exchanging ideas, and contributing to projects that create meaningful change. I look forward to further honing my skills, taking on challenging opportunities, and making a difference in the world of data science.

Generative AI Intermediate Listicle

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Top 13 Small Language Models (SLMs)

Table of contents

Versatile Multi-Task Performance (Translation, Summarization, Q&A)

T5

Size of Parameters

Architecture

Availability

Qwen-2

Size of Parameters

Architecture

Availability

Llama 3.2

Size of Parameters

Architecture

Availability

Mistral Nemo

Size of Parameters

Architecture

Availability

Mistral Small 3

Size of Parameters

Architecture

Availability

Reasoning-Heavy Tasks

o3-mini

Size of Parameters

Architecture

Availability

Phi-4

Parameter Sizes

Architecture and Training

Availability

Text Generation

DistilGPT-2

Size of Parameters

Architecture

Availability

SmolLM

Parameter Sizes

Architecture

Availability

General NLU (Text Classification, Sentiment Analysis, Named Entity Recognition)

MiniLM

Size of Parameters

Architecture

Availability

MobileBERT

Size of Parameters

Architecture

Availability

Microsoft Phi 3.5 Mini

Size of Parameters

Architecture

Availability

Gemma 2

Size of Parameters

Architecture

Availability

TinyBERT

Size of Parameters

Architecture

DistilBERT

Parameter Size

Architecture

Availability

Conclusion