How to Run LLMs Locally?

Diksha Kumari Last Updated : 08 Jan, 2025
2 min read

LLMs like GPT and Llama have completely transformed how we tackle language tasks, from creating intelligent chatbots to generating complex pieces of code. Cloud platforms like HuggingFace simplify using these models, but there are times when running an LLM locally on your own computer is the smarter choice. Why? Because it offers greater privacy, allows for customizations tailored to your specific needs, and can significantly reduce costs. Running LLMs locally gives you full control, letting you leverage their power on your own terms.

Let me show you how to run an LLM on your system in just a few simple steps using Ollama and HuggingFace!

Here’s a video that explains it step by step:

Steps to Run LLMs Locally

Step 1: Download Ollama

First, search for “Ollama” on your browser, download it, and install it on your system.

Step 2: Find the Best Open-Source LLMs

Next, search for “HuggingFace LLM leaderboard” to find a list of the top open-source language models.

Step 3: Filter the Models for Your Device

Once you see the list, apply filters to find models that work best for your setup. For example:

  • Select consumer devices for home use.
  • Choose official providers only to avoid unofficial or unverified models.
  • If your laptop has a lower-end GPU, select models designed for edge devices.

Click on a top-ranked model, such as Qwen/Qwen2.5-35B. On the top-right corner of the screen, click “Use this model.” However, you won’t find Ollama listed here as an option.

That’s because Ollama uses a specialized format called gguf, which is a smaller, faster, and quantized version of the model.

(Note: Quantization slightly reduces quality but makes it more efficient for local use.)

To get a model in the gguf format:

  • Go to the Quantization section on the leaderboard – there are around 80 models available here. Sort these models by most download ones.

Look for models with “gguf” in their name, like Bartowski. This is a good choice.

  • Select this model and click “Use this model with Ollama.”
  • For quantization settings, choose a file size that’s 1-2GB smaller than your GPU’s RAM or pick a recommended option like Q5_K_M.

Step 5: Download and Start Using the Model

Copy the command provided for your selected model and paste it into your terminal. Hit “Enter” and wait for the download to complete.

Once it’s downloaded, you can start chatting with the model just like you would with any other LLM. Simple and fun!

And there you go! You’re now running a powerful LLM locally on your device. Let me know if these steps worked for you in the comment section below.

As an Instructional Designer at Analytics Vidhya, Diksha has experience creating dynamic educational content on the latest technologies and trends in data science. With a knack for crafting engaging, cutting-edge content, Diksha empowers learners to navigate and excel in the evolving tech landscape, ensuring educational excellence in this rapidly advancing field.

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details