Q1. Can I use HuggingFace model with vLLM?

Question

Accepted Answer

A. HuggingFace hub is the platform with most of the large language models are hosted. vLLM provides the compatibility to perform the inference on any HuggingFace open source large language models. Further vLLM also helps in the serving and deployment of the model on the endpoints.

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Guide to vLLM Using Gemma-7b-it

Introduction

Learning Objectives

Table of contents

vLLM Architecture Overview

Understanding KV Cache

How KV Cache Works?

Math Representation

What is PagedAttention?

How it Works?

Math Representation

Gemma Model Inference Using vLLM

Step1: Installation of the Module

Step2: Define LLM

Step3: Sampling Parameters Guide in vLLM

Step4: Prompt Template for Gemma Model

Step5: vLLM inference

Step6: Speed benchmarking

Conclusion

Key Takeaways

Frequently Asked Questions

Reference

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit