Fine-Tuning a Model Using OpenAI Platform for Customer Query Support

Vipin Vashisth Last Updated : 22 Feb, 2025

9 min read

Fine-tuning large language models (LLMs) is essential for optimizing their performance in specific tasks. OpenAI provides a robust framework for fine-tuning GPT models, allowing organizations to tailor AI behavior based on domain-specific requirements. This process plays a crucial role in LLM customization, enabling models to generate more accurate, relevant, and context-aware responses.
Fine-tuned LLMs can be applied in various scenarios such as financial analysis for risk assessment, customer support for personalized responses, and medical research for aiding diagnostics. They can also be used in software development for code generation and debugging, and legal assistance for contract review and case law analysis. In this guide, we’ll walk through the fine-tuning process using OpenAI’s platform and evaluate the fine-tuned model’s performance in real-world applications.

What is OpenAI Platform?
- Cost of Inference
Fine Tuning a Model on OpenAI Platform
GPT-4o vs Finetuned GPT-4o Performance Check
Frequently Asked Questions

What is OpenAI Platform?

The OpenAI platform provides a web-based tool that makes it easy to fine-tune models, letting users customize them for specific tasks. It provides step-by-step instructions for preparing data, training models, and evaluating results. Additionally, the platform supports seamless integration with APIs, enabling users to deploy fine-tuned models quickly and efficiently. It also offers automatic versioning and model monitoring to ensure that models are performing optimally over time, with the ability to update them as new data becomes available.

Cost of Inference

Here’s how much it costs to train models on the OpenAI Platform.

Model	Pricing	Pricing with Batch API	Training Pricing
gpt-4o-2024-08-06	$3.750 / 1M input tokens$15.000 / 1M output tokens	$1.875 / 1M input tokens$7.500 / 1M output tokens	$25.000 / 1M training tokens
gpt-4o-mini-2024-07-18	$0.300 / 1M input tokens$1.200 / 1M output tokens	$0.150 / 1M input tokens$0.600 / 1M output tokens	$3.000 / 1M training tokens
gpt-3.5-turbo	$3.000 / 1M training tokens$6.000 / 1M output tokens	$1.500 / 1M input tokens$3.000 / 1M output tokens	$8.000 / 1M training tokens

For more information, visit this page: https://openai.com/api/pricing/

Fine Tuning a Model on OpenAI Platform

Fine-tuning a model allows users to customize models for specific use cases, improving their accuracy, relevance, and adaptability. In this guide, we focus on more personalized, accurate, and context-aware responses to customer service interactions.

By fine tuning a model on real customer queries and interactions, the businesses can enhance response quality, reduce misunderstandings, and improve overall user satisfaction.

Also Read: Beginner’s Guide to Finetuning Large Language Models (LLMs)

Now let’s see how we can train a model using the OpenAI Platform. We will do this in 4 steps:

Identifying the dataset
Downloading the dfinetuning data
Importing and Preprocessing the Data
Fine-tuning on OpenAI Platform

Let’s begin!

Step 1: Identifying the Dataset

To fine-tune the model, we first need a high-quality dataset tailored to our use case. For this fine tuning process, I downloaded the dataset from Hugging Face, a popular platform for AI datasets and models. You can find a wide range of datasets suitable for fine-tuning by visiting Hugging Face Datasets. Simply search for a relevant dataset, download it, and preprocess it as needed to ensure it aligns with your specific requirements.

Step 2: Downloading the Dataset for Finetuning

The customer service data for the fine tuning process is taken from Hugging Face datasets. You can access it from here.

LLMs need data to be in a specific format for fine-tuning. Here’s a sample format for GPT-4o, GPT-4o-mini, and GPT-3.5-turbo.

{"messages": [{"role": "system", "content": "This is an AI assistant for answering FAQs."}, {"role": "user", "content": "What are your customer support hours?"}, {"role": "assistant", "content": "Our customer support is available	1 24/7. How else may I assist you?"}]}

Now in the next step we will check what our data looks like and make the necessary adjustments if it is not in the required format.

Step 3: Importing and Preprocessing the Data

Now we will import the data and preprocess to to the required format.

To do this we will follow these steps:

1. Now we will load the data in the Jupyter Notebook and modify it to match the required format.

import pandas as pd
splits = {'train': 'data/train-00000-of-00001.parquet', 'test': 'data/test-00000-of-00001.parquet'}
df_train = pd.read_parquet("hf://datasets/charles828/vertex-ai-customer-support-training-dataset/" + splits["train"])

Here we have 6 different columns. But we need only need two – “instruction” and “response” as these are the columns that have customer queries and the relative responses in them.

Now we can use the above csv file to create a jsonl file as needed for fine-tuning.

import json
messages = pd.read_csv("training_data")
with open("query_dataset.jsonl", "w", encoding='utf-8') as jsonl_file:
   for _, row in messages.iterrows():
       user_content = row['instruction']
       assintant_content = row['response']      
       jsonl_entry = {
           "messages":[
               {"role": "system", "content": "You are an assistant who writes in a clear, informative, and engaging style."},
               {"role": "user", "content": user_content},
               {"role": "assistant", "content": assintant_content}
           ]
       }    
       jsonl_file.write(json.dumps(jsonl_entry) + '\n')

As shown above, we can iterate through the data frame to create the jsonl file.

Here we are storing our data in a jsonl file format which is slightly different from json.

json stores data as a hierarchical structure (objects and arrays) in a single file, making it suitable for structured data with nesting. Below is an example of the json file format.

{
 "users": [
   {"name": "Alice", "age": 25},
   {"name": "Bob", "age": 30}
 ]}

jsonl consists of multiple json objects, each on a separate line, without arrays or nested structures. This format is more efficient for streaming, processing large datasets, and handling data line by line.Below is an example of the jsonl file format.

{"name": "Alice", "age": 25}
{"name": "Bob", "age": 30}

Step 4: Fine-tuning on OpenAI Platform

Now, we will use this ‘query_dataset’ to fine-tune the GPT-4o LLM. To do this, follow the below steps.

1. Go to this website and sign in if you haven’t signed in already. Once logged in, click on “Learn more” to learn more about the fine-tuning process.

2. Click on ‘Create’ and a small window will pop up.

Creating a fine-tuned Model on OpenAI Platform

Here is a breakdown of the hyperparameters in the above image:

Batch Size: This refers to the number of training examples (data points) used in one pass (or step) before updating the model’s weights. Instead of processing all data at once, the model processes small chunks (batches) at a time. A smaller batch size will take more time but may create better models. You have to find right balance over here. While a larger one might be more stable but much faster.

Learning Rate Multiplier: This is a factor that adjusts how much the model’s weights change after each update. If it’s set high, the model might learn faster but could overshoot the best solution. If it’s low, the model will learn more slowly but might be more precise.

Number of Epochs: An “epoch” is one complete pass through the entire training dataset. The number of epochs tells you how many times the model will learn from the entire dataset. More epochs typically allow the model to learn better, but too many can lead to overfitting.

3. Select the method as ‘Supervised’ and the ‘Base Model’ of your choice. I have selected GPT-4o.

4. Upload the json file for the training data.

5. Add a ‘Suffix’ relevant to the task on which you want to fine-tune the model.

6. Choose the hyper-parameters or leave them to the default values.

7. Now click on ‘Create’ and the fine-tuning will start.

8. Once the fine-tuning is completed it will show as follows:

Fine-tuned Language Model on OpenAI Platform

9. Now we can compare the fine-tuned model with the pre-existing model by clicking on the ‘Playground’ in the bottom right corner.

Important Note:

Fine-tuning duration and cost depend on the dataset size and model complexity. A smaller dataset, like 100 samples, costs significantly less but may not fine tune the model sufficiently, while larger datasets require more resources in terms of both time and money. In my case, the dataset had approximately 24K samples, so fine-tuning took around 7 to 8 hours and costed approximately $700.

Caution

Given the high cost, it’s recommended to start with a smaller dataset for initial testing before scaling up. Ensuring the dataset is well-structured and relevant can help optimize both performance and cost efficiency.

GPT-4o vs Finetuned GPT-4o Performance Check

Now that we have fine-tuned the model, we’ll compare its performance with the base GPT-4o and analyze responses from both models to see if there are improvements in accuracy, clarity, understanding, and relevance. This will help us determine if the fine-tuned model meets our specific needs and performs better in the intended tasks. For brevity i am showing you sample results of 3 prompts form both the fine tunned and standard GPT-4o model.

Query 1

Query: “Help me submitting the new delivery address”

Response by finetuned GPT-4o model:

Fine-Tuning A Language Model on OpenAI Platform

Response by GPT-4o:

Comparative Analysis

The fine-tuned model delivers a more detailed and user-centric response compared to the standard GPT-4o. While GPT-4o provides a functional step-by-step guide, the fine-tuned model enhances clarity by explicitly differentiating between adding and editing an address. It is more engaging and reassuring to the user and offers proactive assistance. This demonstrates the fine-tuned model’s superior ability to align with customer service best practices. The fine-tuned model is therefore a stronger choice for tasks requiring user-friendly, structured, and supportive responses.

Query 2

Query: “I need assistance to change to the Account Category account”

Response by finetuned GPT-4o model:

Response by GPT-4o:

Comparative Analysis

The fine-tuned model significantly enhances user engagement and clarity compared to the base model. While GPT-4o provides a structured yet generic response, the fine-tuned version adopts a more conversational and supportive tone, making interactions feel more natural.

Query 3

Query: “i do not know how to update my personal info”

Response by finetuned GPT-4o model:

Response by GPT-4o:

Comparative Analysis

The fine-tuned model outperforms the standard GPT-4o by providing a more precise and structured response. While GPT-4o offers a functional answer, the fine-tuned model improves clarity by explicitly addressing key distinctions and presenting information in a more coherent manner. Additionally, it adapts better to the context, ensuring a more relevant and refined response.

Overall Comparative Analysis

Feature	Fine-Tuned GPT-4o	GPT-4o (Base Model)
Empathy & Engagement	High – offers reassurance, warmth, and a personalized touch	Low – neutral and formal tone, lacks emotional depth
User Support & Understanding	Strong – makes users feel supported and valued	Moderate – provides clear guidance but lacks emotional connection
Tone & Personalization	Warm and engaging	Professional and neutral
Efficiency in Information Delivery	Clear instructions with added emotional intelligence	Highly efficient but lacks warmth
Overall User Experience	More engaging, comfortable, and memorable	Functional but impersonal and transactional
Impact on Interaction Quality	Enhances both effectiveness and emotional resonance	Focuses on delivering information without emotional engagement

Conclusion

In this case fine-tuning the models to respond better to the customer queries their effectiveness . It makes interactions feel more personal, friendly, and supportive, which leads to stronger connections and higher user satisfaction. While base models provide clear and accurate information, they can feel robotic and less engaging. Fine tuning the models through OpenAI’s convenient web platform is a great way to build custom large language models for domain specific tasks.

Frequently Asked Questions

Q1. What is fine-tuning in AI models?

A. Fine-tuning is the process of adapting a pre-trained AI model to perform a specific task or exhibit a particular behavior by training it further on a smaller, task-specific dataset. This allows the model to better understand the nuances of the task and produce more accurate or tailored results.

Q2. How does fine-tuning improve an AI model’s performance?

A. Fine-tuning enhances a model’s performance by teaching it to better handle the specific requirements of a task, like adding empathy in customer interactions. It helps the model provide more personalized, context-aware responses, making interactions feel more human-like and engaging.

Q3. Are fine-tuned models more expensive to use?

A. Fine-tuning models can require additional resources and training, which may increase the cost. However, the benefits of a more effective, user-friendly model often outweigh the initial investment, particularly for tasks that involve customer interaction or complex problem-solving.

Q4. Can I fine-tune a model on my own?

A. Yes, if you have the necessary data and technical expertise, you can fine-tune a model using machine learning frameworks like Hugging Face, OpenAI, or others. However, it typically requires a strong understanding of AI, data preparation, and training processes.

Q5. How long does it take to fine-tune a model?

A. The time required to fine-tune a model depends on the size of the dataset, the complexity of the task, and the computational resources available. It can take anywhere from a few hours to several days or more for larger models with vast datasets.

Vipin Vashisth

Hello! I'm Vipin, a passionate data science and machine learning enthusiast with a strong foundation in data analysis, machine learning algorithms, and programming. I have hands-on experience in building models, managing messy data, and solving real-world problems. My goal is to apply data-driven insights to create practical solutions that drive results. I'm eager to contribute my skills in a collaborative environment while continuing to learn and grow in the fields of Data Science, Machine Learning, and NLP.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Fine-Tuning a Model Using OpenAI Platform for Customer Query Support

Table of Contents

What is OpenAI Platform?

Cost of Inference

Fine Tuning a Model on OpenAI Platform

Step 1: Identifying the Dataset

Step 2: Downloading the Dataset for Finetuning

Step 3: Importing and Preprocessing the Data

Step 4: Fine-tuning on OpenAI Platform

Important Note:

GPT-4o vs Finetuned GPT-4o Performance Check

Query 1

Comparative Analysis

Query 2

Comparative Analysis

Query 3

Comparative Analysis

Overall Comparative Analysis

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt