7 LLM Parameters to Enhance Model Performance (With Practical Implementation)

Pankaj Singh Last Updated : 21 Oct, 2024

14 min read

Let’s say you’re interacting with an AI that not only answers your questions but understands the nuances of your intent. It crafts tailored, coherent responses that almost feel human. How does this happen? Most people don’t even realize the secret lies in LLM parameters.

If you’ve ever wondered how AI models like ChatGPT generate remarkably lifelike text, you’re in the right place. These models don’t just magically know what to say next. Instead, they rely on key parameters to determine everything from creativity to accuracy to coherence. Whether you’re a curious beginner or a seasoned developer, understanding these parameters can unlock new levels of AI potential for your projects.

This article will discuss the 7 essential generation parameters that shape how large language models (LLMs) like GPT-4o operate. From temperature settings to top-k sampling, these parameters act as the dials you can adjust to control the AI’s output. Mastering them is like gaining the steering wheel to navigate the vast world of AI text generation.

Overview

Learn how key parameters like temperature, max_tokens, and top-p shape AI-generated text.
Discover how adjusting LLM parameters can enhance creativity, accuracy, and coherence in AI outputs.
Master the 7 essential LLM parameters to customize text generation for any application.
Fine-tune AI responses by controlling output length, diversity, and factual accuracy with these parameters.
Avoid repetitive and incoherent AI outputs by tweaking frequency and presence penalties.
Unlock the full potential of AI text generation by understanding and optimizing these crucial LLM settings.

What are LLM Parameters?
- Key Aspects Influenced by LLM Generation Parameters:
Practical Implementation of 7 LLM Parameters
- Install Necessary Libraries
- Basic Setup for All Code Snippets
1. Max Tokens
- Implementation
2. Temperature
- Implementation
3. Top-p (Nucleus Sampling)
- Implementation
4. Top-k (Token Sampling)
- Implementation
5. Frequency Penalty
- Implementation
6. Presence Penalty
- Implementation
7. Stop Sequence
- Implementation
How do These LLM Parameters Work Together?
Conclusion

What are LLM Parameters?

In the context of Large Language Models (LLMs) like GPT-o1, generation parameters are settings or configurations that influence how the model generates its responses. These parameters help determine various aspects of the output, such as creativity, coherence, accuracy, and even length.

Think of generation parameters as the “control knobs” of the model. By adjusting them, you can change how the AI behaves when producing text. These parameters guide the model in navigating the vast space of possible word combinations to select the most suitable response based on the user’s input.

Without these parameters, the AI would be less flexible and often unpredictable in its behaviour. By fine-tuning them, users can either make the model more focused and factual or allow it to explore more creative and diverse responses.

Key Aspects Influenced by LLM Generation Parameters:

Creativity vs. Accuracy: Some parameters control how “creative” or “predictable” the model’s responses are. Do you want a safe and factual response or seek something more imaginative?
Response Length: These settings can influence how much or how little the model generates in a single response.
Diversity of Output: The model can either focus on the most likely next words or explore a broader range of possibilities.
Risk of Hallucination: Overly creative settings may lead the model to generate “hallucinations” or plausible-sounding but factually incorrect responses. The parameters help balance that risk.

Each LLM generation parameter plays a unique role in shaping the final output, and by understanding them, you can customize the AI to meet your specific needs or goals better.

Practical Implementation of 7 LLM Parameters

Install Necessary Libraries

Before using the OpenAI API to control parameters like max_tokens, temperature, etc., you need to install the OpenAI Python client library. You can do this using pip:

!pip install openai

Once the library is installed, you can use the following code snippets for each parameter. Make sure to replace your_openai_api_key with your actual OpenAI API key.

Basic Setup for All Code Snippets

This setup will remain constant in all examples. You can reuse this section as your base setup for interacting with GPT Models.

import openai
# Set your OpenAI API key
openai.api_key = 'your_openai_api_key'
# Define a simple prompt that we can reuse in examples
prompt = "Explain the concept of artificial intelligence in simple terms"

7 llm generation parameters — 7 LLM Generation Parameters

1. Max Tokens

The max_tokens parameter controls the length of the output generated by the model. A “token” can be as short as one character or as long as one word, depending on the complexity of the text.

Low Value (e.g., 10): Produces shorter responses.
High Value (e.g., 1000): Generates longer, more detailed responses.

Why is it Important?

By setting an appropriate max_tokens value, you can control whether the response is a quick snippet or an in-depth explanation. This is especially important for applications where brevity is key, like text summarization, or where detailed answers are needed, like in knowledge-intensive dialogues.

Note: Max_token value is now deprecated in favor of max_completion_tokens and is not compatible with o1 series models.

Implementation

Here’s how you can control the length of the generated output by using the max_tokens parameter with the OpenAI model:

import openai
client = openai.OpenAI(api_key='Your_api_key')
max_tokens=10
temperature=0.5
response = client.chat.completions.create(
           model="gpt-4o",
           messages=[
               {"role": "user",
                "content": "What is the capital of India? Give 7 places to Visit"}
           ],
           max_tokens=max_tokens,
           temperature=temperature,
           n=1,
           )
print(response.choices[0].message.content)

Output

max_tokens = 10

Output: ‘The capital of India is New Delhi. Here are’
The response is very brief and incomplete, cut off due to the token limit. It provides basic information but doesn’t elaborate. The sentence begins but doesn’t finish, cutting off just before listing places to visit.

max_tokens = 20

Output: ‘The capital of India is New Delhi. Here are seven places to visit in New Delhi:\n1.’
With a slightly higher token limit, the response begins to list places but only manages to start the first item before being cut off again. It’s still too short to provide useful detail or even finish a single place description.

max_tokens = 50

Output: ‘The capital of India is New Delhi. Here are seven places to visit in New Delhi:\n1. **India Gate**: This iconic monument is a war memorial located along the Rajpath in New Delhi. It is dedicated to the soldiers who died during World’
Here, the response is more detailed, offering a complete introduction and the beginning of a description for the first location, India Gate. However, it is cut off mid-sentence, which suggests the 50-token limit isn’t enough for a full list but can give more context and explanation for at least one or two items.

max_tokens = 500

Output: (Full detailed response with seven places)
With this larger token limit, the response is complete and provides a detailed list of seven places to visit in New Delhi. Each place includes a brief but informative description, offering context about its significance and historical importance. The response is fully articulated and allows for more complex and descriptive text.

2. Temperature

The temperature parameter influences how random or creative the model’s responses are. It’s essentially a measure of how deterministic the responses should be:

Low Temperature (e.g., 0.1): The model will produce more focused and predictable responses.
High Temperature (e.g., 0.9): The model will produce more creative, varied, or even “wild” responses.

Why is it Important?

This is perfect for controlling the tone. Use low temperatures for tasks like generating technical answers, where precision matters, and higher temperatures for creative writing tasks, such as storytelling or poetry.

Implementation

The temperature parameter controls the randomness or creativity of the output. Here’s how to use it with the newer model:

import openai
client = openai.OpenAI(api_key=api_key)
max_tokens=500
temperature=0.1
response = client.chat.completions.create(
           model="gpt-4o",
           messages=[
               {"role": "user",
                "content": "What is the capital of India? Give 7 places to Visit"}
           ],
           max_tokens=max_tokens,
           temperature=temperature,
           n=1,
           stop=None
       )
print(response.choices[0].message.content)

Output

temperature=0.1

The output is strictly factual and formal, providing concise, straightforward information with minimal variation or embellishment. It reads like an encyclopedia entry, prioritizing clarity and precision.

temperature=0.5

This output retains factual accuracy but introduces more variability in sentence structure. It adds a bit more description, offering a slightly more engaging and creative tone, yet still grounded in facts. There’s a little more room for slight rewording and additional detail compared to the 0.1 output.

temperature=0.9

The most creative version, with descriptive and vivid language. It adds subjective elements and colourful details, making it feel more like a travel narrative or guide, emphasizing atmosphere, cultural significance, and facts.

3. Top-p (Nucleus Sampling)

The top_p parameter, also known as nucleus sampling, helps control the diversity of responses. It sets a threshold for the cumulative probability distribution of token choices:

Low Value (e.g., 0.1): The model will only consider the top 10% of possible responses, limiting variation.
High Value (e.g., 0.9): The model considers a wider range of possible responses, increasing variability.

Why is it Important?

This parameter helps balance creativity and precision. When paired with temperature, it can produce diverse and coherent responses. It’s great for applications where you want creative flexibility but still need some level of control.

Implementation

The top_p parameter, also known as nucleus sampling, controls the diversity of the responses. Here’s how to use it:

import openai
client = openai.OpenAI(api_key=api_key)
max_tokens=500
temperature=0.1
top_p=0.5
response = client.chat.completions.create(
           model="gpt-4o",
           messages=[
               {"role": "user",
                "content": "What is the capital of India? Give 7 places to Visit"}
           ],
           max_tokens=max_tokens,
           temperature=temperature,
           n=1,
           top_p=top_p,
           stop=None
       )
print(response.choices[0].message.content)

Output

temperature=0.1
top_p=0.25

Highly deterministic and fact-driven: At this low top_p value, the model selects words from a narrow pool of highly probable options, leading to concise and accurate responses with minimal variability. Each location is described with strict adherence to core facts, leaving little room for creativity or added details.

For instance, the mention of India Gate focuses purely on its role as a war memorial and its historical significance, without additional details like the design or atmosphere. The language remains straightforward and formal, ensuring clarity without distractions. This makes the output ideal for situations requiring precision and a lack of ambiguity.

temperature=0.1
top_p=0.5

Balanced between creativity and factual accuracy: With top_p = 0.5, the model opens up slightly to more varied phrasing while still maintaining a strong focus on factual content. This level introduces extra contextual information that provides a richer narrative without drifting too far from the main facts.

For example, in the description of Red Fort, this output includes the detail about the Prime Minister hoisting the flag on Independence Day—a point that gives more cultural significance but isn’t strictly necessary for the location’s historical description. The output is slightly more conversational and engaging, appealing to readers who want both facts and a bit of context.

More relaxed but still factual in nature, allowing for slight variability in phrasing but still quite structured.
The sentences are less rigid, and there is a wider range of facts included, such as mentioning the hoisting of the national flag at Red Fort on Independence Day and the design of India Gate by Sir Edwin Lutyens.
The wording is slightly more fluid compared to top_p = 0.1, though it remains quite factual and concise.

temperature = 0.5
top_p=1

Most diverse and creatively expansive output: At top_p = 1, the model allows for maximum variety, offering a more flexible and expansive description. This version includes richer language and additional, sometimes less expected, content.

For example, the inclusion of Raj Ghat in the list of notable places deviates from the standard historical or architectural landmarks and adds a human touch by highlighting its significance as a memorial to Mahatma Gandhi. Descriptions may also include sensory or emotional language, like how Lotus Temple has a serene environment that attracts visitors. This setting is ideal for producing content that is not only factually correct but also engaging and appealing to a broader audience.

4. Top-k (Token Sampling)

The top_k parameter limits the model to only considering the top k most probable next tokens when predicting (generating) the next word.

Low Value (e.g., 50): Limits the model to more predictable and constrained responses.
High Value (e.g., 500): Allows the model to consider a larger number of tokens, increasing the variety of responses.

Why is it Important?

While similar to top_p, top_k explicitly limits the number of tokens the model can choose from, making it useful for applications that require tight control over output variability. If you want to generate formal, structured responses, using a lower top_k can help.

Implementation

The top_k parameter isn’t directly available in the OpenAI API like top_p, but top_p offers a similar way to limit token choices. However, you can still control the randomness of tokens using the top_p parameter as a proxy.

import openai
# Initialize the OpenAI client with your API key
client = openai.OpenAI(api_key=api_key)
max_tokens = 500
temperature = 0.1
top_p = 0.9
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "What is the capital of India? Give 7 places to Visit"}
    ],
    max_tokens=max_tokens,
    temperature=temperature,
    n=1,
    top_p=top_p,
       stop=None
)
print("Top-k Example Output (Using top_p as proxy):")
print(response.choices[0].message.content)

Output

Top-k Example Output (Using top_p as proxy):

The capital of India is New Delhi. Here are seven notable places to visit in
 New Delhi:

1. **India Gate** - This is a war memorial located astride the Rajpath, on
 the eastern edge of the ceremonial axis of New Delhi, India, formerly called
 Kingsway. It is a tribute to the soldiers who died during World War I and
 the Third Anglo-Afghan War.

2. **Red Fort (Lal Qila)** - A historic fort in the city of Delhi in India,
 which served as the main residence of the Mughal Emperors. Every year on
 India's Independence Day (August 15), the Prime Minister hoists the national
 flag at the main gate of the fort and delivers a nationally broadcast speech
 from its ramparts.

3. **Qutub Minar** - A UNESCO World Heritage Site located in the Mehrauli
 area of Delhi, Qutub Minar is a 73-meter-tall tapering tower of five 
storeys, with a 14.3 meters base diameter, reducing to 2.7 meters at the top
 of the peak. It was constructed in 1193 by Qutb-ud-din Aibak, founder of the
 Delhi Sultanate after the defeat of Delhi's last Hindu kingdom.

4. **Lotus Temple** - Notable for its flowerlike shape, it has become a 
prominent attraction in the city. Open to all, regardless of religion or any
 other qualification, the Lotus Temple is an excellent place for meditation
 and obtaining peace.

5. **Humayun's Tomb** - Another UNESCO World Heritage Site, this is the tomb
 of the Mughal Emperor Humayun. It was commissioned by Humayun's first wife
 and chief consort, Empress Bega Begum, in 1569-70, and designed by Mirak
 Mirza Ghiyas and his son, Sayyid Muhammad.

6. **Akshardham Temple** - A Hindu temple, and a spiritual-cultural campus in
 Delhi, India. Also referred to as Akshardham Mandir, it displays millennia
 of traditional Hindu and Indian culture, spirituality, and architecture.

7. **Rashtrapati Bhavan** - The official residence of the President of India.
 Located at the Western end of Rajpath in New Delhi, the Rashtrapati Bhavan
 is a vast mansion and its architecture is breathtaking. It incorporates
 various styles, including Mughal and European, and is a

5. Frequency Penalty

The frequency_penalty parameter discourages the model from repeating previously used words. It reduces the probability of tokens that have already appeared in the output.

Low Value (e.g., 0.0): The model won’t penalize for repetition.
High Value (e.g., 2.0): The model will heavily penalize repeated words, encouraging the generation of new content.

Why is it Importnt?

This is useful when you want the model to avoid repetitive outputs, like in creative writing, where redundancy might diminish quality. On the flip side, you might want lower penalties in technical writing, where repeated terminology could be beneficial for clarity.

Implementation

The frequency_penalty parameter helps control repetitive word usage in the generated output. Here’s how to use it with gpt-4o:

import openai
# Initialize the OpenAI client with your API key
client = openai.OpenAI(api_key='Your_api_key')
max_tokens = 500
temperature = 0.1
top_p=0.25
frequency_penalty=1
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "What is the capital of India? Give 7 places to Visit"}
    ],
    max_tokens=max_tokens,
    temperature=temperature,
    n=1,
    top_p=top_p,
frequency_penalty=frequency_penalty,
               stop=None
   )
print(response.choices[0].message.content)

Output

frequency_penalty=1

Balanced output with some repetition, maintaining natural flow. Ideal for contexts like creative writing where some repetition is acceptable. The descriptions are clear and cohesive, allowing for easy readability without excessive redundancy. Useful when both clarity and flow are required.

frequency_penalty=1.5

More varied phrasing with reduced repetition. Suitable for contexts where linguistic diversity enhances readability, such as reports or articles. The text maintains clarity while introducing more dynamic sentence structures. Helpful in technical writing to avoid excessive repetition without losing coherence.

frequency_penalty=2

Maximizes diversity but may sacrifice fluency and cohesion. The output becomes less uniform, introducing more variety but sometimes losing smoothness. Suitable for creative tasks that benefit from high variation, though it may reduce clarity in more formal or technical contexts due to inconsistency.

6. Presence Penalty

The presence_penalty parameter is similar to the frequency penalty, but instead of penalizing based on how often a word is used, it penalizes based on whether a word has appeared at all in the response so far.

Low Value (e.g., 0.0): The model won’t penalize for reusing words.
High Value (e.g., 2.0): The model will avoid using any word that has already appeared.

Why is it Important?

Presence penalties help encourage more diverse content generation. It’s especially useful when you want the model to continually introduce new ideas, as in brainstorming sessions.

Implementation

The presence_penalty discourages the model from repeating ideas or words it has already introduced. Here’s how to apply it:

import openai
# Initialize the OpenAI client with your API key
client = openai.OpenAI(api_key='Your_api_key')
# Define parameters for the chat request
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of India? Give 7 places to visit."
        }
    ],
    max_tokens=500,   # Max tokens for the response
    temperature=0.1,  # Controls randomness
    top_p=0.1,        # Controls diversity of responses
    presence_penalty=0.5,  # Encourages the introduction of new ideas
    n=1,              # Generate only 1 completion
    stop=None         # Stop sequence, none in this case
)
print(response.choices[0].message.content)

Output

presence_penalty=0.5

The output is informative but somewhat repetitive, as it provides well-known facts about each site, emphasizing details that may already be familiar to the reader. For instance, the descriptions of India Gate and Qutub Minar don’t diverge much from common knowledge, sticking closely to conventional summaries. This demonstrates how a lower presence penalty encourages the model to remain within familiar and already established content patterns.

presence_penalty=1

The output is more varied in how it presents details, with the model introducing more nuanced information and restating facts in a less formulaic way. For example, the description of Akshardham Temple adds an additional sentence about millennia of Hindu culture, signaling that the higher presence penalty pushes the model toward introducing slightly different phrasing and details to avoid redundancy, fostering diversity in content.

7. Stop Sequence

The stop parameter lets you define a sequence of characters or words that will signal the model to stop generating further content. This allows you to cleanly end the generation at a specific point.

Example Stop Sequences: Could be periods (.), newlines (\n), or specific phrases like “The end”.

Why is it Important?

This parameter is especially handy when working on applications where you want the model to stop once it has reached a logical conclusion or after providing a certain number of ideas, such as in Q&A or dialogue-based models.

Implementation

The stop parameter allows you to define a stopping point for the model when generating text. For example, you can stop it after generating a list of items.

import openai
# Initialize the OpenAI client with your API key
client = openai.OpenAI(api_key='Your_api_key')
max_tokens = 500
temperature = 0.1
top_p = 0.1
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "What is the capital of India? Give 7 places to Visit"}
    ],
    max_tokens=max_tokens,
    temperature=temperature,
    n=1,
    top_p=top_p,
    stop=[".", "End of list"]  # Define stop sequences
)
print(response.choices[0].message.content)

Output

The capital of India is New Delhi

How do These LLM Parameters Work Together?

Now, the real magic happens when you start combining these parameters. For example:

Use temperature and top_p together to fine-tune creative tasks.
Pair max_tokens with stop to limit long-form responses effectively.
Leverage frequency_penalty and presence_penalty to avoid repetitive text, which is particularly useful for tasks like poetry generation or brainstorming sessions.

Understanding these LLM parameters can significantly improve how you interact with language models. Whether you’re developing an AI-based assistant, generating creative content, or performing technical tasks, knowing how to tweak these parameters helps you get the best output for your specific needs.

By adjusting LLM parameters like temperature, max_tokens, and top_p, you gain control over the model’s creativity, coherence, and length. On the other hand, penalties like frequency and presence ensure that outputs remain fresh and avoid repetitive patterns. Finally, the stop sequence ensures clean and well-defined completions.

Experimenting with these settings is key, as the optimal configuration depends on your application. Start by tweaking one parameter at a time and observe how the outputs shift—this will help you dial in the perfect setup for your use case!

Conclusion

LLM parameters like max tokens, temperature, top-p, top-k, frequency penalty, presence penalty, and stop sequence play a vital role in shaping generated outputs. Proper tuning of these parameters ensures desired results, balancing creativity and coherence. Understanding and implementing them enhances control over language model behavior.

Hope you like the article! LLM parameters are crucial for optimizing model performance, encompassing a variety of settings. An LLM parameters list typically includes weights, biases, and hyperparameters. For LLM parameters examples, consider temperature and context window adjustments that influence output diversity and relevance. Understanding these LLM parameters helps in fine-tuning models for specific tasks effectively.

Are you looking for an online Generative AI course? If yes, explore this: GenAI Pinnacle Program.

Q1. What are LLM generation parameters?

Ans. LLM generation parameters control how AI models like GPT-4 generate text, affecting creativity, accuracy, and length.

Q2. What is the role of the temperature parameter?

Ans. The temperature controls how creative or focused the model’s output is. Lower values make it more precise, while higher values increase creativity.

Q3. How does max_tokens affect the output?

Ans. Max_tokens limits the length of the generated response, with higher values producing longer and more detailed outputs.

Q4. What is top-p sampling?

Ans. Top-p (nucleus sampling) controls the diversity of responses by setting a threshold for the cumulative probability of token choices, balancing precision and creativity.

Q5. Why are frequency and presence penalties important?

Ans. These penalties reduce repetition and encourage the model to generate more diverse content, improving overall output quality.

Pankaj Singh

Hi, I am Pankaj Singh Negi - Senior Content Editor | Passionate about storytelling and crafting compelling narratives that transform ideas into impactful content. I love reading about technology revolutionizing our lifestyle.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

7 LLM Parameters to Enhance Model Performance (With Practical Implementation)

Overview

Table of contents

What are LLM Parameters?

Key Aspects Influenced by LLM Generation Parameters:

Practical Implementation of 7 LLM Parameters

Install Necessary Libraries

Basic Setup for All Code Snippets

1. Max Tokens

Implementation

max_tokens = 10

max_tokens = 20

max_tokens = 50

max_tokens = 500

2. Temperature

Implementation

temperature=0.1

temperature=0.5

temperature=0.9

3. Top-p (Nucleus Sampling)

Implementation

temperature=0.1top_p=0.25

temperature=0.1top_p=0.5

temperature = 0.5top_p=1

4. Top-k (Token Sampling)

Implementation

5. Frequency Penalty

Implementation

frequency_penalty=1

frequency_penalty=1.5

frequency_penalty=2

6. Presence Penalty

Implementation

presence_penalty=0.5

presence_penalty=1

7. Stop Sequence

Implementation

How do These LLM Parameters Work Together?

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

temperature=0.1
top_p=0.25

temperature=0.1
top_p=0.5

temperature = 0.5
top_p=1