Chatbot Evolution: ChatGPT Vs. Rule-based

Suvojit Last Updated : 12 May, 2023

11 min read

Introduction

Chatbots have become an integral part of the digital landscape, revolutionizing the way businesses interact with their customers. From customer service to sales, virtual assistants to voice assistants, chatbot evolution has taken place in everyday lives and in the way companies communicate with their users. The technological capabilities of chatbots have improved over time, moving from rule-based bots to complex conversational agents driven by Artificial Intelligence and Machine Learning algorithms.

In this blog, we will explore the evolution of chatbots, starting from rule-based chatbots to the emergence of ChatGPT, which is powered by large language models like GPT-3.5 Turbo. We will delve deeper into the key concepts, functionalities, coding, and advancements that have shaped the field of chatbots today with the help of large language models.

Learning Objectives

New Feature

Get Personalized Learning Path! Set your goal and timeline. Get a path—under 2 mins.

Understand the evolution of chatbots from rule-based systems to large language models.
Explore the functionalities, architecture, and limitations of rule-based chatbots.
Learn about the emergence of large language models and their impact on chatbot development.
Gain insights into GPT-3.5 Turbo (ChatGPT), GPT4 and deep dive into coding and API usage.
Discover the features and applications of ChatGPT.
Discuss the potential future of chatbots and their implications.

This article was published as a part of the Data Science Blogathon.

Rule-based Chatbots

Rule-based chatbots, or scripted chatbots, are the earliest form of chatbots that were developed based on predefined rules or scripts. These chatbots follow a predefined set of rules to generate responses to user inputs. The responses are designed based on a predefined script that the chatbot developer creates, which outlines the possible interactions and responses the chatbot can provide.

Rule-based chatbots operate using a series of conditional statements that check for keywords or phrases in the user’s input and provide corresponding responses based on these conditions. For example, if the user asks a question like “What’s the name of the author of this blog about chatbots?”, the chatbot’s script would have a conditional statement that checks for the keywords “name”, “author”, “blog”, also known as entities, and responds with a predefined response “The author of this blog is Suvojit”. This is because a pre-defined set of entities and contexts are defined to train the chatbot based on which it depicts the user’s intent, and responds with a predefined response format.

Architecture of Rule-Based Chatbots

The architecture of rule-based chatbots usually consists of 3 parts on a high level: the UI, the Natural Language Processing (NLP) engine, and the rule engine.

User Interface: The UI is the platform or application through which the user interacts with the chatbot. It can be a website, a messaging app, or a platform that supports text-based communication.
Natural Language Processing (NLP) Engine: The NLP engine is responsible for processing the user input and converting it into a machine-readable format. It involves breaking down the user input into words, identifying the parts of speech, and extracting relevant information. The NLP engine can perform synonym mapping, spell-checking, and language translation, to ensure that the chatbot can understand and respond to user inputs.
Rule Engine: The rule engine is the brain of the chatbot. It is responsible for interpreting the user input, determining the intent, and selecting the appropriate response based on the predefined rules. The rule engine contains a set of decision trees, where each node represents a specific rule that the chatbot should follow. For example, if the user input contains a specific keyword, the chatbot will have a particular response or perform a specific action.

Limitations of Rule-Based Chatbots

While Rule-based chatbots can be effective in certain scenarios, they have several limitations. Here are some of the limitations of rule-based chatbots:

Limited ability to understand natural language: Rule-based chatbots rely on pre-programmed rules and patterns to understand and respond to user queries. They have a limited ability to understand natural language and may struggle to interpret complex queries that deviate from their pre-defined patterns.
Lack of context: Rule-based chatbots can’t understand the context of a conversation. They cannot interpret user intent beyond the specific set of rules they have been programmed with. Therefore, they cannot modify responses to reflect the user’s current context.
Difficulty handling ambiguity: Chatbots need to be able to handle ambiguity while communicating with people. However, rule-based chatbots can struggle to respond effectively in response to ambiguity, which can lead to frustrating user experiences.
Scalability: Rule-based chatbots need a lot of entities and context to handle many queries. This can make it difficult to scale up or improve, since new rule or patterns, needs more programming and maintenance.
Inability to learn and adapt: Rule-based chatbots are incapable of learning or adapting. They can’t use machine learning algorithms to improve their responses over time. This means that they will continue to rely on their predefined rules, even if they are ineffective.

So how do we overcome these limitations? Introducing Large Language Models (LLMs) – trained on massive datasets that contain billions of words, phrases, and sentences, these models are capable of performing language tasks with unprecedented accuracy and efficiency.

LLMs use a combination of deep learning algorithms, neural networks, and natural language processing techniques to understand the intricacies of language and generate human-like responses to user queries. With their immense size and sophisticated architecture, LLMs have the ability to learn from big data and continuously improve their performance over time. Let’s take a look at the most popular large language models in use today.

Popular Large Language Models

GPT3: GPT-3 (Generative Pre-trained Transformer 3) is a language processing AI model developed by OpenAI. It has 175 billion parameters and is capable of performing several natural language processing tasks, including language translation, summarization, and answering questions. GPT-3 has been lauded for its ability to generate high-quality text that is similar to text written by humans, making it a powerful tool for chatbots, content creation, and more.

GPT-3.5 Turbo: GPT-3.5 Turbo is an upgraded version of GPT-3 developed by OpenAI. It boasts a massive 350 billion parameters, making it much more powerful compared to its predecessor. With this increased processing power, GPT-3.5 Turbo is capable of generating even more sophisticated and complex natural language outputs. This model has the potential to be used in many domains, including academic research, content creation, and customer service.

GPT-4: GPT-4 is the next generation of OpenAI’s GPT series of language-processing AI models. Although the number of parameters has not been publicly released by OpenAI, many experts predict that the number of parameters could be about 1 Trillion. GPT-4 has been trained on more data, has better problem-solving capabilities, and higher accuracy, and produces more factual responses than its predecessors. Currently, GPT4 API is available through a waitlist, and it can be used with the ChatGPT Plus subscription too.

LLaMA: LLaMA is a large language model released by Facebook designed to help researchers in this subfield of AI. It has a variety of model sizes trained with parameters ranging from 7 billion to 65 billion. LLaMA can be used to research large language models, including exploring potential applications like answering questions, natural language understanding, capabilities and limitations of current language models, and developing techniques to improve those, evaluating, and mitigating biases. LLaMa is available under GPL-3 license and can be accessed by applying to the waitlist.

StableLM: StableLM is a recently released large language model by Stability AI. It is fully free and open source and it is trained with parameters ranging from 3 billion to 65 billion. StableLM is trained on a new experimental dataset built on The Pile, but three times larger with 1.5 trillion tokens of content. The richness of this dataset gives StableLM surprisingly high performance in conversational and coding tasks, despite its small size of 3 to 7 billion parameters for smaller models.

OpenAI’s ChatGPT

OpenAI’s ChatGPT is a large language model based on the GPT-3.5 Turbo architecture, which is designed to generate human-like responses to text-based conversations. The model is trained on a massive corpus of text data using unsupervised learning techniques, which allows it to learn and generate natural language.

ChatGPT is built using a DNN architecture with multiple layers of processing units called transformers. These transformers are responsible for processing the input text and generating the output text. The model is trained using unsupervised language modeling, where it is tasked with predicting the next word in a sequence of text.

One of the key features of ChatGPT is its ability to generate long and coherent responses to text-based input. This is achieved through the use of MLE, which encourages the model to generate responses that are both grammatically and semantically meaningful.

In addition to its ability to generate natural language responses, ChatGPT can handle a multitude of conversational tasks. These include the ability to detect and respond to specific keywords or phrases, generate text-based summaries of long documents, and even perform simple arithmetic operations.

Let’s take a look at how we can use the OpenAI APIs for GPT3.5 Turbo and GPT4.

GPT3.5 and GPT4 API

Most of us are aware of ChatGPT and have spent quite some time experimenting with it. Let’s take a look at how we can have a conversation with it using OpenAI APIs. First, we need to create an account on OpenAI and navigate to the View API Keys Section.

Once you have the API key, head over to the billing section and add your credit card. The cost per thousand tokens can be found on the OpenAI pricing page.

Now let’s see how we can invoke the APIs to use the GPT3.5-turbo model:

import openai

openai.api_key = 'asdadsa-Enter-Your-API-Key-Here'

def prompt_model(prompts, temperature=0.0, model="gpt-3.5-turbo"):
    messages = [{"role": "system", "content": "You are a helpful assistant."}]
    for prompt in prompts:
        messages.append({"role": "user", "content": prompt})
        response = openai.ChatCompletion.create(
            model=model, temperature=temperature, messages=messages
        )
    return response["choices"][0]["message"]["content"]

In the above code, the API call to invoke the GPT-3.5 Turbo Model is defined. Based on the set temperature and user input, the quality and type of response will vary. Now let’s try to talk to the bot and see the output:

prompts = []

prompts.append(
    '''Write about this amazing blog written by author Suvojit about 
    large language models''')

for model in ['gpt-3.5-turbo']:
    response = prompt_model(prompts, temperature=0.0, model=model)
    print(f'\n{model} Model response: \n\n{response}')

Let’s see the output:

gpt-3.5-turbo Model response: 

Suvojit's blog about large language models is an amazing read for anyone 
interested in the field of natural language processing (NLP). In his blog, 
Suvojit delves into the world of large language models, which are a type of
 machine learning model that can process and understand human language.

Suvojit starts by explaining what large language models are and how they work. 
He then goes on to discuss the different types of large language models, such 
as GPT-3 and BERT, and how they are trained using massive amounts of data.

One of the most interesting parts of Suvojit's blog is the 
potential applications of large language models. He explains how these models 
can be used for language translation, text summarization, and 
even generating human-like responses in chatbots.

Suvojit addresses some of the concerns and criticisms surrounding large 
language models, their potential to perpetuate biases and their high 
computational requirements.

Overall, Suvojit's blog is a well-written and informative piece that provides 
a great introduction to the world of large language models. Whether you're a 
seasoned NLP expert or just starting out, this blog is definitely worth a read.

Now let’s switch to GPT-4 and see another example of the capabilities of this model.

import openai

openai.api_key = 'asdadsa-Enter-Your-API-Key-Here'

def prompt_model(prompts, temperature=0.0, model="gpt-4"):
    messages = [{"role": "system", "content": "You are a helpful assistant."}]
    for prompt in prompts:
        messages.append({"role": "user", "content": prompt})
        response = openai.ChatCompletion.create(
            model=model, temperature=temperature, messages=messages
        )
    return response["choices"][0]["message"]["content"]


prompts = []

prompts.append(
    '''Prove the Cauchy-Schwarz inequality''')

for model in ['gpt-4']:
    response = prompt_model(prompts, temperature=0.0, model=model)
    print(f'\n{model} Model response: \n\n{response}')

Below is the response from the model with the proof:

gpt-4 Model response: 

The Cauchy-Schwarz inequality is a fundamental inequality in mathematics 
that holds for any two vectors in an inner product space. It states that 
the absolute value of the inner product of two vectors is less than or 
equal to the product of the norms (lengths) of the two vectors. In its 
simplest form, for two real-valued vectors u and v, the inequality is given by:

|<u, v>| ≤ ||u|| ||v||

Here, <u, v> denotes the inner product of u and v, and ||u|| and ||v|| 
denote the norms of u and v, respectively.

Proof:

Let u = (u1, u2, ..., un) and v = (v1, v2, ..., vn) be two n-dimensional 
real-valued vectors. We will prove the Cauchy-Schwarz inequality using 
the method of "completing the square."

Consider the following non-negative quadratic expression in a real variable t:

Q(t) = ||tu + v||^2

Expanding Q(t) using the definition of the inner product and the norm, we get:

Q(t) = <tu + v, tu + v> = (t^2)(<u, u>) + 2t(<u, v>) + <v, v>

Since Q(t) is non-negative for all t, its discriminant must be non-positive:

D = (2(<u, v>))^2 - 4(<u, u>)(<v, v>) ≤ 0

Simplifying and dividing by 4, we get:

(<u, v>)^2 ≤ (<u, u>)(<v, v>)

Taking the square root of both sides, we obtain the Cauchy-Schwarz inequality:

|<u, v>| ≤ sqrt(<u, u>) sqrt(<v, v>) = ||u|| ||v||

This completes the proof of the Cauchy-Schwarz inequality.

This demonstrates the remarkable capabilities of GPT-3.5 and GPT-4 in pushing the boundaries of natural language processing and paving the way for more sophisticated models in the future. With continued development and refinement, GPT-3.5 and GPT-4 are poised to become game-changers in the field of AI and natural language, with unprecedented capabilities and advancements in language technology. Let’s look at some of these applications.

Applications of ChatGPT

Let’s look at some of the possible applications of ChatGPT:

ChatGPT can be a conversational agent for customer support in e-commerce, finance, and healthcare. It can answer questions, provide product recommendations, and even assist in resolving complex issues.
ChatGPT can generate content such as blogging, summarization, and translation. It can assist journalists, bloggers, and content creators by generating high-quality content in a matter of seconds.
GPT-4 can be applied in the education sector to facilitate personalized learning experiences. It can generate interactive and engaging content, provide explanations, and even evaluate students’ responses.
ChatGPT can be integrated into virtual assistants to perform various tasks through voice commands. It can make appointments, set reminders, and even control smart home devices.
It can also be used in the field of mental health to provide therapy and support to mental health patients. GPT-4 can assist in identifying symptoms, providing coping mechanisms, and even suggesting therapy resources.
ChatGPT can be used in the recruitment process, assisting with screening resumes, scheduling, and conducting interviews. This can save time and effort for recruiters while ensuring a fair recruitment process.

Future Prospects and Concerns

GPT-4 and its successors have vast potential for future development, both in terms of their capabilities and their applications. As technology continues to evolve, these models will become even more sophisticated in their ability to understand and generate natural language, and may even develop new features like emotion recognition and contextual understanding. While the mathematical capabilities of ChatGPT are currently limited, this might soon be a thing of the past, and educators and students can find it helpful to have an AI assistant guide them in their academic pursuits, increasing the availability of knowledge and reasoning.

However, there are some major concerns:

Ethical Concerns: ChatGPT has raised ethical concerns about its potential to spread disinformation, promote harmful content, and manipulate public opinion. Some experts worry that the model’s ability to generate human-like responses can deceive and mislead people.
Bias and Fairness: Some researchers have pointed out that ChatGPT, like other machine learning models, can reflect and amplify the biases present in its training data. This could lead to unfair treatment of certain groups who are underrepresented in the training data.
Privacy and Security: ChatGPT relies on large amounts of data, including personal information, to generate its responses. This has raised concerns about the privacy and security of the data used to train the model, as well as the privacy of users who interact with it. There are also concerns about the potential for malicious actors to use ChatGPT to exploit vulnerabilities and gain unauthorized access to sensitive information.

Conclusion

Large language model-based chatbots like ChatGPT have revolutionized natural language processing and made significant advancements in language understanding and generation. Compared to rule-based chatbots, these LLM-based chatbots have demonstrated remarkable abilities to perform a wide range of language tasks, including text completion, translation, summarization, and more. Their massive training data and sophisticated algorithms have enabled them to produce highly accurate and coherent output that mimics human-like language. However, their size and energy consumption have raised concerns about their environmental impact. Despite these challenges, the potential benefits of large language models are undeniable, and they continue to drive innovation and research in the field of artificial intelligence.

Key Takeaways:

Rule-based chatbots can perform basic conversations with the end user which are predefined with intent, entities, and contexts.
The rule-based bots are not great at understanding new contexts and cannot answer complex questions.
LLM-based chatbots, on the other hand, are capable of generating human-like text, answering complex questions, and even carrying on realistic conversations with users.
ChatGPT, the most popular LLM-based chatbot, has been designed specifically for conversational use and can generate text that is both coherent and relevant to the task at hand.
GPT-3.5 Turbo and GPT-4 are both capable of advanced natural language processing tasks with unprecedented accuracy and efficiency, such as language translation, text summarization, question answering, solving basic math, and many more.
There are ethical and privacy-related concerns about these LLMs since they are supervised and improved based on user input, and these user inputs can contain sensitive and private information. Also, sometimes they can produce highly unreliable or misleading data.
However, despite these challenges, LLM-based chatbots remain one of the most important and sophisticated technological advancements today and for years to come.

References

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Suvojit

Suvojit is a Senior Data Scientist at DunnHumby. He enjoys exploring new and innovative ideas and techniques in the field of AI and tries to solve real-world machine learning problems by thinking out of the box. He writes about the latest advancements in Computer Vision and Natural Language processing. You can follow him on LinkedIn.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Chatbot Evolution: ChatGPT Vs. Rule-based

Introduction

Get Personalized Learning Path! Set your goal and timeline. Get a path—under 2 mins.

Table of Contents

Rule-based Chatbots

Architecture of Rule-Based Chatbots

Limitations of Rule-Based Chatbots

Popular Large Language Models

OpenAI’s ChatGPT

GPT3.5 and GPT4 API

Applications of ChatGPT

Future Prospects and Concerns

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set