Chatbots have become an integral part of the digital landscape, revolutionizing the way businesses interact with their customers. From customer service to sales, virtual assistants to voice assistants, chatbot evolution has taken place in everyday lives and in the way companies communicate with their users. The technological capabilities of chatbots have improved over time, moving from rule-based bots to complex conversational agents driven by Artificial Intelligence and Machine Learning algorithms.
In this blog, we will explore the evolution of chatbots, starting from rule-based chatbots to the emergence of ChatGPT, which is powered by large language models like GPT-3.5 Turbo. We will delve deeper into the key concepts, functionalities, coding, and advancements that have shaped the field of chatbots today with the help of large language models.
Learning Objectives
This article was published as a part of the Data Science Blogathon.
Rule-based chatbots, or scripted chatbots, are the earliest form of chatbots that were developed based on predefined rules or scripts. These chatbots follow a predefined set of rules to generate responses to user inputs. The responses are designed based on a predefined script that the chatbot developer creates, which outlines the possible interactions and responses the chatbot can provide.
Rule-based chatbots operate using a series of conditional statements that check for keywords or phrases in the user’s input and provide corresponding responses based on these conditions. For example, if the user asks a question like “What’s the name of the author of this blog about chatbots?”, the chatbot’s script would have a conditional statement that checks for the keywords “name”, “author”, “blog”, also known as entities, and responds with a predefined response “The author of this blog is Suvojit”. This is because a pre-defined set of entities and contexts are defined to train the chatbot based on which it depicts the user’s intent, and responds with a predefined response format.
The architecture of rule-based chatbots usually consists of 3 parts on a high level: the UI, the Natural Language Processing (NLP) engine, and the rule engine.
While Rule-based chatbots can be effective in certain scenarios, they have several limitations. Here are some of the limitations of rule-based chatbots:
So how do we overcome these limitations? Introducing Large Language Models (LLMs) – trained on massive datasets that contain billions of words, phrases, and sentences, these models are capable of performing language tasks with unprecedented accuracy and efficiency.
LLMs use a combination of deep learning algorithms, neural networks, and natural language processing techniques to understand the intricacies of language and generate human-like responses to user queries. With their immense size and sophisticated architecture, LLMs have the ability to learn from big data and continuously improve their performance over time. Let’s take a look at the most popular large language models in use today.
GPT3: GPT-3 (Generative Pre-trained Transformer 3) is a language processing AI model developed by OpenAI. It has 175 billion parameters and is capable of performing several natural language processing tasks, including language translation, summarization, and answering questions. GPT-3 has been lauded for its ability to generate high-quality text that is similar to text written by humans, making it a powerful tool for chatbots, content creation, and more.
GPT-3.5 Turbo: GPT-3.5 Turbo is an upgraded version of GPT-3 developed by OpenAI. It boasts a massive 350 billion parameters, making it much more powerful compared to its predecessor. With this increased processing power, GPT-3.5 Turbo is capable of generating even more sophisticated and complex natural language outputs. This model has the potential to be used in many domains, including academic research, content creation, and customer service.
GPT-4: GPT-4 is the next generation of OpenAI’s GPT series of language-processing AI models. Although the number of parameters has not been publicly released by OpenAI, many experts predict that the number of parameters could be about 1 Trillion. GPT-4 has been trained on more data, has better problem-solving capabilities, and higher accuracy, and produces more factual responses than its predecessors. Currently, GPT4 API is available through a waitlist, and it can be used with the ChatGPT Plus subscription too.
LLaMA: LLaMA is a large language model released by Facebook designed to help researchers in this subfield of AI. It has a variety of model sizes trained with parameters ranging from 7 billion to 65 billion. LLaMA can be used to research large language models, including exploring potential applications like answering questions, natural language understanding, capabilities and limitations of current language models, and developing techniques to improve those, evaluating, and mitigating biases. LLaMa is available under GPL-3 license and can be accessed by applying to the waitlist.
StableLM: StableLM is a recently released large language model by Stability AI. It is fully free and open source and it is trained with parameters ranging from 3 billion to 65 billion. StableLM is trained on a new experimental dataset built on The Pile, but three times larger with 1.5 trillion tokens of content. The richness of this dataset gives StableLM surprisingly high performance in conversational and coding tasks, despite its small size of 3 to 7 billion parameters for smaller models.
OpenAI’s ChatGPT is a large language model based on the GPT-3.5 Turbo architecture, which is designed to generate human-like responses to text-based conversations. The model is trained on a massive corpus of text data using unsupervised learning techniques, which allows it to learn and generate natural language.
ChatGPT is built using a DNN architecture with multiple layers of processing units called transformers. These transformers are responsible for processing the input text and generating the output text. The model is trained using unsupervised language modeling, where it is tasked with predicting the next word in a sequence of text.
One of the key features of ChatGPT is its ability to generate long and coherent responses to text-based input. This is achieved through the use of MLE, which encourages the model to generate responses that are both grammatically and semantically meaningful.
In addition to its ability to generate natural language responses, ChatGPT can handle a multitude of conversational tasks. These include the ability to detect and respond to specific keywords or phrases, generate text-based summaries of long documents, and even perform simple arithmetic operations.
Let’s take a look at how we can use the OpenAI APIs for GPT3.5 Turbo and GPT4.
Most of us are aware of ChatGPT and have spent quite some time experimenting with it. Let’s take a look at how we can have a conversation with it using OpenAI APIs. First, we need to create an account on OpenAI and navigate to the View API Keys Section.
Once you have the API key, head over to the billing section and add your credit card. The cost per thousand tokens can be found on the OpenAI pricing page.
Now let’s see how we can invoke the APIs to use the GPT3.5-turbo model:
import openai
openai.api_key = 'asdadsa-Enter-Your-API-Key-Here'
def prompt_model(prompts, temperature=0.0, model="gpt-3.5-turbo"):
messages = [{"role": "system", "content": "You are a helpful assistant."}]
for prompt in prompts:
messages.append({"role": "user", "content": prompt})
response = openai.ChatCompletion.create(
model=model, temperature=temperature, messages=messages
)
return response["choices"][0]["message"]["content"]
In the above code, the API call to invoke the GPT-3.5 Turbo Model is defined. Based on the set temperature and user input, the quality and type of response will vary. Now let’s try to talk to the bot and see the output:
prompts = []
prompts.append(
'''Write about this amazing blog written by author Suvojit about
large language models''')
for model in ['gpt-3.5-turbo']:
response = prompt_model(prompts, temperature=0.0, model=model)
print(f'\n{model} Model response: \n\n{response}')
Let’s see the output:
gpt-3.5-turbo Model response:
Suvojit's blog about large language models is an amazing read for anyone
interested in the field of natural language processing (NLP). In his blog,
Suvojit delves into the world of large language models, which are a type of
machine learning model that can process and understand human language.
Suvojit starts by explaining what large language models are and how they work.
He then goes on to discuss the different types of large language models, such
as GPT-3 and BERT, and how they are trained using massive amounts of data.
One of the most interesting parts of Suvojit's blog is the
potential applications of large language models. He explains how these models
can be used for language translation, text summarization, and
even generating human-like responses in chatbots.
Suvojit addresses some of the concerns and criticisms surrounding large
language models, their potential to perpetuate biases and their high
computational requirements.
Overall, Suvojit's blog is a well-written and informative piece that provides
a great introduction to the world of large language models. Whether you're a
seasoned NLP expert or just starting out, this blog is definitely worth a read.
Now let’s switch to GPT-4 and see another example of the capabilities of this model.
import openai
openai.api_key = 'asdadsa-Enter-Your-API-Key-Here'
def prompt_model(prompts, temperature=0.0, model="gpt-4"):
messages = [{"role": "system", "content": "You are a helpful assistant."}]
for prompt in prompts:
messages.append({"role": "user", "content": prompt})
response = openai.ChatCompletion.create(
model=model, temperature=temperature, messages=messages
)
return response["choices"][0]["message"]["content"]
prompts = []
prompts.append(
'''Prove the Cauchy-Schwarz inequality''')
for model in ['gpt-4']:
response = prompt_model(prompts, temperature=0.0, model=model)
print(f'\n{model} Model response: \n\n{response}')
Below is the response from the model with the proof:
gpt-4 Model response:
The Cauchy-Schwarz inequality is a fundamental inequality in mathematics
that holds for any two vectors in an inner product space. It states that
the absolute value of the inner product of two vectors is less than or
equal to the product of the norms (lengths) of the two vectors. In its
simplest form, for two real-valued vectors u and v, the inequality is given by:
|<u, v>| ≤ ||u|| ||v||
Here, <u, v> denotes the inner product of u and v, and ||u|| and ||v||
denote the norms of u and v, respectively.
Proof:
Let u = (u1, u2, ..., un) and v = (v1, v2, ..., vn) be two n-dimensional
real-valued vectors. We will prove the Cauchy-Schwarz inequality using
the method of "completing the square."
Consider the following non-negative quadratic expression in a real variable t:
Q(t) = ||tu + v||^2
Expanding Q(t) using the definition of the inner product and the norm, we get:
Q(t) = <tu + v, tu + v> = (t^2)(<u, u>) + 2t(<u, v>) + <v, v>
Since Q(t) is non-negative for all t, its discriminant must be non-positive:
D = (2(<u, v>))^2 - 4(<u, u>)(<v, v>) ≤ 0
Simplifying and dividing by 4, we get:
(<u, v>)^2 ≤ (<u, u>)(<v, v>)
Taking the square root of both sides, we obtain the Cauchy-Schwarz inequality:
|<u, v>| ≤ sqrt(<u, u>) sqrt(<v, v>) = ||u|| ||v||
This completes the proof of the Cauchy-Schwarz inequality.
This demonstrates the remarkable capabilities of GPT-3.5 and GPT-4 in pushing the boundaries of natural language processing and paving the way for more sophisticated models in the future. With continued development and refinement, GPT-3.5 and GPT-4 are poised to become game-changers in the field of AI and natural language, with unprecedented capabilities and advancements in language technology. Let’s look at some of these applications.
Let’s look at some of the possible applications of ChatGPT:
GPT-4 and its successors have vast potential for future development, both in terms of their capabilities and their applications. As technology continues to evolve, these models will become even more sophisticated in their ability to understand and generate natural language, and may even develop new features like emotion recognition and contextual understanding. While the mathematical capabilities of ChatGPT are currently limited, this might soon be a thing of the past, and educators and students can find it helpful to have an AI assistant guide them in their academic pursuits, increasing the availability of knowledge and reasoning.
However, there are some major concerns:
Large language model-based chatbots like ChatGPT have revolutionized natural language processing and made significant advancements in language understanding and generation. Compared to rule-based chatbots, these LLM-based chatbots have demonstrated remarkable abilities to perform a wide range of language tasks, including text completion, translation, summarization, and more. Their massive training data and sophisticated algorithms have enabled them to produce highly accurate and coherent output that mimics human-like language. However, their size and energy consumption have raised concerns about their environmental impact. Despite these challenges, the potential benefits of large language models are undeniable, and they continue to drive innovation and research in the field of artificial intelligence.
Key Takeaways:
References
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.