This guide primarily introduces the readers to Cohere, an Enterprise AI platform for search, discovery, and advanced retrieval. Leveraging state-of-the-art Machine Learning techniques enables organizations to extract valuable insights, automate tasks, and enhance customer experiences through advanced understanding. Cohere empowers businesses and individuals across industries to unlock the full potential of their textual data, driving efficiency and innovation.
This article was published as a part of the Data Science Blogathon.
First, go to the Cohere dashboard. If you are an existing user, log in directly or sign up. Then, after a successful login, go to the side panel and select API Keys.
Then, create a new trial key by giving it a unique name and clicking Generate Trail Key. This will generate the API key that can be used for further connection establishments. Store the value in a safe place. Cohere provides a generous free plan, yet review the limits in the free plan so you are within the credit usage limit.
Now that you have the API key, install the official Cohere Python SDK.
pip install cohere
Then, after successful execution now, we need to verify the installation, for which we will create a Cohere client. Create a new Python file: file_name.py, and paste the following code:
import cohere
co = cohere.Client(COHERE_API_KEY)
print('Done...')
Then run the file using the command:
python file_name.py
If the output is Done, you have successfully installed Cohere and proceed further. To go with the flow of the guide, clone this GitHub repository locally, switch to the folder, and run the setup script:
git clone https://github.com/srini047/cohere-guide-blog
cd cohere-guide-blog
./run.sh
If you are getting the permission denied error, then there needs to be a change in the file execution rules. So, run the command to ensure the correct levels of execution scripts are set.
chmod +x ./run.sh ./exec.sh ./setup.sh
Then again, execute the ./run.sh file, which in turn runs the ./setup.sh and ./exec.sh.
One of the most commonly used endpoints is the /chat. We can use this to create content based on the prompt or the user input provided. The better the prompt, the output generated is personalized and realistic. The main parameters around this model are model, prompt, and temperature.
Model: There are four models available that bag this endpoint. They are command-light, command, command-nightly, and command-light-nightly. The first two are default versions, whereas `nightly` are experimental versions. The presence of `light` in the name suggests that the model is lightweight, and using these depends especially on the use case. If the model needs a faster response, a tradeoff with a fluent and coherent response is up to the consumer.
Prompt: This is the key to generating the response as required. The more precise and crafted the prompt, the more likely we will receive a desired response. One doesn’t need to be a prompt engineer for this. Rather, one understands the way a particular model works for a unique prompt and rephrases the prompt to generate better ones next time. The practical way of testing various prompts is through Cohere Playground. But a better approach through Python SDK can be found below:
# Define Generate endpoint
def generate_content(key, input, model, max_tokens, temp):
co = cohere.Client(key)
response = co.chat(
model=model, message=input, temperature=temp, max_tokens=max_tokens
)
return response.text
# Define model to be used
model="command-light-nightly"
# Define the prompt
prompt = "What is the product of first 10 natural numbers?"
# Define the temperature value
temperature = 0.7
# Define max possible tokens
max_tokens=1000
# Display the response
print("Temperature range: " + str(temperatures))
print(generate_content(COHERE_API_KEY, prompt, model, max_tokens, temperature))
This generates a response that contains the values of the product of the first 10 prime numbers. Since we use a nightly model, the responses are quicker than expected. As mentioned by Cohere, they are in the experimental stages, and there can sometimes be unexpected responses due to breaking changes.
Temperature: This value determines the randomness of the generation. They are positive floating point values ranging from 0.0 to 5.0, which defaults to 0.75. The lower the temperature value, the less random the output generated. Lower temperatures also consume extra time for the model to generate responses.
Another endpoint provided by Cohere is the /classify. This is useful for classifying or predicting the class of text based on a series of text and labels. This is referred to as the ClassifyExample(), a named tuple with standard values as text and its corresponding label:
from cohere import ClassifyExample
example = ClassifyExample(text="I'm so proud of you", label="positive")
We pass the model, example inputs, and sample input to classify as parameters to the API. The available models are embed-english-v2.0 (default), embed-multilingual-v2.0, and embed-english-light-v2.0. The quality of output depends on various factors like:
Till now we have been discussing single-class classification. However, the API does support multiclass classification, meaning more than one class is predicted at the output by the model. Let’s see a classic example of text-based sentiment classification to predict the sentiment of the text as positive or negative or neutral:
import cohere
from cohere import ClassifyExample
@st.cache_data
def classify_content(key, inputs):
co = cohere.Client(key)
examples = [
ClassifyExample(text="I'm so proud of you", label="positive"),
ClassifyExample(text="What a great time to be alive", label="positive"),
ClassifyExample(text="That's awesome work", label="positive"),
ClassifyExample(text="The service was amazing", label="positive"),
ClassifyExample(text="I love my family", label="positive"),
ClassifyExample(text="They don't care about me", label="negative"),
ClassifyExample(text="I hate this place", label="negative"),
ClassifyExample(text="The most ridiculous thing I've ever heard", label="negative"),
ClassifyExample(text="I am really frustrated", label="negative"),
ClassifyExample(text="This is so unfair", label="negative"),
ClassifyExample(text="This made me think", label="neutral"),
ClassifyExample(text="The good old days", label="neutral"),
ClassifyExample(text="What's the difference", label="neutral"),
ClassifyExample(text="You can't ignore this", label="neutral"),
ClassifyExample(text="That's how I see it", label="neutral"),
]
classifications = co.classify(inputs=inputs, examples=examples)
return (
"Provided sentence is: "
+ classifications.classifications[0].prediction.capitalize()
)
inputs=["Replace your content(s) to classify"]
model="embed-english-v2.0"
print(classify_content(COHERE_API_KEY, inputs, model))
This is a reference on how to leverage the classify endpoint. We find that example-type documents have multiple examples for a single class, namely positive, negative, or neutral. This ensures that the chosen model produces the most accurate results.
Calculating metrics like accuracy, f1-score, precision, recall, etc., is important to better understand how the model works. These all lead to confusion at the heart of any classification problem. By finding these values, we can see how our model performs, which will help us identify the best model for the use case.
This will help us choose the model that best suits our use case and understand the tradeoffs well before we move it to production or development. Moreover, all these tasks are useful yet must be performed manually or have a script that could perform this task repeatedly.
With the rise in textual data, judging the quality and conciseness of the article/context sometimes becomes difficult. So, we prefer skimming and scamming, but there is a high chance that we tend to skip golden content, considering we mistakenly leave the key terminologies. Therefore, it is necessary to convert it into short, readable text without losing its value. That’s where the Cohere’s /summarize endpoint comes to the rescue and does the job well.
import cohere
def summarize_content(key, input, model, extractiveness, format, temp):
co = cohere.Client(key)
response = co.summarize(
text=input,
model=model,
extractiveness=extractiveness,
format=format,
temperature=temp,
)
return response.summary
# Get the input
text = input("Enter the input (atleast 250 words for best results)): ")
# Define the model
model = "command-nightly"
# Set the extractiveness (how much value to retain)
extract = "medium"
# Define the format (paragraph, bullets, auto)
format = "auto"
# Define the temperature value
temperature = 0.7
# Display the summarized content
print(summarize_content(COHERE_API_KEY, text, model, extract, , temperature))
Let’s take a text transcript from here: A Practical Tutorial to Simple Linear Regression Using Python
Then we run the summarize function and get the following output:
With the rise of vector databases, storing the strings as floats is necessary. Embedding, in naive terms, means giving weights to each word in the sentence. The weights are assigned based on the importance of the word, thus adding meaning to the sentence. These are floating point values in the range of -1.0f to +1.0f. The reason to convert into floats between a specified range makes it easy to store these values in the vector databases. This brings uniformity and also helps make the search efficient. Cohere has provided the /embed endpoint.
Here, the input is a list of strings, model, and input_type. Most embeddings are float, but Cohere supports multiple data types, including int, unsigned, binary, etc., depending on the vector database and the use case.
import cohere
def embed_content(key, input):
co = cohere.Client(key)
response = co.embed(
texts=input.split(" "), model="embed-english-v3.0", input_type="classification"
)
return response
# Enter the sentence
message=("Enter your message: ")
# Display the values
print(embed_content(COHERE_API_KEY, message))
We get the embedding values that can be used for further processing like storage, retrieval, etc. These are useful, especially in RAG-based applications.
With the rise of chunks and data, there is a high chance that a retrieval might lead to a few tens to hundreds of closest retrieved chunks. Then there arises a doubt: is the top chunk always the correct one, or is someone at the second, third, or nth position from the top the accurate answer for a given prompt? This is where we need to reorder the retrieved embeddings/chunks based on multiple factors, not just the similarity. This is called the reranking of embeddings based on prompt, relevancy, and use case. This adds more value to each chunk retrieved, and we can ensure the right generation for each prompt. This satisfies the user’s satisfaction greatly and is useful for improving the business from an organizational perspective.
Cohere has provided us with /rerank endpoint that will take the documents, query, and model as input. Then we will get the documents in reranked order based on their relevancy score in descending order.
import cohere
def rerank_documents(key, docs, model, query):
co = cohere.Client(key)
response = co.rerank(
documents=docs.split(". "),
model=model,
query=query,
return_documents=True
)
return response.results
# Enter the input
docs=input("Enter the sentence: ")
docs=docs.split(". ")
# Define the model
model = "rerank-english-v3.0"
# Enter your sentence
query =("Enter your query: ")
# Display the reranked documents
print(rerank_documents(COHERE_API_KEY, docs, model, query))
I provided a text about myself as the input and then the prompt regarding my interests. The reranker then lists the documents based on the relevancy score. We can verify the document’s closeness to the prompt and the results manually.
If you followed the tutorial until now, there’s a Deploy button. If you follow the steps there, it’s just a piece of cake to take your application to production, like the one Cohere Guide here.
This article focuses on one of the leading enterprise AI platforms, Cohere. We have seen the major use cases of Cohere utilizing its featured endpoints, namely:
We saw how each endpoint works with different hyperparameters that make up the model using Streamlit as per the article’s GitHub repository. This makes the guide more interactive, and by the end, you will have deployed an application to the cloud.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.
A. According to Cohere’s nomenclature, nightly uses a lightweight and fast model. Most of these models are in active development, and sometimes, results will be inaccurate. Rest assured, this model serves its role well for development and testing purposes.
A. Temperature refers to how greedy the model has to behave. With less temperature, the model is more likely to provide the same output for the same input. This means that the randomness will be less and, simultaneously, precise to what the user might be expecting. To understand it better, click here.
A. Primary reasons for choosing Cohere are:
a. Driven by cutting-edge ML research
b. Continuous development from the team
c. Backed by strong tech giants and investors
d. Generous free tier and pricing scheme
More about the same – Click Here.
A. Till now, we have seen how to run Cohere and use its features locally. But let’s see how to take it to production; millions could access it. Cohere can be deployed using multiple cloud platforms like:
a. Streamlit
b. FastAPI
c. Google Apps Script
d. Docker/K8s
To know more – Click Here
A. We have seen Cohere as an Enterprise AI platform till now, but you can utilize these APIs to test locally for personal projects using the free API. Then, if you need to take that application to production, it’s better to use the production plan and pay for what you use the plan for. It is suited for individuals and small businesses. If your use case is larger, there is also an Enterprise plan. On the whole, Cohere has something to offer for all levels of users.