Imagine loving a podcast and wishing to remember the best bits, but it’s all sound, no text. What do you do? That’s where cool tools like LLMs and Audio-to-Text translators step in. They magically turn spoken words into written notes, letting you easily pick out the gems and create handy bullet points. So, your favorite podcast moments are just a transcription away! Since its first debut in November 2022, LLM has been all the rage. LLM can be used for various tasks, and text summarization is an essential application. We can have summarization to many other modes apart from text, such as audio & video. We can use LLM to enhance podcast accessibility and generate bulleted highlights for ease of use or take notes for future reference.
PaLM (Pathways Language LLM) is a critical LLM established by Google AI last year in April 2022. This year, in March 2023, PaLM 2’s second version was released, an improved and more updated version. It is intended to have superior bilingual, coding, and thinking abilities. The advantage of using PaLM 2 LLM API over other LLMs is that its API is freely available. Unlike OpenAI’s ChatGPT, it performs better and has improved reasoning abilities than other LLMs.
In this article, we will be learning how to use these tools, namely PaLM 2 API and Maker Suite, to create a simple Podcast Text Highlighter and learn how to optimize the settings of the LLM model to generate better-bulleted summaries. Learn the features of these tools and try to understand different use cases where they can be used. So let’s get started!
This article was published as a part of the Data Science Blogathon.
PALM 2 is a massive NN model with 540 billion parameters, which is scaled using the Pathways method to achieve breakthrough performance. PaLM 540B outperforms the current state of the art on a variety of multi-step reasoning tasks and outperforms average human performance on the just-released BIG-bench benchmark, achieving breakthrough performance. It learns the relationship between words and phrases and can use this knowledge for different tasks.
Pathways is a new way of AI architecture thinking that addresses many of the weaknesses of existing systems. Machine learning models tend to overspecialize at single tasks when they could excel at many. Below are the underlying concepts of this architecture:
Palm 2 has been trained in over 100 languages and can pass language proficiency exams at the expert level. It is the second largest model in parameter size; the first is GPT-4 with 1 trillion parameters. It has highly efficient training on 6k chips(TPU v4) across 2 pods or clusters. PaLM uses a standard Transformer model architecture in a decoder-only.
It is used in intermediate MLP layers, which have a better performance quality than ReLU, GeLU or Swish. SwiGLU activations are more efficient than traditional activation functions, and they also help improve LLMs’ stability. SwiGLU uses a gating mechanism, which allows it to activate neurons based on the input it receives selectively. This can help to reduce overfitting and improve generalization. The SwiGLU activation function is a piecewise linear function that is defined as follows:
SwiGLU(x) = max(x, 0) + min(α(x – ReLU(x)), 0)
where x is the input to the function, ReLU(x) is the rectified linear unit function (i.e., max(x, 0)), and α is a tunable parameter that controls the shape of the negative part of the function.
The SwiGLU activation function is designed to address some of the limitations of the ReLU function, which can result in “dead” neurons that do not contribute to the output of a neural network. By introducing a piecewise linear negative slope, the SwiGLU function can help to prevent this problem and improve the performance of neural networks.
A parallel formulation is used in every transformer block instead of the serialized one used in the standard formulation. The parallel formulation enables 15% faster training at larger scales. Parallel formulation is a new way of training LLMs that allows them to be trained much faster than traditional LLMs. Traditional LLMs are trained on a single GPU, which can be prolonged. Parallel formulation will enable LLMs to be trained on multiple GPUs simultaneously, significantly speeding up the training process. Here is an example of how parallel formulation works. Imagine that we have an LLM that is trained on a single GPU. The LLM has a vocabulary of 10,000 words, and a vector of 100 dimensions represents each word. The LLM is trained on a dataset of 1 million sentences.
We need to iterate over the dataset and update the LLM’s parameters for each sentence to train the LLM. This process can be prolonged, especially if the dataset is large. With parallel formulation, we can prepare the LLM on multiple GPUs simultaneously. We can divide the dataset into 1000 batches, and each batch can be trained on a separate GPU. This significantly speeds up the training process because we can simultaneously prepare the LLM on 1000 batches.
The key/value is shared for each head instead of just one, which results in cost savings at autoregressive decoding time. We can say that in multi-head attention, the entire attention computation is replicated h times, whereas, in multi-query attention, each “head” of the query value Q has the same K and V transformation applied to it. The amount of computation performed by incremental MQA is similar to that of incremental MHA. The critical difference is the reduced amount of data read/written from memory with MQA.
Rotary Positional Embedding is a new type of positional embedding that unifies absolute and relative approaches and gives superior results. It incorporates the “relative” positions of two tokens rather than absolute positions while calculating the Self Attention. Transformers employ self-attention or cross-attention mechanisms that are agnostic to the order of tokens. This means the model perceives the input tokens as a set rather than a sequence. It thereby loses crucial information about the relationships between tokens based on their positions in the sequence. To mitigate this, positional encodings embed information about the token positions directly into the model.
This type of position embedding uses a rotation matrix to include explicit relative position dependency in the self-attention formulation. Rotary embeddings are essential for natural language processing because they allow models to understand better the context in which words are used. When a model has a better idea of the position of the input tokens, it can produce more accurate predictions. For example, a language model that uses RoPE might better understand that “I love pizza” and “Pizza is what I love” have different meanings due to word position. A model can make more nuanced predictions with a better understanding of relative positioning.
No biases were applied in dense and layer norms, which increased training stability for large models. This increases the training efficiency and stability of LLM and allows them to reduce redundant parameters and increase space utilization and scaling.
Palm provides many different variants of the model of different sizes. They have named various models based on animal names and their sizes.
The model parameters help us to modify and generate different responses for our prompt. Let us try to understand them one by one:
This influences the randomness of the model’s responses. A high temperature closer to 1 results in more diverse output and creative responses instead of the dry set of definitions. Suppose we want to understand the meaning of a particular word and its usage in this case, we do not require a creative response but dictionary meaning so we can keep the temperature closer to 0(deterministic responses). If we want to write an innovative article or story, we can maintain the temperature closer to 1.
A token refers to a chunk of text and determines how much text a model can process. A larger token limit lets the model gain a broader scope of information at a time, and a smaller limit restricts the amount of tokens it can handle. Example – Palm 2 can now take 8,000 tokens simultaneously as input.
When generating text, the model considers many possible words to follow the current one. The top-k sampling restricts which next-word choices to k most likely words. A lower k-parameter value makes the content more predictable, but a higher number makes it more diversified.
It is the probability threshold for considering words and controls the diversity of output. The model keeps considering the next word out of the top k choices until total probability reaches the top-p value. This means that rather than focusing on the top few most likely words, the model might accept less likely words if they achieve the top-p probability together, resulting in a more diversified output. A higher probability results in a more diverse combination.
This denotes the number of outputs generated for a particular input that is, we can specify if we want to see more than one output of model response and accordingly consider which one to take. Below in the image, we can see the example where we get 2 responses for the same input when we set Max Output to 2.
We can download any podcast audio using this link by pasting our podcast url. Here, we use the Indian Express podcast url.
!pip install openai-whisper
import whisper
Initially, we used the “tiny” model variant, and then we used the “base” variant, which is more extensive and gives better results regarding the spelling of words and grammar. We transcribe two audio podcasts.
Note: After downloading the mp3 audio of the podcast from the link as mentioned above, upload it in your colab environment files and paste the path of the audio file in transcribe function as shown.
# Load whisper model
whisper_model = whisper.load_model("base")
# Transcribe audio
def transcribe(file_path: str) -> str:
# `fp16` defaults to `True`, which tells the model to attempt to run on GPU.
# we'll run this on the CPU for local demonstration purposes by setting it to `False`.
transcription = whisper_model.transcribe(file_path, fp16=False)
return transcription['text']
transcript = transcribe('/content/CATCH-UP-2023-10th-October-v1.mp3')
print(transcript)
Output
#OUTPUT
This is the catch-up on 3 things for the Indian Express, and I am Flora Swine.
It's the 10th of October, and here are the headlines. Four days after the Hamas attack, the
Israeli Army said today that they have regained control of the Gaza border.
It warned the population to flee to neighboring Egypt in a grim
reminder of the expected retaliation. The Israeli Army also
reported the discovery of the bodies of 1500 Hamas militants within Israeli territory
. The ongoing conflict has claimed approximately 1,600 lives, with 900 casualties in
Israel and nearly 700 in Gaza. Meanwhile, Prime Minister Narendra Modi took to
extradite and said that he spoke with Israeli Prime Minister Benjamin Netanyahu,
assuring him that India stands firmly with Israel and is difficult to guard. He also
said that India strongly and unequivocally condemned terrorism in all its forms and
manifestations. Chief Justice of India, D.Y. Chandrachud, said today that the
The Supreme Court's role is not to micromanage issues that arise across the country. He
stressed that local matters are best left to the jurisdiction of the respective High
Court. He was presiding over a three-judge bench. The CGI Maynthese remarks while
hearing a matter related to captive elephants and said, Court, we have to
have a broader functional understanding as a court. What is the role of the Supreme Court in the
nation? Not to deal with micromanagement of issues that arise all over the country.
Two militants linked to the terror outfit Lashkare Thaibarvak were killed in an encounter
with security forces in the Soviet district of Jaman Kashmir today. The encounter broke
out when the security forces launched an anti-militancy operation in the Al-Sipura area,
acting on intelligence regarding the presence of militants. The disease militants
have been identified as Morifat Magbul and Jazim Farok. Chintanubhadhai was sentenced
to life imprisonment today for his involvement in abetting and conspiring to murder
his estranged wife, Hema Obadhai, in 2015. The Sessions Court also imposed life
imprisonment sentences on three co-accused, namely Vijay Rajvahar, Pradeep Rajvahar,
and Shivkuma Rajvahar. On Saturday, the prosecution sought the death penalty for
all four individuals. The ICC Men's World Cup 2023 has two matches slated for today.
Pakistanis facing Shilankain Hagradwadwal Bangladesh is taking on England in Haramshalla.
In other World Cup news, New Zealand beats the Dutch to win their second game in a row at
the competition. The previously triumphed over defending Champions England in the
tournament opener, placing them at the top of the points table. This was a catch-up on
three things by the Indian Express.
Now, we use this podcast summary as training input, prepare its sample model response independently, and use the other as test input. We go to this site and generate a bullet summary.
We adjust the model parameter settings to generate summaries.
Generate the code using the API Key of Palm API. We have generated our own API key from the site.
"""
At the command line, only need to run once to install the package via pip:
$ pip install google-generativeai
"""
import google.generativeai as palm
palm.configure(api_key="API_KEY")
defaults = {
'model': 'models/text-bison-001',
'temperature': 1,
'candidate_count': 1,
'top_k': 40,
'top_p': 0.95,
'max_output_tokens': 1024,
'stop_sequences': [],
'safety_settings': [{"category":"HARM_CATEGORY_DEROGATORY","threshold":4},{"category":"HARM_CATEGORY_TOXICITY","threshold":4},{"category":"HARM_CATEGORY_VIOLENCE","threshold":4},{"category":"HARM_CATEGORY_SEXUAL","threshold":4},{"category":"HARM_CATEGORY_MEDICAL","threshold":4},{"category":"HARM_CATEGORY_DANGEROUS","threshold":4}],
}
Sentence = "This is the catch up on three things for the Indian Express and I am Flora Swain. It\'s the 10th of October and here are the headlines. Four days after the Hamas attacked the Israeli army said today that they have regained control of the Gaza border. It warned the population there to flee to neighboring Egypt while they can in a grim reminder of the retaliation that is expected to follow. The Israeli army also reported the discovery of the bodies of 1500 Hamas militants within Israeli territory. The ongoing conflict has claimed approximately 1600 lives with 900 casualties in Israel and nearly 700 in Gaza. Meanwhile, Prime Minister Narendra Modi took to X today and said that he spoke with Israeli Prime Minister Benjamin Netanyahu assuring him that India stands firmly with Israel and this difficult art. He also said that India strongly and unequivocally condemns terrorism in all its forms and manifestations. Chief Justice of India D.Y. Chandrachud said today that the Supreme Court's role is not to micromanage issues that arise across the country. He stressed that local matters are best left to the jurisdiction of the respective high courts. Prziding over a three-judge bench the CGI made these remarks while hearing a matter related to captive elephants and said, quote, we have to as a court have broader functional understanding. What is the role of the Supreme Court in the nation? Not to deal with micromanagement of issues which arise all over the country. Unquote. Two militants linked to the terror outfit Lashkaretayabah were killed in an encounter with security forces in the Soapian District of Jammun Kashmir today. The encounter broke out after security forces launched an anti-militancy operation in the Alsepura area acting on intelligence regarding the presence of militants. The deceased militants have been identified as Mureffat Maghbul and Jasm Farukh. Chintanubhadi Haya was sentenced to life imprisonment today for his involvement in a betting and conspiring to murder his estranged wife, Hema Upadhyay in 2015. The Sessions Court also imposed life imprisonment sentences on three co-accused, namely Vijay Rajpur, Pradeep Rajpur and Shivkumar Rajpur. On Saturday the prosecution have sought the death penalty for all four individuals. The ICC men's World Cup 2023 has two matches slated for today. Pakistan is facing Sri Lanka in Hyderabad while Bangladesh is taking on England in Haramshalla. In other World Cup news New Zealand beat the Dutch to win their second game in a row at the competition. They previously triumphed over defending champions England in the tournament opener, placing them at the top of the points table. This was the Catchup on Three Things by the Indian Express."
prompt = f"""Transform a sentence into a bulleted list.
Sentence: This is the catch up on three things for the Indian Express and I'm Flora Swain. It's the 11th of October and here are the headlines. Days after the Hamas attack, the Israeli military said that it is carrying out strikes in Lebanon after an anti-tank guided missile was fired from the neighboring nation at one of its posts near the blue line. As for reports, there was a massive buildup of troops along the Israel Gaza border as the country prepared for a ground invasion in the coming days. More than 2,000 people have lost their lives so far in the war which started on Saturday. The Supreme Court today took a serious exception to AIM's authorities seeking clarification of its order from the 9th of October which allowed the abortion of a 26-week-old fetus. The AIM's court cited some fresh concerns and asked why the concerns were not conveyed to the court earlier when it had sought a medical opinion on the women's request seeking permission for medical termination of pregnancy. A special bench of justices, B.V. Nagaratma and Hema Kohli also pulled up the center for approaching Chief Justice of India D.Y. Chandrachud's bench on Tuesday against its order. Samajwadi party president Akhilesh Yadav was denied permission to go inside the J.K.N.R. and International Center to offer floral tribute to Freedom Fighter J.K.N.R. on his birth anniversary. Officials cited security reasons for not allowing the former UPCM into the center today. After he was denied permission, Akhilesh reads the building and jumped the center's boundary wall along with other SP leaders and workers. K.H.N.R. ensued on the spot while the police tried to stop them from entering the premises. The poster girl of Kerala's adult literacy program, K.R.Y.H.A. Amma, died at the age of 101 at her house in Alapurha today. In 2018, she made headlines by becoming the top scorer in the state literacy mission's flagship adult literacy program Akshana Laksham. At the age of 96, K.R.Y.H.A. scored 98 out of 100 marks in the exam that tested writing and mathematical skills. CM Pinery Vijayan in his condolence message said K.R.Y.A. was Kerala's pride and a model the individual. Indian Afghanistan are battling each other in the ninth match of the ICC Cricket World Cup 2023 at the Orange JT Stadium in New Delhi today. India added to your favourites for this match having convincingly won their opening match against Australia. On the other hand, Afghanistan lost their opening match to Bangladesh but they will be looking to perform better against India. This was the Catch Up on Three Things by the Indian Express.
Bulleted: * Israeli military carried out strikes in Lebanon after an anti-tank guided missile was fired from the neighboring nation.
* SC took a serious exception to AIIMS authorities seeking clarification of its order on abortion of a 26-week-old fetus.
* Akhilesh Yadav was denied permission to go inside the J.K.N.R. and International Center to offer floral tribute to Freedom Fighter J.K.N.R. * Poster girl of Kerala's adult literacy program, K.R.Y.H.A. Amma, died at the age of 101.
* India Afghanistan are battling each other in the ninth match of the ICC Cricket World Cup 2023 at the Orange JT Stadium in New Delhi today.
Sentence: {Sentence}
Bulleted:"""
response = palm.generate_text(
**defaults,
prompt=prompt
)
print(response.result)
Below is the resulting output of our podcast. Most of the content is accurate except for spelling and names of proper nouns, such as Dharamshala and Lashkar-e-Taiba, etc.
Powerful tools, LLMs can combine with other tools to generate quick prototypes, enabling us to test and experiment with various LLM use cases. Since LLM is a very new technology, its potential use and implementation require a lot of back-and-forth experiments. This is where tools like Maker Suite empower data science and analytics professionals to quickly bring their ideas into code using minimal time and effort and focusing on fine-tuning and improving the data and other essential elements.
A. Yes, the PALM API is open to the public for free use but production isn’t free.
A. For now, Maker Suite only allows one model, Text-Bison.
A. GPT-4 has around 1 trillion parameters compared to 540 B parameters of PaLM. Also, it supports multimodal features such as images as input and output. So GPT-4 offers more features and services.
A. PaLM supports responses in other languages but is available only in one model, which is not open for public review and is a paid service.
A. The safety settings in Palm API prevent any violent, derogatory, Medical, or Sexual content in the model responses. In our podcast summary, we block violent content, but once we change the settings and reduce the filter, we can get proper output.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.