Create Book Summarizer in Python with GPT-3.5 in 10 Minutes

Gyan Prakash Tripathi Last Updated : 14 Jul, 2023

4 min read

Create Book Summarizer in Python with OpenAI's GPT 3.5 in 10 Minutes

Are you tired of reading lengthy books that take up much of your time? Do you wish to get a summary of the main points without having to go through the entire book? Well, look no further because we have the solution for you. This article will discuss creating a book summarizer in Python using OpenAI’s GPT-3.5 in just 10 minutes.

Learn More: How To Use ChatGPT API In Python

Hold on tight! We’ve got something extraordinary coming your way: an offer that will take your skills to new heights and expand your horizons. Calling all data science and AI enthusiasts to be part of the highly-anticipated DataHack Summit 2023. Save the dates from 2nd to 5th August and prepare to be amazed at the prestigious NIMHANS Convention Centre in Bangalore. This remarkable event is designed to ignite your passion with immersive hands-on sessions, game-changing industry insights, and endless networking opportunities. Don’t let this data revolution slip away – be there and become an integral part of the movement!

Steps on How to Create a Book Summarizer Using Python

Setting up the Environment
To begin with, we will be using Google Colab for this purpose. Generally, the books we download are in pdf format, and for this purpose, we will install the PyPDF2 library. Besides this, we will use OpenAI’s GPT-3.5 for the task at hand, and to query it, we will use OpenAI APIs. Let’s install the required libraries:
```
!pip install openai PyPDF2
```
Initializing Libraries
After installing the required libraries, let’s import them and initialize the local variables. We will also need to specify the location of the pdf file and add our OpenAI API key.
```
import openai
import PyPDF2
import os
import pandas as pd
import time
filepath= "<LOCATION OF YOUR PDF FILE>"
openai.api_key  = "<YOUR OPENAI API KEY>"
```

Setup for Querying the API

Now we will create a function for querying the GPT-3.5 Turbo model:


def get_completion(prompt, model="gpt-3.5-turbo"):
  messages = [{"role": "user", "content": prompt}]
  response = openai.ChatCompletion.create(
     model=model,
     messages=messages,
     temperature=0, # this is the degree of randomness of the model's output
  )
  return response.choices[0].message["content"]

Reading the PDF
Because OpenAI has a limit on the input prompt size, we would like to send the data to be summarized in parts. There can be multiple ways to split the text. For the sake of simplicity, we will divide the whole book on the basis of pages. A better strategy will be to split it on the basis of paragraphs. However, it will increase the number of API calls increasing the overall time.

We will store each page in a list and then summarize it.
```
# creating a pdf file object
pdfFileObject = open(filepath, 'rb')
# creating a pdf reader object
pdfReader = PyPDF2.PdfReader(pdfFileObject)
text=[]
summary=' '
#Storing the pages in a list
for i in range(0,len(pdfReader.pages)):
  # creating a page object
  pageObj = pdfReader.pages[i].extract_text()
  pageObj= pageObj.replace('\t\r','')
  pageObj= pageObj.replace('\xa0','')
  # extracting text from page
  text.append(pageObj)
```

Prompting

Now we will start prompting. This is a matter of experiment to figure out the best prompt. However, there are a few basic guidelines on how to do it efficiently. In some upcoming articles, we will discuss the art of prompting in more detail. You can use the prompt for now, which has worked well for me. You can also play around with it:


for i in range(len(text)):
  prompt =f"""
  Your task is to extract relevant information from a text on the page of a book. This information will be used to create a book summary.
  Extract relevant information from the following text, which is delimited with triple backticks.\
  Be sure to preserve the important details.
  Text: ```{text[i]}```
  """
  try:
    response = get_completion(prompt)
  except:
    response = get_completion(prompt)
  print(response)
  summary= summary+' ' +response +'\n\n'
  result.append(response)
  time.sleep(19)  #You can query the model only 3 times in a minute for free, so we need to put some delay

Saving the Summary
Finally, we will save the summary we have obtained in a text file:
```
with open('summary.txt', 'w') as out:
  out.write(summary)
```

Advantages

Creating a book summarizer using Python in OpenAI’s GPT 3.5 keeps all the useful information and compresses the document by ~75%. To make it shorter, you can add that clause to the prompt. For example, you can ask it to extract helpful information within 30 or 40 words. Overall, it is still a good way to summarize the text for personal use.
Because we have not used the ‘summarize’ word but asked it to ‘extract’ the key information, it generally covers everything.

Learn More: Comprehensive Guide to Text Summarization using Deep Learning in Python

Disadvantages

It takes a lot of time to summarize the text. Summarizing a 300-page book will take around 100 minutes. However, you can improve it by getting a more than 1-page summary in each query.
Each response of the API starts with ‘This page says…’ or ‘The text talks about…’. Thus, one may have to clean the text obtained thoroughly.

Conclusion

Thus, creating a book summarizer in Python with GPT-3.5 API is a quick and efficient way to summarize lengthy books. Although it does have a few disadvantages, like taking time and requiring a basic cleaning of the text, the advantages outweigh them. With this approach, you can extract all the essential information from the book and compress it into a summary that can be read quickly. Hence, making it an excellent tool for students and avid readers alike. So, go ahead and try it out for yourself and make your reading experience more efficient and enjoyable.

Also Read: What is ChatGPT? Everything You Need to Know

But before you wrap things up, I’ve got something incredible to share with you. Gear up for a mind-blowing lineup of workshops at the highly-anticipated DataHack Summit 2023 that will take your skills to the next level. From ‘Applied Machine Learning with Generative AI‘, ‘Build Scalable Machine Learning Models’, to ‘Mastering LLMs: Training, Fine-tuning, and Best Practices’ (and more), get ready to unleash your creativity and expertise like never before. These workshops are meticulously crafted to equip you with practical skills and real-world knowledge. With immersive hands-on experiences, you’ll gain the confidence to tackle any data challenge that comes your way with ease. Secure your spot and register now for the DataHack Summit 2023 to embark on an unforgettable journey!

Gyan Prakash Tripathi

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

Artificial Intelligence Books Generative AI Python Python Use Cases

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Eli

What about if a sentence from one page continues to the second, or in general when the context in the end of a page and a start of the next page is the same? I think it would work better if you divide by X paragraphs and not by a page. A page isn't a good part, many times it's a byproduct of the book

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

Create Book Summarizer in Python with GPT-3.5 in 10 Minutes

Steps on How to Create a Book Summarizer Using Python

Advantages

Disadvantages

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang