Interesting Python Projects With Code for Beginners – Part 2

Gaurav Sharma Last Updated : 19 Jan, 2022

6 min read

This article was published as a part of the Data Science Blogathon.

Introduction

In my previous article, I discussed three python projects with codes and explained them in detail. Also gave you some examples which you can try. All these projects were beginner-friendly. This time, we will look at some more python projects with codes again. And the more projects you will make, the more you will get better in the programming and the language.

Python Projects with Codes

Image Source: https://realpython.com

Let’s get started!

1. Text Extraction using OpenCV and OCR

OpenCV is a library of programming functions used mainly for computer vision tasks. With this, you can process images, resize images, object detection, etc. We will see how to extract text in a snap using contours.

Install these:

pip install pytesseract

pip install opencv-python

Python-tesseract is Google’s Tessaract-OCR engine used to get text from images. You will need this to execute a tesseract file and Download it from here.

Now let’s begin with the text extractions step by step:

1. Convert the image to Gray using cv2.COLOR_BGR2GRAY.

cv2.cvtColor(input_image, cv2.COLOR_BGR2GRAY)

2. Finding contours in the image:

To find contours use cv2.findContours(). It takes three parameters: the source image, contour retrieval mode, contour approximation method. This will return a python list of all contours. Contour is nothing but a NumPy array of (x,y) coordinates of boundary points in the object.

3. Apply OCR.

By looping through each contour, take x,y and width, height using cv2.boundingRect() function. Then draw a rectangle function in image using cv2.rectange(). This has five parameters: input image, (x, y), (x+w, y+h), boundary colour for rectangle, size of the boundary.

4. Crop the rectangular region and pass that to tesseract to extract text. Save your content in a file by opening it in append mode.

For more details, go through code comments also.

Code:

import cv2
import pytesseract
# path to Tesseract-OCR in your computer
pytesseract.pytesseract.tesseract_cmd = 'path_to_tesseract.exe'
img = cv2.imread("input.png") #input image
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)    # Converting image to gray scale
# performing OTSU threshold
ret, img_thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)

# give structure shape and kernel size
# kernel size increases or decreases the area of the rectangle to be detected.
rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (18, 18))
#dilation on the threshold image
dilation = cv2.dilate(img_thresh , rect_kernel, iterations = 1)
img_contours, hierarchy = cv2.findContours(dilation, cv2.RETR_EXTERNAL,
                                                cv2.CHAIN_APPROX_NONE)
im2 = img.copy()
file = open("Output.txt", "w+") #text file to save results
file.write("")
file.close()
#loop through each contour
for contour in img_contours:
    x, y, w, h = cv2.boundingRect(contour)
    rect = cv2.rectangle(im2, (x, y), (x + w, y + h), (0, 255, 0), 2)
    cropped_image = im2[y:y + h, x:x + w] #crop the text block
    file = open("Output.txt", "a")
    text = pytesseract.image_to_string(cropped_image) #applying OCR
    file.write(text)
    file.write("n")
    file.close()

Input image:

Python Projects with Codes Output image:

Python Projects with Codes

2. Convert your PDF File to Audio Speech

Say you have some book as PDF to read, but you are feeling too lazy to scroll; how good it would be then if that PDF is converted to an audiobook. So, let’s implement this using python.

We will need these two packages:

pyttsx3: It is for Text to Speech, and it will help the machine speak.

PyPDF2: It is a PDF toolkit. It is capable of extracting document information, merging documents, etc.

Install them using these commands:

pip install pyttsx3
pip install PyPDF2

Steps:

Import the required modules.
Use PdfFileReader() to read PDF file.
getPage() method is used to select the page to be read from.
Extract the text using extract text().
By using pyttx3, speak out the text.

Code:

# import the modules
import PyPDF2
import pyttsx3
  
# path of your PDF file
path = open('Book.pdf', 'rb')
  
# PdfFileReader object
pdfReaderObj = PyPDF2.PdfFileReader(path)
  
# the page with which you want to start
from_page = pdfReaderObj.getPage(12)
content = from_page.extractText()
# reading the text
speak = pyttsx3.init()
speak.say(content)
speak.runAndWait()

That’s it! It will do the job. This small code is beneficial to you when you don’t want to read; you can hear.

Next, you can provide a GUI to this project using tikinter or anything else. You can give a GUI to enter the pdf path, the page number to start from, a stop button. Try this!

Let’s move to the next project.

3. Reading mails and downloading attachments from the mailbox

Let’s understand what the benefit of reading the mailbox with Python is. So, let’s suppose if we are working on a project where some data comes daily in word or excel, which is required for the script as input or to Machine learning model as input. So, if you have to download this data file daily and give it to the hand, it will be hectic. But if we can automate this step, read this file, and download the required attachment, it would be a great help. So, let’s implement this.

We will use pywin32 to implement automatic attachment download from a particular mail. It can access Windows applications like Excel, PowerPoint, Word, Outlook, etc., to perform some actions. We will focus on Outlook and download attachments from the outlook mailbox.

Note: This does not need authentication like user email id or password. It can access Outlook that is already logged in to your machine. (Keep the outlook app open while running the script).

In the above example, we chose smtplib because it can only send emails and not download attachments. So, we will go with pywin32 to download attachments from Outlook, and it will be pretty straightforward. Let’s look at the code.

Command to install: pip install pywin32

Import module

import win32com.client

Now, establish a connection to Outlook.

outlook = win32com.client.Dispatch(“Outlook.Application”).GetNamespace(“MAPI”)

Let’s try to access Inbox:

inbox = outlook.GetDefaultFolder(number)

This function takes a number/integer as input which will tell the index of the inbox folder in our outlook app.

To check the index of all folders, just run this code snippet:

import win32com.client
outlook=win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
for i in range(50):
  try:
    box = outlook.GetDefaultFolder(i)
    name = box.Name
    print(i, name)
  except:
    pass

Output:

3 Deleted Items
4 Outbox
5 Sent Items
6 Inbox
9 Calendar

As you can see in the output Inbox index is 6. So we will use 6 in the function.

inbox = outlook.GetDefaultFolder(6)

If you want to print the subject of all the emails in the inbox, use this:

messages = inbox.Items
# get the first email
message = messages.GetFirst()
# to loop through all the email in the inbox 
while True:
  try:
    print(message.subject) # get the subject of the email
    message = messages.GetNext() 
  except:
    message = messages.GetNext()

There are other properties also like “message. subject”, “message. senton”, which can be used accordingly.

Downloading Attachment

If you want to print all the names of attachments in a mail:

for attachment in message.Attachments:
    print(attachment.FileName)

Let’s download an attachment (an excel file with extension .xlsx) from a specific sender.

import win32com.client
import re
import os
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
message = messages.GetFirst()
while True:
  try:
    if re.search('Data Report', str(message.Subject).lower()) != None and  re.search("ABC prasad", str(message.Sender).lower()) != None:
      attachments = message.Attachments
      for attachment in message.Attachments:
        if ".xlsx" in attachment.FileName or ".XLSX" in attachment.FileName:
         attachment_name = str(attachment.FileName).lower()
        attachment.SaveASFile(os.path.join(download_folder_path, attachment_name))
    else:
      pass
    message = messages.GetNext()
  except:
    message = messages.GetNext()
exit

Explanation

This is the complete code to download an attachment from Outlook inbox. Inside try block, you can change conditions. For example, I am searching for those mails which have subjects such as Data Report and Sender name “ABC prasad”. So, it will iterate from the first mail in the inbox, and if the condition gets true, it will then look if that particular mail has an attachment with the extension .xlsx or .XLSX. So you can change all these things subject, sender, file type and download the file you want. Once it finds the file, it is saved to a path given as “download_folder_path”.

End Notes

We discussed three projects in a previous article and three in this article. I hope these python projects with codes helped you to polish your skill set. Just do some hands-on and try these; you will enjoy coding them. I hope you find this article helpful. Let’s connect on Linkedin.

Thanks for reading 🙂

Happy coding!

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Gaurav Sharma

Love Programming, Blog writing and Poetry

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Intoduction to Python

Variables and data types

OOPs Concepts

Conditional statement

Looping Constructs

Data Structures

String Manipulation

Functions

Modules, Packages and Standard Libraries

Python Libraries for Data Science

Reading Data Files in Python

Preprocessing, Subsetting and Modifying Pandas Dataframes

Sorting and Aggregating Data in Pandas

Visualizing Patterns and Trends in Data

Programming

Interesting Python Projects With Code for Beginners – Part 2

Introduction

1. Text Extraction using OpenCV and OCR

2. Convert your PDF File to Audio Speech

3. Reading mails and downloading attachments from the mailbox

Downloading Attachment

Explanation

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)

_gcl_au

SID