Errors are the bane of a programmer’s existence. You write an awesome piece of code, are ready to execute it and build a powerful machine learning model, and then poof. Python throws up an unexpected error, ending your hope of quick code execution.
Every single one of us has faced this issue and emerged from it a better programmer. Dealing with bugs and errors is what builds our confidence in the long run and teaches us valuable lessons along the way.
We have some rules while writing programs in any programming language, such as not using a space when defining a variable name, adding a colon (:) after the if statement, and so on. If we don’t follow these rules, we run into Syntax Errors and our program refuses to execute until we squash those errors.
But there are occasions when the program is syntactically correct and it still throws up an error when we try to execute the program. What’s going on here? Well, these errors detected during the execution are called exceptions. And dealing with these errors is called exception handling.
We’ll be talking all about exception handling in Python here!
Why should you learn exception handling? Here is the answer using two-pronged argument:
Here’s a list of the common exception you’ll come across in Python:
ZeroDivisionError
: It is raised when you try to divide a number by zeroImportError
: It is raised when you try to import the library that is not installed or you have provided the wrong nameIndexError
: Raised when an index is not found in a sequence. For example, if the length of the list is 10 and you are trying to access the 11th index from that list, then you will get this errorIndentationError
: Raised when indentation is not specified properlyValueError
: Raised when the built-in function for a data type has the valid type of arguments, but the arguments have invalid values specifiedException
: Base class for all exceptions. If you are not sure about which exception may occur, you can use the base class. It will handle all of themYou can read about more common exceptions here.
The try() function is used in programming languages like Python to handle exceptions or errors that may occur during the execution of a block of code. It allows you to catch and handle exceptions gracefully, preventing your program from crashing.
Let’s define a function to divide two numbers a and b. It will work fine if the value of b is non-zero but it will generate an error if the value of b is zero:
def division(a, b):
return a/b
# function works fine when you try to divide the number by a non-zero number
print(division(10, 2))
# >> 5.0
print(division(10,-3))
# >> -3.333333334
# Error when you try to divide the number by zero
print(division(10,0))
We can handle this using the try and except statement. First, the try clause will be executed which is the statements between the try and except keywords.
If no exception occurs, the except clause will be skipped. On the other hand, if an exception occurs during the execution of the try clause, then the rest of the try statements will be skipped:
In Python, we can also instruct a program to execute certain lines of code if no exception occurs using the else clause. Now, if no exception occurs in the above code, we want to print “No Error occurred!!”.
Let’s see how to do this:
Now, what if we need some sort of action that will execute whether the error occurred or not (like maintaining logs). For this, we have the finally clause in Python. It will always get executed whether the program gets any of the exceptions or not.
We will see how we can use the finally clause to write the logs later in this article.
Now, in the above example, I want to print the value of a and b after every execution regardless of whether the error occurred or not. Let’s see how to do that:
So far, we have seen exception handling on some random data. How about turning the lever up a notch and understanding this using a real-life example?
We have data that contains the details of employees like their education, age, number of trainings undertaken, etc. The data is divided into multiple directories region-wise. The details of employees belonging to the same region are stored in the same file.
Now, our task is to read all the files and concatenate them to form a single file. Let’s start by importing some of the required libraries.
To view the directory structure, we will use the glob library and to read the CSV files, we will use the Pandas library:
View the directory structure using the glob.glob function and the path of the target directory. You can download the directory structure here.
We can see that folder names are represented as some numbers and in the next step will go through each of the directories and see the files present:
In each of the folders, there is a CSV file present that contains the details of the employees of that particular region. You can open and view any CSV file. Below is the image of how data looks like in the region_1.csv file. It contains details of employees belongning to region 1:
Now, we know that there is a pattern in the directory and filename structure. In the directory n, there is a CSV file name region_n present. So now we will try to read all these files using a loop.
We know that the maximum number is 34 so we will use a for loop to iterate and read the files in order:
You can see that the file region_7 is not present. So, one of the simpler ways to deal with this is to put an if condition in the program – if the directory name is 7 then skip reading from that file.
But what if we have to read thousands of files together? It would be a tedious task to update the if condition every time we get an error.
Here, we will use the try and except statement to handle the errors. If there is any exception during the run time while reading any file, we will just skip that step and continue reading the next folder. We will print the file name with “File not found!!” if the error is FileNotFoundError and print the file name with “Another Error!!” if any other error occurs.
We can see that we got another error in file number 32. Let’s try to read this file separately:
There are some issues with File 32’s format. If you open the file region_32.csv, you will see that there are some comments added on top of the file and we can read that file using the skiprows parameter:
Let’s see how to handle this.
We will create two boolean variables – parse_error, file_not_found – and initialize both of them as False at the start of each iteration. So, if we get FileNotFoundError, then we’ll set file_not_found as True.
Then, we will print that particular file as missing and skip that iteration. If we get ParserError, then we’ll set parse_error as True and print that the particular file has an incorrect format and read the file again using the skiprows parameter.
Now, if no exception occurs, then the code under the else statement will execute. In the else statement, we will append the data frame to the data frame list:
Let’s say you want to create a log file to keep a track of which files are correct and which ones have errors. This is one use case of the finally statement. Whether you getthe error or not, the finally statement will execute.
So in the finally clause, we will write to the file regarding the status of the file at each iteration:
I use exception handling while scraping data from multiple webpages. I’d love to know how you use exception handling so comment down below with your thoughts and share them with the community.
If you found this article informative, then please share it with your friends and comment below your queries and feedback. I have listed some amazing articles related to Python and data science below for your reference:
A very well written and informative article - well done! I learned especially about logging the errors and that Pandas had own exceptions. In web scraping I use exceptions for missing data, missing html and sometimes to try and find the data on alternative web page sources.