This article was published as a part of the Data Science Blogathon.
“You’re either the one that creates the automation or you’re getting automated.” Tom Preston-Werner.
Automation affects almost every aspect of modern life, and it can be used in any industry. Automation minimizes human input and eliminates doing repetitive tasks.
Python is one of many excellent languages that can be used for extensive automation. Python scripts can be created for innovative automated solutions like scraping data, manipulating documents, working with image files, solving complex equations, working with file systems, etc. This article will use the file system and automate the structuring of files in the desired folder.
We will create a small Python code that will automate the structuring of files in a folder. It takes three inputs –
1. Path which needs structuring
2. Desired folder name that will get created
3. File type which needs structuring
Currently, the code supports the following file types – SQL, DOCX, DOC, PDF, TXT, MSG, XLSX, XLS, PY, JPEG, PNG. However, the code can be easily extended to support other file types.
Note: The code can be extended based on more use-cases. That is, more coding can be done on top of it. This automation is for learning purposes to show how file systems can be used in Python.
Problem Statement: In this article, we have several files in a folder, including three excel files and three documents. We want to structure all document files into a single folder. We specify the three inputs shown above in the code, resulting in the desired folder created with the required files. We have taken a small folder structure as input to test the code. However, this code will make more sense when the folder has vast files, and we need quick structuring.
Importing libraries
We are using four libraries – os, shutil, time and sys
1. os module provides way if using operation system dependent functionality. You can use this link for further reading
2. shutil module provides way for operations on files and collection of files. You can use this link for further reading
3. time module provides time related functions. You can use this link for further reading
4. exit module is used to exit a program when an error occurs. You can use this link for further reading
import osimport shutilimport timefrom sys import exit
Taking inputs from the user
This code requires to take certain user inputs.
1. Path – The user has to specify the full path of the folder which requires management. If the user specifies a wrong path or path does not exist, the code will exit and not run any further lines
2. Folder name – The user has to specify the name of a new folder that will be created after structuring. This code is written to allow flebility to users to give desired names
3. File type – As the code runs on single file type once at a time, the user input has to specified to give the desire file type that needs structuring
path = input("Enter the path where you need restructuring. A new folder will be created there : ")if not (os.path.isdir(path)) : print("The path - "+path+" does not exist. Enter the correct path. So exiting...") time.sleep(2) exit(1)folder_name = input("Enter the name of the folder you want to create : ") file_type = input("Enter the file type that you wish to structure from "+path+" in the above folder "+folder_name+" (SQL,DOCX,DOC,PDF,TXT,MSG,XLSX,XLS,PY,JPEG,PNG formats are acceptable): ") newpath = path + "/" + folder_name found_file_type = False
Writing the python script main body that will execute structuring
try :# If newpath does not exist, create a new oneif not os.path.exists(newpath): os.makedirs(newpath)os.chdir('C:\')# Setting the source and destination foldersdir_src = path+"/"dir_dst = newpath# For SQL Filesif file_type in ('SQL','sql','.SQL','.sql') :for filename in os.listdir(dir_src):if filename.endswith('.sql'):shutil.move( dir_src + filename, dir_dst)found_file_type = True# For DOCX Filesif file_type in ('docx','DOCX','.docx','.DOCX') :for filename in os.listdir(dir_src):if filename.endswith('.docx'):shutil.move( dir_src + filename, dir_dst)found_file_type = True# For DOC Filesif file_type in ('doc','DOC','.doc','.DOC') :for filename in os.listdir(dir_src):if filename.endswith('.doc'):shutil.move( dir_src + filename, dir_dst)found_file_type = True# For PDF Filesif file_type in ('pdf','PDF','.pdf','.PDF') :for filename in os.listdir(dir_src):if filename.endswith('.pdf'):shutil.move( dir_src + filename, dir_dst)found_file_type = True# For TXT Filesif file_type in ('txt','TXT','.txt','.TXT') :for filename in os.listdir(dir_src):if filename.endswith('.txt'):shutil.move( dir_src + filename, dir_dst)found_file_type = True# For OUTLOOK MESSAGE Filesif file_type in ('msg','MSG','.msg','.MSG') :for filename in os.listdir(dir_src):if filename.endswith('.msg'):shutil.move( dir_src + filename, dir_dst)found_file_type = True# For EXCEL Filesif file_type in ('xlsx','XLSX','.xlsx','.XLSX') :for filename in os.listdir(dir_src):if filename.endswith('.xlsx'):shutil.move( dir_src + filename, dir_dst)found_file_type = Trueif file_type in ('xls','XLS','.xls','.XLS', 'xlsx', '.xlsx') :for filename in os.listdir(dir_src):if filename.endswith('.xls'):shutil.move( dir_src + filename, dir_dst)found_file_type = True# For Python Script Filesif file_type in ('py','PY','.py','.PY') :for filename in os.listdir(dir_src):if filename.endswith('.py'):shutil.move( dir_src + filename, dir_dst)found_file_type = True# For JPEG Filesif file_type in ('jpeg','JPEG','.jpeg','.JPEG') :for filename in os.listdir(dir_src):if filename.endswith('.jpeg'):shutil.move( dir_src + filename, dir_dst)found_file_type = True# For PNG Filesif file_type in ('png','PNG','.png','.PNG') :for filename in os.listdir(dir_src):if filename.endswith('.png'):shutil.move( dir_src + filename, dir_dst)found_file_type = Trueprint(folder_name, 'created succefully!')except OSError : print("The path given by you : "+path+" does not exist");if not found_file_type :print("No ."+file_type+" file type is present in the path : "+path+". So exiting...")if os.path.exists(newpath) : os.removedirs(newpath)time.sleep(2)exit(1)
It is always good to handle exceptions within your code to eliminate unwanted code crashes. We are using the try-except block in this code because the user input can be unexpected. Our Python code would run still and give the user a proper error message to understand whether the code is successfully executed or not.
There are two distinguishable kinds of errors – syntax errors and exceptions. Syntax errors are parsing errors that we encounter when we write incorrect syntax such as missing color after for loop, missing brackets after print, improper indentation after if condition, etc. Error encountered during execution is called exceptions. Examples are zero division error, name error, type error, etc. While it is necessary to handle both kinds of errors, exceptions can be handled using try-except block. If exceptions are not handled well, it will interrupt the program with an error message not understandable for users. Proper error message well enables end-users to appreciate whether the program has been successfully executed or not. You can read more about error handling here.
Suppose my folder has the following files:
I want to structure all my Docx files using this automation code. I run the python file and give user inputs as shown below. The folder we want to create is BKP_DOCS, and we want to structure all the DOCX file type
The success message appears as shown below
The folder is arranged and looks like this –
It is necessary to test boundary conditions for any program to run uninterrupted. We will show the two exception conditions here –
1. Incorrect folder path – The folder path is incorrect. The user may specify an incorrect way, some spelling mistake or the folder path does not exist. In this case, our program should not proceed and give a proper failing message. In the below image, I input a path that doesn’t exist in my system; I get the following message
2. Incorrect file type – The file type is incorrect. The user may specify a random name as a file type or a file type that is still not coded into the program. In such a case also, the code should exit with a proper failing message, as shown below. In our case, we specified “ABC” as a file type. Since invalid, code shuts down and doesn’t execute further
The error messages are proper and easy to understand. The user can correct the user inputs with the failing messages. This way, we can create boundary conditions test cases. We should test our programs thoroughly before it is sent for production or used by end-users.
I hope this article explained the automation problem statement and its solution in detail. This way, we can automate using a file system in Python. File system automation can be fun and challenging, and many different use cases can be created. One more sophisticated use case in this area can be creating folders for all file types present with a given user input path. Such that the user only needs to provide folder path as input. The program will create as many folders as file types and bundle similar file types in the folder. Interested users can try this out and extend the code for this use case. Happy automating!
Read more blogs on Python here.
Image sources – Author
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.