Python CSV Quick & Simple Guide | Read, Write & Manipulate (Updated 202)

Chirag Goyal Last Updated : 13 Dec, 2024

11 min read

Introduction

Pandas csv tutorial is a typical file format that is often used in the industry to store data because it stores data in a typical text file and maintains a tabular format. Thus, it is necessary to have a good understanding of the CSV format to easily handle the data you are going to deal with on a day-to-day basis. In this beginner-friendly article, I’ll introduce you to the Pandas CSV Operations format and show you how to handle CSV format in python.

Learning Objectives

In this tutorial, you will learn about the CSV file format and its basic operations using Python.
We will create a Pandas DataFrame by importing a local csv file and importing csv file from a URL.
Lastly, we will see how to handle CSV format under different circumstances.

This article was published as a part of the Data Science Blogathon.

Introduction
What Is a CSV?
Basic Operations With CSV Files
Why Do We Use CSV Files?
How to Read CSV File using Pandas read_csv
Pandas DataFrame
Different Parameters in CSV
Conclusion
Frequently Asked Questions

What Is a CSV?

CSV (Comma Separated Values) may be a simple file format accustomed to storing tabular data, like a spreadsheet or database. CSV file stores tabular data (numbers and text) in plain text. Each line of the file could be a data record. Each record consists of 1 or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format.

Basic Operations With CSV Files

In Basic operations, we are going to understand the subsequently three things:

Working with CSV files
Opening a CSV file
Saving a CSV file

How to Work With CSV Files?

Working with CSV files isn’t a tedious task, but it’s pretty straightforward. However, counting on your workflow, there can become caveats that you might want to observe out for.

How to Open a CSV File?

If you’ve got a CSV file, you’ll open it in Excel without much trouble. Just open Excel, and find the CSV file to work with (or right-click on the CSV file and choose Open in Excel). After you open the file, you’ll notice that the data is simply plain text put into different cells. When dealing with CSV files in Python using pandas, you can perform various operations like reading, writing, and manipulating the data easily.

How to Save a CSV File?

If you wish to save lots of your current workbook into a CSV file, you have got to use the subsequent commands:

File -> Save As… and choose CSV file.

More often than not, you’ll get this warning:

Understanding CSV Files - saving as CSV from xls

Let’s understand what this error is saying to us.

Here Excel is trying to mention that your CSV Files don’t save any reasonable formatting in the least.

For Example, Column widths, font styles, colors, etc., won’t be saved.

Just your plain old data are saved in an exceedingly comma-separated file.

Note that even after you put it aside, Excel will still show the formats you just had, so don’t be fooled by this and think that after you open the workbook again, your formats will still be there. They won’t be.

Even after you open up a CSV get in Excel, if you apply even any minor formatting, like adjusting the column widths to work out the info, Excel will still warn you that you can’t save the formats that you just added. You’ll get a warning like this one:

Image Source: Google Images

So, the aim is to note that your formats can never be saved in CSV Files.

Why Do We Use CSV Files?

CSV files are the simplest way to speak data between different applications, making them versatile for data exchange. Say you had a database application and wanted to export the info to a file. If you wish to export it to an Excel file, the database application will support exporting to XLS* files. Additionally, when working with CSV files in Python, particularly with pandas CSV operations, you can easily read, write, and manipulate tabular data for analysis and processing.

However, since the CSV file format is extremely straightforward and lightweight (much more than XLS* files), it’s easier for varied applications to support it. In its basic usage, you have a line of text, with every column of data going in alternative ways by a comma. That’s it. And since of this simplicity, it’s simple for developers to make Export / Import practicality with CSV files to transfer knowledge between applications instead of a lot of sophisticated file formats.

For Example,

Let’s take the tabular data in the given form below:

If we convert this data into a CSV Format, then it looks like this:

Now, we are completed with all the basics of CSV files. So, In the later part of the article, we will be discussing working with CSV files in a detailed manner.

How to Read CSV File using Pandas read_csv

To read a CSV file using Pandas’ read_csv function, you first need to import the Pandas library. Then, you can use the read_csv function by passing the path to the CSV file as an argument. This function will return a DataFrame containing the data from the CSV file. Here’s an example code snippet:

import pandas as pd

# Path to the CSV file
csv_file_path = "path/to/your/file.csv"

# Read the CSV file into a DataFrame
df = pd.read_csv(csv_file_path)

# Now you can work with the DataFrame 'df'

Replace "path/to/your/file.csv" with the actual path to your CSV file. Once the CSV file is read into the DataFrame, you can perform various operations and analysis on the data using Pandas.

Pandas DataFrame

One of the easiest ways to read a csv file in Python is using the Pandas library. It has a very useful function, read_csv(), which allows us to read a CSV file and create a Pandas DataFrame. A Pandas DataFrame is nothing but a two-dimensional data structure storing data like a table with rows and columns. So here I’ll show you how to work with CSV files using Python Pandas.

Importing Pandas

Firstly, we import the necessary the Pandas Library of Python.

import pandas as pd

Now we will see how to load and read the dataset easily.

Read-csv Function

It is an important pandas function to read CSV files and do operations on them.
This function helps us load the file from your local machine or any URL.

Opening a Local CSV File

If the file is present in the same location as in our Python File, then give the file name only to load that file; otherwise, you have to give the complete filepath to the file. Following is the syntax to read a csv file and create a pandas dataframe from it.

df = pd.read_csv('aug_train.csv')
df

Output:

Opening a CSV File From a URL

If the file is not present directly in our local machine, but we have to fetch the data from a given URL, then we take the help of the requests module to load that data.

Python Code:

import requests
from io import StringIO
import pandas as pd
url = "https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0"}
req = requests.get(url, headers=headers)
data = StringIO(req.text)
df = pd.read_csv(data)
print(df.head())

Output:

Different Parameters in CSV

Sep Parameter

Suppose we have a dataset in which entities in a particular row are not separated by a comma but by some other delimiter or separator. In that case, we have to use the sep parameter to specify that specific separator or delimiter. For example, a tab or a semi-colon could also be used as separtors.

Say, we have a tsv file, i.e., entities are tab-separated, and if we try to load this data directly, then loads of all the entities are combined.

import pandas as pd
pd.read_csv('movie_titles_metadata.tsv')

Output:

To solve the above problem for the CSV file, we have to overwrite the sep parameter to ‘t‘ instead of ‘,‘ which is a default separator.

import pandas as pd
pd.read_csv('movie_titles_metadata.tsv',sep='t')

Output:

In the above example, we have observed that the first row is treated as the column’s name, and to solve this problem and make our custom name for the columns, we have to specify the list of words with names as the name of the list.

pd.read_csv('movie_titles_metadata.tsv',sep='t',names=['sno','name','release_year','rating','votes','genres'])

Output:

Index-col Parameter

The index-col parameter allows us to set which columns to use as the data frame’s index. The default value for this parameter is None, and pandas automatically will add a new column starting from 0 to describe the index column.

So, it allows us to use a column as the row labels for a given DataFrame. This function comes in handy quite often. Let’s say we have an ID column present with our dataset, which is not impacting our predictions, so we make that column our index for rows instead of the default.

pd.read_csv('aug_train.csv',index_col='enrollee_id')

Output:

Header Parameter

Header parameter allows us to specify which row will be used as column names for your data frame. It expects input as an int value or a list of int values.

The default value for this parameter is header=0, which implies that the first row of the CSV file will be considered as column names.

pd.read_csv('test.csv',header=1)

Output:

Use-cols Parameter

The use-cols parameter specifies which columns to import from the complete dataset to the data frame. It can take input from either a list of int values or directly from the column names.

This function comes in handy when we have to analyze data on just some columns and not on all the columns of our dataset.

So, this parameter returns a subset of the columns from your dataset.

pd.read_csv('aug_train.csv',usecols=['enrollee_id','gender','education_level'])

Output:

Squeeze Parameter

If true and only one column is passed, the squeeze parameter returns a pandas series instead of a DataFrame. A Pandas series is a single-dimensional data structure storing only a single column with a single data type.

pd.read_csv('aug_train.csv',usecols=['gender'],squeeze=True)

Output:

Skiprows Parameter

Skiprows parameter is used to skip past rows in the new data frame. This is useful as we can easily iterate through the csv data and read only the relevant rows.

pd.read_csv('aug_train.csv',skiprows=[0,1])

Output:

Nrows Parameter

The nrows function only reads the fixed number of rows from the file. Needs an int value.

This parameter comes in handy when we have a huge dataset, and we want to load our dataset in chunks instead of directly loading the complete dataset.

pd.read_csv('aug_train.csv',nrows=100)

Output:

Encoding Parameter

The encoding parameter helps determine which encoding you must use for UTF when reading or writing the files.

Sometimes, our files are not encoded in the default form, i.e., UTF-8. So, saving that with a text editor or adding the param “encoding=’utf-8′ doesn’t work. In both cases, it returns the error.

So, to resolve this issue, we call our read_csv function with encoding=’latin1′, encoding=’iso-8859-1′ or encoding=’cp1252′ (these are some of the various encodings found on Windows).

pd.read_csv('zomato.csv',encoding='latin-1')

Output:

Error-bad-lines Parameter

If we have a dataset in which some lines is having too many fields (For Example, a CSV line with too many commas), then by default, it raises and causes an exception, and no DataFrame will be returned.

So, to resolve these types of issues, we have to make this parameter False, then these “bad lines” will be dropped from the DataFrame that is returned. (Only valid with C parser)

pd.read_csv('BX-Books.csv', sep=';', encoding="latin-1",error_bad_lines=False)

Output:

Dtype Parameter

The dtype parameter stands for ‘data type’ for converting data or columns. For Example, {‘a’: np.float64, ‘b’: np.int32}

This function comes in handy when we need to convert our columns from float data type to int data type.

pd.read_csv('aug_train.csv',dtype={'target':int}).info()

Output:

Parse-dates Parameter

If we make this parameter True, then it tries to parse the index.

For Example, If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column, and if we have to combine columns 1 and 3 and parse them as a single date column, then use [[1,3]].

pd.read_csv('IPL Matches 2008-2020.csv',parse_dates=['date']).info()

Output:

Convertors Parameter

This parameter helps us to convert values in the columns based on a custom function given by the user.

def rename(name):
    if name == "Royal Challengers Bangalore":
        return "RCB"
    else:
        return name

rename("Royal Challengers Bangalore")

Output:

‘RCB’

pd.read_csv('IPL Matches 2008-2020.csv',converters={'team1':rename})

Output:

Na-values Parameter

As we know, the missing values by default will be NaN. If we want other strings to be considered as NaN, then we have to use this parameter. It expects a list of strings as the input.

Sometimes in our dataset, another type of symbol is used to make them missing values, so at that time to understand those values as missing, we use this parameter.

pd.read_csv('aug_train.csv',na_values=['Male',])

Output:

This completes our discussion!

NOTE: In this article, we will only discuss those very useful parameters while working with CSV files daily. But if you are interested to learn more parameters, then refer to the official website of Pandas.

Conclusion

So in this beginner-friendly tutorial, we gained a good understanding of the Pandas CSV Operations format and how to handle it in Python. We looked at how to create a Pandas DataFrame and Pandas read CSV file using read_csv() function. We also looked at various parameters of read_csv() function like sep, header, nrows, etc. parameters to read the CSV file in Pandas in the most efficient manner to serve our purpose.

Key Takeaways

CSV format is a simple file format that stores data in a tabular form, like a spreadsheet or database.
CSV format is simple, straightforward, lightweight, and supported by various applications.

Frequently Asked Questions

Q1.What is csv operations?

A. CSV operations, like reading, writing, and manipulating data, are essential parts of working with CSV files. This is covered in a Pandas CSV tutorial.

Q2. What are the advantages of reading a CSV file in Pandas?

A. In Pandas, we read a CSV file using the read_csv() function. This function supports a lot of parameters which makes reading CSV files very easy. For example, we can easily read a large CSV file in chunks using the nrows parameters, which read only a set number of rows. Or the skiprows parameter, which allows skipping a certain number of rows while reading a CSV file.

Q3. How do pandas work with CSV files in python?

A. read_csv() function imports a CSV file to DataFrame format.

Q4.How to manipulate csv with Pandas?

To work with CSV files using Pandas, follow a Pandas CSV tutorial. You can read, write, and manipulate data easily once you’ve imported the Pandas library in Python.

Q5.How to iterate through csv using Pandas?

Learn how to iterate through CSV files with Pandas in a simple Pandas CSV tutorial. After reading the file into a DataFrame, you can access and manipulate each row of data.

Chirag Goyal

I am a B.Tech. student (Computer Science major) currently in the pre-final year of my undergrad. My interest lies in the field of Data Science and Machine Learning. I have been pursuing this interest and am eager to work more in these directions. I feel proud to share that I am one of the best students in my class who has a desire to learn many new things in my field.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

David Myers

Hi Thanks for this. I have a queston at the start of the article you have " For Example, Let’s take the tabular data in the given form below: Table Data If we convert this data into a CSV Format, then it looks like this: CSV Data " What is the Python code that converts the Table Data to CSV Data, specifically the content of the "Designed By" column (containing spaces). Did I miss something? Thanks

Reading list

Intoduction to Python

Variables and data types

OOPs Concepts

Conditional statement

Looping Constructs

Data Structures

String Manipulation

Functions

Modules, Packages and Standard Libraries

Python Libraries for Data Science

Reading Data Files in Python

Preprocessing, Subsetting and Modifying Pandas Dataframes

Sorting and Aggregating Data in Pandas

Visualizing Patterns and Trends in Data

Programming

Python CSV Quick & Simple Guide | Read, Write & Manipulate (Updated 202)

Introduction

Table of contents

What Is a CSV?

Basic Operations With CSV Files

How to Work With CSV Files?

How to Open a CSV File?

How to Save a CSV File?

Why Do We Use CSV Files?

How to Read CSV File using Pandas read_csv

Pandas DataFrame

Importing Pandas

Read-csv Function

Opening a Local CSV File

Opening a CSV File From a URL

Different Parameters in CSV

Sep Parameter

Index-col Parameter

Header Parameter

Use-cols Parameter

Squeeze Parameter

Skiprows Parameter

Nrows Parameter

Encoding Parameter

Error-bad-lines Parameter

Dtype Parameter

Parse-dates Parameter

Convertors Parameter

Na-values Parameter

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie