Google Sheets is one of the most popular and widely used alternatives to Excel. Its collaborative environment offers features such as real-time editing, and version control, and its tight integration with Google Suite which allows you to call Google Sheets in Google Docs, helps to bring the best of the Google workspace. You can easily load and work with Excel files programmatically using Pandas, one of the most popular data science libraries. Similarly, you can replicate this setup with Google Sheets.
As Google Sheets is a SaaS offering, one needs to access Google Sheets data using its API. You can access the API using various programming languages, including Java, JavaScript, Node.js, PHP, Ruby, Python, and Google’s own AppScript. For this article, we will focus on using Python. By leveraging Python, we can efficiently load Google Sheets data into Pandas data frames, a powerful tool for data manipulation. This allows us to perform transformations and analyses quickly. Once our changes are complete, we can push them back to Google Sheets using the gspread Python library, which provides a convenient interface for connecting to and interacting with the Google Sheets API.
This article was published as a part of the Data Science Blogathon.
As mentioned earlier, Google Sheets is a SaaS offering, so you need to prepare extra steps for automation. Google Cloud Platform (GCP), a popular cloud computing platform, offers a variety of services that help to interact with Google products along with the deployment of your custom projects.
Broadly, we need to follow these 3 steps to get started with Google Sheets automation.
Head over to https://console.cloud.google.com/ and sign up for a free account. Next from the top left, click on the project selection menu and select new project. Provide a project name, leaving the organization as “No organization”, hit create and your GCP project is set now.
We need to enable the Google Sheets and Google Drive API and create a service account. This special type of account allows us to access and manage Google Cloud resources without requiring human interaction. To enable the Google Sheets API, search for sheets in the top search bar and select “Google Sheets API”. Click allow and it will redirect us to the API details page. Here click on “Create Credentials” and it will open up the credentials creation form.
Select “Application Data” from the selection box and click Next. On the next screen, provide a meaningful name for the service account, as you will use it in later steps. Then, select the “Editor” role. Finally, click Done at the bottom.
Similarly, the Google Drive API can be enabled. Please note that we don’t need to create another service account for this API. The existing service account will be able to access both APIs. Now, we need to download the credentials JSON, which our Python script will consume to access Google Sheets. Click on the newly generated service account email, switch to the keys tab, click on the add key button to select the create new key option, select JSON, and then create.
Our Python script will use the generated credentials to access Google Sheets. However, we need to manually grant access to the files that our script will use. To do this, copy the email generated for the service account (found in the details tab of the service account) and add that email as an Editor to the desired files.
gspread is a Python API wrapper for Google Sheets. It encapsulates a lot of functionalities offered by the Google Sheets API under separate classes and access methods. It makes interaction with sheets API easy to navigate and one can quickly pick it up.
To set up the library in the local environment, one can use a simple pip command, as with any other Python package. Install the library in a separate environment as a best practice to avoid any dependency conflicts.
pip install gspread
The cell addresses in Google Sheets can be referenced using two popular notations:
Now that we have set up the necessary access and libraries, let’s test our code. In the spreadsheet containing the service account email, enter some random text in the first cell. We’ll attempt to retrieve this value using our Python script.
We will use the JSON module to load our credentials and pass it to the gspread’s “service_account_from_dict()” function. This will return a Google Sheets client object and this object can be used to open any Google Sheet using the “open()” function. See the code below.
import gspread
import json
with open('creds.json') as f:
credentials = json.load(f)
gc = gspread.service_account_from_dict(credentials)
sh = gc.open("ArticleDemo")
There are two alternative ways to open a Google Sheet instead of a title name. These alternative ways eliminate the title name dependency as in Google Workspace, multiple files can have the same title. In the case of spreadsheets, if there are two files with the same title, then the latest file will be accessed by the API. We can access the spreadsheets using the file URL or the spreadsheet’s unique ID which proceeds the following link: “https://docs.google.com/spreadsheets/d/<unique_id>/edit”. Below is the code to access the spreadsheet via URL or unique ID.
## Access via unique ID
sh = gc.open_by_key("1R97twcM0FfFNSsrh_0FjDDg-HcQF5PLHbhRxu9pTV_Q")
## Access via URL
sh = gc.open_by_url("https://docs.google.com/spreadsheets/d/1R97twcM0FfFNSsrh_0FjDDg-HcQF5PLHbhRxu9pTV_Q/edit?gid=0#gid=0")
The following code will read the value entered earlier in the sheet. The code working will be explained in the later sections of the article.
print(sh.sheet1.acell('A1').value)
This will return the value present in the A1 cell in the sheet, which in our case is “GSheet: This is the first cell”. Now we are all set to deep dive into the gspread library and explore all the available options.
Note: The sh variable holds the spreadsheet object and it will be referred to throughout the guide
There could be many use cases where a spreadsheet is created programmatically. One could be building a solution to publish data and insights for their users. Along with this, they might want to share this sheet directly with the user.
create()
function of the gspread client. Pass the title of the new spreadsheet as a parameter, and if you want to specify the location, use the folder_id
parameter.sh = gc.create('ArticleDemoTest') ## Creating a new spreadsheet
sh.share(email_address='[email protected]', perm_type='user', role='writer', notify=True, email_message="This is a test file")
Every spreadsheet is a collection of worksheets. A simple analogy to this is how a book has multiple pages. Using gspread, users can access, modify, delete, or create new worksheets. Let’s take a look at each of these functions.
The worksheet(s) of a spreadsheet object can be accessed using the following methods of the worksheet object:
Here is the sample code for all the listed methods.
print(sh.get_worksheet(0))
print(sh.worksheet("ArticleWorkSheet1"))
print(sh.sheet1)
print(sh.get_worksheet_by_id(0))
print("Now fetching all sheets...")
## Returning all worksheets
for ws in sh.worksheets():
print(ws)
All of these print statements return the worksheet object
Apart from existing worksheets in the spreadsheet, we can programmatically create new worksheets in the same spreadsheet. This approach can be useful when processing data from an existing worksheet and publishing the results in a separate worksheet.
To create a new worksheet, we need to use the “add_worksheet()” function of the worksheet object. It takes the following parameters.
The below code will create a worksheet with 100 rows and 20 columns and place the worksheet in the second position.
sh.add_worksheet('ArticleWorkSheet1.5', rows=100, cols=20, index=1)
And it did place it in the second position (index + 1)
You can rename worksheet titles using the update_title()
function of the worksheet object, which accepts the new title as a parameter.
print(sh.worksheet("ArticleWorkSheet3").update_title("ArticleWorkSheet2.5"))
A worksheet can be deleted from a spreadsheet using the following function of the worksheet object:
The selection of which function to use depends on the use case the script is made. Below is the code sample demonstrating the usage of both functions.
sh.del_worksheet(sh.worksheet("ArticleWorkSheet2.5"))
sh.del_worksheet_by_id('602396579')
We are slowly narrowing down from top to bottom and reaching the smallest (and most important) unit of our worksheet, a cell. A cell is an intersection of a row and a column. For gspread library, it holds the following properties:
The below sample code accesses all the properties of a cell. The particular cell for inspection is returned using the cell function of the worksheet.
sampleCell = sh.worksheet("ArticleWorkSheet1").cell(row=1, col=1)
print('Row: {}\nColumn: {}\nValue: {}\nAddress: {}'.format(sampleCell.row, sampleCell.col, sampleCell.value, sampleCell.address))
All these accessors will come into play once we are dealing with higher-order functions of the library.
Let’s begin adding new rows and columns to our existing sample worksheet to have some data to work on for the later sections of this guide. The insertion as an operation is supported in two ways by the gspread library.
You can insert a row or column at a specific position using the insert_row()
, insert_rows()
, and insert_cols()
functions of the worksheet object. These functions allow us to add the row(s) or columns at a particular location in a worksheet. The function specifics are as below:
insert_row()
function, you determine the offset using the index
parameter, while in the insert_rows()
function, you specify it with the row
parameter.This insertion is only applicable to rows. It allows us to insert rows after a particular table range, where the position is unknown. Again, the insertion can be done in a single or multi-row manner.
Here is the sample code that:
sampleWorksheet.insert_row(
['A', 'B', 'C', 'D']
)
sampleWorksheet.insert_rows(
[
['KG', 54, 23, 12],
['OG', 34, 12, 34],
['ME', 23, 45, 90],
['YE', 65, 12, 54]
], row=2
)
Now, let’s do the following steps on top of this:
sampleWorksheet.append_rows(
[
['SN', 67, 87, 45],
['AR', 56, 23, 65]
],
table_range="A1:D5"
)
sampleWorksheet.insert_cols(
[
['E', 56, 34, 65, 34, 76, 45]
],
col=5
)
Note: The sampleWorksheet variable holds the worksheet object and it will be referred to throughout the guide.
In the last sections, we have prepared our data programmatically using various insertion operations. Now, we can fetch the data inserted using various read functions. We will see how to fetch cells and then move to fetch values from a range of cells and the entire worksheet.
The most basic read operation on a worksheet involves getting the value or any other cell property as described in the previous sections. To fetch a single cell, there are two functions:
Both these functions return a cell object and we have already seen how to get the value from these objects. The acell function was used in the section where we established a connection with the Google Sheets API.
print(sampleWorksheet.acell('A1').row)
print(sampleWorksheet.cell(1, 1).value)
range()
function of the worksheet object. This function accepts various input forms, such as A1 notation, numeric boundaries, or named ranges. If used without input, it returns all cells in the worksheet in a single API call. For use cases involving cell properties, this function helps filter out the desired cells and perform further actions.print(sampleWorksheet.get_all_cells())
print(sampleWorksheet.range('B4:E5'))
Users usually create multiple miniature tables in the same worksheet for better accessibility. In such cases, we need to refine our fetching range to the exact addresses of these table ranges. To fetch such table ranges, we can use the following two functions of the worksheet object:
print('Get Range: {}'.format(sampleWorksheet.get("A1:D4")))
print('Batch Get Range: {}'.format(sampleWorksheet.batch_get([
"A1:D4",
"B4:E3"
]))
We can fetch all the values of a row or column using the “row_values()” and “col_values()” functions of the worksheet object. Both functions take the position (numbering from 1) of a row or column and return the values in a list.
print(sampleWorksheet.row_values(1))
print(sampleWorksheet.col_values(4))
One of the best use cases of fetching an entire worksheet would be to load this data directly into a pandas data frame and then do the post-processing or analysis as per the requirement. The entire data can be returned using following functions of the worksheet object:
You can directly pass the outputs of both functions to the Pandas DataFrame function to obtain the worksheet table as a Pandas DataFrame.
import pandas as pd
print(pd.DataFrame(sampleWorksheet.get_all_records()))
print(pd.DataFrame(sampleWorksheet.get_all_values()))
Updating the existing data of the spreadsheet is the most crucial function that can be performed easily using gspread library functions. There are multiple ways to update the cells of a spreadsheet, single cell updation to multiple cells of a range and then to multiple ranges with a single API call.
A single cell of a worksheet can be updated using the following functions of the worksheet object.
print(sampleWorksheet.update_acell('A2', 'Kaustubh'))
print(sampleWorksheet.update_acell('A3', 'Oggy'))
print(sampleWorksheet.update([['Hello']], 'A4'))
You can update a range of cells in a worksheet using the following two functions from the worksheet object.
rangeOfCells = sampleWorksheet.range('B2:B7')
for cell in rangeOfCells:
newValue = int(cell.value) + 10
cell.value = newValue
print(sampleWorksheet.update_cells(rangeOfCells))
The above code fetches a range of cells, adds 10 to their value, and updates them in a single API call.
In the above section, we were able to update multiple cells in a range with a single API call. This behavior can be extended to multiple ranges as well. It means we can update multiple groups of cells with one call. The “batch_update()” function takes a list of dictionaries with keys as range and values. The range key value should be the A1 notation range or a named range and the values key value as the list of list of values.
range1 = 'C2:C7'
range2 = 'E2:E7'
bothRangeValues = sampleWorksheet.batch_get([
range1,
range2
])
range1Values, range2Values = bothRangeValues
range1UpdatedValues = [[int(x[0]) + 10] for x in range1Values]
range2UpdatedValues = [[int(x[0]) + 20] for x in range2Values]
print(sampleWorksheet.batch_update([
{
'range': range1,
'values': range1UpdatedValues
},
{
'range': range2,
'values': range2UpdatedValues
}
]))
The above code fetches two ranges using the “batch_get()” function, then updates their values locally, and then uses the “batch_update()” function to push back the updated values to the Google Sheets. The output of this update looks like this:
Till this point, we have inserted, read, and updated the data in the worksheet. We can perform delete operations to remove redundant or unnecessary data from the worksheet. The “delete_rows()” and “delete_colums()” function takes the “start_index” to be deleted. If “end_index” is specified, then it deletes all the columns in the index range of start and end.
print(sampleWorksheet.delete_columns(4))
print(sampleWorksheet.delete_rows(6))
The Google Sheets API enables you to search for cells by matching a string or a regular expression. You can perform case-sensitive or case-insensitive searches and narrow the search to specific rows or columns if desired. Use these two worksheet functions to find matching cells:
import re
print(sampleWorksheet.find('64', in_column=2))
searchRe = re.compile(r'(a|A)')
print(sampleWorksheet.findall(searchRe))
In Excel, you can format worksheets in various ways, including text highlights, formatting, borders, alignment, and numerous functions. The Google Sheets also offers a variety of formatting options for cells. The whole list of fields is available in Google Sheets Cells documentation.
You can use the format()
function of the gspread
worksheet object to specify the cell or range where you want to apply formatting. Provide the format as a JSON dictionary, which includes all the formatting key fields and their values.
The below code will apply borders to all the cells of the table.
borderFormatting = {
"style": "SOLID",
"colorStyle": {"rgbColor": {"red": 0, "green": 0, "blue": 0, "alpha": 1}},
}
print(
sampleWorksheet.format(
"A1:D6",
format={
"borders": {
"top": borderFormatting,
"bottom": borderFormatting,
"left": borderFormatting,
"right": borderFormatting,
},
},
)
)
We can also apply batch formatting to format multiple ranges at the same time. This saves a lot of time writing different format calls for every new change. The “batch_format()” function takes the list of dictionaries containing two important keys. The first key is the range key which defines the cell’s scope and the format key which contains the formatting dictionary.
Let’s do the following on our sample table using the “batch_format()” function:
borderFormatting = {
"style": "SOLID",
"colorStyle": {"rgbColor": {"red": 0, "green": 0, "blue": 0, "alpha": 1}},
}
formats = [
{
"range": "A1:D6",
"format": {
"borders": {
"top": borderFormatting,
"bottom": borderFormatting,
"left": borderFormatting,
"right": borderFormatting,
},
"horizontalAlignment": "CENTER",
},
},
{
"range": "A1:D1",
"format": {
"textFormat": {
"bold": True,
},
"backgroundColorStyle": {
"rgbColor": {"red": 0.8, "green": 0.8, "blue": 1, "alpha": 0.8}
},
},
},
]
print(sampleWorksheet.batch_format(formats))
And this is the final state of our table.
It might be possible that we want to clear the range before carrying out the operations. For clearing the cell ranges, the “batch_clear()” function of the worksheet object can be used. This takes the list of ranges that need to be cleared. Let’s clear column C from our sample table.
print(sampleWorksheet.batch_clear(["C1:C6"]))
Note: Clear function only clears the values and not the formatting applied.
The entire worksheet can be cleared using the “clear()” function of the worksheet object.
print(sampleWorksheet.clear())
We have done a lot of operations on Google Sheets using the gspread library. This library is just a wrapper that prepares the user-passed data into the format that’s acceptable and makes the API calls to Google projects associated with the sheets.It works so that the developer does not need to understand the underlying API calls, payloads, and responses. The developer interacts only with the abstracted functions.
While this is good for developers who are just playing around, for production and critical tasks, a developer needs to understand how the API calls are consuming the quota. While the usage of Google Sheets API is free, there are some restrictions to how many API calls can be made.
Due to such limitations, a lot of users encounter the famous 429 error that reads as “Too many requests”. For example, the current quota is 300 requests per minute per project. For some reason, if your script is sending more than 300 requests, then the additional requests will not be processed. Exponential backoff is one such method that implements a retry mechanism based on generating random wait times. Such mechanisms can be deployed to tackle these limitations.
In this guide we created a Google Service account to perform all the operations one would perform on the Google Sheets UI. We explored a lot of functions such as adding, updating, and deleting data. We also explored how to format sheets and Google Sheets Automation using Python.
The Google Sheets API offers a lot more functionalities such as merging cells, making protected ranges, hiding cells, adding notes, copy ranges, or even adding filters, all operations programmatically! While the documentation for the gspread library lacks these explanations, one can go ahead to explore the Google Sheets official documentation and also check the API reference section of the gspread documentation that gives high-level information about all the functions implemented in the library.
If you want to read/explore every article of mine, then head over to my master article list.
I hope you liked my article. For any doubts, queries, or potential opportunities, you can reach out to me via LinkedIn — in/kaustubh-gupta
A. A retry mechanism should be implemented that tries to make the requests again in some time. One such example is the exponential backoff algorithm
A. batch_get, batch_update, batch_format, and batch_clear are some of the commonly used batch functions.
A. You can use the format()
and batch_format()
functions to pass the range of cells and the formatting to be applied in a dictionary. The dictionary contains various fields that you can format.
A. You can set up Google Sheets automations using built-in tools like Macros and Google Apps Script. Macros allow you to record actions and replay them, while Google Apps Script lets you create custom automations using JavaScript. These automations can streamline repetitive tasks such as data entry, formatting, or running specific functions automatically in Google Sheets.
A. Popular Google Sheets automations for data analysis include automated data import from external sources, scheduled reports using Google Apps Script, and conditional formatting to highlight trends. These automations help optimize the data analysis process, making Google Sheets a powerful tool for managing and interpreting large datasets efficiently.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.