Note: I have written a Python version of this article, you can access that article here. The beginning of each article is the same. It walks you through how to sign up for the EIA API and obtain an API key.
The Energy Information Administration (EIA) is responsible for the US Department of Energy’s statistics and data. There is a wealth of data and information about all things related to energy in the United States including data on renewable energy, nuclear energy, crude oil, gasoline, jet fuel, diesel, electricity, and natural gas.
However, navigating and finding data on the EIA’s website can be tricky. To help with this, the EIA created an API for ease of access. You can find the API at www.eia.com/opendata. The EIA releases new data weekly, monthly, or annually depending on the different types of data. The API is very useful for refreshing weekly and monthly data when they are released.
To begin, you need to obtain an API key from the EIA. Go to www.eia.com/opendata and click on the “Register Now” button.
This will bring you to a registration form that looks like this. Enter your information and click “Register”. You will be sent an email containing your new API key. It might be sent to your junk folder so be sure to check there if you haven’t received the API key within a few minutes.
Now you can search for the data sets you want to pull via the API. You can browse the data sets by clicking on “API Query Browse” on the API’s homepage.
On the next page, you can click on the links under “Children Categories” to search for the data you need.
Another way to search for data is by clicking “Sources & Uses” at the top of the page and browsing the website. When you come across a data set you want, the EIA typically publishes a link to that data set in the API.
Every series of data comes with a particular Series ID. You’ll use this Series ID along with your API key to pull the data set from the database. For example, the picture below shows the Series ID for crude oil consumption data in PADD 3 (US Gulf Coast). The code allows you to pull multiple series in at once. Keep track of all the series IDs you want to use.
Fun fact: PADD stands for Petroleum Administration for Defense District. These districts were created in 1942 during World War II to organize the distribution of fuel like gasoline and diesel. Today PADDs are still used to organize data by region.
Now that we have our API key and the Series IDs, we can write the R code to access the data. First, import the necessary libraries. We need to use the httr and jsonlite libraries.
#Import libraries install.packages(c("httr", "jsonlite"))library(httr) library(jsonlite)
Now, paste your API key into the code. Then paste in the series IDs you want to pull. Separate your series IDs by commas. You can also choose the date range you want to pull using the “startdate” and “enddate” variables.
# API Key from EIA key <- 'PASTE YOUR API KEY HERE'# Paste your Series IDs in the list, separated by commas padd_key <- list('PET.MCRRIP12.M','PET.MCRRIP22.M', 'PET.MCRRIP32.M','PET.MCRRIP42.M', 'PET.MCRRIP52.M')# Choose the start and end dates startdate <- "2010-01-01" #YYYY-MM-DD enddate <- "2020-01-01" #YYYY-MM-DD
Finally, make calls to the API to pull the data in json format. Your URL link may change depending on what data set you are pulling. To check the exact url needed, look at the “API CALL TO USE” link in the API Query Browser.
API CALL TO USE Link is the link you need to use to pull in the API. It may be different than the code provided below. Replace the link in the ‘url’ variable with the appropriate API CALL TO USE link if necessary.
The code will loop through every Series ID you’ve chosen, extract the data in json format, and then convert it into an R data frame. Now you have an R data frame that is easy to manipulate, analyze, and visualize however you want!
Check out my GitHub to download the code.
j = 0 for (i in padd_key) {url <- paste('http://api.eia.gov/series/api_key=',key,'&series_id=',i,sep="") # Make the call to the EIA's API res <- GET(url) json_data <- fromJSON(rawToChar(res$content)) data <- data.frame(json_data$series$data) data$Year <- substr(data$X1,1,4) data$Month <- substr(data$X1,5,6) data$Day <- 1# Create date format data$Date <- as.Date(paste(data$Year, data$Month, data$Day, sep='')) # Rename the column to its given name from the EIA colnames(data)[2] <- json_data$series$name # Drop the unnecessary date columns data <- data[-c(1,3,4,5)] if (j == 0){ data_final <- data } else{ data_final <- merge(data_final,data,by="Date") } j = j + 1 }# Splice the data between the start and end dates data_final <- subset(data_final, Date >= startdate & Date <= enddate)
The final data frame will look like the image below.
Shu Lee
Shu is interested in using data to develop insights, solve real-world problems, and drive business decisions. She is passionate about data science, machine learning, economics, and statistics, and continues to learn every day.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.