The Complex yet Powerful World of DateTime in Data Science
I still remember coming across my first DateTime variable when I was learning Python. It was an e-commerce project where I had to figure out the supply chain pipeline – the time it takes for an order to be shipped, the number of days it takes for an order to be delivered, etc. It was quite a fascinating problem from a data science perspective.
The issue – I wasn’t familiar with how to extract and play around with the date and time components in Python.
There is an added complexity to the DateTime features, an extra layer that isn’t present in numerical variables. Being able to master these DateTime features will help you go a long way towards becoming a better (and more efficient) data scientist. It’s definitely helped me a lot!
And the date and time features are ubiquitous in data science projects. Think about it – they are a rich source of valuable information, and hence, can give some deep insights about any dataset at hand. Plus the amount of flexibility they offer when we’re performing feature engineering – priceless!
In this article, we will first have a look at how to handle date and time features with Python’s DateTime module and then we will explore Pandas functions for the same!
Note: I assume you’re familiar with Python and the Pandas library. If not, I highly recommend taking the awesome free courses below:
Advanced DateTime formatting with Strptime & Strftime
Timedelta
DateTime with Pandas
DateTime and Timedelta objects in Pandas
Date range in Pandas
Making DateTime features in Pandas
The Importance of the Date-Time Component
It’s worth reiterating, dates and times are a treasure trove of information and that is why data scientists love them so much.
Before we dive into the crux of the article, I want you to experience this yourself. Take a look at the date and time right now. Try and imagine all kinds of information that you can extract from it to understand your reading habit. The year, month, day, hour, and minute are the usual suspects.
But if you dig a little further, you can determine whether you prefer reading on weekdays or weekends, whether you are a morning person or a night owl (we are in the same boat here!), or whether you accumulate all the interesting articles to read at the end of the month!
Clearly, the list will go on and you will gradually learn a lot about your reading habits if you repeat this exercise after collecting the data over a period of time, say a month. Now imagine how useful this feature would be in a real-world scenario where information is collected over a long period of time.
Date and time features find importance in data science problems spanning industries from sales, marketing, and finance to HR, e-commerce, retail, and many more. Predicting how the stock markets will behave tomorrow, how many products will be sold in the upcoming week, when is the best time to launch a new product, how long before a position at the company gets filled, etc. are some of the problems that we can find answers to using date and time data.
This incredible amount of insight that you can unravel from the data is what makes date and time components so fun to work with! So let’s get down to the business of mastering date-time manipulation in Python.
Working with Dates in Python
The date class in the DateTime module of Python deals with dates in the Gregorian calendar. It accepts three integer arguments: year, month, and day. Let’s have a look at how it’s done:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
You can see how easy it was to create a date object of datetime class. And it’s even easier to extract features like day, month, and year from the date. This can be done using the day, month, and year attributes. We will see how to do that on the current local day date object that we will create using the today() function:
Python Code:
from datetime import date
d1 = date(2020,4,23)
print(d1)
print(type(d1))
d1 = date.today()
print(d1)
# day
print('Day :',d1.day)
# month
print('Month :',d1.month)
# year
print('Year :',d1.year)
Working with Time in Python
time is another class of the DateTime module that accepts integer arguments for time up to microseconds and returns a DateTime object:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
You can extract features like hour, minute, second, and microsecond from the time object using the respective attributes. Here is an example:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is just the tip of the iceberg. There is so much more we can do with DateTime features in Python and that’s what we’ll look at in the next section.
DateTime in Python
So far, we have seen how to create a date and a time object using the DateTime module. But the beauty of the DateTime module is that it lets you dovetail both the properties into a single object, DateTime!
datetime is a class and an object in Python’s DateTime module, just like date and time. The arguments are a combination of date and time attributes, starting from the year and ending in microseconds.
So, let’s see how you can create a DateTime object:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Or you could even create an object on the local date and time using the now() method:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
You can go on and extract whichever value you want to from the DateTime object using the same attributes we used with the date and time objects individually.
Next, let’s look at some of the methods in the DateTime class.
Updating old Dates
First, we’ll see how to separate date and time from the DateTime object using the date() and time() methods. But you could also replace a value in the DateTime objects without having to change the entire date using the replace() method:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
One really cool thing that you can do with the DateTime function is to extract the day of the week! This is especially helpful in feature engineering because the value of the target variable can be dependent on the day of the week, like sales of a product are generally higher on a weekend or traffic on StackOverflow could be higher on a weekday when people are working, etc.
The weekday() method returns an integer value for the day of the week, where Monday is 0 and Sunday is 6. But if you wanted it to return the weekday value between 1 and 7, like in a real-world scenario, you should use isoweekday():
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Alright, you know the day of the week, but do you know what week of the year is it? This is another very important feature that you can generate from the given date in a dataset.
Sometimes the value of the target variable might be higher during certain times of the year. For example, the sales of products on e-commerce websites are generally higher during vacations.
You can get the week of the year by slicing the value returned by the isocalendar() method:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Want to check whether it is a leap year or not? You will need to use the isleap() method from the calendar module and pass the year as an attribute:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Congratulations – you are living in a leap year! What did you do with the extra day? Oh, you missed it? Don’t worry! Just take a day this month and do the stuff that you love! But where are you going? You got your calendar right here!
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Not free this month? You can have a look at the entire calendar for the year:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pretty cool, right? Plan your year wisely and take out some time to do the things you love!
DateTime Formats
The Datetime module lets you interchange the format of DateTime between a few options.
First up is the ISO format. If you wanted to create a DateTime object from the string form of the date in ISO format, use the fromisoformat() method. And if you intended to do the reverse, use the isoformat() method:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If you wanted to convert DateTime into a string format, you could use the ctime() method. This returns the date in a string format. And if you wanted to extract just the date from that, well, you would have to use slicing:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
And if none of these functions strike your fancy, you could use the format() method which lets you define your own format:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Wait – what are these arguments I passed to the function? These are called formatted string codes and we will look at them in detail in the next section.
Advanced DateTime Formatting with Strptime & Strftime
These functions are very important as they let you define the format of the DateTime object explicitly. This can give you a lot of flexibility with handling DateTime features.
strptime() creates a DateTime object from a string representing date and time. It takes two arguments: the date and the format in which your date is present. Have a look below:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
You define the format using the formatting codes as I did above. There are a number of formatting codes and you can have a look at them in the documentation.
The stftime() method, on the other hand, can be used to convert the DateTime object into a string representing date and time:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
But you can also extract some important information from the DateTime object like weekday name, month name, week number, etc. which can turn out to be very useful in terms of features as we saw in previous sections.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
So far, we have seen how to create a DateTime object and how to format it. But sometimes, you might have to find the duration between two dates, which can be another very useful feature that you can derive from a dataset. This duration is, however, returned as a timedeltaobject.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
As you can see, the duration is returned as the number of days for the date and seconds for the time between the dates. So you can actually retrieve these values for your features:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
But what if you actually wanted the duration in hours or minutes? Well, there is a simple solution for that.
timedelta is also a class in the DateTime module. So, you could use it to convert your duration into hours and minutes as I’ve done below:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Now, what if you wanted to get the date 5 days from today? Do you simply add 5 to the present date?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Not quite. So how do you go about it then? You use timedelta of course!
timedelta makes it possible to add and subtract integers from a DateTime object.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We already know that Pandas is a great library for doing data analysis tasks. And so it goes without saying that Pandas also supports Python DateTime objects. It has some great methods for handling dates and times, such as to_datetime() and to_timedelta().
DateTime and Timedelta objects in Pandas
The to_datetime() method converts the date and time in string format to a DateTime object:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
You might have noticed something strange here. The type of the object returned by to_datetime() is not DateTime but Timestamp. Well, don’t worry, it is just the Pandas equivalent of Python’s DateTime.
We already know that timedelta gives differences in times. The Pandas to_timedelta() method does just this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Here, the unit determines the unit of the argument, whether that’s day, month, year, hours, etc.
Date Range in Pandas
To make the creation of date sequences a convenient task, Pandas provides the date_range() method. It accepts a start date, an end date, and an optional frequency code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Instead of defining the end date, you could define the period or number of time periods you want to generate:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Let’s also create a series of end dates and make a dummy dataset from which we can derive some new features and bring our learning about DateTime to fruition.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Perfect! So we have a dataset containing start date, end date, and a target variable:
We can create multiple new features from the date column, like the day, month, year, hour, minute, etc. using the dt attribute as shown below:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Our duration feature is great, but what if we would like to have the duration in minutes or seconds? Remember how in the timedelta section we converted the date to seconds? We could do the same here!
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Great! Can you see how many new features we created from just the dates?
Now, let’s make the start date the index of the DataFrame. This will help us easily analyze our dataset because we can use slicing to find data representing our desired dates:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Awesome! This is super useful when you want to do visualizations or any data analysis.
End Notes
I hope you found this article on how to manipulate date and time features with Python and Pandas useful. But nothing is complete without practice. Working with time series datasets is a wonderful way to practice what we have learned in this article.
I am on a journey to becoming a data scientist. I love to unravel trends in data, visualize it and predict the future with ML algorithms! But the most satisfying part of this journey is sharing my learnings, from the challenges that I face, with the community to make the world a better place!
Really nicely paced tutorial for an old Fortran/PL-1 hacker from the 20th
century like me. The power of contemporary programming systems is
really mind-blowing for old machine coders like myself!
We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.
Show details
Powered By
Cookies
This site uses cookies to ensure that you get the best experience possible. To learn more about how we use cookies, please refer to our Privacy Policy & Cookies Policy.
brahmaid
It is needed for personalizing the website.
csrftoken
This cookie is used to prevent Cross-site request forgery (often abbreviated as CSRF) attacks of the website
Identityid
Preserves the login/logout state of users across the whole site.
sessionid
Preserves users' states across page requests.
g_state
Google One-Tap login adds this g_state cookie to set the user status on how they interact with the One-Tap modal.
MUID
Used by Microsoft Clarity, to store and track visits across websites.
_clck
Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_clsk
Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.
SRM_I
Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.
SM
Use to measure the use of the website for internal analytics
CLID
The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.
SRM_B
Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.
_gid
This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.
_ga_#
Used by Google Analytics, to store and count pageviews.
_gat_#
Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.
collect
Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.
AEC
cookies ensure that requests within a browsing session are made by the user, and not by other sites.
G_ENABLED_IDPS
use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.
test_cookie
This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.
_we_us
this is used to send push notification using webengage.
WebKlipperAuth
used by webenage to track auth of webenagage.
ln_or
Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.
JSESSIONID
Use to maintain an anonymous user session by the server.
li_rm
Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.
AnalyticsSyncHistory
Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.
lms_analytics
Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.
liap
Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.
visit
allow for the Linkedin follow feature.
li_at
often used to identify you, including your name, interests, and previous activity.
s_plt
Tracks the time that the previous page took to load
lang
Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings
s_tp
Tracks percent of page viewed
AMCV_14215E3D5995C57C0A495C55%40AdobeOrg
Indicates the start of a session for Adobe Experience Cloud
s_pltp
Provides page name value (URL) for use by Adobe Analytics
s_tslv
Used to retain and fetch time since last visit in Adobe Analytics
li_theme
Remembers a user's display preference/theme setting
li_theme_set
Remembers which users have updated their display / theme preferences
We do not use cookies of this type.
_gcl_au
Used by Google Adsense, to store and track conversions.
SID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
SAPISID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
__Secure-#
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
APISID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
SSID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
HSID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
DV
These cookies are used for the purpose of targeted advertising.
NID
These cookies are used for the purpose of targeted advertising.
1P_JAR
These cookies are used to gather website statistics, and track conversion rates.
OTZ
Aggregate analysis of website visitors
_fbp
This cookie is set by Facebook to deliver advertisements when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
fr
Contains a unique browser and user ID, used for targeted advertising.
bscookie
Used by LinkedIn to track the use of embedded services.
lidc
Used by LinkedIn for tracking the use of embedded services.
bcookie
Used by LinkedIn to track the use of embedded services.
aam_uuid
Use these cookies to assign a unique ID when users visit a website.
UserMatchHistory
These cookies are set by LinkedIn for advertising purposes, including: tracking visitors so that more relevant ads can be presented, allowing users to use the 'Apply with LinkedIn' or the 'Sign-in with LinkedIn' functions, collecting information about how visitors use the site, etc.
li_sugr
Used to make a probabilistic match of a user's identity outside the Designated Countries
MR
Used to collect information for analytics purposes.
ANONCHK
Used to store session ID for a users session to ensure that clicks from adverts on the Bing search engine are verified for reporting purposes and for personalisation
We do not use cookies of this type.
Cookie declaration last updated on 24/03/2023 by Analytics Vidhya.
Cookies are small text files that can be used by websites to make a user's experience more efficient. The law states that we can store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses different types of cookies. Some cookies are placed by third-party services that appear on our pages. Learn more about who we are, how you can contact us, and how we process personal data in our Privacy Policy.
It'd be great if you could talk about timezones.
i have learned more. Thanks
Really nicely paced tutorial for an old Fortran/PL-1 hacker from the 20th century like me. The power of contemporary programming systems is really mind-blowing for old machine coders like myself!