A Data Structure sounds like a very straightforward topic, yet many data science and analytics newcomers have no idea what it is. When I quiz these folks about the different data structures in Python and how they work, I’m met with a blank stare. Not good!
Python is an easy programming language to learn, but we need to clear our basics before diving into the attractive machine learning coding bits. That’s because behind every data exploration task we perform, even the analytics step we take, there is a basic element of storage and organization of the data.
And this is a no-brainer – extracting information when we store our data efficiently is so much easier. We save ourselves a ton of time thanks to our code running faster – who wouldn’t want that?
And that’s why I implore you to learn about data structures in Python.
In this article, we will explore the basic in-built data structures in Python that will come in handy when dealing with data in the real world. So whether you’re a data scientist or an analyst, this article is equally relevant.
If you’re new to this awesome programming language, go through our comprehensive FREE Python course.
Data structures are a way of storing and organizing data efficiently. This will allow you to access and perform operations on the data easily.
There is no one-size-fits-all kind of model when it comes to data structures. You will want to store data in different ways to cater to the needs of the hour. Maybe you want to store all types of data together, or you want something for faster searching of data, or maybe something that stores only distinct data items.
Luckily, Python has a host of in-built data structures that help us to organize our data easily. Therefore, it becomes imperative to get acquainted with these first so that when dealing with data, we know exactly which data structure will solve our purpose effectively.
Lists in Python are the most versatile data structure. They are used to store heterogeneous data items, from integers to strings or even another list! They are also mutable, meaning their elements can be changed even after creating the list.
Lists are created by enclosing elements within [square] brackets, and each item is separated by a comma:
Python Code:
# Creating a list
alist = ['science', 'math', 'english']
print(type(alist))
print(alist)
Since each element in a list has its own distinct position, having duplicate values in a list is not a problem:
To access elements of a list, we use Indexing. Each element in a list has an index related to it depending on its position in the list. The first element of the list has the index 0, the next element has index 1, and so on. The last element of the list has an index of one less than the length of the list.
But indexes don’t always have to be positive; they can be negative too. What do you think negative indexes indicate?
While positive indexes return elements from the start of the list, negative indexes return values from the end of the list. This saves us from the trivial calculation we would have to perform otherwise if we wanted to return the nth element from the end of the list. So instead of trying to return List_name[len(List_name)-1] element, we can simply write List_name[-1].
Using negative indexes, we can return the nth element from the end of the list easily. If we want to return the first element from the end of the last index, the associated index is -1. Similarly, the index for the second last element will be -2, and so on. Remember, the 0th index will still refer to the very first element in the list.
But what if we wanted to return a range of elements between two positions in the lists? This is called Slicing. All we have to do is specify the start and end index within which we want to return all the elements – List_name[start : end].
One important thing to remember is that the element at the end index is never included. Only elements from the start index to the index equaling end-1 will be returned.
We can add new elements to an existing list using the append() or insert() methods:
Removing elements from a list is as easy as adding them and can be done using the remove() or pop() methods:
Most of the time, you will use a list to sort elements. So, it is essential to know about the sort() method. It lets you sort list elements in place in either ascending or descending order:
But where things get a bit tricky is when you want to sort a list containing string elements. How do you compare two strings? Well, string values are sorted using ASCII values of the characters in the string. Each character in the string has an integer value associated with it. We use these values to sort the strings.
On comparing two strings, we compare the integer values of each character from the beginning. If we encounter the same characters in both strings, we compare the next character until we find two differing characters. It is, of course, done internally, so you don’t have to worry about it!
We can even concatenate two or more lists using the + symbol. This will return a new list containing elements from both the lists:
A very interesting application of Lists is List Comprehension, which provides a neat way of creating new lists. These new lists are created by applying an operation on each element of an existing list. It will be easy to see their impact if we first check out how it can be done using the good old for-loops:
Now, we will see how we can concisely perform this operation using list comprehensions:
See the difference? List comprehensions are a useful asset for any data scientist because you have to write concise and readable code on a daily basis!
A list is an in-built data structure in Python. But we can use it to create user-defined data structures. Two very popular user-defined data structures built using lists are Stacks and Queues.
Stacks are a list of elements in which elements are added or deleted from the end of the list. Think of it as a stack of books. You do it from the top whenever you need to add or remove a book from the stack. It uses the simple concept of Last-In-First-Out.
Queues, on the other hand, are a list of elements in which elements are added at the end of the list, but the deletion of elements takes place from the front of the list. You can think of it as a queue in the real world. The queue becomes shorter when people from the front exit the queue. The queue becomes longer when someone new adds to the queue from the end. It uses the concept of First-In-First-Out.
Now, as a data scientist or an analyst, you might not be employing this concept every day, but knowing it will surely help you when you have to build your own algorithm!
Tuples are another very popular in-built data structure in Python. These are quite similar to Lists except for one difference – they are immutable. This means that no value can be added, deleted, or edited once a tuple is generated.
We will explore this further, but let’s first see how to create a Python Tuple!
Tuples can be generated by writing values within (parentheses), and each element is separated by a comma. But even if you write many values without any parenthesis and assign them to a variable, you will still have a tuple! Have a look for yourself:
Now that we know how to create tuples let’s talk about immutability.
Anything that cannot be modified after creation is immutable in Python. Python language can be broken down into mutable and immutable objects.
Lists, dictionaries, and sets (we will explore these in further sections) are mutable objects, meaning they can be modified after creation. On the other hand, integers, floating values, boolean values, strings, and even tuples are immutable objects. But what makes them immutable?
Everything in Python is an object. So, we can use the in-built id() method, which allows us to check an object’s memory location. This is known as the identity of the object. Let’s create a list and determine the location of the list and its elements:
As you can see, both the list and its element have different locations in memory. Since we know lists are mutable, we can alter the value of its elements. Let’s do that and see how it affects the location values:
The location of the list did not change, but that of the element did. This means a new object was created for the element and saved in the list. This is what is meant by mutable. A mutable object can change its state or contents after creation, but an immutable object cannot.
But we can call tuples pseudo-immutable because even though they are immutable, they can contain mutable objects whose values can be modified!
As you can see from the example above, we could change the values of an immutable object, list, contained within a tuple.
Tuple packing and unpacking are useful operations you can perform to assign values to a tuple of elements from another tuple in a single line.
We already saw tuple packing when we made our planet tuple. Tuple unpacking is just the opposite-assigning values to variables from a tuple:
It is handy for swapping values in a single line. Honestly, this was one of the first things that got me excited about Python: being able to do so much with such little coding!
Although I said that tuple values cannot be changed, you can actually make changes to it by converting it to a list using list(). When you are done making the changes, you can again convert it back to a tuple using tuple().
This change, however, is expensive as it involves making a copy of the tuple. But tuples come in handy when you don’t want others to change the content of the data structure.
A dictionary is another Python data structure to store heterogeneous objects that are immutable but unordered. This means that when you try to access the elements, they might not be in exactly the same order as the one in which you inserted them.
But what sets dictionaries apart from lists is how elements are stored. Elements in a dictionary are accessed via their key values instead of their index, as we did in a list. So, dictionaries contain key-value pairs instead of just single elements.
Dictionaries are generated by writing keys and values within a { curly } bracket separated by a semi-colon. Each key-value pair is separated by a comma:
Using the key of the item, we can easily extract the associated value of the item:
These keys are unique. But even if you have a dictionary with multiple items with the same key, the item value will be the one associated with the last key:
Dictionaries are handy to access items quickly because, unlike lists and tuples, a dictionary does not have to iterate over all the items to find a value. Dictionary uses the item key to find the item value quickly. This concept is called hashing.
You can access the keys from a dictionary using the keys() method and the values using the values() method. These we can view using a for-loop or turn them into a list using list():
We can even access these values simultaneously using the items() method, which returns the respective key and value pair for each element of the dictionary.
Sometimes, you don’t want multiple occurrences of the same element in your list or tuple. It is here that you can use a set data structure. A Set is an unordered but mutable collection of elements that contains only unique values.
You will see that the values are not in the same order as entered in the set. This is because sets are unordered.
To add values to a set, use the add() method. It lets you add any value except mutable objects:
To remove values from a set, you have two options to choose from:
If the value does not exist, remove() will give an error, but discard() won’t.
Using Python Sets, you can perform operations like union, intersection, and difference between two sets, just like you would in mathematics.
The Union of two sets gives values from both sets. But the values are unique. So if both the sets contain the same value, only one copy will be returned:
The Intersection of two sets returns only those values that are common to both sets:
The Difference of a set and another gives only those values that are not present in the first set:
User-defined data structures refer to data structures that are created by the programmer based on their specific requirements and needs. These data structures are not built-in to the programming language but are designed and implemented by the programmer to store and organize data in a way that suits their application. User-defined data structures allow programmers to tailor the data storage and manipulation to match the problem they are trying to solve. Let’s look at the different types of user-defined data structures in Python.
Arrays are a fundamental data structure that stores elements of the same data type in contiguous memory locations. They have a fixed size and provide constant-time access to elements.
Sample Code:
# Creating an array in Python numbers = [10, 20, 30, 40, 50] # Accessing elements of an array print(numbers[2]) # Output: 30 # Modifying an element numbers[1] = 25 print(numbers) # Output: [10, 25, 30, 40, 50]
Lists, also known as dynamic arrays, are similar to arrays but can grow or shrink in size dynamically. They’re implemented using arrays and provide more flexibility.
Sample Code:
# Creating a list in Python names = ["Alice", "Bob", "Charlie"] # Adding an element to the end of the list names.append("David") print(names) # Output: ["Alice", "Bob", "Charlie", "David"] # Removing an element from the list names.remove("Bob") print(names) # Output: ["Alice", "Charlie", "David"]
A stack is a linear data structure that follows the Last In First Out (LIFO) principle. Elements are added and removed from the top of the stack.
Sample Code:
# Implementing a stack using Python's list stack = [] # Pushing elements onto the stack stack.append(10) stack.append(20) stack.append(30) # Popping elements from the stack print(stack.pop()) # Output: 30 print(stack.pop()) # Output: 20
A queue is a linear data structure that follows the First In First Out (FIFO) principle. Elements are added at the rear and removed from the front.
Sample Code:
# Implementing a queue using Python's collections module from collections import deque queue = deque() # Enqueue elements queue.append(5) queue.append(10) queue.append(15) # Dequeue elements print(queue.popleft()) # Output: 5 print(queue.popleft()) # Output: 10
A tree is a hierarchical data structure consisting of nodes connected by edges. Each node has a parent (except the root) and zero or more children.
Sample Code:
# Defining a simple binary tree node class TreeNode: def __init__(self, value): self.value = value self.left = None self.right = None # Creating a binary tree root = TreeNode(10) root.left = TreeNode(5) root.right = TreeNode(15)
A linked list is a linear data structure where each element (node) points to the next element. They are more memory-efficient than arrays and allow dynamic resizing.
Sample Code:
# Defining a linked list node class ListNode: def __init__(self, value): self.value = value self.next = None # Creating a linked list head = ListNode(10) head.next = ListNode(20) head.next.next = ListNode(30)
A graph is a collection of nodes (vertices) connected by edges. Graphs can be directed (edges have a direction) or undirected.
Sample Code:
# Using Python's NetworkX library to create a simple undirected graph import networkx as nx import matplotlib.pyplot as plt G = nx.Graph() G.add_nodes_from([1, 2, 3]) G.add_edges_from([(1, 2), (2, 3)]) nx.draw(G, with_labels=True, font_weight='bold') plt.show()
A hashmap (or dictionary) is a data structure that stores key-value pairs. It provides fast access to values using keys.
Sample Code:
# Creating a dictionary in Python phonebook = { "Alice": "123-456-7890", "Bob": "987-654-3210", "Charlie": "555-123-4567" } # Accessing values using keys print(phonebook["Alice"]) # Output: 123-456-7890
Isn’t Python a beautiful language? It provides you with many different options to handle your data more efficiently. Learning about data structures in Python is a key aspect of your own learning journey. This article should serve as a good introduction to the in-built data structures in Python. If it got you interested in Python, and you are itching to know more about it in detail and how to use it in your everyday data science or analytics work, I recommend going through the following articles and courses:
Ans. The built-in data structures in Python include lists, tuples, sets, and dictionaries. Apart from these, there are user-defined data structures in Python such as arrays, strings, queues, lists, stacks, trees, linked lists, graphs, and hashmaps.
Ans. Lists, Tuples, and Dictionaries are the three types of data structures in Python.
Ans. Python has 4 built-in data types: Integer (int), Float (float), String (str), and Boolean (bool)
Ans. The 2 main types of data structures are primitive data structures and non-primitive or composite data structures.
Hi Aniruddha, Excellent Article. THANKS A LOT!
Glad you liked it.
Thanks a lot for letting us know
Nice artical Mr.ANIRUDDHA BHANDARI .One small suggestion is you provide more information about data types elaborating more of each datatype
Thanks for the suggestion.