This article was published as a part of the Data Science Blogathon
Because of its simplicity and ease of learning, Python has become very popular these days. It is used for various activities such as data science, machine learning, web development, scripting, automation, etc. Python is one of the most demanding skills for data scientists. The simplicity of Python is the first of several advantages in data science. Although some data scientists have a background in computer science or know other programming languages, many data scientists come from statistics, mathematics, or other technical disciplines and may not have as much programming knowledge when they enter the industry. Python syntax is easy to understand and write, which makes it a fast and easy to learn programming language. In this article, I will introduce more than 40 ideas and methods that can help you speed up your data science activities on a regular basis.
Alright, let’s get started…..
The elements of a list can be looped quite extensively in a single line. Let’s put it into practice using the following example:
numbers = [1, 2, 3, 4, 5, 6, 7, 8]
even_numbers = [number for number in numbers if number % 2 == 0]
print(even_numbers)
The same thing can be done using dictionaries, sets, and generators. Let’s look at another example, this time with dictionaries:
dictionary = {'first_num': 1, 'second_num': 2, 'third_num': 3, 'fourth_num': 4} oddvalues = {key: value for (key, value) in dictionary.items() if value % 2 != 0} print(oddvalues)Output: {'first_num': 1, 'third_num': 3}
Enumerate is a helpful function for iterating over an object like a list, dictionary, or file. The function produces a tuple that includes the values acquired from iterating through the object as well as the loop counter (from the start position of 0). When you wish to write code depending on the index, the loop counter comes in handy. Let’s look at an example where the first and last elements might be treated differently.
sentence = 'Just do It' length = len(sentence) for index, element in enumerate(sentence): print('{}: {}'.format(index, element)) if index == 0: print('The first element!') elif index == length - 1: print('The last element!')
Output: 0: J
The first element!
1: u
2: s
3: t
4:
5: d
6: o
7:
8: I
9: t
The last element!
Files can also be enumerated with the enumerate function. Before breaking out of the loop, we will print the first 10 rows of the CSV file in the example below. We’re not going to replicate the result because it’s too long. You may, however, use it on whatever files you have.
with open('heart.csv') as f: for i, line in enumerate(f): if i == 10: break print(line)
We frequently wish to return more than one value when designing functions. We’ll go through two typical approaches here:
Method 1:
Let’s start with the easiest option: returning a tuple. This technique is often used only when there are two or three values to return. When there are more values in a tuple, it’s simple to lose track of the order of the items.
In the code section below get_student is an example function that returns the employee’s first and last name as tuples based on their ID numbers.
# returning a tuple. def get_student(id_num): if id_num == 0: return 'Taha', 'Nate' elif id_num == 1: return 'Jakub', 'Abdal' else: raise Exception('No Student with this id: {}'.format(id_num))
When we call the function get_student with the number 0, we notice that it returns a tuple with two values: ‘Taha’ and ‘Nate’
Student = get_student(0) print('first_name: {}, last_name: {}'.format(Student[0], Student[1]))
Output: first_name: Taha, last_name: Nate
Method 2:
Returning a dictionary is the second option. Because dictionaries are key, value pairs, we may name the values that are returned, which is more intuitive than tuples.
Method 2 is the same as Method 1, it just returns a dictionary.
# returning a dictionary def get_data(id_num): if id_num == 0: return {'first_name': 'Muhammad', 'last_name': 'Taha', 'title': 'Data Scientist', 'department': 'A', 'date_joined': '20200807'} elif id_num == 1: return {'first_name': 'Ryan', 'last_name': 'Gosling', 'title': 'Data Engineer', 'department': 'B', 'date_joined': '20200809'} else: raise Exception('No employee with this id: {}'.format(id_num))
It’s easier to refer to a specific value by its key when a result is a dictionary. We are calling the function with id_num = 0
employee = get_data(0) print('first_name: {},nlast_name: {},ntitle: {},ndepartment: {},ndate_joined: {}'.format( employee['first_name'], employee['last_name'], employee['title'], employee['department'], employee['date_joined']))
Output: first_name: Muhammad,
last_name: Taha,
title: Data Scientist,
department: A,
date_joined: 2020-08-07
If you have a value and wish to compare it to two other values, you may use the following basic mathematical expression: 1<x<30
That’s the kind of algebraic expression we learn in primary school. However, the identical statement may be used in Python as well. Yes, you read that correctly. Until now, you’ve presumably done comparisons in this format:
1<x and x<30
In Python, all you have to do is use the following: 1<x<30
x = 5 print(1<x<30)
Output: True
Let’s imagine you got the input of a function as a string, but it should be a list like this:
input = [[1, 2, 3], [4, 5, 6]]
Rather than dealing with complex regular expressions, just import the module ‘ast’ and invoke its function literal eval:
import ast def string_to_list(string): return ast.literal_eval(string) string = "[[1, 2, 3],[4, 5, 6]]" my_list = string_to_list(string) print(my_list)
Output: [[1, 2, 3], [4, 5, 6]]
This method is used to apply looping on a list. In general, when you want to iterate through a list you apply, a for-loop. But in this method, you can pass an else condition in a loop, which is extremely rare. Other programming language doesn’t support this method.
Let’s have a look at how it works in General. If you want to check if there is any even number in a list.
number_List = [1, 3, 7, 9,8] for number in number_List: if number % 2 == 0: print(number) break else: print("No even numbers!!")
Output: 8
If an even number is found, the number will be printed and the else part will not execute since we pass a break statement. If the break statement never executes then the else block will execute.
By using “heapq” module you can find the n-largest or n-smallest element from a list. Let’s see an example:
import heapq numbers = [80, 25, 68, 77, 95, 88, 30, 55, 40, 50] print(heapq.nlargest(5, numbers)) print(heapq.nsmallest(5, numbers))
Output: [95, 88, 80, 77, 68]
[25, 30, 40, 50, 55]
All elements of a list can be accessed by using a “*”
def Summation(*arg): sum = 0 for i in arg: sum += i return sum result = Summation(*[8,5,10,7]) print(result)
Output: 30
Just multiply the string with a number, the number of times you want the string to be repeated. Then your work is done.
value = "Taha" print(value * 5) print("-" *21) Output: TahaTahaTahaTahaTaha
———————
Use “.index” to find the index an element from a list
cities= ['Vienna', 'Amsterdam', 'Paris', 'Berlin'] print(cities.index('Berlin'))
Output: 3
print("Analytics", end="") print("Vidhya") print("Analytics", end=" ") print("Vidhya") print('Data', 'science', 'blogathon', '12', sep=', ')
Output: AnalyticsVidhya
Analytics Vidhya
Data, science, blogathon, 12
Sometimes when you try to print a big number, then it’ll be really confusing to pass the whole number and hard to read. Then you can use the underscore, to make it easy to read.
print(5_000_000_000_000) print(7_543_291_635)
Output: 5000000000000
7543291635
When you slice a list, then you need to pass the minimum, maximum, and step sizes. To make the slicing in the reverse order you just need to pass a negative step size. Let’s see an example:
sentence = "Data science blogathon"
print(sentence[21:0:-1])
# Take two steps forward
print(sentence[21:0:-2])
Output: nohtagolb ecneics ata
nhaobenisaa
If you want to check whether two variables are pointing to the same object, then you need to use “is”
But if you want to check whether two variables are the same or not, then you need to use “==”.
list1 = [7, 9, 4] list2 = [7, 9, 4] print(list1 == list2) print(list1 is list2) list3 = list1 print(list3 is list1)
Output: True
False
True
The first statement is True, because list1 and list2 both hold the same values, so they are equal. The second statement is False because the values are pointing to different variables in the memory and the third statement is True because list1 and list3 both pointing to a common object in memory.
first_dct = {"London": 1, "Paris": 2} second_dct = {"Tokyo": 3, "Seol": 4} merged = {**first_dct, **second_dct} print(merged)
Output: {‘London’: 1, ‘Paris’: 2, ‘Tokyo’: 3, ‘Seol’: 4}
If you need to know if a string starts with a specific alphabet then you can use the indexing method which is common. But you can also use a function called “startswith“, it will tell you whether a string starts with a specific word or not, which you pass to the function.
sentence = "Analytics Vidhya" print(sentence.startswith("b")) print(sentence.startswith("A"))
Output: False
True
If you need to know the Unicode of a character then you need to use a function called “ord” and pass the character in the function, whose Unicode you want to know. Let’s see an example:
print(ord("T")) print(ord("A")) print(ord("h")) print(ord("a"))
Output: 84
65
104
97
If you want to access the key and value of a dictionary differently, you can do that using a function called “items()”.
cities = {'London': 1, 'Paris': 2, 'Tokyo': 3, 'Seol': 4} for key, value in cities.items(): print(f"Key: {key} and Value: {value}")
Output: Key: London and Value: 1
Key: Paris and Value: 2
Key: Tokyo and Value: 3
Key: Seol and Value: 4
False is considered as 0 and True is considered as 1
x = 9 y = 3 outcome = (x - False)/(y * True) print(outcome)
Output: 3.0
If you want to add a value to a list by using the “append” function, but it will add a value in the last position of a list. So, what if you want to add value in a specific position of a list. You can do that also, you can use a function called “insert” to insert a value in a specific position of a list.
Syntax:
list_name.insert(position, value)
Let’s see an example.
cities = ["London", "Vienna", "Rome"] cities.append("Seoul") print("After append:", cities) cities.insert(0, "Berlin") print("After insert:", cities)
Output: After append: [‘London’, ‘Vienna’, ‘Rome’, ‘Seoul’]
After insert: [‘Berlin’, ‘London’, ‘Vienna’, ‘Rome’, ‘Seoul’]
The working of the filter function lies within its name. It filters a specific iterator by a specific function passed within it. It returns an iterator.
Syntax:
filter(function, iterator)
Let’s see an example with filter function:
mixed_number = [8, 15, 25, 30,34,67,90,5,12] filtered_value = filter(lambda x: x > 20, mixed_number) print(f"Before filter: {mixed_number}") print(f"After filter: {list(filtered_value)}")
Output: Before filter: [8, 15, 25, 30, 34, 67, 90, 5, 12]
After filter: [25, 30, 34, 67, 90]
You can create a function without worrying about the parameters. You can pass any number of parameters you want when you call the function. Let’s see an example:
def multiplication(*arguments): mul = 1 for i in arguments: mul = mul * i return mul print(multiplication(3, 4, 5)) print(multiplication(5, 8, 10, 3)) print(multiplication(8, 6, 15, 20, 5))
Output: 60
1200
72000
You can iterate over a single list using enumerate function, but when you have two or more lists, you can also iterate over them using the “zip()” function.
capital = ['Vienna', 'Paris', 'Seoul',"Rome"] countries = ['Austria', 'France', 'South Korea',"Italy"] for cap, country in zip(capital, countries): print(f"{cap} is the capital of {country}")
Output: Vienna is the capital of Austria
Paris is the capital of France
Seoul is the capital of South Korea
Amsterdam is the capital of Italy
If you want to change the cases of letters i.e. upper case to lower case and lower case to upper case, then you can do that using a function called “swap case”. Let’s see an example:
sentence = "Data Science Blogathon." changed_sen = sentence.swapcase() print(changed_sen)
Output: dATA sCIENCE bLOGATHON.
To check the memory used by an object first import the sys library then use a method of this library called “getsizeof“. It will return the size of the memory used by the object.
import sys mul = 5*6 print(sys.getsizeof(mul))
Output: 28
The map() function is used to apply a specific function to a given iterator.
Syntax:
map(function, iterator)
values_list = [8, 10, 6, 50] quotient = map(lambda x: x/2, values_list) print(f"Before division: {values_list}") print(f"After division: {list(quotient)}")
Output: Before division: [8, 10, 6, 50]
After division: [4.0, 5.0, 3.0, 25.0]
To reverse a string you can use the slicing method. Let’s see an example:
value = "Analytics Vidhya" print("Reverse is:", value[::-1])
Output: Reverse is: ayhdiV scitylanA
When you train your machine learning or deep learning model or simply run a block of code then you can check how much time it took to run the block of code. You have to use a magic function “%%time” at the top of the block of your code. It will show you the amount of time it took to run the code block. Let’s see an example:
%%time sentence = "Data Science Blogathon." changed_sen = sentence.swapcase() print(changed_sen)
Output: dATA sCIENCE bLOGATHON.
Wall time: 998 µs
There are two functions called “rstrip()” and “lstrip()”, “rstrip” is used to drop some character from the right of a string and “lstrip” is used to drop some character from the left of a string. Both function’s default value is whitespace. But you can pass your specific character to remove them from the string.
sentence1 = "Data Science Blogathon " print(f"After removing the right space: {sentence1.rstrip()}") sentence2 = " Data Science Blogathon" print(f"After removing the left space: {sentence2.lstrip()}") sentence3 = "Data Science Blogathon .,bbblllg" print("After applying rstrip:", sentence3.rstrip(".,blg"))
Output: After removing the right space: Data Science Blogathon
After removing the left space: Data Science Blogathon
After applying rstrip: Data Science Blogathon
You can count the number of times an element appears in a list by running a for loop among them. But you can do it more easily, just by calling a method on the list called “count”. Here is an example:
cities= ["Amsterdam", "Berlin", "New York", "Seoul", "Tokyo", "Paris", "Paris","Vienna","Paris"] print("Paris appears", cities.count("Paris"), "times in the list")
Output: Paris appears 3 times in the list
You can find the index of an element in a tuple or list just by calling a simple method called “index” on that tuple or list. Here is an example:
cities_tuple = ("Berlin", "Paris", 5, "Vienna", 10) print(cities_tuple.index("Paris")) cities_list = ['Vienna', 'Paris', 'Seoul',"Amsterdam"] print(cities_list.index("Amsterdam"))
Output: 1
3
You can remove all elements from a list or set by applying a method called “clear” on that list or set.
cities_list = ['Vienna', 'Paris', 'Seoul',"Amsterdam"] print(f"Before removing from the list: {cities_list}") cities_list.clear() print(f"After removing from the list: {cities_list}") cities_set = {'Vienna', 'Paris', 'Seoul',"Amsterdam"} print(f"Before removing from the set: {cities_set}") cities_set.clear() print(f"After removing from the set: {cities_set}")
Output: Before removing from the list: [‘Vienna’, ‘Paris’, ‘Seoul’, ‘Amsterdam’]
After removing from the list: []
Before removing from the set: {‘Vienna’, ‘Amsterdam’, ‘Seoul’, ‘Paris’}
After removing from the set: set()
For joining two sets you can apply the method called “union()”. It will join the two lists on which you applied the method.
set1 = {'Vienna', 'Paris', 'Seoul'} set2 = {"Tokyo", "Rome",'Amsterdam'} print(set1.union(set2))
Output: {‘Vienna’, ‘Tokyo’, ‘Seoul’, ‘Amsterdam’, ‘Rome’, ‘Paris’}
First, use “counter” from the module called collections to measure the frequency of each value, then apply a method called “most_common” on the result of the counter to sort values of the list based on their frequency.
from collections import Counter count = Counter([7, 6, 5, 6, 8, 6, 6, 6]) print(count) print("Sort values according their frequency:", count.most_common())
Output: Counter({6: 5, 7: 1, 5: 1, 8: 1})
Sort values according their frequency: [(6, 5), (7, 1), (5, 1), (8, 1)]
First, convert the list into a set, this will remove the duplicate values because a set doesn’t contain duplicate values. Then convert the set to a list again, this way you can easily drop the duplicate values from a list.
cities_list = ['Vienna', 'Paris', 'Seoul',"Amsterdam","Paris","Amsterdam","Paris"] cities_list = set(cities_list) print("After removing the duplicate values from the list:",list(cities_list))
Output: After removing the duplicate values from the list: [‘Vienna’, ‘Amsterdam’, ‘Seoul’, ‘Paris’]
By using a method called “join” you can join all the single elements of a list and make a single string or sentence.
words_list = ["Data", "science", "Blogathon"] print(" ".join(words_list))
Output: Data science Blogathon
Yes, you can do this in python. You can return multiple values from a function at a single time. Let’s see an example:
def calculation(number): mul = number*2 div = number/2 summation = number+2 subtract = number-2 return mul, div, summation, subtract mul, div, summation, subtract = calculation(10) print("Multiplication:", mul) print("Division:", div) print("Summation:", summation) print("Subtraction:", subtract)
Output: Multiplication: 20
Division: 5.0
Summation: 12
Subtraction: 8
First, convert the lists into sets, then apply the method called “symmetric_difference” on these sets. This will return the difference between these two lists.
cities_list1 = ['Vienna', 'Paris', 'Seoul',"Amsterdam", "Berlin", "London"] cities_list2 = ['Vienna', 'Paris', 'Seoul',"Amsterdam"] cities_set1 = set(cities_list1) cities_set2 = set(cities_list2) difference = list(cities_set1.symmetric_difference(cities_set2)) print(difference)
Output: [‘Berlin’, ‘London’]
First, apply a zip function on these two lists, then convert the output of the zip function into a dictionary. Your work is done, it’s that easy to convert two lists into a single dictionary.
number = [1, 2, 3] cities = ['Vienna', 'Paris', 'Seoul'] result = dict(zip(number, cities)) print(result)
Output: {1: ‘Vienna’, 2: ‘Paris’, 3: ‘Seoul’}
First import the “heap q” module then apply the method “n largest” and “n smallest” and pass the value of n and the name of the list, this way you can get the n largest and n smallest elements of a list.
import heapq numbers = [100, 20, 8, 90, 86, 95, 9, 66, 28, 88] print(heapq.nlargest(3, numbers)) print(heapq.nsmallest(3, numbers))
Output: [100, 95, 90]
[8, 9, 20]
Thank you for sticking with me all the way to the end. Hope this article helped you learn something new.
I am an undergraduate student, studying Computer Science, with a strong interest in data science, machine learning, and artificial intelligence. I like diving into data in order to uncover trends and other useful information. You can connect with me on Linkedin.