Top 8 Hidden Python Packages For Machine Learning in 2021

Akshay Last Updated : 24 Apr, 2021
6 min read
This article was published as a part of the Data Science Blogathon.

Introduction

Python is one of the most loved languages in the data science and machine learning world. It is easy to learn and provides a bunch of libraries and packages and having good developers community. Python Libraries and Packages are a group of modules that makes our life easier. There are more than 137,000 python libraries and 198,826 python packages prepared to facilitate engineers’ ordinary programming experience. These libraries and packages are planned for an assortment of advanced arrangements.

As a data science enthusiast, I have seen people always talking about some famous libraries like for data manipulation pandas and NumPy, for data visualization matplotlib, seaborn, plotly, and many more, for modeling scikit-learn, TensorFlow, etc. In this article I’m not going to cover these libraries as tons of blogs are already available, check my article on the most used python libraries here. But in his article, I am going to cover some hidden gems of python libraries that are unknown to the data science world. These are some important libraries you can check out in 2021.

Python Packages

These libraries include functionality such as handling missing values in an organized way, handle emojis, converting numbers into ints and floats, visualization intelligence tools, time series modelling and many more. It covers a vast range of topics like from natural language processing to data visualization and time series. So let’s get started.

 

Table of Contents

  1. Missingo
  2. Emot
  3. Bamboolib
  4. ppscore
  5. AutoViz
  6. Numerizer
  7. PyFlux
  8. Flash Text
Python Packages collage

 

Missingo

The real-world datasets generally contain a lot of missing and null values. This might be due to various reasons like data leakage, data is not available etc. Sometimes, it is very irritating to deal with these kinds of messy data. This messy data requires special attention before feeding into machine learning algorithms as these algorithms don’t handle missing values.
We need a better approach to handle these missing values. Here comes the magic of the python library called missingo. It helps us to deal with missing values with the help of data visualisations in a much better way. This is based on matplotlib. As of now April 2021, it has four types of plots for the understanding distribution of missing data namely bar chart. heatmap, matrix, and dendrogram. So let’s get started.

Installation

pip install missingo

Importing  the library

import missingo as msns

In the below bar plot, you can see the number of missing values in each column:

Python Packages missingo plot

For more information, check the official documentation: Link

 

Emot

Emojis are very common in chats. When you deal with natural language processing tasks, it is very tedious to deal with Emojis. Here comes a very handy library to get rid of the emoticons from the text data. It is a famous python library that is very useful when we have to deal with Emoji and Emoticons. It works well with Python 2 and Python 3. It takes a string as an input and returns a list of the dictionary. So let’s get started.

Installation

pip install emot

Importing the library

import emot

Code

import emot
text = "I love python 👨 :-)"
emot.emoji(text)
[{'value': '👨', 'mean': ':man:', 'location': [14, 14], 'flag': True}]
emot.emoticons(text)
{'value': [':-)'], 'location': [[16, 19]], 'mean': ['Happy face smiley'], 'flag': True}

 

EMOT

For more information, check the official documentation: Link

 

Bamboolib

Analyzing and Visualizing the information is the most significant and time taking interaction. We need to put a great deal of time to unmistakably investigate what is the issue here and what it is attempting to tell. We utilize various sorts of python libraries to envision the examples and oddities in the dataset to get comfortable with the dataset.

Bamboolib is GUI for pandas DataFrames that empowers anybody to work with python in Jupyter Notebook or JupyterLab. Bamboolib is a profoundly intelligent and broadly supportive library to examine, imagine, and control information.

Indeed, even an individual with a non-programming foundation can utilize it to draw bits of knowledge from information since it doesn’t need any coding experience. Bamboolib isn’t open-source which implies that you need to purchase bamboolib to utilize it, yet it gives a 14-day free preliminary form so you can completely investigate it and perceive how it very well may be valuable for you.

 

Installation

pip install bamboolib

Importing the library

import bamboolib

 

Python Packages Bamboolib

For more information, check the official documentation: Link

 

Ppscore

Full from of ppscore is Predictive Power Score. This python library is made by bamboolib developers. The Predictive Power Score is an alternative to the correlation matrix. This score is asymmetric and can detect the linear or non-linear relationships between two columns in our dataset. So let’s get started with this library.

Installation

pip install ppscore

Importing the library

import ppscore

 

ppscore Python Packages

For more information, check the official documentation: Link

AutoViz

It is the most underrated python library that has been used to perform exploratory data analysis. This library automatically visualizes any kind of dataset including large datasets as well. Beautiful visualizations can be drawn with just a single code. You have to just provide your data file (txt, JSON or CSV) and it will automatically visualize it. Just upload your data and AutoViz will automatically give you the right charts that help you to derive insights within seconds. So let’s get started.

Python Packages Autoviz

Installation

pip install autoviz

Importing the library

import autoviz
plot autoviz

For more information, check the official documentation: Link

 

Numerizer

It is a very interesting python module for text processing. It converts natural language numbers into floats and ints. This is a very useful module in natural language processing tasks. For
example, if it converts ‘forty-two’ to 42, ‘one billion and one’ to 1000000001
etc. So let’s get started.

Installation

pip install numerizer

Importing the library

from numerizer import numerize

Code

numerize(‘forty-two’)
'42'
numerize('one billion and one')
'1000000001'

For more information, check the official documentation: Link

 

PyFlux

Time series investigation is quite possibly the most oftentimes experienced issues in the Machine learning area. PyFlux is an open-source library in Python that unequivocally worked for working with time series issues. The library has a brilliant cluster of present-day time arrangement models including yet not restricted to ARIMA, GARCH, and VAR models. So, PyFlux offers a probabilistic way to deal with time arrangement displaying. So let’s get started.

Installation

pip install pyflux

Importing the library

import pyflux

 

pyflux

For more information, check the official documentation: Link

 

FlashText

FlashText is a Python library made explicitly to search at replacing the words in a record. Presently, how FlashText works is that it requires a word or a rundown of words and a string. The words which FlashText calls keywords are then looked at or supplanted in the string.

Allow us to look at in insight regarding FlashText working. At the point when keywords are passed to FlashText for looking or supplanting, they are put away as a Trie Data Structure which is productive at Retrieval assignments. So let’s get started.

Installation

pip install flashtext

Importing the library

import flashtext

Searching:

 

Flash Text

 

Replacing:

 

Replacing

For more information, check the official documentation: Link

Final Note

You can check my articles here: Articles

Thanks for reading this article and for your patience. Do let me in the comment section about feedback. Share this article, it will give me the motivation to write more blogs for the data science community.

Email id: gakshay1210@gmail.com

Follow me on LinkedIn: LinkedIn

The media are shown in this article on Python Packages are not owned by Analytics Vidhya and is used at the Author’s discretion. 

Responses From Readers

Clear

Kris Bradley
Kris Bradley

Just a note, Pyflux is not maintained by the creator anymore and the package was built as an experiment only. It has not even been officially tested, so be careful using this package (taken from a note from the creator on githib).

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details