Data is getting more and more complex these days as the number of data sources increases. A data scientist’s job is to extract actionable insights from this data, but as more and dimensions are added to it, this is no easy task. Humans perceive the world in 3 dimensions so recognizing patterns from thousands, if not millions, of variables is a task we rely heavily on machines for.
But even machines can struggle with this. This is where the awesome technique of dimensionality reduction comes into the picture. In case you haven’t come across this term yet, you can check out AV’s article about it here. As the name suggests, it basically reduces the number of dimensions in a dataset to make it more easy to work with. There are a few different techniques to achieve this, and one of the most common ones is called PCA, or Principal Component Analysis.
Hypertools was designed with PCA and data visualization at the core. It’s a python library designed to implement dimensionality reduction-based visual explorations of datasets (or a series of datasets) with high dimensions.
How does it work? As input, you feed in the dataset with high dimensions. In a single function command, Hypertools reduces the dimensionality of the data and visualizes it in the form of a plot. The library has been developed on top of a few popular python libraries, like scikit-learn, seaborn and of course, matplotlib.
As mentioned by the developers, below are a few main features which HyperTools provides for data scientists:
To install the latest stable version of Hypertools from pip, run the below command:
pip install hypertools
You can check out the GitHub repository for HyperTools here and also read their research paper here. Also be sure to check out the short video below which introduces this library:
I love this library! Anyone who has handled a dataset with a lot of variables knows what a headache it can be. While performing PCA is considered necessary, Hypertools makes it so much more easier for a data scientist to deal with thousands and millions of variables.
I’m a huge advocate of visualizing data so this is quickly becoming one of my favourite libraries. The way it allows you to look at your dimensions, in hyperspace and from all angles, it’s truly awesome. It’s no wonder the library has received almost a 1000 stars so quickly and has become popular in the data science community.
Try out this library and let us know how it worked out for you.
is there a step by step example in python using a the iris dataset ?
Hi Jason, Thanks for reading the article! The Iris dataset is too small (only 4 variables) for dimensionality reduction technique(s) to be effective. You would require a much bigger dataset for Hypertools to be truly useful! I would suggest going through the below article and trying it out on the dataset mentioned in it: https://www.analyticsvidhya.com/blog/2016/03/practical-guide-principal-component-analysis-python/