LUX – Intelligence Visual Discovery of data using python

Mayur Last Updated : 27 Apr, 2021

6 min read

This article was published as a part of the Data Science Blogathon.

Overview

Introduction
Introduction on the library lux
Installation of lux
Setting up lux -widget in jupyter notebook
Importing Important Libraries
User-based data visualization
Automatic Encoding of visualization
EndNote

“Visualization gives you answers to questions you didn’t know you had.”

– Ben Schneiderman

I wish these upper quotes make some impact on you, this quote has deep-down meaning if we think we know that many questions answer we will get from visualizing the entire situation, why I am writing this quote you will get an answer here.

Introduction

As you are reading this article then you are definitely in the field of data science, can you hear the sentence “a picture is worth a thousand words” then as a data scientist you will know that what the importance of these words in our data science world. In data science, life cycle visualization is associated, but there is a huge amount of python libraries and tools that will ease our work.

I definitely hear the term EDA which means Exploratory data analysis, why I am pointing this? because in machine learning EDA is the most important step, let’s discuss briefly EDA for basic understanding Exploratory data analysis (EDA) is used for analyzing the data by some visualization methods, by which we can able to summarize the whole data. Its main task is to provide a better understanding of variables and the relationship among them.

Suppose if you want to visualize any dataset in machine learning with the help of python tools then you surely go for some famous tools, such famous visualization tools are matplotlib, seaborn, bokeh, etc…, and when you start coding and if are a beginner you have to suffer lots of problems to build the visualization, it is not as much easy work. There are lots of functions that you can’t remember easily as a beginner, these tools don’t provide us an intelligent visualization. We want something new in the era of automation, so what will we do now? is there have any technique that makes work faster and easier?

Yes, it was possible because of some members group, because they made up so much amazing library that automates the visualization process in some lines of code, this library name is Lux,

Let’s discuss it,

Lux

A python library is used by a data scientist to explore the data and discover something meaningful, which will be used for a machine learning project. We can say that the lux library has some features which automate the whole visualization process in less time and effort.

If the user doesn’t have any idea about the visualization, then this library provides a clear idea about it. It provides faster experience or experimentation with data visualization.

The visualization displays in an amazing way, by these we can browse through a large collection of visualization within jupyter notebook.

We will use Jupyter notebook for the python environment, so we discuss the installation of lux:-

Installation of Lux

For installation, you have to open a command prompt or you can use jupyter notebook as well and use the following code:

pip install lux

If you faced any problem regarding installation then you ask me to comment or re the documentation. After the installation of lux, you have to open the jupyter notebook for further processes,

setting up lux -widget in jupyter notebook

We have to activate the lux -widget into jupyter notebook for using lux, execute this following code into jupyter notebook :

pip install --py luxwidget

pip enable --py luxwidget

Now, make sure you will check lux-widget is install or not for checking you execute this,

jupyter nbextension list

You can also use VSCode but VSCode only support lux-widget greater than 0.1.2.

Importing Important Libraries

# importing lux
import lux
#importing pandas
import pandas as pd

Lux is designed to tightly combining with pandas, which you can use without modifying the existing code, you have to trigger the logger to use the simple statistics,

lux.logger = true

Now, we read our dataframe with the help of pandas, we use colleges dataset for use:

df  = pd.read_csv('college.csv')

If we print our dataset then see what’s new happens,

When the dataframe is df is executed then, lux automatically recommends a set of visualizations highlighting interesting trends and patterns in the dataset. As you can see that one amazing option was added at the top right corner, if we click on this there are lots of visualization graphs and plots are created.

If you notice, in the above image there are 3 options that were associated with the plots and graphs. These options are very beneficial for the user as he/she can take their own exploration. Now, we will take a brief introduction to those tabs or options.

Correlation

This is the first tab we saw, in the output image, it is a set of relationships between the most correlated feature and least correlated feature. let we will see how our dataset features visualization create.

In the above image, notice the point that “Show the relationship between two quantitative attributes”. you understand that the plot 1 of 3 there is a correlation between two features, where the value of AverageFacultySalary is quite changed by the median earnings.

Distribution

It visualizes the histograms of the two computable features, it clear that how 1st feature changed with respect to the 2nd one.

Occurrence:

It shows the set of bar plots.

User-based data visualization

We can also perform data visualization on our choosable features, it mean we can visualize those pair of feature we wanted to be, you only have to set attributes and values which we are interested in the lux inbuild function intent.

Let us perform the user-based data visualization on our dataset:

df.intent = ["AverageCost","SATAverage"]
df

Suppose, imagine in the dataset we are interested in two features AverageCost and SATAverage, so we those features,

The current visualization generated based on what the user is interested in. Have you notice on the left side top, there are interesting tabs just like the above discussion, they are the set of recommendations. As we will discuss in brief below:

Enhance:

Basically, the mean of enhancing is to improve, so here it improves the current attribute by adding extra features.

Here see that if we break the relation between SATAverage and AverageCost by the HighestDegree then there is a change in the data distribution of Associate, Bachelors, and Graduates.

Filter:

The filters the distribution of variables with user intent features.

Here we apply the filter on the Region as you can see in the image.

Generalize:

It completely deletes the additional attributes and filters from the plots to display a more general model of features relationship.

Automatic Encoding of visualization

If you don’t know the intention behind the Lux, then it was that the user should always be able to visualize anything they want. The user can also create their own visualization with the Vis function.

from  lux.vis.Vis import Vis
Vis(["Region=New England","MedianEarnings"],df)

This visualization are can be stored into HTML files by writing the following code:

df.save_as_html('File_name.html')

EndNote

I hope now you have a detailed understanding of the lux library after reading this article, this was pretty much an amazing library was interact with. My friend suggested I take knowledge on lux, and then I wish to write an article on it. Hope it was helpful for you.

Thank You.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Mayur

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

Data Science Tools and Techniques

Madhuri Verma

Tremendous!...That's really nice✨🤟

Rex Smith

LUX seems like a very interesting adjunct to pandas. But I get the following errors 1)Usage: pip install [options] [package-index-options] ... pip install [options] -r [package-index-options] ... pip install [options] [-e] ... pip install [options] [-e] ... pip install [options] ... ambiguous option: --py (--pypi-url, --python-version?) 2)NameError Traceback (most recent call last) in ----> 1 lux.logger = true NameError: name 'true' is not defined Thanks

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

LUX – Intelligence Visual Discovery of data using python

Overview

“Visualization gives you answers to questions you didn’t know you had.”

– Ben Schneiderman

Introduction

Lux

Installation of Lux

setting up lux -widget in jupyter notebook

Importing Important Libraries

Correlation

Distribution

Occurrence:

User-based data visualization

Enhance:

Filter:

Generalize:

Automatic Encoding of visualization

EndNote

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk