Hands on tutorial to perform Data Exploration using Elastic Search and Kibana (using Python)

Guest Blog Last Updated : 27 Oct, 2024

6 min read

Introduction

Exploratory Data Analysis (EDA) helps us to uncover the underlying structure of data and its dynamics through which we can maximize the insights. EDA is also critical to extract important variables and detect outliers and anomalies. Even though there are many algorithms in Machine Learning, EDA is considered to be one of the most critical part to understand and drive the business.

There are several ways to perform EDA on various platforms like Python (matplotlib, seaborn), R (ggplot2) and there are a lot of good resources on the web such as “Exploratory Data Analysis” by John W. Tukey, “Exploratory Data Analysis with R” by Roger D. Peng and so on..

In this article, I am going to talk about performing EDA using Kibana and Elastic Search.

Elastic Search
Kibana
Creating dashboards
- Indexing data
- Linking Kibana
- Making visualizations
Search bar

1. Elastic Search (ES)

Elastic Search is an open source, RESTful distributed and scalable search engine. Elastic search is extremely fast in fetching results for simple or complex queries on large amounts of data (Petabytes) because of it’s simple design and distributed nature. It is also much easier to work with than a conventional database constrained by schemas, tables.

Elastic Search provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.

Installation of Elastic Search

Installation and initialization is quite simple and it is as follows:

Download and unzip Elasticsearch
Change the directory to Elasticsearch folder
Run bin/elasticsearch (or bin\elasticsearch.bat on Windows)

Elasticsearch instance should be running at http://localhost:9200 in your browser if you run with default configuration.

Keep the terminal open where elastic search is running to be able to keep the instance running. you could also use nohup mode to run the instance in the background.

2. Kibana

Kibana is an open source data exploration and visualization tool built on Elastic Search to help you understand data better. It provides visualization capabilities on top of the content indexed on an Elasticsearch cluster. Users can create bar, line and scatter plots, or pie charts and maps on top of large volumes of data.

Installation

Installation and initialization is similar to that of Elasticsearch:

Download and unzip Kibana
Open config / Kibana.yml in an editor and Set elasticsearch.url to point at your Elasticsearch instance
Change the directory to Kibana folder
Run bin/Kibana (or bin\Kibana.bat on Windows)

Kibana instance should be running at http://localhost:5601 in your browser if you run with default configuration.

Keep the terminal open where Kibana was run to be able to keep the instance running. you could also use nohup mode to run the instance in the background.

3. Creating Dashboards

There are mainly three steps to create dashboards using ES and Kibana. I will be using Loan prediction practice problem data to create a dashboard. Please register for the problem to be able to download the data. Please check the data dictionary for more information.

Note: In this article I will be using python to read data and insert data into Elasticsearch for creating visualizations through Kibana.

Reading data

import pandas as pd

train_data_path = '../loan_prediction_data/train_u6lujuX_CVtuZ9i.csv'
test_data_path = '../loan_prediction_data/test_Y3wMUE5_7gLdaTN.csv'
train = pd.read_csv(train_data_path); print(train.shape)
test = pd.read_csv(test_data_path); print(test.shape)

(614, 13)
(367, 12)

Python Code:

import pandas as pd

train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
print(train.shape)
print(test.shape)

3.1 Indexing data

Elastic Search indexes data into its internal data format and stores them in a basic data structure similar to a JSON object. Please find the below python code to insert data into ES. Please install pyelasticsearch library as shown below for indexing through python.

Note: Please note that the code assumes that the elastic search is run with default configuration.

pip install pyelasticsearch

from time import time
from pyelasticsearch import ElasticSearch

CHUNKSIZE=100

index_name_train = "loan_prediction_train"
doc_type_train = "av-lp_train"

index_name_test = "loan_prediction_test"
doc_type_test = "av-lp_test"

def index_data(data_path, chunksize, index_name, doc_type):
    f = open(data_path)
    csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize) 
    es = ElasticSearch('http://localhost:9200/')
    try :
        es.delete_index(index_name)
    except :
        pass
    es.create_index(index_name)
    for i,df in enumerate(csvfile): 
        records=df.where(pd.notnull(df), None).T.to_dict()
        list_records=[records[it] for it in records]
        try :
            es.bulk_index(index_name, doc_type, list_records)
        except :
            print("error!, skiping chunk!")
            pass

index_data(train_data_path, CHUNKSIZE, index_name_train, doc_type_train) # Indexing train data

index_data(test_data_path, CHUNKSIZE, index_name_test, doc_type_test) # Indexing test data

DELETE /loan_prediction_train [status:404 request:0.010s]
DELETE /loan_prediction_test [status:404 request:0.009s]

3.2 Linking Kibana

Now point your browser to http://localhost:5601
Go to Management. Click on Index Patterns. Click on Add new.
Check box if your data indexed contains timestamp. Here, uncheck the box.
Enter the same index we used to index the data into ElasticSearch. (Example: loan_prediction_train).
Click on create.

Repeat the above 4 steps for loan_prediction_test. Now kibana is linked with train and test data present in elastic search

3.3 Create Visualizations

Click on Visualize > Create a Visualization > Select the Visualization type > Select the index (train or test) > Build

Example 1

Select Vertical bar chart and select train index for plotting the Loan_status distribution.
Select the y-axis as count and x-axis as Loan status

Save the visualization.
Add a dashboard > Select the index > Add the visualization just saved.

Voila!! Dashboard created.

Example 2

Click on visualize > Create a visualization > Select the visualization type > Select the index (train or test) > Build
Select Vertical bar chart and select train index for plotting the Married distribution.
Select the y-axis as count and x-axis as Married

Save the visualization.
Repeat the above steps for test index.
Open the dashboard already created. Add these visualizations

Example 3

Similarly for Gender distribution. This time we will use pie chart.

Click on visualize > Create a visualization > Select the visualization type > Select the index (train or test) > Build
Select Pie chart chart and select train index for plotting the Married distribution.
Select the slice-size as count and split-slices by Married column

Save the visualization.
Repeat the above steps for test index.
Open the dashboard already created. Add these visualizations

Finally the dashboard with all the visualizations created would look like this!

Beautiful! isn’t it?

Now I leave you here to explore more of elastic search and kibana and create various kind of visualizations

4. Search bar

Search bar allows you to explore data by string search, which helps us in understanding the changes in data with changes in one particular attribute which is not easy to do with visualizations.

Example

Go to Discover> Add Loan_Status and Credit_History
Using the search bar select only where Credit_History is 0. (Credit_History:0)
Now you can observe the changes in Loan_Status column.

Before

After

Insight: Most of the clients that had credit history 0 did not receive Loan (Loan status is N = 92.1%)

That’s all!!

This article was contributed by Supreeth Manyam (@ziron) as part of The Mightiest Pen, DataFest 2017. Supreeth won the competition and also finished second in overall leaderboard of DataFest 2017. Supreeth is a passionate Data Scientist who is keen on bringing insights to business and help it get better by analyzing relevant data using Machine Learning and Artificial Intelligence.

Guest Blog

Big data Business Intelligence Data Exploration Intermediate Libraries Programming Python Python Structured Data

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

Data Science Tools and Techniques

Preeti Agarwal

Good One

ankur

not able to import elasticsearch thing...throwing error....from pyelasticsearch import Elasticsearch

AttributeError: 'Elasticsearch' object has no attribute 'create_index' - getting this error

Show 1 reply

Supreeth Manyam

Hi Ankur, Please make sure you have installed pyelasticsearch correctly using `pip install pyelasticsearch` and you are able to import it in python. Following versions were used in this tutorial: pyelasticsearch - 1.2.3 ElasticSearch - 5.3.2 Kibana - 5.3.2 PS: This error could also be caused by the typos in the article as mentioned by Osmar Rodriguez. Now the typos are fixed!

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Hands on tutorial to perform Data Exploration using Elastic Search and Kibana (using Python)

Introduction

Table of Contents

1. Elastic Search (ES)

Installation of Elastic Search

2. Kibana

Installation

3. Creating Dashboards

Reading data

3.1 Indexing data

3.2 Linking Kibana

3.3 Create Visualizations

Example 1

Example 2

Example 3

Finally the dashboard with all the visualizations created would look like this!

4. Search bar

Example

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk