DIVE – MIT’s Open Source Tool for Data Exploration and Visualization for Data Scientists

Aishwarya Singh Last Updated : 24 Jul, 2024

3 min read

Overview

MIT has unveiled an open source tool, called DIVE, for performing data exploration and visualization
Features include all sorts of graphs and curves for visualization, and even regression capabilities
We tested the tool and were pretty impressed; check out the results and details below

Introduction

Data cleaning is the most time consuming process in the data science lifecycle. But data exploration might be the most important one when it comes to building a good model. I have personally seen the accuracy of models drop significantly when the dataset at hand was not explored properly. It’s critical that we know what the data represents, if there are any biases, what features can we engineer, etc. All of this falls under data exploration. And now you don’t even have to write code to do this!

MIT’s research team has built a web-based data exploration system called DIVE, that lets you create stories from your data without having to write any code. You can have a look at the public version of DIVE here. It showcases the integration of advanced tools in MIT Data Science, streamlining the process for data scientists. Below is a brief summary of what you can expect from DIVE:

Intelligent Data Ingestion: DIVE can sample the data to infer the types of features and the structure of datasets
Semi-automated Visualization Recommendation: DIVE lets you select fields and recommends relevant visualizations_. These visualizations can be sorted based on effectiveness, expressiveness, and statistical properties like correlation, entropy, and gini
Point-and-click Statistical Analysis: Using DIVE one can compare group means, explore relationships between fields and perform statistical analysis with just one click
WYSIWYG Visual Narratives: It provides you with a ‘what-you-see-is-what-you-get editor’ where you can share stories with interactive content linked to dynamic data

When it comes to analysis, the tool currently offers the below 4 options:

Below is a demo video by the team presenting the working of DIVE from uploading the dataset to exploring the tool. Have a look.

Here are the links to Front-end repository and Back-end repository provided by the team. For more information about DIVE, you can read their paper published in the proceedings of HILDA 2018.

Our take on this

Of course this is not the first automated tool in this space. The competition for automated ML is fierce but what makes DIVE stand out is it’s relatively lightweight appearance for quick exploration.

I took DIVE for a test run and it has impressed me a lot. It’s easy to use, is extremely efficient and the fact that I don’t have to install anything (it’s web based) is a major positive. I found the overall process extremely intuitive. Check out the below screenshots where I uploaded the dataset and analysed the data. This one is a simple statistical analysis of the variables in the dataset.

The below one is a summary of the linear regression model:

If you’re from a non-technical background, I would suggest trying out this tool. You don’t have to write a single line of code! Let me know your experience using it in the comments below.

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

Aishwarya Singh

An avid reader and blogger who loves exploring the endless world of data science and artificial intelligence. Fascinated by the limitless applications of ML and AI; eager to learn and discover the depths of data science.

AVbytes

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

DIVE – MIT’s Open Source Tool for Data Exploration and Visualization for Data Scientists

Overview

Introduction

Our take on this

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Data analyst Learning Path

Tableau Learning Path

NLP Learning Path

Data Scientist Learning Path

Data Engineer Learning Path

MLOps Learning Path

AI Engineer Learning Path

Computer Vision Learning Path

Generative AI Learning Path

Generative AI Roadmap for Enterprises

LLMs Roadmap

Prompt Engineer Leaning Path

DIVE – MIT’s Open Source Tool for Data Exploration and Visualization for Data Scientists

Overview

Introduction

Our take on this

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques