Data cleaning is the most time consuming process in the data science lifecycle. But data exploration might be the most important one when it comes to building a good model. I have personally seen the accuracy of models drop significantly when the dataset at hand was not explored properly. It’s critical that we know what the data represents, if there are any biases, what features can we engineer, etc. All of this falls under data exploration. And now you don’t even have to write code to do this!
MIT’s research team has built a web-based data exploration system called DIVE, that lets you create stories from your data without having to write any code. You can have a look at the public version of DIVE here. It showcases the integration of advanced tools in MIT Data Science, streamlining the process for data scientists. Below is a brief summary of what you can expect from DIVE:
When it comes to analysis, the tool currently offers the below 4 options:
Below is a demo video by the team presenting the working of DIVE from uploading the dataset to exploring the tool. Have a look.
Here are the links to Front-end repository and Back-end repository provided by the team. For more information about DIVE, you can read their paper published in the proceedings of HILDA 2018.
Of course this is not the first automated tool in this space. The competition for automated ML is fierce but what makes DIVE stand out is it’s relatively lightweight appearance for quick exploration.
I took DIVE for a test run and it has impressed me a lot. It’s easy to use, is extremely efficient and the fact that I don’t have to install anything (it’s web based) is a major positive. I found the overall process extremely intuitive. Check out the below screenshots where I uploaded the dataset and analysed the data. This one is a simple statistical analysis of the variables in the dataset.
The below one is a summary of the linear regression model:
If you’re from a non-technical background, I would suggest trying out this tool. You don’t have to write a single line of code! Let me know your experience using it in the comments below.