Data science is still a very nascent field in India, despite the recent surge in interest. From agriculture to healthcare, there are a plethora of challenges the Government faces on a day-to-day basis, and that was the primary reason for founding a data science department under the NITI Aayog initiative. On this Independence Day, we thought what better way to acquaint our community with these challenges and how our Goverment is using data science to tackle them, than bring NITI Aayog’s Head of Data Science straight to you?
It was a thrilling experience to have one of India’s foremost data science leaders, Dr. Avik Sarkar, on our DataHack Radio podcast. He is an eloquent speaker and he talked about various topics, from his love of mathematics to his master’s and Ph.D thesis techniques. He also provided details about the work performed by the data science team under the NITI Aayog initiative, a must-listen for all Indians.
In this article, we look at the top key points Dr. Avik made during his conversation with Kunal. Happy listening!
You can subscribe to DataHack Radio and listen to this, and all previous episodes, on any of the below platforms:
Dr. Avik’s penchant for numbers can be traced back to his childhood. He was interested in mathematics since his school days and that led him to do his bachelor’s in statistics and master’s (from IIT-Bombay) in applied statistics and informatics. He also holds a Ph.D in computer science and statistics. As you can surmise from this, he was the perfect candidate for data science!
Before he joined NITI Aayog as the Head of the Data Science cell, Dr. Avik worked in senior roles at companies like Accenture, IBM and Nokia Siemens, among others. A trend emerges when you look at his profile – he worked with data well before data science become a buzzword, and thus has a very strong background in this domain.
When Dr. Avik was learning and working on Artificial Intelligence, it was a different experience to how we see AI these days. This is what he had to say about how quickly the world of data science, machine learning and AI is advancing:
“In this domain, learning new things is something you have to do every year. It’s a rapidly evolving field – new technologies, new platforms and new coding languages come every year, so getting acquainted with these is very important.”
The subject of Dr. Avik’s master’s thesis was around multi-topic text classification. He took this up because it was an important topic due to the hierarchical information arrangement that was prevalent at that time (early 2000s). The main aim of the hierarchy was to arrange whatever text data you had into categories – it could be news articles, blogs, etc.
The internet was getting democratized as more and more Indians (and global users) started getting online in the late 90s/early 2000s. So, suddenly we went from seeing a few editors putting content online to a plethora of writers gaining access to the internet. The amount of content spiked, nothing close to what we see now, but enough to ensure that one could not manually categorize the articles into a hierarchy.
Dr. Avik saw a need for an automatic classification system that would identify these topics and put them into a hierarchy model. The more challenging problem, which he took up, was that some articles might be relevant for multiple topics.
His Ph.D was in text mining and statistical modeling on text distribution. If you are interested in NLP, do listen to this section where Dr. Avik explains why and how he took up this topic. He discusses the nuances of various techniques he used and how they helped him build up his study. It makes for fascinating listening!
“We are trying to make sense of the operational data to get a good picture about the state of the economy.”
The data science team at NITI Aayog, as Dr. Avik put it, is more of a horizontal organization. The type of analytics he and his team perform are vast in nature. Even though he had over fifteen years of experience working with data prior to joining the Government, this was an almost new body of work for him.
There is a lot of simulation and scenario modeling that they need to perform. He gave some really intuitive examples of how the team thinks about certain industries (like oil and automobiles), and the variables to consider when forecasting production and manufacturing. This qualifies as long term forecasting.
The team also uses analytics for short-term challenges as well, which are operational in nature. For example, malnutrition is a major problem in India (and has been for decades). They extract insights about which districts need more funds to deal with the issue and this has helped the people on the ground.
There are other aspects where data science is helping the government tackle long standing challenges. Taking the example of surveys, Dr. Avik explained how there is a lag of 2-3 years between initiating surveys and finally extracting meaningful insights from them. His current team at NITI Aayog is trying to do more of a real-time analysis of these things, especially critical fields like healthcare, education and agriculture.
“80% of my day goes in phone calls!”
Data collection, as Kunal pointed out, would be a major obstacle for Dr. Avik’s team. All of the things they are doing are fairly new from an Indian perspective and nothing so far has been done in a systematic or structured manner. As the above quote summarized, he spends most of his day trying to convince people to share their data.
Often there are data quality issues. Since most of the data is operational, people assume it might not be used anywhere and hence it’s stored in a very unfocused manner. A lot of the fields need to be dropped because of the serious gaps in data quality. The hope is that with time, as Dr. Avik continues his work, departments will soon realize the need to properly store this data.
A lack of data also inevitably leads to biases in the model you build. Unfortunately this is a problem India faces in almost all sectors. Mitigating these issues has become a big challenge as well and Dr. Avik pointed out this is the biggest obstacle he had to deal with.
For energy modeling, a long term initiative (takes up to 1-2 years), ‘Message Models’ and ‘Times Markel Model’ are the team’s tools of choice. For generating visualizations and dashboards to be shared with state governments, the team uses popular tools like:
Different countries have their unique challenges when it comes to adopting AI. For India, Dr. Avik believes it’s the obstacle of inclusion, or “AI for all”. This is what his team is piloting throughout the country.
Taking the example of healthcare, he explained how automating certain parts of a nurse and/or doctor’s job will help cut down on the time it takes to make a diagnosis, as well as spread the benefits of healthcare to rural places. Intriguing is the only word I can think of to describe the task Dr. Avik and his team are dealing with.
The podcast also includes details about how the team is working on certain agricultural issues throughout the country. This includes factors like yield, fertilizer, weather patterns, etc. Where does all the data comes from, you ask? Most of it is collected through satellite imagery and then broken down to analyze and extract certain patters. This helps them inform the farmers 2-3 weeks in advance that, for example, potato prices are going down so don’t sow potatoes for now.
If you are a data scientist (or aspiring to be one) working in India, this podcast is like a treasure trove of information. This article only covers the very key takeaways – there are a ton of awesome nuggets in the podcast you are sure to find useful, like the reception to data science at NITI Aayog, the ranking system Dr. Avik’s team has pioneered, etc.
The power of data science isn’t just limited to research labs and big tech companies. It was truly inspiring to hear the different issues Dr. Avik’s team is trying to solve. I hope to see our community leverage data science for good causes in the foreseeable future. 🙂 And of course, happy Independence Day to everyone!