Building additional features & variables through open data sources

Tavish Srivastava Last Updated : 05 Nov, 2024

5 min read

Power of Analytics

Recently, while travelling, I met a few people who perceived analytics as a passive industry. They considered it to be a limited growth industry. On the contrary, I always wonder about the enormous source of accessible data available at our fingertips – big thanks to the search engines! Exploitation of twitter feeds for sentimental analysis is no longer a tough row to hoe.

Let’s understand this unseen power of analytics by an example.

Suppose, you run an international chain of retail stores, say Bresco. You run all sorts of loyalty programs to collect customer data. You have also tied up with commercial banks. As a result, you get all the necessary data about your customers ranging from bank account details, card details, demographic information, food preferences etc. Now, the collected data can help you in creating a ‘virtual image’ of the customers. Based on the image, predicting what type of food they would have next, or their future purchases can do wonders for your store.

In this article, we’ll look at freely available sources of information and discuss how they can be used in context of analytics.

Social Networks bring out two very critical pieces of information, we could not have known otherwise.

First, the unrealized customer preference. Using the behavioural information of customers social media, we can predict what customer prefers. This information can aid existing information about the customer as well. For instance, if a customer transact a lot on restaurants, we can say that the customer is a Foodie and likes to visit different restaurants. But this might be just a requirement of his job and not his preference. Yet, if such an inference comes out from his social network, we can be more certain of what customer really likes and what not.

Customer preference can be carved out from the customers network (if he has more people who have been referred to as Foodies, this person might be a foodie as well), the photos he has been tagged / Check ins in (if he is tagged in multiple restaurants, he might be a Foodie), his comments, hash tags etc. Social Media can bring out such information, which can help us make our products more customer centric.

ALSO SEE: Here is an article which can give you a kick start using Twitter Sentiment analysis.

Second, the customer network information. Social media can bring out the type of people network a customer owns. Imagine, we have a social media management team who can resolve 10,000 customer complaints in a day. But, we started getting 1,00,000 complaints everyday on social media. How should we prioritize addressing these complaints? A very simple way to do this is to quantitatively assess the network strength of the customer and choose the stronger ones. For instance, complaints coming from person X will be more important than person Y, if the people X interacts with are more influential than those of Y.

ALSO SEE: Here is an article which can give you a kick start using network analysis.

Google API

Google can help us create features in multiple ways. Here we will take help of Google in two different ways:

First, the direct information which can be extracted from Google. A few example are as follows:

Google Maps can serve as a revolutionary step to measure distances between subject places. In case of Bresco (refer above), we have the location of our stores and address of customers. Using this set of information and customer preferences collected through social network, we can recommend customer, the most suitable offer in our nearest outlet. Till now, centroid to centroid distances between store location and customer location are the most commonly used methodology, which were highly inaccurate, given the area of each pin code is reasonably big.
Google Spreadsheets and other shared drives can be directly accessed using APIs. Many public survey results can be found shared on these Google drives which can be accessed using the API.
Google + is again another social network which can be harnessed to bring out relevant customer information.
Google Trends can also be used as an input to many time series models to understand the popularity of different products/ topics and interest.

Second, the capability of Google being leveraged directly in our analytics projects. Google has always been the undisputed leader in data science. We can leverage Google’s strong algorithms directly. Below are a few ways:

1. Google has the facility of auto-correcting spellings. In text mining concepts, this is like an unnatural power which can be directly leveraged. For example, I have a list of cricketers from the year 1970 to 2015. I want to aggregate all the records made by all cricketers. But, the information is manually typed, hence requires cleaning. One of the record states Mahendra Singh Thoni! Should we combine this with Mahendra Singh Dhoni’s record or not? Of course the answer is yes, but we cannot go to each record and check. So we make an automated system which uses Google API and search for the keyword and picks up the top 5-10 searches. If all these searches respond to a single key (which in this case is MS Dhoni) we will impute the information by new key. Here is a video which can help you write python codes to bring out all the search links for a keyword.

2. Google also has the capability to know the popularity of different pages. Using this we can check the popularity of different pages in different countries. This can dictate us a few key trends for each country.

3. Google’s capability to recognize language can also be exploited to impute information in countries like Germany or Japan, where information is directly fed in the local language. Though, this can be translated using Google Translator to standardize the entire data.

Just like Google in search, Youtube is the undisputed worldwide leader in video sharing websites. Youtube API can be used to find the popularity of videos and thereby the popularity of the topic of videos. All the likes, dislikes, comments information can tied up together to understand the trends in preference of customers.

ALSO SEE: Here is an article which will get you kick started with harnessing You tube information.

End Notes

My objective of writing this article was to ignite interest in upcoming data sources which can be readily used in different industries without much investments. The information sources stated above are easily accessible and carry massive potential of transforming analytics industry.

Did you find the article useful? Share with us all the new sources of information which you have used in your projects. Also share with us any links of related video or article to leverage these data sources. Do let us know your thoughts about this article in the box below.

Tavish Srivastava

Tavish Srivastava, co-founder and Chief Strategy Officer of Analytics Vidhya, is an IIT Madras graduate and a passionate data-science professional with 8+ years of diverse experience in markets including the US, India and Singapore, domains including Digital Acquisitions, Customer Servicing and Customer Management, and industry including Retail Banking, Credit Cards and Insurance. He is fascinated by the idea of artificial intelligence inspired by human intelligence and enjoys every discussion, theory or even movie related to this idea.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Building additional features & variables through open data sources

Power of Analytics

Google API

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Building additional features & variables through open data sources

Power of Analytics

Social Media

Google API

Video Sharing Website

End Notes

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques