Hacking Google Maps to create distance features in your model / applications

Tavish Srivastava Last Updated : 19 Jul, 2020

5 min read

This article is going to be different from the rest of my articles published on Analytics Vidhya – both in terms of content and format. I usually layout my article such that after a read, the reader is left to think about how this article can be implemented on grounds.

In this article, I will start with a round of brainstorming around a particular type of business problem and then talk about a sample analytics based solution to these problems. To make use of this article make sure that you follow my instructions carefully.

Let’s start with a few business cases:

Retail bank: Optimize primary bank branch allocation for all the customers. This is to make sure that the bank branch allotted to the customer is close to the mailing or permanent address of the customer for his convenience. This might be specially applicable, if we open a new branch and the closest branch for many existing customer changes to this new branch.
Retail Store chain: Send special offers to your loyal customers. But offers could be region specific so same offer cannot be sent to all. Hence, you first need to find the closest store to the customer and then mail the offer which is currently applicable for that store.
Credit card company who sells co-branded cards: You wish to find out all partner stores which are closest to your existing client base and then mail them appropriate offers.
Manufacturing plant: Wish to find out wholesalers near your plant for components required in manufacturing of the product.

What is so common in all the problems mentioned above? Each of these problems deal with getting the distance between multiple combination of source and target destinations.

Exercise : Think about at-least 2 such cases in your current industry and then at least 2 cases outside your current industry and write them in the comment section below.

A common approach

I have worked in multiple domains and saw this problem being solved in similar fashion which gives approximate but quick results.

Exercise : Can you think of a method to do the same using your currently available data and resources?

Here is the approach :

You generally have a PIN CODE for both source and destination. Using these PIN CODES, we find the centroid of these regions. Once you have both the centroids, you check their latitude and longitude. You finally calculate the eucledian distance between these two points. We approximate our required distance with this number. Following figure will explain the process better :

The two marked areas refers to different PIN CODES and the distance 10 kms is used as an approximate distance between the two points.

Exercise : Can you think of challenges with this approach ?

Here are a few I can think of :

If the point of interest is far away from the centroid, this approach will give inaccurate results.
Some times the centroid of other PIN CODE can be more closer to the point of interest than its own PIN CODE. But because it falls in area of the distant PIN CODE, we still approximate the point of interest with the centroid of distant PIN CODE.
In cases where we need finer distances than the precision of PIN CODE demarcation, this method will lead nowhere. Imagine a scenario where two branches of a bank and customer address is located in the same PIN CODE. We have no way to find the closest branch.
The distance calculated is a point to point distance and not on road. Imagine a scenario when you have two PIN Codes right next to each other but you have valley between which you need to circle around to reach destination.

A manual Approach

Say you have two branches and a single customer, how will you make a call between the two branches (which one is closer)? Here is a step by step approach :

You choose the first combination of branch-customer pair.
You feed the two addresses in Google Maps.
You pick the distance/time on road
You fill in the distance in the table with the combinations (2 in this case)
Repeat the same process with the other combination.

How to automate this approach?

Obviously, this process cannot be done manually for millions of customers and thousands of branches. But this process can be well automated (however, Google API have a few caps on the total number of searches). Here is a simple Python code which can be used to create functions to calculate the distance between two points on Google Map.

Exercise : Create a table with a few sources and destinations. Use these functions to find distance and time between those points. Reply “Done without support” if you are able to implement the code without looking at the rest of the solution.

Here is how we can read in a table of different source-destination combinations :

Notice that we have all types of combinations here. Combination 1 is a combo of two cities. Combo 4 is a combination of two detailed address. Combo 6 is a combination of a city and a monument. Let’s now try to get the distances and time & check if they make sense.

All the distance and time calculations in this table look accurate.

Exercise : What are the benefits of using this approach over the PIN CODE approach mentioned above? Can you think of a better way to do this task?

Here is the complete Code :

[stextbox id=”grey”]

import googlemaps
from datetime import datetime

def finddist(source, destination):
     gmaps = googlemaps.Client(key='XXX')
    now = datetime.now()
   directions_result = gmaps.directions(source, destination, mode="driving",departure_time=now)
   for map1 in directions_result:
         overall_stats = map1['legs']
         for dimensions in overall_stats:
                distance = dimensions['distance']
                return [distance['text']]
 
def findtime(source, destination):
      gmaps = googlemaps.Client(key='XXX')
      now = datetime.now()
      directions_result = gmaps.directions(source, destination, mode="driving",departure_time=now)
      for map1 in directions_result:
            overall_stats = map1['legs']
            for dimensions in overall_stats:
                   duration = dimensions['duration']
                   return [duration['text']]

import numpy as np
import pandas as pd
import pylab as pl 
import os
os.chdir(r"C:\Users\Tavish\Desktop")
cities = pd.read_csv("cities.csv")

cities["distance"] = 0
cities["time"] = 0
for i in range(0,8):
 source = cities['Source'][i]
 destination = cities['Destination'][i]
 cities['distance'][i] = finddist(source,destination)
 cities['time'][i] = findtime(source,destination)
[/stextbox]

End Notes

GoogleMaps API come with a few limitations on the total number of searches. You can have look at the documentation, if you see a use case of this algorithm.

Did you find the article useful? Share with us find more use cases of GoogleMaps API usage apart from the one mentioned in this article? Also share with us any links of related video or article to leverage GoogleMaps API. Do let us know your thoughts about this article in the box below.

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.

Tavish Srivastava

Tavish Srivastava, co-founder and Chief Strategy Officer of Analytics Vidhya, is an IIT Madras graduate and a passionate data-science professional with 8+ years of diverse experience in markets including the US, India and Singapore, domains including Digital Acquisitions, Customer Servicing and Customer Management, and industry including Retail Banking, Credit Cards and Insurance. He is fascinated by the idea of artificial intelligence inspired by human intelligence and enjoys every discussion, theory or even movie related to this idea.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

sumalatha

Is it a sophisticated implementation of any of the methods that are used to solve transportation problems (like modi method, VAM method etc)?

Hi Sumalatha, This article is on how to find distances between two points and has nothing to do with transportation problem frameworks. However, you can use the method described in this article to find distances between points and then use algorithms like VAM and Modi, to optimize routes. Thank you for sharing this thought. This just adds a new business case to the list I have mentioned in this article : Using distance mapping techniques in transportation problems. Tavish

Sunny

First of all, thanks a lot for sharing this wonderful article. It really helps students like me to think beyond text-books. I was trying to replicate this project and came up with some doubts (may be very silly). I will be obliged if you can answer it: a) While generating the Key from Google Maps - which one (Directions API/Distance Matrix API/any other) to select. Else if, the key is totally different than the three options above, can you please include a brief process to generate the Key. b) Will this code work for source and destination as latitude and longitude. If not, can something be done to calculate distance based on these. Thanks a lot in advance! It really helps :)

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Hacking Google Maps to create distance features in your model / applications

A common approach

A manual Approach

How to automate this approach?

End Notes

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie