One of the common challenge faced by Analytics professionals is geo & radius related questions such as:
In all these scenarios, we need to implement the radius search to get us an answer. We can use our favorite analytical package such as R, SAS or even PG SQL’s. However, when we move to implement these type of analytics to a live application, which requires fast response, the infrastructure engineering could become tricky.
AWS Cloudsearch, a cloud search service from Amazon Web Services, could help in some of these problems. AWS Cloud search is a hosted search platform, which can be used to search large collections of data such as web pages, document files, forum posts, or product information. Search indexing technologies such as Lucene have existed for a long time but AWS’s Cloud search relies on the AWS platform, which is used to power Amazon’s own shopping search engine. This means that the kinks that are required to be ironed out from the dev server to live operations have been worked out.
AWS Cloudsearch indexes and searches both structured data and plain text. Some of the features are:
We can get search results in JSON or XML, sort and filter results based on field values, and sort results alphabetically, numerically, or according to custom expressions.
We can follow these broad steps to build a search domain in AWS Cloud Search:
vHomeInsurance collected detailed data on home insurance & property values across the US and compared it within specific geographic and regional areas. Here is an example of home prices using a heat map showing the house prices across the US. Consumers & analysts need to understand where to live within those pockets based on home prices given the property values, home insurance & other factors.
For example, if you want to live in Atlanta, you want to identify the cheapest home insurance in Atlanta and nearby locations. AWS Cloud search helps you do that using geo-location searching. Here is another example data set in California and places close to Los Angeles for home insurance & property values.
City | Home Insurance | Property Value | Number of Homes | Zipcode | Lat | Long |
Los angeles | $642 | $470,000 | 1419626 | 90001 | 33.7866 | -118.2987 |
San Diego | $635 | $426,100 | 861451 | 92104 | 32.7397 | -117.1293 |
San Jose | $672 | $659,100 | 592151 | 95101 | 37.3435 | -121.8887 |
Sacramento | $606 | $242,100 | 431564 | 94203 | 38.5854 | -121.4925 |
San Francisco | $687 | $750,900 | 375861 | 94103 | 37.7731 | -122.411 |
San Bernardino | $602 | $217,800 | 240838 | 92401 | 34.1054 | -117.2912 |
Fresno | $602 | $217,100 | 232708 | 93650 | 36.8419 | -119.7952 |
Ontario | $624 | $355,700 | 195756 | 91758 | 34.0635 | -117.6503 |
We index the above table using the cloud search API within the search domain in cloud search. The indexed fields can include home insurance rates, home value, number of homes, city, state, Zipcode, population & lat long details.
AWS Cloud Search uses Cosine search for its geo location search. A brief explanation on the cosine search is available below and details can be found here.
Law of cosines is more preferable than haversine when calculating distance between two latitude-longitude points.It gives well-conditioned results down to distances as small as around 1 metre. In view of this, it is probably worth, in most situations, using either the simpler law of cosines or the more accurate ellipsoidal Vincenty formula in preference to haversine.
Law of cosines:
d = acos( sin φ1 ⋅ sin φ2 + cos φ1 ⋅ cos φ2 ⋅ cos Δλ ) ⋅ R
var φ1 = lat1.toRadians(), φ2 = lat2.toRadians(),
Δλ = (lon2-lon1).toRadians(), R = 6371;
var d = Math.acos( Math.sin(φ1)*Math.sin(φ2) + Math.cos(φ1)*Math.cos(φ2) * Math.cos(Δλ) ) * R;
Lat/Lon in degrees:
d = ACOS( SIN(lat1*PI()/180)*SIN(lat2*PI()/180) + COS(lat1*PI()/180)*COS(lat2*PI()/180)*COS(lon2*PI()/180-lon1*PI()/180)) * 6371;
The formula above can be used to find the distance between one geo location to all the document locations in the search domain and return the documents which are in the specified radius.
Here is an example search query to find the cheapest areas for home insurance in Los Angeles within a 50 miles radius:
dis_rank="&rank-dis=acos(sin(latitude)*sin(3.141*lat/(1000000*180))%2Bcos(latitude)*cos(3.141*lat/(1000000*180))*cos(longitude-(-3.141*(long-18100000)/(100000*180))))*6371*0.6214" ;
treshold=”&t-dis=..radius”
For example Los angeles latitude=33.7866,longitude=-118.2987,radius=50(miles). If you pass these values to the above query, it will return all the documents which are less than 50 miles range to Los angeles.
The simplicity and consistency of the AWS cloud search can enable you to do Geo analytics on the fly instead of developing a custom infrastructure and an entire team to support and maintain the search infrastructure.
Hope this article helps you to solve your needs on geo-location analytics with geo radius questions.
—-
This article has been contributed by vHomeInsurance.com. vHomeInsurance.com (www.vhomeinsurance.com) analyzes home insurance rates, home values and other factors to help home owners make better decisions about their insurance.