Guide to Land Cover Classification using Google Earth Engine

Soumyadarshani Dash Last Updated : 10 Jul, 2024

9 min read

Introduction

Land segmentation is significant in farther detecting and geological data frameworks (GIS) for analyzing and classifying diverse arrive cover sorts in partisan symbolism. This direct will walk you through making a arrive division demonstrate utilizing Google Soil Motor (GEE) and joining it with Python for upgraded usefulness. By the conclusion of this direct, you’ll get it how to stack adj. symbolism, prepare it, and apply machine learning procedures for arrive cover classification.

Guide to Land Cover Classification using Google Earth Engine and Python

Learning Objective

Understand how to set up and authenticate the Google Earth Engine (GEE) API for geospatial analysis.
Learn to retrieve and preprocess satellite imagery, including cloud masking, using GEE.
Gain the ability to calculate the Normalized Difference Vegetation Index (NDVI) for assessing vegetation health.
Acquire skills in preparing training data and applying k-means clustering for land cover classification.
Develop proficiency in visualizing geospatial data and classification results using Folium.
Implement error handling to ensure the reliability and robustness of satellite imagery processing code.

This article was published as a part of the Data Science Blogathon.

Introduction to Google Earth Engine
Setting Up Your Environment
Retrieving and Preprocessing Satellite Imagery
Cloud Masking
Calculating NDVI
Training Data Preparation
K-Means Clustering for Land Cover Classification
Visualization
Error Handling
Future Applications
Frequently Asked Questions

Introduction to Google Earth Engine

Google Soil Motor may be a cloud-based stage for planetary-scale natural information investigation. It combines a multi-petabyte catalog of toady symbolism and geospatial datasets with effective preparing capabilities. GEE is broadly utilized for inaccessible detecting errands like arrive division due to its vigorous preparing capacities and broad information library.

In this guide, we’ll walk through the process of land cover classification using Landsat imagery and GEE in Python. We’ll classify land cover into different classes using k-means clustering. Here’s what we’ll cover:

Setting up Google Earth Engine
Retrieving and Preprocessing Satellite Imagery
Cloud Masking
Calculating NDVI (Normalized Difference Vegetation Index)
Training Data Preparation
K-Means Clustering for Land Cover Classification
Visualization

Google Earth Engine provides all the data used in this model.

Setting Up Your Environment

First, install the Earth Engine API and authenticate your account using the following code:

# Install and Import the Earth Engine API
!pip install earthengine-api

import ee
import folium

# Authenticate and initialize with specific project
ee.Authenticate()
ee.Initialize(project='ee-dashsoumyadarshani')

The Earth Engine API could be a capable geospatial investigation stage created by Google, providing access to a endless file of toady symbolism and geospatial datasets. It allows users to perform large-scale processing and analysis of remote sensing data using Google’s infrastructure.

This pop-up warns that any resources created using the API may be deleted if the API is disabled, and all code utilizing this project’s credentials to call the Google Earth Engine API will fail.

The background displays detailed metrics for various methods, including ListAlgorithms, ListOperations, ListAssets, and CreateMap, with their respective request counts, errors, and average latencies. The data indicates low usage and error rates, with latencies generally under half a second, except for CreateMap, which has a higher average latency of 1.038 seconds.

The “APIs & Services” dashboard on the Google Cloud Platform provides an overview of the API’s traffic, errors, and latency. According to the dashboard, there were 64 requests made to the Google Earth Engine API, with a 10.94% error rate, equating to 7 errors. The median latency stands at 229 milliseconds, while the 95th percentile latency reaches up to 2.656 seconds, indicating some variability in response times. The traffic and error graphs illustrate peaks at specific times, suggesting periods of higher activity or potential issues.

The Earth Engine API could be a capable instrument that empowers the checking of different natural variables, such as activity, vegetation wellbeing, and arrive cover changes, utilizing partisan symbolism and geospatial information. This capability enables clients to analyze and track energetic wonders on Earth’s surface over time, giving basic experiences for natural checking and administration.

Retrieving and Preprocessing Satellite Imagery

Define your Area of Interest (AOI) and fetch Landsat imagery:

aoi = ee.Geometry.Rectangle([-73.96, 40.69, -73.92, 40.71])

# Fetch Landsat imagery
landsat = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR') \
    .filterBounds(aoi) \
    .filterDate('2020-01-01', '2024-05-30')

We utilize Landsat 8 symbolism from the LANDSAT/LC08/C01/T1_SR dataset. Landsat 8, propelled in 2013, may be an adherent overseen jointly by NASA and the U.S. Topographical Overview (USGS). It carries two sensors: the Operational Arrive Imager (OLI), which captures information in nine unearthly groups counting obvious, near-infrared, and shortwave infrared, and the Warm Infrared Sensor (TIRS), which captures information in two warm groups.

This dataset contains climatically adjusted surface reflectance and land surface temperature inferred from the information delivered by these sensors.

Band 2 (Blue)
Band 3 (Green)
Band 4 (Red)
Band 5 (Near Infrared, NIR)
Band 6 (Shortwave Infrared 1, SWIR1)
Band 7 (Shortwave Infrared 2, SWIR2)

These bands are crucial for various remote sensing applications, including accurate assessment of different land cover types, cloud masking, and calculation of indices like NDVI for vegetation analysis. The combination of these unearthly groups empowers comprehensive inaccessible detecting investigation, fundamental for precise arrive cover classification and vegetation assessment.

LANDSAT/LC08/C01/T1_S

Cloud Masking

Cloud masking is the method of distinguishing and expelling clouds and their shadows from adj. pictures to guarantee clearer and more precise investigation.

Create a function to mask clouds and apply it to the image collection:

def maskL8sr(image):
    cloudShadowBitMask = (1 << 3)
    cloudsBitMask = (1 << 5)
    qa = image.select('pixel_qa')
    mask = qa.bitwiseAnd(cloudShadowBitMask).eq(0).And(
        qa.bitwiseAnd(cloudsBitMask).eq(0))
    return image.updateMask(mask)

# Apply cloud masking function to the image collection
landsat = landsat.map(maskL8sr)

In remote sensing, clouds can cloud the Earth’s surface, driving to wrong information elucidation. By applying cloud masking, we filter out these unwanted elements, allowing us to focus on the actual land features and perform precise tasks like land segmentation.

In our project, cloud masking is crucial because it helps eliminate interference from clouds, ensuring that our analysis and classification of land cover types are based on reliable and unobstructed imagery.

We create a function to mask clouds using the pixel quality attributes from the Landsat 8 images and apply this function to the entire image collection to ensure clearer, more accurate analysis. This step is essential for removing cloud and cloud shadow interference in our land cover classification process.

Calculating NDVI

Calculate NDVI for each image in the collection:

median_landsat = landsat.median()
ndvi = median_landsat.normalizedDifference(['B5', 'B4']).rename('NDVI')
median_landsat_with_ndvi = median_landsat.addBands(ndvi)

We calculate the Normalized Contrast Vegetation Record (NDVI) for each picture within the collection utilizing the near-infrared (NIR) and red bands. NDVI may be a key marker of vegetation well-being and thickness, and it is calculated as follows:

where:

The Normalized Distinction Vegetation File (NDVI) may be a key pointer of vegetation health and thickness. It is calculated utilizing the reflectance values within the near-infrared (NIR) and ruddy groups of disciple symbolism.

NIR is the reflectance in the near-infrared band (Band 5 for Landsat 8).
Red is the reflectance in the red band (Band 4 for Landsat 8).

This list makes a difference recognize vegetated regions from non-vegetated zones in our arrive cover classification.

NDVI makes a difference recognize vegetated zones from non-vegetated ones. Higher NDVI values indicate more advantageous vegetation, which helps in precisely classifying arrive cover sorts, particularly in recognizing between vegetation and urban or fruitless regions.

The advent of NDVI changed all that by enabling the use of satellite data to provide consistent, reliable, and expansive insights into the Earth’s vegetative landscapes.

Training Data Preparation

Prepare training data by sampling pixels from the image:

training = median_landsat_with_ndvi.select(['B4', 'B3', 'B2', 'NDVI']).sample({
    'region': aoi,
    'scale': 30,
    'numPixels': 1000
})

Prepare the training data by sampling pixels from the image. We select specific bands and calculate NDVI for each pixel, then sample these values over the defined AOI. This process involves extracting a representative set of pixels, which are used to train our clustering algorithm for land cover classification. The training data includes a specified number of pixels, ensuring a robust dataset for accurate model training.

K-Means Clustering for Land Cover Classification

Perform k-means clustering on the training data:

num_clusters = 5
clusterer = ee.Clusterer.wekaKMeans(num_clusters).train(training)
result = median_landsat_with_ndvi.cluster(clusterer)

Perform k-means clustering on the training data to classify land cover types. This involves using the extracted pixel values, including the spectral bands and calculated NDVI, as input features for the clustering algorithm. K-means clustering groups the pixels into a specified number of clusters based on their spectral similarities,. Allowing us to categorize different land cover types such as urban areas, vegetation, water bodies, bare soil, and mixed land cover areas. This unsupervised machine learning technique helps identify distinct land cover classes without prior label information.

Visualization

Visualize the original and clustered images using Folium:

# Visualization of original image with NDVI
map_before = folium.Map(location=[40.70, -73.94], zoom_start=12)

vis_params_before = {
    'bands': ['B4', 'B3', 'B2'],
    'min': 0,
    'max': 3000,
    'gamma': 1.4
}

map_before.add_ee_layer(median_landsat_with_ndvi, vis_params_before, 'Median Image with NDVI')
map_before.add_child(folium.LayerControl())
map_before

New York

# Visualization of clustered image
map_after = folium.Map(location=[40.70, -73.94], zoom_start=12)

vis_params_after = {
    'min': 0,
    'max': num_clusters - 1,
    'palette': ['red', 'green', 'blue', 'orange', 'gray']
}

map_after.add_ee_layer(result, vis_params_after, 'Clustered Image')
map_after.add_child(folium.LayerControl())
map_after

The color palette used in our land cover classification model assigns specific colors to different land cover types:

Red often represents urban or built-up areas due to their high reflectance in the visible red band, making it easy to identify high-density regions like cities or towns.
Green typically indicates vegetation, such as forests, grasslands, and agricultural fields, which have high reflectance in the near-infrared band and high NDVI values.
Blue is commonly used to depict water bodies, including rivers, lakes, and oceans, as water has low reflectance in most bands.
Orange represents bare soil or sparse vegetation, characterized by moderate reflectance in visible bands and lower NDVI values compared to dense vegetation.
Gray is used for areas not easily classified into other categories, such as mixed land cover types, shadowed regions, or barren lands with very low vegetation cover.

Error Handling

Adding error handling to the code makes it more robust and reliable:

try:
    # Code for retrieving and processing satellite imagery
    median_landsat = landsat.median()
    ndvi = median_landsat.normalizedDifference(['B5', 'B4']).rename('NDVI')
    median_landsat_with_ndvi = median_landsat.addBands(ndvi)
except Exception as e:
    print(f"An error occurred: {e}")

We also applied the same land cover classification model to the San Francisco area to evaluate its effectiveness in a different urban environment. Using the same process of retrieving Landsat imagery, cloud masking, NDVI calculation, and k-means clustering. We classified the land cover into five distinct types.

The resulting map shows a clear distinction between urban areas, vegetation, water bodies, bare soil, and mixed areas, demonstrating the model’s ability to segment diverse land cover types accurately. Below is the output image for San Francisco:

Future Applications

This land segmentation model can extend and improve in several ways, providing solutions for various future challenges.

Environmental Monitoring: Continuously monitor changes in vegetation health, urban expansion, and water bodies.
Disaster Management: Assess damage from natural disasters like floods and wildfires by comparing pre-and post-event imagery.
Agricultural Planning: Monitor crop health and predict yields using vegetation indices.
Urban Planning: Analyze land use changes and plan sustainable urban expansion.
Climate Change Studies: Track long-term changes in land cover and their correlation with climate data.

By leveraging Google Earth Engine’s information handling capabilities and joining with Python. It able to construct vigorous models to address these challenges, giving important bits of knowledge to analysts, policymakers, and organizers.

Conclusion

This guide has walked you through the process of land cover classification using Google Earth Engine and Python. By retrieving and preprocessing satellite imagery, applying cloud masking, calculating NDVI, preparing training data, and using k-means clustering, we’ve classified land cover types in both New York and San Francisco. This methodology applies to various other regions and datasets, enabling the analysis of land cover changes, environmental monitoring, and urban planning. It allows for the classification of different land cover types and provides valuable insights into spatial patterns and dynamics.

Key Takeaways

The arrive, division, shows bolsters natural checking, catastrophe administration, agrarian arranging, urban arranging, and climate alter ponders.
GEE provides a cloud-based stage for getting to and preparing huge volumes of today symbolism and geospatial information.
You can adjust the land cover classification strategy for different regions and datasets by modifying parameters such as the region of interest and date ranges.
NDVI distinguishes healthy vegetation from other land cover types, crucial for accurate classification and monitoring.
Combining GEE with Python enhances the development of robust land cover classification models, offering valuable insights for various stakeholders.

Frequently Asked Questions

Q1. What arrives division, and why is it critical?

A. Arrive, division, also known as arrive cover classification, includes isolating a geological region into fragments based on arrive cover sorts such as vegetation, urban regions, water bodies, and uncovered soil. This preparation is pivotal for natural observation, urban arranging, farming, and calamity administration. It makes a difference in understanding arrive utilize designs, following changes over time, and making educated choices for economic advancement.

Q2. How does Google Earth Engine facilitate land segmentation?

A. GEE provides a cloud-based platform with extensive disciple symbol and geospatial datasets. This enables efficient large-scale analyses for complex land segmentation tasks.

Q3. How is NDVI used in land segmentation?

A. The NDVI could be a key marker of vegetation well-being and thickness. It is calculated utilizing the reflectance values within the near-infrared (NIR) and ruddy groups of adj. symbolism. In the arrival division, NDVI makes a difference in recognizing vegetated regions from non-vegetated ones. Higher NDVI values demonstrate more advantageous vegetation, which helps in precisely classifying arrival cover sorts, particularly in recognizing between vegetation and urban or desolate zones.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Soumyadarshani Dash

I'm Soumyadarshani Dash, and I'm embarking on an exhilarating journey of exploration within the captivating realm of Data Science. As a dedicated graduate student with a Bachelor's degree in Commerce (B.Com), I have discovered my passion for the enthralling world of data-driven insights.

My dedication to continuous improvement has garnered me a 5⭐ rating on HackerRank, along with accolades from Microsoft. I've also completed courses on esteemed platforms like Great Learning and Simplilearn. As a proud recipient of a virtual internship with TATA through Forage, I'm committed to the pursuit of technical excellence.

Frequently immersed in the intricacies of complex datasets, I take pleasure in crafting algorithms and pioneering inventive solutions. I invite you to connect with me on LinkedIn as we navigate the data-driven universe together!

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

Guide to Land Cover Classification using Google Earth Engine

Introduction

Learning Objective

Table of contents

Introduction to Google Earth Engine

Setting Up Your Environment

Retrieving and Preprocessing Satellite Imagery

LANDSAT/LC08/C01/T1_S

Cloud Masking

Calculating NDVI

Training Data Preparation

K-Means Clustering for Land Cover Classification

Visualization

Error Handling

Future Applications

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)