Empowering Real-Time Insights with Website Monitoring Using Python

sharkbite Last Updated : 03 Jul, 2023

12 min read

Introduction

The purpose of this project is to develop a Python program that automates the process of monitoring and tracking changes across multiple websites. We aim to streamline the meticulous task of detecting and documenting modifications in web-based content by utilizing Python. This capability is invaluable for real-time news tracking, immediate product updates, and conducting competitive analyses. As the digital landscape evolves rapidly, identifying website changes is essential for maintaining continuous awareness and comprehension.

Website Monitoring: Using Python for Efficient Data Extraction

Learning Objectives

Our learning objectives for this project will cover the following components:

Enhance knowledge of web scraping methods using Python libraries like BeautifulSoup and Scrapy. We aim to extract valuable data from websites proficiently, navigate HTML structures, identify specific elements, and handle diverse content types.
Improve skills in identifying subtle changes in the website content. We aspire to learn techniques for comparing newly scraped data with existing references to detect insertions, removals, or modifications. We also aim to handle the various data formats and structures encountered during these comparisons.
Leverage Python’s automation capabilities to track website updates. We plan to employ scheduling mechanisms such as cron jobs or Python’s scheduling libraries to enhance data gathering and eliminate repetitive tasks.
Develop a comprehensive understanding of HTML’s architecture. We aim to navigate HTML documents proficiently, identify crucial elements during data extraction, and effectively manage changes in website layouts and structures.
Enhance text processing skills by exploring data manipulation techniques. We will learn to clean and refine extracted data, address data encoding complexities, and manipulate data for insightful analysis and versatile reporting.

This article was published as a part of the Data Science Blogathon.

Introduction
Project Description
Problem Statement
Approach
Scenario
Warning on Web scraping
Getting the Desired Weblinks for Monitoring
Initial Data Capture
Comparison and Change Detection
Notification Mechanism
Logging and Reporting
Limitations
Conclusion
Frequently Asked Questions

Project Description

We aim to devise a Python application in this project to oversee and catalog alterations on select websites. This application will incorporate the following:

Website Checks: Consistent evaluations of the assigned website to spot updates in particular content or sections.
Data Retrieval: Using web scraping methods to draw out required details from the website, such as text, graphics, or pertinent data.
Change Identification: Contrasting freshly scraped data with earlier stored data to spot differences or amendments.
Notification Mechanism: Implementing an alert mechanism to keep the user informed when changes are noticed.
Logging: Keeping a detailed record of modifications over time, with timestamps and information about the changes. This application can be tailored to monitor any given website and particular content based on user preferences. The anticipated results include immediate alerts about website alterations and comprehensive change records for understanding the nature and timing of changes.

Problem Statement

The main aim of this project is to streamline the process of keeping tabs on specific websites. By crafting a Python application, we plan to track and catalog changes on a website of interest. This tool will offer timely updates about recent modifications in news articles, product listings, and other web-based content. Automating this tracking process will be time-saving and ensure immediate awareness about any modifications or additions made to the website.

Website monitoring | web scrapping method | python | data extraction

Approach

To implement this project successfully, we will follow a high-level approach that involves the following
steps:

Our project will use Python’s powerful libraries like BeautifulSoup or Scrapy. These libraries make it easy to gather information from websites and sift through HTML content.
We’ll pull information from the website to create a baseline at the outset. This benchmark data will help us identify any changes later on.
We can match incoming data with a set benchmark to track any new additions or changes. Our techniques may involve comparing text or analyzing differences in HTML structures.
We’ll keep track of our project’s runs through log files. These logs will have useful details like run times, websites tracked, and changes found. They will help us keep track of updates and find patterns.
As part of the system, a notification feature will be integrated. If a change is detected, alerts will be sent via email, SMS, or other methods, keeping users updated in real-time.

Scenario

Imagine a company that gathers information about kids’ activities from numerous websites and consolidates them onto their own website. However, manually tracking changes on each website and updating their own platform accordingly poses significant challenges. This is where our specialized tool comes in to save the day, providing an efficient solution for overcoming these obstacles.

Examples of Monitored Websites:

We monitor various websites to curate information on kids’ activities. Here are a few examples:

Super Duper Tennis

This organization offers engaging programs such as lessons, camps, and parties to introduce children aged 2 to 7 to the world of tennis. Their focus is teaching tennis fundamentals, promoting fitness and coordination, and fostering good sportsmanship.

Next Step Broadway

This performing arts school in Jersey City provides high-quality dance, voice, and acting classes. They cater to students of all skill levels, nurturing their creativity and confidence in a supportive and inspiring environment.

The School of Nimbus

Renowned in Jersey City, this institution offers dance education for all ages and skill levels. With a diverse range of dance genres and the organization of performances and community outreach programs, they contribute to the local arts scene and foster an appreciation for dance.

We have developed a Python-based solution that utilizes web scraping techniques to automate the process. Our tool periodically monitors the selected websites to detect changes in the information about kids’ activities. Once a change is identified, the tool seamlessly updates the company’s website, consistently reflecting the most up-to-date information.

In addition to updating the website, our tool maintains a detailed log of changes, providing valuable data for analysis and reference purposes. It can also be configured to send real-time notifications, keeping the company’s team informed of any detected changes. Utilizing our tool allows the company to streamline its operations, ensuring its website always showcases the latest information from multiple sources.

Warning on Web scraping

It is important to note that web scraping activities may have legal and ethical implications. Before engaging in any scraping activities, it is essential to verify whether the target websites permit scraping or obtain necessary permissions from the website owners. Adhering to the websites’ terms of service and respecting their policies is crucial. Additionally, it is important to be mindful of the frequency of requests and avoid practices that may disrupt the website’s operations. Always approach web scraping carefully and follow best practices to ensure a positive and compliant experience.

Getting the Desired Weblinks for Monitoring

A sophisticated approach is employed to extract the page links from the target website’s home page. Here’s a code snippet where we will use Python’s BeautifulSoup library to extract the page links from the home page of Super Duper Tennis:

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import pandas as pd

# URL of the home page
url = 'https://www.superdupertennis.com/'

# Retrieve the HTML content via a GET request
response = requests.get(url)
html_content = response.text

# Parse the HTML content with BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')

# Extract the page links from anchor tags (<a>)
links = soup.find_all('a')

# Create a list to store the data
data = []
for link in links:
    page_name = link.text.strip()  
    # Remove leading/trailing whitespace 
    # and format the page name properly
    web_link = link.get('href')  
    # Use get() method to retrieve 
    #the 'href' attribute
    if web_link:  # Check if 'href' 
    #attribute exists before adding to table
        complete_link = urljoin(url, web_link)  
        # Construct the complete web link using urljoin
        data.append({
            'Service Provider': 'Super Duper Tennis',  
            # Update with the actual service provider name
            'Page Name': page_name,
            'Complete Web Link': complete_link
        })

# Create a pandas DataFrame from the data
df = pd.DataFrame(data)

We bring essential tools like the requests, BeautifulSoup, and pandas libraries. We also pick out the website we want to explore, for example, ‘https://www.superdupertennis.com/’.
A GET request is sent to this URL through the requests library. The resulting HTML content from the homepage is saved in ‘html_content’.
BeautifulSoup reviews the HTML content and identifies ‘<a>’ tags. These tags usually have the links we’re interested in.
Each ‘<a>’ tag is processed to pull out the page name and ‘href’ value, the actual link. We also tidied up the extracted data by removing extra spaces.
With the help of the urljoin() function from urllib.parse, we stick the base URL and each relative link together to form full URLs.
All the cleaned and prepared data is then put into a ‘data’ list. This list contains dictionaries containing the service provider name, clean page name, and full URL.
Finally, we change the ‘data’ list using pandas into a DataFrame. This DataFrame is divided into three columns: ‘Service Provider’, ‘Page Name’, and ‘Complete Web Link’.

Initial Data Capture

To establish a baseline for future comparisons, perform an initial data capture by scraping the desired
content from the website and storing it in a data structure such as a database or file. Here is the continuation of the previous code:

for a,b,c in zip(df['Service Provider'].to_list(), 
df['Page Name'].to_list(),df['Web link'].to_list()):
    url = c
    headers = {'User-Agent': 'Mozilla/5.0 
    (Macintosh; Intel Mac OS X 10_10_1) 
    AppleWebKit/537.36 (KHTML, like Gecko) 
    Chrome/39.0.2171.95 Safari/537.36'}

    time.sleep(60)
    
    # page download
    response = requests.get(url, headers=headers)
    # parsed downloaded homepage
    soup = BeautifulSoup(response.text, "lxml")
    
    # discard scripts and styles
    for script in soup(["script", "style"]):
        script.extract() 
    soup = soup.get_text()
    
    current_ver = soup.splitlines()
    with open(r"PATH\{}\{}\{}_{}_{}.txt".
    format(a,b,a,b,date.today().strftime
    ('%Y-%m-%d')),"a",encoding="utf-8") as file:
        file.write("\n".join(current_ver))
    
    file.close()

Using the pandas DataFrame, we iterate over the rows to access the service provider, page name, and web link. These variables are assigned to a, b, and c, respectively.

We set the url variable within the iteration to the web link (c). Additionally, we define the headers variable for the requests library.

To introduce a reasonable delay between requests, we use the time.sleep() function to pause for 60 seconds.

Next, we send a GET request to download the page’s content using the specified URL and headers. The response is stored in the response variable.

Using BeautifulSoup with the “lxml” parser, we parse the downloaded homepage and extract the text
content. Scripts and styles are removed from the parsed content.

The extracted text is split into lines and assigned to the current_ver variable.

Finally, we open a file in write mode and write the current_ver text content. We can construct the file name based on the service provider, page name, and current date. This captured data will serve as the baseline for future comparisons in the website monitoring project.

Comparison and Change Detection

During subsequent executions, we retrieve the updated webpage content and contrast it with our stored baseline data to identify any deviations or alterations. Here’s a continuation of the previous script:

change_logs = pd.DataFrame()

for provider, page, link in zip(df['Service Provider'].
to_list(), df['Page Name'].to_list(), df['Web link'].to_list()):
files = glob.glob(r"PATH{}{}*.txt".format(provider, page))
files_sorted = sorted(files, key=os.path.getctime, reverse=True)
current_content = open(files_sorted[0], 'r', encoding="utf-8").readlines()
prior_content = open(files_sorted[1], 'r', encoding="utf-8").readlines()

comparison = difflib.context_diff(current_content, 
prior_content, n=3, lineterm='\n')

compared_text = "\n".join([line.rstrip() for line 
in'\n'.join(comparison).splitlines() if line.strip()])
if compared_text == '':
    change_description = 'No alterations detected on ' 
    + date.today().strftime('%Y-%m-%d') + ' compared to ' 
    + files_sorted[1].split('_')[2].split('.')[0]
else:
    if "We couldn't find the page you were looking for" 
    in compared_text:
        change_description = 'URL modified on ' + 
        date.today().strftime('%Y-%m-%d') + ' compared to ' + 
        files_sorted[1].split('_')[2].split('.')[0]
    else:
        change_description = 'Alterations detected on ' +
         date.today().strftime('%Y-%m-%d') + ' compared to ' + 
         files_sorted[1].split('_')[2].split('.')[0]

temp_log = pd.DataFrame({'Service Provider': pd.Series(provider),
  'Section': pd.Series(page), 'Changes': pd.Series
  (change_description), 'Link': pd.Series(link)})
change_logs = change_logs.append(temp_log)

comparison = difflib.context_diff(current_content, 
prior_content, n=3, lineterm='\n')

compared_text = "\n".join([line.rstrip() for line 
in'\n'.join(comparison).splitlines() if line.strip()])
if compared_text == '':
    change_description = 'No alterations detected 
    on ' + date.today().strftime('%Y-%m-%d') + 
    ' compared to ' + files_sorted[1].split('_')[2].split('.')[0]
else:
    if "We couldn't find the page you were looking for"
     in compared_text:
        change_description = 'URL modified on ' + 
        date.today().strftime('%Y-%m-%d') + ' compared to '
         + files_sorted[1].split('_')[2].split('.')[0]
    else:
        change_description = 'Alterations detected on '
         + date.today().strftime('%Y-%m-%d') + ' compared to '
          + files_sorted[1].split('_')[2].split('.')[0]

temp_log = pd.DataFrame({'Service Provider': 
pd.Series(provider), 'Section': pd.Series(page), 
'Changes': pd.Series(change_description), 'Link': pd.Series(link)})
change_logs = change_logs.append(temp_log)

We create an empty DataFrame called change_logs to store the details of any identified changes. Using the pandas DataFrame, we iterate over the rows to fetch the service provider, page name, and webpage link. Denote these as provider, page, and link.

Inside the loop, we gather a collection of files matching the pattern of the previously saved files. This collection is sorted according to the file creation time, with the most recent file coming first.

We then read the content of the current and prior files for comparison. The difflib.context_diff() function performs the comparison, storing the result in the comparison variable.

Depending on the content of compared_text, we can determine if there are any changes or if specific messages indicate the page is missing or the URL has changed.

Subsequently, we construct the change_description variable, noting the date and the reference date of the prior file for comparison. Using the retrieved data, we generate a temporary DataFrame, temp_log, which includes the service provider, page name, change description, and webpage link.

Finally, we add temp_log to the change_logs DataFrame, which gathers the details of all detected changes.

Notification Mechanism

Employ a notification mechanism to alert the user upon detecting changes. You can use Python libraries or external APIs for notification delivery. First, import the necessary libraries for sending notifications, and depending on the method you choose, you may need to install additional libraries or APIs.

For email notifications, we will utilize the smtplib library to send emails via an SMTP server. Ensure you provide your email credentials and SMTP server details.

Below is a code snippet showcasing an email notification:

import smtplib

def send_email_notification(subject, message, recipient):
    sender = '[email protected]'
    password = 'your-email-password'
    smtp_server = 'smtp.example.com'
    smtp_port = 587

    email_body = f'Subject: {subject}\n\n{message}'
    with smtplib.SMTP(smtp_server, smtp_port) as server:
        server.starttls()
        server.login(sender, password)
        server.sendmail(sender, recipient, email_body)

# Usage:
subject = 'Website Change Notification'
message = 'Changes have been detected on the website. Please review.'
recipient = '[email protected]'
send_email_notification(subject, message, recipient)

For SMS notifications, you can integrate external APIs like Twilio or Nexmo. These APIs enable programmatic SMS message sending. Register for an account, acquire the necessary API credentials, and install the respective Python libraries.

Below is an example code snippet demonstrating SMS notification using the Twilio API:

from twilio.rest import Client

def send_sms_notification(message, recipient):
    account_sid = 'your-account-sid'
    auth_token = 'your-auth-token'
    twilio_number = 'your-twilio-phone-number'

    client = Client(account_sid, auth_token)
    message = client.messages.create(
        body=message,
        from_=twilio_number,
        to=recipient
    )

# Usage:
message = 'Changes have been detected on the website. Please review.'
recipient = '+1234567890'
send_sms_notification(message, recipient)

Logging and Reporting

As the organization would run this script frequently to track changes, it’s essential to maintain a record of the output of each run. Logging each execution, including the time, duration, and changes detected, can facilitate this process. We can use this data to generate summary reports that show trends over time and aid in understanding the frequency and nature of website updates.

We initiate the process by importing the suitable libraries, which include the logging library in Python. Simultaneously, we must set up the logging level and the log file format.

import logging

# Configure logging settings
logging.basicConfig(filename='website_monitoring.log', 
level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# Log an informational message
logging.info('Website monitoring started.')

# Log the detected changes
logging.info('Changes detected on {date}: {details}'
.format(date='2023-06-15', details='Updated content on homepage.'))

# Log an error message
logging.error('Error occurred while retrieving website content.')

To generate a report from the logging data, we can employ libraries like matplotlib or seaborn to create visualizations summarizing the changes over time. The choice of reports and visualizations will depend on the tracked changes.

Here’s an example code snippet for generating a simple line plot to illustrate the change frequency over time:

import matplotlib.pyplot as plt
import pandas as pd

# Read the log file into a pandas DataFrame
log_data = pd.read_csv('website_monitoring.log', 
delimiter=' - ', header=None, names=['Timestamp', 'Level', 'Message'])

# Convert the Timestamp column to datetime format
log_data['Timestamp'] = pd.to_datetime(log_data
['Timestamp'], format='%Y-%m-%d %H:%M:%S')

# Group the data by date and count the number of changes per day
changes_per_day = log_data[log_data['Level'] == 'INFO']
.groupby(log_data['Timestamp'].dt.date).size()

# Plot the changes over time
plt.plot(changes_per_day.index, changes_per_day.values)
plt.xlabel('Date')
plt.ylabel('Number of Changes')
plt.title('Website Content Changes Over Time')
plt.xticks(rotation=45)
plt.show()

Limitations

Several challenges may arise during the project implementation, requiring careful consideration. These
limitations include website structure changes, legal or ethical constraints, and errors in the web scraping or data comparison processes.

Website Structure Changes: Dynamic websites often undergo modifications, impacting the web scraping process. Adapting the scraping code to accommodate these changes becomes necessary. Regularly monitoring and updating the scraping code can ensure compatibility with evolving website structures.

Legal and Ethical Constraints: Following legal and ethical guidelines is crucial for web scraping. Websites may have terms of service that prohibit scraping or impose restrictions on data collection. Respecting these terms and using the scraped data responsibly is essential to ensure compliance.

Errors in Web Scraping and Data Comparison: Web scraping involves interaction with external websites, introducing the possibility of errors. Connection failures, timeouts, or server issues may occur during scraping. Employing robust error-handling mechanisms to gracefully handle such situations is vital. Additionally, ensuring the accuracy of data comparison processes and accounting for potential
errors like false positives or negatives are crucial for reliable results.

Permissions and Website Policies: Verifying whether the target website permits scraping or obtains necessary permissions from the website owner is essential before initiating web scraping. Complying with the website’s robots.txt file, respecting their terms of service, and being mindful of the frequency of requests are important considerations to avoid policy violations.

Conclusion

In conclusion, this project has successfully created a powerful Python tool for tracking website updates through web scraping. We have successfully developed a tool with essential features such as web scraping, data comparison, notifications, logging, and reporting.

Throughout this project, we have deepened our understanding of HTML, honed our skills in text processing, and mastered the art of data manipulation. Leveraging the capabilities of BeautifulSoup and requests, we have become proficient in web scraping and automated tasks using Python. Furthermore, we have developed a robust error-handling mechanism and acquired data analysis and reporting expertise.

Our tool is a reliable solution for tracking changes in news articles, product listings, and other web content. Automating the process eliminates the need for manual updates, ensuring that the information remains current and accurate.

Throughout this journey, we have gained valuable knowledge and skills, including:

Web scraping techniques using BeautifulSoup and requests.
Effective extraction of valuable information from HTML structures.
Automation of tasks to streamline processes.
Robust error handling in web scraping procedures.
Advanced data analysis and comparison to identify changes.
Creation of comprehensive notifications, logs, and reports for efficient tracking and insightful analysis.

Frequently Asked Questions

Q1: What is web scraping, and how does it work?

A. Web scraping is a process that automatically pulls out data from websites. It works by reviewing the website’s HTML code, looking at its pages, and collecting necessary information. Tools like BeautifulSoup and requests in Python make this process easier.

Q2: How does web scraping aid in monitoring websites?

A. Web scraping helps keep tabs on changes on websites. It regularly pulls data from sites and compares it with past data to spot any updates or changes. This is particularly handy for tracking updates on news, product details, or even competitor data.

Q3. What legal issues should one keep in mind while web scraping?

A. It’s crucial to abide by the law when scraping websites. Websites have terms of service that explain their rules about scraping. Always make sure you understand and follow these rules. Also, use the data you collect responsibly and avoid gathering sensitive or personal information.

Q4. How can we manage errors that come up during web scraping?

A. Sometimes, problems like changes in a website’s design or internet issues can lead to errors in web scraping. Good error-handling techniques can help address these problems. This includes using try-except blocks to handle errors, retry mechanisms for connection issues, and log files to track recurring issues.

Q5. What are the recommended practices for using Python for web scraping and monitoring?

A. To scrape and monitor websites successfully, respect the website’s rules, use headers and user-agent strings that copy real browsing behavior, and avoid too many requests quickly. You should also update your scraping code regularly to deal with changes in the website.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

sharkbite

I am a data scientist with a passion for innovative problem-solving. With years of experience in the industry, I work with one of the world's leading cloud-based software companies, where I apply my expertise in data analysis, modelling, and visualization to drive meaningful insights and help businesses make data-driven decisions. Through my work, I have gained a deep understanding of the latest technologies and tools in the field of data science and machine learning, and I am always eager to learn more and stay up-to-date with the latest trends and developments. In my free time, I enjoy exploring new coding techniques, experimenting with new data sets, and sharing my knowledge and experience with other data scientists and enthusiasts.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Intoduction to Python

Variables and data types

OOPs Concepts

Conditional statement

Looping Constructs

Data Structures

String Manipulation

Functions

Modules, Packages and Standard Libraries

Python Libraries for Data Science

Reading Data Files in Python

Preprocessing, Subsetting and Modifying Pandas Dataframes

Sorting and Aggregating Data in Pandas

Visualizing Patterns and Trends in Data

Programming

Empowering Real-Time Insights with Website Monitoring Using Python

Introduction

Table of contents

Project Description

Problem Statement

Approach

Scenario

Super Duper Tennis

Next Step Broadway

The School of Nimbus

Warning on Web scraping

Getting the Desired Weblinks for Monitoring

Initial Data Capture

Comparison and Change Detection

Notification Mechanism

Logging and Reporting

Limitations

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at