Real-Time Analysis of Analytics Vidhya Blogathon Leaderboard

Harika Last Updated : 23 Mar, 2024

8 min read

Introduction

In today’s digital world, we have an ocean of information waiting to be explored online. From tracking the latest trends to understanding what makes a website tick, digging into this data can reveal all sorts of valuable insights. And that’s where web scraping comes in—a nifty technique that lets us gather data from websites automatically. Rather than picking an unknown website, I have decided to work on analysis of Analytics Vidhya’s blogathon page, as we are all familiar with it. Since the current leaderboard does not have much data to deal with, I am using an old leaderboard page with more data points.

Web scraping involves extracting data from websites and converting unstructured information into structured datasets for analysis and visualization. Python offers several libraries, such as BeautifulSoup and Scrapy, which facilitate this process.

The target webpage (AV Blogathon Leaderboard) contains a leaderboard displaying user names and their corresponding views. The idea is to inspect the HTML structure of the webpage, identify the relevant elements, and extract the desired data using BeautifulSoup’s intuitive syntax.

To achieve it, in this article, we will be leveraging Python’s Tkinter library to build a Graphical User Interface (GUI), Selenium to scrape data, and Plotly to visualize leaderboard results.

Learning Outcomes

Learn to extract data from websites and handle dynamic content using Python libraries like BeautifulSoup and Selenium.
Understand how to create interactive and visually appealing plots in Python for exploring and presenting data effectively.
Explore creating graphical user interfaces in Python with Tkinter to develop interactive applications and provide user feedback.
Master techniques to handle errors gracefully and provide informative messages for a smoother user experience in programming projects.

This article was published as a part of the Data Science Blogathon.

Journeying through the Codebase
Illuminating Insights through Data Visualization
Navigating with a Friendly GUI
Asynchronous Data Loading
Enhancing User Experience
Future Directions and Advanced Applications
Tutorial links

Journeying through the Codebase

Let’s start but don’t worry if you’re not a coding whiz just yet! We’ll break down the process step by step.

Step1: Importing Necessary Libraries

import re
import requests
import pandas as pd
import tkinter as tk
from PIL import Image
from bs4 import BeautifulSoup
from selenium import webdriver
import plotly.graph_objects as go
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options

In the above code block, we import the libraries and modules required for various tasks within the script.

Step2: Building Data Scrapers Using Python’s request module

def scrape_leaderboard_requests():
    # URL of the Analytics Vidhya leaderboard
    url = "https://datahack.analyticsvidhya.com/contest/data-science-blogathon-23/#LeaderBoard"
    
    # Headers for the HTTP request
    headers = {
        'authority': 'datahack.analyticsvidhya.com',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
        'accept-language': 'en-US,en;q=0.9',
        'cache-control': 'max-age=0',
        'sec-ch-ua': '"Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"Windows"',
        'sec-fetch-dest': 'document',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-site': 'none',
        'sec-fetch-user': '?1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
    }
    
    # Sending an HTTP GET request to fetch the webpage
    response = requests.get(url, headers=headers, verify=False, timeout=80)
    
    # Checking if the request was successful (status code 200)
    if response.status_code == 200:
        # Parsing the HTML content of the webpage using BeautifulSoup
        soup = BeautifulSoup(response.content, 'lxml')
        # Finding the leaderboard table on the webpage
        table = soup.find('table', attrs={'class': 'table-responsive'})
        # Checking if the table exists
        if table:
            print(table) # Print the table content (for debugging)
            # parse table to get names, values data
        else:
            print('no such element found')  # Print a message if the table is not found
    else:
        print('invalid status code')  # Print a message if the HTTP request fails
    
    return names, views

The function above uses the Python requests module to fetch the page content but fails because the content is dynamically loaded using JavaScript. In such cases, we can use Selenium. With Selenium, we can automate web interactions such as clicking buttons, filling out forms, and scrolling through web pages, mimicking human behavior in the virtual realm.

Step3: Building Data Scrapers Using Selenium Library

def get_data(driver, url):
    cur_names = []
    cur_views = []
    driver.get(url)
    driver.implicitly_wait(10)
    all_elements = driver.find_elements(By.CLASS_NAME, 'table-responsive')
    if all_elements:
        last_ele = all_elements[-1]
        leaderboard_table = last_ele.get_attribute('outerHTML')

        soup = BeautifulSoup(leaderboard_table, 'html.parser')
        rows = soup.find_all('tr')
        for row in rows:
            cells = row.find_all('td')
            if len(cells) >= 3:
                cur_names.append(cells[2].text.strip())
                cur_views.append(int(cells[-1].text.strip()))

    return cur_names, cur_views


def scrape_leaderboard():
    print('fetching')
    update_message(msg="Fetching leaderboard results, please wait...")
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--log-level=3")

    chrome_driver_path = "path to chromedriver executable file"

    service = Service(chrome_driver_path)
    driver = webdriver.Chrome(service=service, options=chrome_options)

    url = 'https://datahack.analyticsvidhya.com/contest/data-science-blogathon-23/#LeaderBoard'

    cur_names, cur_views = get_data(driver, url)
    names.extend(cur_names)
    views.extend(cur_views)

    last_page = None
    pagination_ele = driver.find_element(By.CLASS_NAME, 'page-link')
    if pagination_ele:
        pagination_ele = pagination_ele.get_attribute('outerHTML')
        last_page = re.search('Page\s+\d+\s+of\s+(\d+)', pagination_ele)
        if last_page:
            last_page = int(last_page.group(1))

    if last_page:
        for i in range(2, last_page+1):
            url = 'https://datahack.analyticsvidhya.com/contest/data-science-blogathon-23/lb/%s/' % i

            cur_names, cur_views = get_data(driver, url)
            names.extend(cur_names)
            views.extend(cur_views)

    driver.quit()

    return names, views

The scrape_leaderboard() function coordinates the scraping process. It initializes a headless Chrome browser using WebDriver, then calls the get_data() function to fetch data from the main leaderboard page and subsequent pages if pagination exists. The script appends the extracted names and views to global lists (names and views), ensuring comprehensive data collection.

The get_data() function is responsible for scraping user names and views from a specified URL. It utilizes Selenium to navigate the webpage and extract data from the leaderboard table using BeautifulSoup.

Illuminating Insights through Data Visualization

Data, in its raw form, can be overwhelming and difficult to comprehend. Data visualization serves as a beacon of light, illuminating patterns, trends, and insights hidden within the data. Plotly, a Python library for interactive data visualization, empowers us to create stunning visualizations that captivate and inform.

From scatter plots to bar charts, Plotly offers a diverse range of visualization options, each tailored to convey specific insights effectively. With its interactive features and customization capabilities, Plotly enables us to engage with data in meaningful ways, unlocking its full potential.

The plot_data function transforms the extracted data into interactive scatter plots using Plotly, a versatile visualization library. These plots offer dynamic exploration capabilities, including hover tooltips with user details, customizable color schemes, and axis labels for enhanced clarity.

def plot_data(df, msg=''):
    update_message(msg="Generating report, please wait...")
    fig = go.Figure()

    fig.add_trace(go.Scatter(x=df['Name'], y=df['Views'], mode='markers',
                         marker=dict(color=df['Views'], colorscale='Viridis', size=10),
                         text=[f"User: {name}<br>Views: {view}" for name, view in zip(df['Name'], df['Views'])],
                         hoverinfo='text'))

    bg_image = Image.open("bg.png")  # Replace "bg.png" with your actual image file
    fig.update_layout(images=[dict(source=bg_image, xref="paper", yref="paper", x=0, y=1, sizex=1, sizey=1, opacity=0.1, layer="below")])

    fig.update_layout(
        xaxis=dict(tickangle=45),
        yaxis=dict(range=[0, df['Views'].max() + 10]),
        template='plotly_dark',
        title='Views by User%s'%msg,
        xaxis_title='User',
        yaxis_title='Views'
    )
    fig.show()
    update_message('Report Generated...')

Navigating with a Friendly GUI

The code integrates a user-friendly GUI using Tkinter, a popular Python GUI toolkit. The GUI features interactive buttons that enable users to generate reports, access additional features, and receive real-time progress updates.

root = tk.Tk()
root.geometry("400x400")
root.title("AV Blogathon Report")

button_frame = tk.Frame(root)
button_frame.pack(side="bottom", pady=20)

button_width = 40
execute_button1 = tk.Button(button_frame, text="Get Leaderboard Report", command=get_full_report, width=button_width)
execute_button1.pack(pady=5)

execute_button2 = tk.Button(button_frame, text="Get Top 'N'", command=get_top_ten, width=button_width)
execute_button2.pack(pady=5)

execute_button3 = tk.Button(button_frame, text='Get article Links of user', command=get_article_link, width=button_width)
execute_button3.pack(pady=5)

message_label = tk.Label(button_frame, text="")
message_label.pack(side="bottom", pady=5)

disable_buttons()

root.after(100, check_data)

root.mainloop()

Asynchronous Data Loading

To optimize user experience, data loading, and GUI initialization occur asynchronously. The check_data function fetches leaderboard data in the background, allowing users to interact with the GUI without interruptions.

import re
import requests
import pandas as pd
import tkinter as tk
from PIL import Image
from bs4 import BeautifulSoup
from selenium import webdriver
import plotly.graph_objects as go
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options

names = []
views = []


def scrape_leaderboard_requests():
    url = "https://datahack.analyticsvidhya.com/blogathon/#LeaderBoard"
    headers = {
        'authority': 'datahack.analyticsvidhya.com',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
        'accept-language': 'en-US,en;q=0.9',
        'cache-control': 'max-age=0',
        'sec-ch-ua': '"Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"Windows"',
        'sec-fetch-dest': 'document',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-site': 'none',
        'sec-fetch-user': '?1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
    }
    response = requests.get(url, headers, verify=False, timeout=80)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'lxml')
        table = soup.find('table', attrs={'class': 'table-responsive'})
        if table:
            print(table)
        else:
            print('no such element found')
    else:
        print('invalid status code')
    return names, views


def update_message(msg, level='info'):
    color = 'green'
    if level=='alert':
        color = 'red'
    message_label.config(text=msg, fg=color)
    root.update()
    return
 

def get_data(driver, url):
    cur_names = []
    cur_views = []
    driver.get(url)
    driver.implicitly_wait(10)
    all_elements = driver.find_elements(By.CLASS_NAME, 'table-responsive')
    if all_elements:
        last_ele = all_elements[-1]
        leaderboard_table = last_ele.get_attribute('outerHTML')

        soup = BeautifulSoup(leaderboard_table, 'html.parser')
        rows = soup.find_all('tr')
        for row in rows:
            cells = row.find_all('td')
            if len(cells) >= 3:  # Ensure the row contains the required data
                cur_names.append(cells[2].text.strip())
                cur_views.append(int(cells[-1].text.strip()))

    return cur_names, cur_views


def scrape_leaderboard():
    print('fetching')
    update_message(msg="Fetching leaderboard results, please wait...")
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--log-level=3")

    chrome_driver_path = path_to_chromedriver # add correct path here

    service = Service(chrome_driver_path)
    driver = webdriver.Chrome(service=service, options=chrome_options)

    # url = "https://datahack.analyticsvidhya.com/blogathon/#LeaderBoard"
    url = 'https://datahack.analyticsvidhya.com/contest/data-science-blogathon-23/#LeaderBoard'

    cur_names, cur_views = get_data(driver, url)
    names.extend(cur_names)
    views.extend(cur_views)

    last_page = None
    pagination_ele = driver.find_element(By.CLASS_NAME, 'page-link')
    if pagination_ele:
        pagination_ele = pagination_ele.get_attribute('outerHTML')
        last_page = re.search('Page\s+\d+\s+of\s+(\d+)', pagination_ele)
        if last_page:
            last_page = int(last_page.group(1))

    if last_page:
        for i in range(2, last_page+1):
            url = 'https://datahack.analyticsvidhya.com/contest/data-science-blogathon-23/lb/%s/' % i

            cur_names, cur_views = get_data(driver, url)
            names.extend(cur_names)
            views.extend(cur_views)

    driver.quit()

    return names, views


def plot_data(df, msg=''):
    update_message(msg="Generating report, please wait...")
    fig = go.Figure()

    fig.add_trace(go.Scatter(x=df['Name'], y=df['Views'], mode='markers',
                         marker=dict(color=df['Views'], colorscale='Viridis', size=10),
                         text=[f"User: {name}<br>Views: {view}" for name, view in zip(df['Name'], df['Views'])],
                         hoverinfo='text'))

    bg_image = Image.open("bg.png")  # Replace "bg.png" with your actual image file
    fig.update_layout(images=[dict(source=bg_image, xref="paper", yref="paper", x=0, y=1, sizex=1, sizey=1, opacity=0.1, layer="below")])

    fig.update_layout(
        xaxis=dict(tickangle=45),
        yaxis=dict(range=[0, df['Views'].max() + 10]),
        template='plotly_dark',
        title='Views by User%s'%msg,
        xaxis_title='User',
        yaxis_title='Views'
    )
    fig.show()
    update_message('Report Generated...')


def get_full_report():
    plot_data(df)


def get_top_ten():
    df_sorted = df.sort_values(by='Views', ascending=False)
    top_10 = df_sorted.head(10)
    plot_data(top_10, msg='(Top 10)')


def get_article_link():
    update_message('error error error!!! Feature not developed yet.', level='alert')    


def disable_buttons():
    execute_button1.config(state="disabled")
    execute_button2.config(state="disabled")
    execute_button3.config(state="disabled")

def enable_buttons():
    execute_button1.config(state="normal")
    execute_button2.config(state="normal")
    execute_button3.config(state="normal")


def check_data():
    names, views = scrape_leaderboard()
    if not names or not views:
        update_message(msg="No results found. Please try after sometime...", level='alert')
        root.destroy()
        exit()
    else:
        enable_buttons()
        update_message(msg='Data Fetched, please proceed to generate report..')
        global df
        df = pd.DataFrame({'Name': names, 'Views': views})


root = tk.Tk()
root.geometry("400x400")
root.title("AV Blogathon Report")

button_frame = tk.Frame(root)
button_frame.pack(side="bottom", pady=20)

button_width = 40
execute_button1 = tk.Button(button_frame, text="Get Leaderboard Report", command=get_full_report, width=button_width)
execute_button1.pack(pady=5)

execute_button2 = tk.Button(button_frame, text="Get Top 'N'", command=get_top_ten, width=button_width)
execute_button2.pack(pady=5)

execute_button3 = tk.Button(button_frame, text='Get article Links of user', command=get_article_link, width=button_width)
execute_button3.pack(pady=5)

message_label = tk.Label(button_frame, text="")
message_label.pack(side="bottom", pady=5)

disable_buttons()

root.after(100, check_data)

root.mainloop()

Enhancing User Experience

A smooth user experience is paramount in engagement and usability. The codebase incorporates several strategies to enhance user experience and provide real-time insights:

Interactive Visualizations: Plotly’s interactive plots empower users to explore data dynamically, facilitating deeper insights into user engagement trends, outlier detection, and pattern recognition.
Error Handling and Feedback: Robust error handling mechanisms ensure that users are notified of data retrieval failures or unexpected errors. Informative messages and progress updates throughout the data retrieval and visualization process maintain transparency and user engagement.
Customization Options: Users have the flexibility to customize plot attributes such as color schemes, marker sizes, and axis labels to suit their preferences and analytical needs.

Future Directions and Advanced Applications

While the provided code offers a solid foundation for real-time blogathon analytics, the journey doesn’t end here. We can explore several enhancements and advanced applications to elevate the analytics capabilities:

Integration with External APIs: Seamless integration with APIs from Analytics Vidhya or other platforms can streamline data retrieval processes, provide access to datasets, and unlock advanced analytics functionalities.
Advanced Visualization Techniques: Exploring advanced visualization techniques such as heatmaps, network graphs, and animated plots can offer deeper insights into user interactions, collaboration patterns, and content consumption behaviors.
Making the dashboard more interactive: If you observe, the method get_article_link is not developed. A function to return the article links of the given username.

Tutorial links

Selenium: https://youtu.be/2DD-ynCIZ4w?feature=shared
Tkinter: https://youtu.be/yQSEXcf6s2I?feature=shared
Plotly: https://youtu.be/9GYmFXBitBw?feature=shared

Key Take Aways

The code demonstrates automated web scraping using Selenium and BeautifulSoup libraries to extract data from Analytics Vidhya’s leaderboard page.
The code utilizes Plotly, a powerful graphing library, to create interactive scatter plots visualizing user views.
The code integrates a user-friendly graphical user interface (GUI) using Tkinter, allowing users to interact with the scraping and visualization functionalities effortlessly.
Users can generate different types of reports based on their preferences. Users can generate a full leaderboard report, display the top ‘N’ users with the highest views, or access article links of specific users (although this feature is under development).
The code includes mechanisms for error handling and user feedback. If scraping does not find results or encounters an error, the system displays appropriate messages to guide users on the next steps.

Conclusion

This article provides a comprehensive exploration of web scraping, data visualization, and GUI development in Python. By dissecting the codebase, learners gain insights into automated data extraction using BeautifulSoup and Selenium, interactive visualization with Plotly, and building user-friendly interfaces with Tkinter. The article focus on analysis of Analytics Vidhya Blogathon leaderboard, offering practical application of these concepts. Learners can embark on their own data-driven projects, extracting insights, creating engaging visualizations, and designing user interfaces.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Harika

Hi, my name is Harika. I am a Data Engineer and I thrive on creating innovative solutions and improving user experiences. My passion lies in leveraging data to drive innovation and create meaningful impact.

Data Analysis Intermediate

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction

Tools

Libraries

Plots

Use cases

Real-Time Analysis of Analytics Vidhya Blogathon Leaderboard

Introduction

Learning Outcomes

Table of contents

Journeying through the Codebase

Step1: Importing Necessary Libraries

Step2: Building Data Scrapers Using Python’s request module

Step3: Building Data Scrapers Using Selenium Library

Illuminating Insights through Data Visualization

Navigating with a Friendly GUI

Asynchronous Data Loading

Enhancing User Experience

Future Directions and Advanced Applications

Tutorial links

Key Take Aways

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)

_gcl_au

SID

SAPISID

__Secure-#