This article was published as a part of the Data Science Blogathon.
Image Source: Link
In this article, we will be learning how we can use Python to keep track of our “wanna-buy” items on Amazon. We tend to buy the product only if it goes below a specific threshold price, to keep it within our budget and maximize our savings. In this project, we will be exactly implementing that idea and build a amazon price tracking system using python.
So in this article, we will make a program, that will scrape the webpage of the product we want, on Amazon, and check if the current price is less than or equal to what we want it to be. If yes, it will also notify us via mail.
To achieve our task, we will be taking the help of some pre-defined libraries in python. They are :
requests
BeautifulSoup (bs4)
smtplib
A brief about Requests in Python
Image source: link
In layman’s language, the Requests library is used to make HTTP requests for a specific link.
The Python requests module has various built-in methods for creating HTTP requests to a given URI using the GET, POST, PUT, PATCH, or HEAD protocols. An HTTP request is employed to induce data from a given URI or to produce data to a server. Between a client and a server, it functions as a request-response protocol.
You can refer more to their documentation here: link
A brief about Beautiful Soup in Python:
Image source: Link
You can consider BeautifulSoup to be a python package, for extracting data from HTML and XML files using web scraping. It generates a parse tree from the code of the page, which can be further used to extract data in a very more legible and hierarchical manner.
You can refer to the documentation here: link
Now that you are familiar Let us Begin!
Let us import the necessary dependencies that we are going to use in the future.
import requests
from smtplib import SMTP
from bs4 import BeautifulSoup as BS
The implementation of these libraries will become clear to you once we start using them. So stay with me.
Let us store the link of the product that we want to buy in a variable named “URL”.
URL = 'https://www.amazon.in/Acer-Nitro-23-8-1080-0-5-Response/dp/B088FLG41L/ref=sr_1_18?crid=2RUS8ZJDNRU05&keywords=monitor&qid=1652971294&sprefix=monitor%2Caps%2C296&sr=8-18'
Do note, that you need to change this link as per your wants.
Next, to do the request, you need to specify the user agent, or else Amazon will give a 503 Error. To avoid that, store the user-agent value in the ‘headers’ field
page = requests.get(URL ,headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36"})
What this does is, gets the HTML source code of the link that we had previously mentioned.
The data on the ‘page’ will be something like this. :
b'nnnnn<!--nnnnnnnnnif (true === true) {n var ue_t0 = (+ new Date()),n ue_csm = window,n ue = { t0: ue_t0, d: function() { return (+new Date() - ue_t0); } },n ue_furl = "fls-eu.amazon.in",n ue_mid = "A21TJRUUN4KGV",n ue_sid = (document.cookie.match(/session-id=([0-9-]+)/) || [])[1],n ue_sn = "opfcaptcha.amazon.in",n ue_id = 'QZ8548Q6DHK81WWFGCC0';n}nnnnnnnnn
nn
nn nn
n
n n
n
Sorry, we just need to make sure you’re not a robot. For best results, please make sure your browser is accepting cookies.
n
n
nn
nn
n
nn n n
n
n
n Type the characters you see in this image: n n n n n
n
n
n
nn
n
n n if (true === true) {n var head = document.getElementsByTagName('head')[0],n prefix = "https://images-eu.ssl-images-amazon.com/images/G/01/csminstrumentation/",n elem = document.createElement("script");n elem.src = prefix + "csm-captcha-instrumentation.min.js";n head.appendChild(elem);nn elem = document.createElement("script");n elem.src = prefix + "rd-script-6d68177fa6061598e9509dc4b5bdd08d.js";n head.appendChild(elem);n }n nn'
Here it contains the data in an unformatted manner. To overcome that, let’s parse it and store it in a variable soup.
soup = bs(page.content,"html.parser")
But we are not interested in the entire source code. We merely want the id of the price. We can easily find the id by inspecting the element. We will also, at the same time, modify the value we get, so that it returns the exact value that we want. We will also be storing the value in a variable.
price = float(soup.find(id="priceblock-ourprice").text.split()[1].replace(",",""))
This return the price of our product.
13999.0
Storing the entire piece under a common function for easier access and processing.
def extract_current_price(): page = requests.get(URL ,headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36"}) soup = bs(page.content,"html.parser") price = float(soup.find(id="priceblock-ourprice").text.split()[1].replace(",","")) return price
Just checking the current price is not all. We also want the code to send us an email if it goes below our budget price.
So let us set up the code for that.
We need to set up a server for enabling email sending. We will be using the SMTP(Simple Mail Transfer Protocol) here. As I will be using Gmail, I need the following configurations.
SMTP_SERVER = "smtp.gmail.com"
Next, we need to set the port. On googling, you can find, that for this particular task, the port we need for this is 587.
PORT = 587
And also we will be specifying the email from which we want to send the notifying email.
EMAIL = "[email protected]" PASSWORD="YOUR_PASSWORD"
To setup up the security, you can replace your password, and in place of the email password, keep the app password. Check more about App Password here Link.
Setting up the server :
server = SMTP(SMTP_SERVER, PORT)
And to enable transport layer security, and start a secure channel :
server.starttls()
Now continuing with logging in to mail,
server.login(EMAIL_ID, PASSWORD)
Now we also need to set the draft for the mail we are going to send:
subject = "Price Dropped!! Should Buy Now!" body = "Price of the product you want has fallen. You can consider buying it now. " +URL
Now combining both to get the total message
msg = f"Subject: {subject}nn{body}"
Finally sending the mail:
server.sendmail(EMAIL_ID, TO_EMAIL_ID, msg)
TO_EMAIL_ID is the email you want to send the mail to.
After all that is done, we need to close the server.
server.quit()
Now finally creating a function for this too,
def notify_mail(): SMTP_SERVER = "smtp.gmail.com" PORT = 587 EMAIL = "[email protected]" PASSWORD="YOUR_PASSWORD" server = SMTP(SMTP_SERVER, PORT) server.starttls() server.login(EMAIL_ID, PASSWORD) subject = "Price Dropped!! Should Buy Now!" body = "Price of the product you want has fallen. You can consider buying it now. " +URL msg = f"Subject: {subject}nn{body}" server.sendmail(EMAIL_ID, TO_EMAIL_ID, msg) server.quit()
Now we have to complement the concept, where we will send the mail if the current price of the product goes below our budget price.
Let us consider, that our budget price is 10000 Rs. (You can modify the budget to any value you want).
BUDGET_PRICE = 10000 #Here put the value you want if extract_current_price<=BUDGET_PRICE : notify_mail()
So that finishes our project on amazon price tracking system. In this article, we saw how o perform web-scraping using Python’s Beautiful Soup, and also fetch the web-page data into our code. We also understood and implemented the concept of sending mail using the SMTP module in Python
Future Scope/Possible Additions in this Project
Things that I will leave out for you to try are :
1)Add a component that allows the code to run regularly after a specific interval of time( For eg. Daily)
2) And one more thing that you can do, instead of exposing the email addresses and the password in the program, you can hard-code them, and extract them from a file in local storage.
That is it for now. If you have any further queries on amazon price tracking system, you can reach out to me on Linked-in
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.