Greetings, everyone. I hope you are doing well. This blog will be an introduction to using Github Actions and common Linux tools for automation and scraping in general. I have decided to demonstrate how to make a simple price tracker of Daraz that will notify the user on price reduction.
So let us begin with the scraping part. For the tutorial I have selected a random item with URL https://www.daraz.com.np/products/mi-xiaomi-router-ac1200-4-antenna-xiaomi-wifi-router-i120336993-s1032855436.html.
The price of the item is marked as 5499, so we need to check out the source code looking for it.
So, apparently the price of the item is stored in key value pairs with keys being value
and priceNumber
. We shall proceed with priceNumber for the sake of simplicity in this tutorial. (Note:Here the remaning two hits are exact copies of value
and priceNumber
)
Let us cook up a simple regex that searches for word priceNumber
and digits after it.
The regex I came up for this looks like priceNumber":[0-9]+
. Now let's grep to check the pattern.
Explanation:
priceNumber":
matches the literal string "priceNumber":[0-9]+
matches one or more digits.
As we can see, the pattern is being matched but there are duplicate results. Simply using sort -u
on the result fixes this
. Now we shall grab the price portion only from the result. This is pretty easy to do using cut
command
Explanation
-d
is used to specify the delimiter used to split our string. It is :
in our case.
-f
is used to select the second element of the splitted string. Here priceNumber"
is the first element and the price is second element.
Alright, we are done with the scraping portion. Now let us use github actions to track the price daily and send us notification if the price reduces. So what is Github Actions? GitHub Actions is a feature of GitHub that allows you to automate workflows directly within your GitHub repository. With GitHub Actions, you can build, test, and deploy your code right from GitHub. It's essentially a CI/CD (Continuous Integration/Continuous Deployment) service that is tightly integrated with your repository. If you want to learn more about Github Actions, please check out this official tutorial by Github.
An event triggers a workflow run. Events can be various GitHub activities, such as pushing code to a repository, creating a pull request, or releasing a new version. For our small monitoring project, we will be using the schedule event. The schedule
event allows you to trigger a workflow at a scheduled time. The schedule
can trigger a workflow to run at specific UTC times using cron syntax. Here's the basic syntax of schedule event:
on:
schedule:
- cron: '30 5 * * *'
jobs:
//do some job
I want the price to be fetched every day at 9am. The cron syntax for this would be
The issue here is that this is for UTC timing and not my local time i.e. (GMT +5:45). So the correct cron syntax would be 45 3 * * *
To get this to, you subtract the offset from GMT, which is +5 hours and 45 minutes.
9:00 AM minus 5 hours and 45 minutes equals 3:15 AM. So, the equivalent time in GMT is 3:15 AM, and the cron schedule becomes 45 3 * * *
.This schedule will run every day at 3:45 AM GMT, which is equivalent to 9:00 AM in my local time zone (GMT+5:45).
Now that we are done with the configuring schedule of the workflow, we shall create a job that uses our previously crafted scraping command. The job looks like
jobs:
scrap-price:
runs-on: ubuntu-latest
steps:
- run: curl -s "https://www.daraz.com.np/products/mi-xiaomi-router-ac1200-4-antenna-xiaomi-wifi-router-i120336993-s1032855436.html" | grep -oE 'priceNumber":[0-9]+' | sort -u | cut -d ":" -f 2
But all this job does is run the command. It doesn't store it or neither does it compares the price with anything. So let's simply use the power of bash scripting for this. First we store the output of command into a variabl
Now we will use a simple if-else statement to check if the price variable is less than our specified value
if [ "$price" -lte 5200 ]; then echo "Price has reduced" fi
We want us to be notified when the price has gone down to our budget. For this I will be sending message to my private telegram channel using the Telegram API. The endpoint to send message looks like https://api.telegram.org/bot[ApiKey]/sendMessage?chat_id=[chatID]&text=[message]
. So let's insert this in our if condition.
if [ "$price" -lte 5200 ]; then curl -s "
https://api.telegram.org/bot[ApiKey]/sendMessage?chat_id=[chatID]&text=[message]
" fi
. This should notify us in Telegram when the price of the item has reached 5200 or less. Now let's wrap this entire command in our github actions yaml file.
on:
schedule:
- cron: '45 3 * * *'
jobs:
scrap-price:
runs-on: ubuntu-latest
steps:
- name: Scrap and compare price
run: |
price=$(curl -s "url" | grep -oE 'priceNumber":[0-9]+' | sort -u | cut -d ":" -f 2)
if [ "$price" -le 5200 ]; then
curl -s "https://api.telegram.org/bot[ApiKey]/sendMessage?chat_id=[chatID]&text=Price is less than or equal to 5200"
fi
And with that we are done. With this configuration, the job named "scrap-price" will run every day at 3:45 AM UTC time (9 am in my local time). It will scrape the price from the specified URL and send a Telegram message if the price is less than or equal to 5200. This marks the end of part one of this blog. The part 2 will make more improvements on the tracker such as grabbing api from github secrets, comparing the value with previous day's value, setting up analytics of daily price etc. Stay tuned.