Python script to monitor website changes

May 8, 2020
Category: Python,Scripting

My local Sheriff’s office puts out a “Daily Activity Report” every weekday, and it’s a great way to keep an eye on the neighborhood. I have been meaning to find a way to scrape content on that website, and have an email alert set up.  Here is the process flow that I came up with:

Step 1.  Hash the contents of a website
Step 2.  Wait X amount of seconds
Step 3.  Hash the contents again.
Step 4.  If there was a change, send me an email. If no change, wait X seconds and try again.

So using this simple logic, I used my rudimentary Python skills to write a short script to do this.  Lets break the script down:


import requests
import time
import smtplib
from email.message import EmailMessage
import hashlib
from urllib.request import urlopen


Above are all of the modules we’re importing. Pretty standard stuff.


url = 'https://www.volusiasheriff.org/reports/district5-logs.stml'
response = urlopen(url).read()
currentHash = hashlib.sha224(response).hexdigest()


These are some variables we’re going to set. The url is obviously the page we’re wanting to monitor.’response’ is just a function that reads out URL, and ‘currentHash’ hashes the entire page. So these lines set the initial hash.

Note that while it works on this site, many pages serve dynamic content, including ads, relative dates, etc., and any change on a page can cause the hash to change.  There is a use case for the BeautifulSoup library, which I will discuss in the future.


while True:


I typically don’t use a ‘while True’ loop, but in this case, it really did make the most sense. It’s a script that I want to continually run unless certain conditions meet criteria for action.  It’s best practice to do an if/then statement, generally.


    try:

        response = urlopen(url).read()
        currentHash = hashlib.sha224(response).hexdigest()
        time.sleep(240)
        response = urlopen(url).read()
        newHash = hashlib.sha224(response).hexdigest()

        if newHash == currentHash:
            continue


As you can see above, I have things nested in a ‘try/except’ statement.  I did this because I would occasionally hit connectivity issues, and the script would fail.Below ‘try’, the first two lines visit the page and hash the contents.  This produces a unique “ID” for that page and everything on it.

I then wait for 240 seconds, and then revisit and grab the hash.

Next is an ‘if/else’ statement.  I compare the old and new hash, and if they match, I just continue the loop.


        else:

            msg = EmailMessage()
            msg.set_content(url)
            msg['From'] = '[email protected]'
            msg['To'] = '[email protected]'
            msg['Subject'] = 'New Daily Activity Report'
            fromaddr = '[email protected]'
            toaddrs = ['[email protected]']
            server = smtplib.SMTP('smtp.gmail.com', 587)
            server.starttls()
            server = smtplib.SMTP_SSL('smtp.gmail.com', 465)
            server.login('[email protected]', 'insertpasswordhere')
            server.send_message(msg)
            server.quit()
            response = urlopen(url).read()
            currentHash = hashlib.sha224(response).hexdigest()
            time.sleep(240)
            continue


If the hashes don’t match, I want to send an email letting me know the page was updated.  Above, you can see that I set the fields for the email to be sent.  I actually made a separate gmail account just for sending emails – that way, because the password here is plaintext, I can mitigate the security implications.  The ‘to’ field is my actual email address, and the ‘from’ is the new one, which is only used for these purposes.

So I set the fields, then use Google’s SMTP to send the email.  The content is just the URL, so that I open the email and click.  I suppose you could scrape the site and insert the fields into the body, and I may do that one day.

Again, after it’s sent, I sleep 240 seconds and then compare hashes again.


    except Exception as e:

        msg = EmailMessage()
        msg.set_content(url)
        msg['From'] = '[email protected]'
        msg['To'] = '[email protected]'
        msg['Subject'] = 'DAR NETWORK FAILURE'
        fromaddr = '[email protected]'
        toaddrs = ['[email protected]']
        server = smtplib.SMTP('smtp.gmail.com', 587)
        server.starttls()
        server = smtplib.SMTP_SSL('smtp.gmail.com', 465)
        server.login('[email protected]', 'insertpasswordhere')
        server.send_message(msg)
        server.quit()


Here is the ‘except’ part of the try/except loop. By nesting the primary functions in the ‘try’ section, if there is an issue with connecting to the site, I have the script emailing me to let me know there was an issue. It then waits a while and tries everything again.

So that’s the script! It’s simple, and won’t work on a lot of websites, but it works great on this site.  Here’s the final script:


import requests
import time
import smtplib
from email.message import EmailMessage
import hashlib
from urllib.request import urlopen

url = 'https://www.volusiasheriff.org/reports/district4-logs.stml'
response = urlopen(url).read()
currentHash = hashlib.sha224(response).hexdigest()

while True:

    try:

        response = urlopen(url).read()
        currentHash = hashlib.sha224(response).hexdigest()
        time.sleep(240)
        response = urlopen(url).read()
        newHash = hashlib.sha224(response).hexdigest()

        if newHash == currentHash:
            continue

        else:

            msg = EmailMessage()
            msg.set_content(url)
            msg['From'] = '[email protected]'
            msg['To'] = '[email protected]'
            msg['Subject'] = 'New Daily Activity Report'
            fromaddr = '[email protected]'
            toaddrs = ['[email protected]']
            server = smtplib.SMTP('smtp.gmail.com', 587)
            server.starttls()
            server = smtplib.SMTP_SSL('smtp.gmail.com', 465)
            server.login('[email protected]', 'insertpasswordhere')
            server.send_message(msg)
            server.quit()
            response = urlopen(url).read()
            currentHash = hashlib.sha224(response).hexdigest()
            time.sleep(240)
            continue

    except Exception as e:

        msg = EmailMessage()
        msg.set_content(url)
        msg['From'] = '[email protected]'
        msg['To'] = '[email protected]'
        msg['Subject'] = 'DAR NETWORK FAILURE'
        fromaddr = '[email protected]'
        toaddrs = ['[email protected]']
        server = smtplib.SMTP('smtp.gmail.com', 587)
        server.starttls()
        server = smtplib.SMTP_SSL('smtp.gmail.com', 465)
        server.login('[email protected]', 'insertpasswordhere')
        server.send_message(msg)
        server.quit()


Comments

  • Hello! Nice script, I am attempting to write something similar. Shouldn’t there be a waiting instruction in the `except` clause too?

  • Thanks for the script. I used this and got it to work; however, for the website that I am monitoring I am getting nuisance alerts, possibly because there’s dynamic content as you mentioned (although I don’t see any on the website I’m monitoring). You mentioned BeautifulSoup may be used and might fix the problem, could you elaborate?

    FYI, I don’t know Python at all. I’ve simply copied your code and can understand it the minimal amount for me to modify it for my purposes.

  • Hey Jace, great script. But how do I know it works? When it runs, it loads the webpage I want to monitor; should the page refresh after the allocated seconds? I just get – “Process finished with exit code 0”.

  • Why do you have a 240 second wait right before the “continue” at the end of the “else” statement, when you will have to wait again at the beginning of the “try” statement. I may be wrong, but looks like you’re waiting double the time you would have to.

    Also, you said the Exception statement “waits for a while and then tries everything again”, but I don’t see any wait built into it?

  • Is this script working properly actually i want to monitor a site if any changes detected should notify me .,
    I am running the script on colla\b. tell me if its working or not.

Leave a Reply to Mike Cancel reply