Website Change Detector: Track HTML & Visual Updates

For: Me, Myself and I

Project Goal

This project aims to monitor web pages for changes, detecting both modifications to the HTML structure and visual differences via screenshots. It provides detailed reports, including an HTML diff, to keep track of website updates.

The scraper is deployed on Apify, check it out here!

How it was built

The core functionality involves several key steps:

HTML Fetching: Uses httpx to retrieve the HTML content of specified URLs.
HTML Signature Generation: Creates a unique hash of the HTML content by removing dynamic attributes, comments, and normalizing whitespace to detect structural changes.
Screenshot Capturing: Employs Selenium with a headless Chrome driver to capture full-page screenshots.
Change Detection: Compares current and previous HTML signatures and screenshots to detect modifications. Generates HTML diffs using htmldiff2 library.
Data Storage: Stores all results, including the URL, change flags, timestamps, HTML diffs, and screenshots, using Apify’s key-value store.

Technologies used

Python
asyncio
httpx
Selenium
Apify SDK
htmldiff2
hashlib

Project Goal

How it was built

Technologies used

Project Goal

How it was built

Technologies used

Leave a Reply Cancel reply