Data Layer Product Scraper - Scrape product data from any e-commerce site - Portfolio

For: Me, Myself and I

Project Goal

This project aims to extract product data from websites by scraping the dataLayer object, a common JavaScript object used by Google Analytics and other tracking tools to store information about products viewed on a website. Instead of relying on traditional methods such as web scraping specific HTML elements, this project is specifically designed to extract data from the dataLayer that is already present.

The scraper is deployed on Apify, check it out here!

How it was built

The scraper is built using Python, Selenium, and Apify’s Actor platform. It starts by initializing a headless Chrome browser via Selenium and navigates to the specified URLs. It then waits for the dataLayer object to be available, using a custom JavaScript function, before extracting product information. The scraper utilizes a recursive function to traverse the dataLayer object, searching for dictionaries containing relevant product data, which is identified by partial matches for keys like “name”, “id”, and “price”. The extracted data is then pushed to the Apify dataset. It also handles cases where a cookie acceptance prompt is present by accepting cookies via a CSS selector specified by the user.

Technologies used

Python: The primary programming language.
Selenium: For browser automation and interaction.
Apify Actor: For cloud-based execution and data management.
Chrome WebDriver: For controlling the headless browser.
JavaScript: Used to interact with the dataLayer.

Data Layer Product Scraper – Scrape product data from any e-commerce site

Project Goal

How it was built

Technologies used

Project Goal

How it was built

Technologies used

Leave a Reply Cancel reply