Skip to content
Home ยป Trustpilot Review Scraper & Translator

Trustpilot Review Scraper & Translator


Project Goal

This project’s aim was to create a robust web scraper capable of extracting detailed review data from Trustpilot pages. Additionally, the scraper needed to provide an option to translate the extracted reviews into a language of the user’s choice. The scraper is deployed on Apify, check it out here!

How it was built

The scraper is built using Python, leveraging the Selenium library for browser automation and BeautifulSoup4 for HTML parsing. It navigates through specified Trustpilot pages, uses Selenium to wait for the page to fully load, and extracts review data from the JSON within the HTML. The extracted data is then flattened to make it more easily usable. If translation is enabled, the Google Translate API via the deep-translator library is used to convert the review text and title to the desired language. Finally, the extracted and translated data is stored in a structured format using Apify’s dataset.

Key features include:

  • Dynamic Page Handling: Selenium waits for the page to load, ensuring that all elements are present before parsing.
  • Data Extraction: Uses BeautifulSoup4 to parse HTML and extract JSON data containing reviews.
  • Data Flattening: A custom function flattens the nested JSON data structure into a single-level dictionary for easier handling.
  • Optional Translation: Leverages the deep-translator library to translate review text and titles, when requested.
  • Apify Integration: Uses Apify SDK for managing the request queue and dataset storage.

Technologies used

  • Python
  • Selenium
  • BeautifulSoup4
  • deep-translator
  • Apify SDK

Leave a Reply

Your email address will not be published. Required fields are marked *