Skip to content
Home ยป Crossborderalex Substack Chatbot

Crossborderalex Substack Chatbot


Technical Explanation

Project Goal

The project aims to create a chatbot that provides information from the Crossborderalex Substack newsletter. It automates the process of scraping content, cleaning it, and storing it in a vector database to enable efficient question answering.

How it was built

The workflow starts by using the ScrapingBee API to fetch the sitemap XML of the Crossborderalex Substack site. The XML is parsed to extract a list of unique URLs. For each URL, the FireCrawl node scrapes the content, which is then cleaned using a custom JavaScript code to remove UI fluff. The cleaned content is transformed into embeddings using OpenAI’s Embeddings API and stored in a Supabase vector store. The chatbot is triggered by new chat messages. It retrieves relevant content from the Supabase vector store using OpenAI embeddings and the Langchain Retrieval QA chain to generate answers based on the retrieved context, using the gpt-4o-mini model. URLs that have been processed are stored in a Supabase database to prevent redundant scraping.

Technologies used

  • n8n
  • ScrapingBee API
  • FireCrawl API
  • Supabase (Vector Store and Database)
  • OpenAI API (Embeddings and Chat)
  • Langchain
Tags: