Crossborderalex Substack Chatbot

Technical Explanation

Project Goal

The project aims to create a chatbot that provides information from the Crossborderalex Substack newsletter. It automates the process of scraping content, cleaning it, and storing it in a vector database to enable efficient question answering.

How it was built

The workflow starts by using the ScrapingBee API to fetch the sitemap XML of the Crossborderalex Substack site. The XML is parsed to extract a list of unique URLs. For each URL, the FireCrawl node scrapes the content, which is then cleaned using a custom JavaScript code to remove UI fluff. The cleaned content is transformed into embeddings using OpenAI’s Embeddings API and stored in a Supabase vector store. The chatbot is triggered by new chat messages. It retrieves relevant content from the Supabase vector store using OpenAI embeddings and the Langchain Retrieval QA chain to generate answers based on the retrieved context, using the gpt-4o-mini model. URLs that have been processed are stored in a Supabase database to prevent redundant scraping.

Technologies used

n8n
ScrapingBee API
FireCrawl API
Supabase (Vector Store and Database)
OpenAI API (Embeddings and Chat)
Langchain

Technical Explanation

Project Goal

How it was built

Technologies used

Easy Explanation

Project Goal

How it was built

Technologies used