Want to understand what customers really think? Learn how to extract Trustpilot reviews at scale using Python and BeautifulSoup. This guide shows you how to collect thousands of customer reviews, bypass anti-scraping blocks, and turn feedback into actionable business intelligence – whether you're tracking competitors, improving products, or analyzing market sentiment.
So here's the thing about Trustpilot – it's basically a goldmine of honest customer opinions. Every day, millions of people share what they actually think about products and services. And if you can tap into that data systematically, you're looking at some serious competitive advantages.
But scraping Trustpilot isn't just about running a script and calling it a day. The site has defenses (like any modern platform), and you need to approach it smartly. That's what we're going to walk through here.
For those who prefer to dive straight in, here's the full working scraper:
python
from bs4 import BeautifulSoup
import requests
import csv
company = "nookmart.com"
base_url = f"https://www.trustpilot.com/review/{company}"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'accept-language': 'en-US,en;q=0.9'
}
payload = {
'api_key': "YOUR_API_KEY",
'url': base_url,
'render': 'true',
'keep_headers': 'true',
}
try:
response = requests.get('https://api.scraperapi.com', params=payload, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
pages_to_scrape = 10
with open('trustpilot_reviews.csv', 'w', newline='', encoding='utf-8') as csvfile:
csv_writer = csv.writer(csvfile)
csv_writer.writerow(['Reviewer', 'Rating', 'Review', 'Date'])
for page in range(1, pages_to_scrape):
payload['url'] = f"{base_url}?page={page}"
page_response = requests.get('https://api.scraperapi.com', params=payload, headers=headers)
page_soup = BeautifulSoup(page_response.content, 'html.parser')
reviews = page_soup.find_all('div', {"class": "styles_reviewCardInner__EwDq2"})
for review in reviews:
reviewer = review.find("span", attrs={"class": "typography_heading-xxs__QKBS8"}).text
rating = review.find("div", attrs={"class": "styles_reviewHeader__iU9Px"})["data-service-review-rating"]
content_element = review.find("p", attrs={"class": "typography_body-l__KUYFJ"})
content = content_element.text if content_element else 'None'
date = review.find("p", attrs={"class":"typography_body-m__xgxZ_ typography_appearance-default__AAY17"}).text
csv_writer.writerow([reviewer, rating, content, date])
print("Data Extraction Successful!")
except Exception as e:
print("An error occurred:", e)
Just drop your API key into the payload, and you're set. If you don't have one yet, grab a free account – most providers give you enough credits to test things out properly.
Want to understand what's actually happening here? Let's break it down step by step.
Getting your environment ready is straightforward. You need Python (version 3.10 or later works great) and a couple of libraries that do the heavy lifting.
Open your terminal and run:
pip install requests beautifulsoup4
The requests library handles all the HTTP communication – it's what fetches the webpage content. Think of it as your messenger that goes to Trustpilot, asks for the page, and brings back the HTML.
BeautifulSoup (bs4) is your parser. It takes that messy HTML and lets you navigate through it like you're reading a map. You can pinpoint exactly where the review text lives, where the ratings are stored, and pull them out cleanly.
Before we start grabbing data, we need to understand where that data actually lives on the page. Every website organizes information differently, and Trustpilot has its own logic.
For this walkthrough, we're looking at reviews for "Nookmart" – an online store for Animal Crossing players. The principles work for any company's Trustpilot page, though.
When you inspect the page (right-click, select 'inspect'), you'll see the HTML structure. Each review is wrapped in a div with the class styles_reviewCardInner__EwDq2. That's our container – everything we need is nested inside.
The reviewer's name sits in a span tag with class typography_heading-xxs__QKBS8. Why do we care about names? Authentication mostly. It helps verify reviews are from real accounts, and you can spot patterns if the same users keep reviewing similar products.
The rating lives in a div with class styles_reviewHeader__iU9Px, but here's the trick – it's not in the visible text. It's stored in an attribute called data-service-review-rating. Ratings give you quantifiable sentiment at a glance.
The actual review content – the meat of what customers are saying – is in a p tag with class typography_body-l__KUYFJ. This is where the real insights hide. People describe specific problems, praise features, compare to competitors.
The experience date is in another p tag with classes typography_body-m__xgxZ_ and typography_appearance-default__AAY17. Context matters. A negative review from three years ago when the company was starting out tells a different story than one from last week.
Now we know what we're looking for. Time to actually grab it.
First, we import our tools and define which company we're scraping:
python
from bs4 import BeautifulSoup
import requests
import csv
company = "nookmart.com"
base_url = f"https://www.trustpilot.com/review/{company}"
Change that company variable to any business on Trustpilot, and the scraper adapts automatically.
Here's where things get interesting. We need to tell the server what kind of browser we are (even though we're a script). We also need to handle Trustpilot's defenses.
python
headers = {
'User-Agent': 'Mozilla/5.0 ...',
'accept-language': 'en-US,en;q=0.9'
}
payload = {
'api_key': "YOUR_API_KEY",
'url': base_url,
'render': 'true',
'keep_headers': 'true',
}
The render: true parameter is crucial. Trustpilot loads reviews dynamically with JavaScript. Without rendering, you'd just get an empty page skeleton. When dealing with websites that have sophisticated bot detection or require JavaScript rendering, using a specialized scraping infrastructure can save you countless hours of troubleshooting.
👉 Get reliable access to any website's data without worrying about blocks or captchas
This setup routes your request through a service that handles all the complexity – rotating IPs, solving captchas, rendering JavaScript, the works. It's the difference between spending three days debugging blocks versus getting clean data in the first run.
Once we have the response, BeautifulSoup takes over:
python
response = requests.get('https://api.scraperapi.com', params=payload, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
Now soup contains a navigable version of the entire page. We can search through it like querying a database.
We create a CSV file with appropriate headers:
python
with open('trustpilot_reviews.csv', 'w', newline='', encoding='utf-8') as csvfile:
csv_writer = csv.writer(csvfile)
csv_writer.writerow(['Reviewer', 'Rating', 'Review', 'Date'])
CSV is simple and universal. You can open it in Excel, import it into databases, or process it with pandas for analysis.
This is where the scraper really works. We loop through multiple pages, extracting reviews from each:
python
for page in range(1, pages_to_scrape):
payload['url'] = f"{base_url}?page={page}"
page_response = requests.get('https://api.scraperapi.com', params=payload, headers=headers)
page_soup = BeautifulSoup(page_response.content, 'html.parser')
reviews = page_soup.find_all('div', {"class": "styles_reviewCardInner__EwDq2"})
for review in reviews:
reviewer = review.find("span", attrs={"class": "typography_heading-xxs__QKBS8"}).text
rating = review.find("div", attrs={"class": "styles_reviewHeader__iU9Px"})["data-service-review-rating"]
content_element = review.find("p", attrs={"class": "typography_body-l__KUYFJ"})
content = content_element.text if content_element else 'None'
date = review.find("p", attrs={"class":"typography_body-m__xgxZ_ typography_appearance-default__AAY17"}).text
csv_writer.writerow([reviewer, rating, content, date])
Notice the if content_element else 'None' check. Some reviews don't have written content – just ratings. Good scrapers handle missing data gracefully.
Everything is wrapped in a try-except block:
python
try:
# All the scraping logic
print("Data Extraction Successful!")
except Exception as e:
print("An error occurred:", e)
Networks fail. Websites change structure. Servers timeout. Your scraper needs to handle this without crashing completely.
Once you have thousands of reviews in a structured format, the possibilities open up. You can track sentiment trends over time. Identify recurring complaints about specific features. Compare your products to competitors based on actual customer feedback.
Businesses use this kind of data to prioritize product improvements, craft marketing messages that address real pain points, and spot emerging issues before they become widespread problems.
The technical process we covered – fetching pages, parsing HTML, storing data – is just the foundation. The real value comes from what you do with the information after you've collected it.
You now have a working Trustpilot scraper that handles pagination, deals with anti-scraping measures, and outputs clean CSV data. Change the company name, adjust the number of pages, and you're extracting insights from any business on the platform. The script handles the repetitive work while you focus on analysis and decision-making – which is exactly how it should be.