How to Integrate ScraperAPI With Scrapy: A Complete Setup Guide

Making web scraping work at scale isn't about fancy code—it's about having a proxy solution that just works. This guide shows you exactly how to hook up ScraperAPI with Scrapy, so you can stop worrying about IP bans and start actually collecting data.

Why Bother With ScraperAPI?

Look, we've all been there. You write a beautiful spider, run it, and within minutes you're blocked. Your IP is toast. You scramble to find new proxies, rotate headers, handle CAPTCHAs—it's exhausting.

ScraperAPI handles all that nonsense for you. Send them a URL, they manage the proxy rotation, header configuration, retries, and even CAPTCHA solving. You just get back the HTML you asked for.

Here's what your typical Scrapy request looks like normally:

python
yield scrapy.Request(url='http://quotes.toscrape.com/page/1/', callback=self.parse)

Simple enough. But this goes straight to the target website, which means you're one suspicious pattern away from getting blocked.

Three Ways to Send Requests Through ScraperAPI

Integrating ScraperAPI with Scrapy is surprisingly straightforward. You've got three options, depending on how you like to work.

Option 1: The API Endpoint Approach

This is the most explicit method. You create a helper function that reformats your URLs into ScraperAPI requests:

python
from urllib.parse import urlencode

API_KEY = 'YOUR_API_KEY'

def get_scraperapi_url(url):
payload = {'api_key': API_KEY, 'url': url}
proxy_url = 'http://api.scraperapi.com/?' + urlencode(payload)
return proxy_url

class QuotesSpider(scrapy.Spider):
name = "quotes"

def start_requests(self):

urls = ['http://quotes.toscrape.com/page/1/']

for url in urls:

yield scrapy.Request(url=get_scraperapi_url(url), callback=self.parse)

Every request now routes through ScraperAPI's endpoint. Clean and visible.

Option 2: Use Their Python SDK

If you don't want to write that helper function yourself, ScraperAPI provides an SDK that does it for you.

Install it:

bash
pip install scraperapi-sdk

Then use it in your spider:

python
from scraperapi_sdk import ScraperAPIClient

API_KEY = 'YOUR_API_KEY'
client = ScraperAPIClient(API_KEY)

class QuotesSpider(scrapy.Spider):
name = "quotes"

def start_requests(self):

urls = ['http://quotes.toscrape.com/page/1/']

for url in urls:

yield scrapy.Request(url=client.scrapyGet(url), callback=self.parse)

Same result, less code on your end.

Option 3: Proxy Mode (If You're Old School)

Maybe you're used to working with traditional proxy setups. ScraperAPI works that way too:

python
class QuotesSpider(scrapy.Spider):
name = "quotes"

def start_requests(self):

urls = ['http://quotes.toscrape.com/page/1/']

proxy = 'http://scraperapi:YOUR_API_KEY@proxy-server.scraperapi.com:8001'

for url in urls:

yield scrapy.Request(url=url, meta={'proxy': proxy}, callback=self.parse)

Scrapy skips SSL verification by default, so you don't need to mess with certificates here.

Customizing ScraperAPI Behavior

Sometimes the default settings aren't enough. ScraperAPI lets you tweak how it handles requests by adding parameters.

Need JavaScript rendered? Add render=true to your request parameters:

python
def get_scraperapi_url(url):
payload = {'api_key': API_KEY, 'url': url, 'render': 'true'}
proxy_url = 'http://api.scraperapi.com/?' + urlencode(payload)
return proxy_url

Other useful parameters include:

country_code – Route through a specific country
device_type – Simulate mobile or desktop
session_number – Maintain the same IP across multiple requests

When you're dealing with sophisticated sites that require specific configurations beyond basic scraping, having these options available makes all the difference. For sites with complex anti-bot systems, 👉 ScraperAPI's advanced features handle the heavy lifting automatically, so you can focus on extracting data rather than fighting blocks.

Using Custom Headers (When You Really Need To)

ScraperAPI automatically selects optimal headers for maximum success rates. Usually that's what you want.

But sometimes you need to send specific headers—maybe for authentication, or because the site requires a particular format. In those cases, add keep_headers=true:

python
def get_scraperapi_url(url):
payload = {'api_key': API_KEY, 'url': url, 'keep_headers': 'true'}
proxy_url = 'http://api.scraperapi.com/?' + urlencode(payload)
return proxy_url

headers = {"X-MyHeader": "123"}

yield scrapy.Request(url=get_scraperapi_url(url), headers=headers, callback=self.parse)

Your custom headers now pass through to the target site.

Tuning Your Scrapy Settings for Maximum Performance

Here's where most people leave performance on the table. ScraperAPI can handle serious throughput, but only if you configure Scrapy properly.

Crank Up the Concurrency

ScraperAPI's whole point is letting you scale from hundreds to millions of pages. That requires parallel requests.

In your settings.py file, set concurrent requests to match your plan's thread limit:

python
CONCURRENT_REQUESTS = 100 # Adjust based on your plan

Also make sure these settings are disabled:

python
DOWNLOAD_DELAY = 0
RANDOMIZE_DOWNLOAD_DELAY = False

Those delay settings throttle your scraper—unnecessary when you're routing through ScraperAPI.

Handle Failed Requests Gracefully

Even with a 97%+ success rate, some requests will fail. ScraperAPI returns a 500 status code for these and doesn't charge you.

Configure Scrapy to automatically retry failed requests:

python
RETRY_TIMES = 3

After three retries, virtually everything succeeds unless the target site itself is down.

Disable Robots.txt Checking

Scrapy checks robots.txt by default, which can interfere with ScraperAPI endpoint routing.

Turn it off:

python
ROBOTSTXT_OBEY = False

You're sending requests through ScraperAPI anyway, so this check is redundant.

Your Final Settings Configuration

Here's what your settings.py should look like:

python

Enable maximum concurrency based on your ScraperAPI plan

CONCURRENT_REQUESTS = 100

Disable delays that throttle performance

DOWNLOAD_DELAY = 0
RANDOMIZE_DOWNLOAD_DELAY = False

Retry failed requests automatically

RETRY_TIMES = 3

Don't check robots.txt when using API endpoint

ROBOTSTXT_OBEY = False

Getting This Actually Working

Integrating ScraperAPI with Scrapy isn't complicated—pick one of the three methods, adjust your settings, and you're scraping at scale. No more IP rotation headaches, no more CAPTCHA nightmares, no more manually managing proxy pools.

The real benefit shows up when you're scaling. Going from scraping a few hundred pages to millions becomes a configuration change rather than a complete infrastructure overhaul. For teams serious about extracting data without the operational overhead, 👉 ScraperAPI removes the infrastructure complexity entirely—you write spiders, they handle everything else.

Page updated

Google Sites

Report abuse