How to Use the n8n API for Automated Web Scraping Workflows

Master programmatic control of n8n workflows to build scalable, code-driven scraping pipelines that handle JavaScript rendering, proxy rotation, and anti-bot protection automatically—no visual editor required.

The n8n API is a RESTful interface that lets you interact with n8n workflows through code instead of clicking around in a visual editor. Think of it as taking the training wheels off your scraping operation—you get full programmatic control over workflow creation, execution, and monitoring. This is especially useful when you're managing dozens of scraping targets, need dynamic workflows that adapt to different sites, or want to integrate scraping capabilities directly into your existing applications.

n8n actually gives you two APIs to work with. The REST API handles the heavy lifting—creating workflows, managing executions, storing credentials. The Webhook API is simpler but powerful for triggering workflows via HTTP requests. For web scraping, the REST API is where things get interesting because you can spin up new workflows on the fly, monitor their progress, and retrieve scraped data without touching the UI.

Getting Your n8n Instance Ready

Before you can start making API calls, you need to flip a few switches in your n8n configuration. Add these environment variables to your setup:

N8N_API_ENABLED=true
N8N_BASIC_AUTH_ACTIVE=true
N8N_BASIC_AUTH_USER=your_username
N8N_BASIC_AUTH_PASSWORD=your_secure_password
N8N_HOST=localhost
N8N_PORT=5678
N8N_PROTOCOL=http

After updating these settings, restart your n8n instance. If you're running Docker, that's docker restart n8n. Using npm? Kill the process and run n8n start again. The API uses Basic Authentication, so every request needs your credentials attached.

Here's how authentication looks in practice:

Python:
python
import requests
from requests.auth import HTTPBasicAuth

API_URL = "https://your-n8n-instance.com/api/v1"
auth = HTTPBasicAuth("your_username", "your_password")

response = requests.get(f"{API_URL}/workflows", auth=auth)
print(response.json())

JavaScript:
javascript
const axios = require('axios');

const auth = {
username: 'your_username',
password: 'your_password'
};

const response = await axios.get(
'https://your-n8n-instance.com/api/v1/workflows',
{ auth }
);

Building Scraping Workflows Programmatically

Creating workflows through the API means defining them as JSON structures. Each workflow contains nodes (the individual steps) and connections (how data flows between them). Here's a complete product scraper that fetches HTML and parses it:

python
import requests
from requests.auth import HTTPBasicAuth

API_URL = "https://your-n8n-instance.com/api/v1"
auth = HTTPBasicAuth("username", "password")

workflow = {
"name": "Product Price Scraper",
"nodes": [
{
"parameters": {},
"name": "Start",
"type": "n8n-nodes-base.start",
"typeVersion": 1,
"position": [250, 300]
},
{
"parameters": {
"url": "https://api.webscraping.ai/html",
"queryParameters": {
"parameters": [
{"name": "api_key", "value": "YOUR_API_KEY"},
{"name": "url", "value": "https://example.com/products"},
{"name": "js", "value": "true"}
]
},
"method": "GET"
},
"name": "Scrape Website",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 3,
"position": [450, 300]
}
],
"connections": {
"Start": {
"main": [[{"node": "Scrape Website", "type": "main", "index": 0}]]
}
},
"active": False
}

response = requests.post(f"{API_URL}/workflows", json=workflow, auth=auth)
workflow_id = response.json()['id']

When you need to handle complex web scraping at scale—dealing with JavaScript-heavy sites, rotating proxies, or bypassing anti-bot systems—integrating a dedicated scraping service makes sense. Services like ScraperAPI handle the infrastructure headaches (proxy rotation, CAPTCHA solving, browser fingerprinting) so your n8n workflows can focus on data extraction logic.

👉 Build production-ready scraping pipelines with automatic proxy rotation and JavaScript rendering

This integration pattern works especially well when you're scraping sites that actively fight automation, since you're offloading the detection-avoidance work to specialized infrastructure.

Triggering and Monitoring Executions

Once your workflow exists, you can fire it off programmatically and watch what happens:

python
def execute_scraping_workflow(workflow_id, target_url):
response = requests.post(
f"{API_URL}/workflows/{workflow_id}/execute",
json={"data": {"target_url": target_url}},
auth=auth
)
return response.json()['executionId']

Run scraping for multiple URLs

urls = [
"https://example.com/product/1",
"https://example.com/product/2"
]

for url in urls:
execution_id = execute_scraping_workflow("workflow_id_here", url)
print(f"Started scraping {url}: {execution_id}")

Monitoring execution status means polling the API until the job finishes:

python
import time

def monitor_execution(execution_id, timeout=300):
start_time = time.time()

while time.time() - start_time < timeout:

response = requests.get(

f"{API_URL}/executions/{execution_id}",

auth=auth

)

execution = response.json()

if execution['finished']:

if execution['data']['resultData']['error']:

print("Execution failed")

return None

return execution['data']['resultData']['runData']

time.sleep(5)

return None

Using Webhooks for Event-Driven Scraping

Webhooks offer a simpler trigger mechanism—just POST to a URL and your workflow runs. Set up a webhook-triggered scraper like this:

python
workflow = {
"name": "Webhook-Triggered Scraper",
"nodes": [
{
"parameters": {
"path": "scrape-data",
"responseMode": "lastNode"
},
"name": "Webhook",
"type": "n8n-nodes-base.webhook",
"typeVersion": 1,
"position": [250, 300]
},
{
"parameters": {
"url": "https://api.webscraping.ai/html",
"queryParameters": {
"parameters": [
{"name": "api_key", "value": "YOUR_API_KEY"},
{"name": "url", "value": "={{$json['url']}}"}
]
}
},
"name": "Scrape URL",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 3,
"position": [450, 300]
}
],
"connections": {
"Webhook": {
"main": [[{"node": "Scrape URL", "type": "main", "index": 0}]]
}
},
"active": True
}

Trigger it with a simple HTTP POST:

python
response = requests.post(
"https://your-n8n-instance.com/webhook/scrape-data",
json={"url": "https://example.com", "js": True}
)

Processing Multiple URLs Efficiently

For batch scraping, you want parallel execution to speed things up:

python
from concurrent.futures import ThreadPoolExecutor

class N8nScraperAPI:
def init(self, api_url, username, password):
self.api_url = api_url
self.auth = HTTPBasicAuth(username, password)

def scrape_urls_parallel(self, urls, workflow_id, max_workers=5):

execution_ids = []

with ThreadPoolExecutor(max_workers=max_workers) as executor:

futures = {

executor.submit(self.create_scraping_job, url, workflow_id): url

for url in urls

}

for future in futures:

url = futures[future]

execution_id = future.result()

execution_ids.append((url, execution_id))

# Wait for completion and gather results

results = {}

for url, execution_id in execution_ids:

result = self.wait_for_result(execution_id)

results[url] = result

return results

Handling Errors and Retries

Production scraping needs solid error handling. Implement exponential backoff for retry logic:

javascript
class N8nScrapingClient {
async executeWithRetry(workflowId, data, retries = 0) {
try {
const response = await axios.post(
${this.apiUrl}/workflows/${workflowId}/execute,
{ data },
{ auth: this.auth }
);

return await this.waitForCompletion(response.data.executionId);

} catch (error) {

if (retries < this.maxRetries) {

const delay = Math.pow(2, retries) * 1000;

await new Promise(resolve => setTimeout(resolve, delay));

return this.executeWithRetry(workflowId, data, retries + 1);

}

throw error;

}

}
}

Key Practices for API-Driven Scraping

Keep these principles in mind when building with the n8n API:

Implement rate limiting on your end to avoid overwhelming your n8n instance. Even though n8n can handle concurrent executions, there's a practical limit before things start slowing down.

Always wrap API calls in try-catch blocks and implement retry logic. Network hiccups happen, and your scraping pipeline should handle them gracefully.

Monitor execution status actively rather than assuming success. Set up alerts for failed jobs so you catch issues before they pile up.

Use environment variables for all credentials and API keys. Never hardcode them in your workflow definitions.

Set appropriate timeouts based on actual page load times. If you're scraping JavaScript-heavy sites, factor in rendering time.

Store workflow definitions in version control. Treat them like code because that's what they are—programmatic definitions of your data extraction logic.

The n8n API transforms web scraping from a point-and-click operation into a programmable system you can integrate, automate, and scale. By combining it with robust scraping infrastructure that handles JavaScript rendering, proxy rotation, and anti-bot protection automatically, you get the flexibility of code with the reliability of enterprise-grade tooling. Whether you're building a monitoring dashboard, aggregating competitive intelligence, or running research at scale, the n8n API gives you the control to do it right.