SuperScraper API: A Flexible REST API for Scraping Any Website

Looking for a straightforward way to extract HTML from dynamic websites without wrestling with headless browsers yourself? SuperScraper API gives you a simple REST endpoint—just pass a URL and get back fully rendered content. It works seamlessly with popular scraping service interfaces, scales on demand, and handles the messy parts like proxy rotation and anti-blocking measures for you.

When you're building data pipelines or need to gather information from modern web pages, the technical overhead can pile up fast. You need headless browsers for JavaScript-heavy sites, proxy management to avoid blocks, and infrastructure that scales when your workload spikes. SuperScraper API handles all of this through a clean REST interface.

What You Get Out of the Box

The API extracts HTML from any URL using a headless browser, so you capture content even when it's loaded dynamically through JavaScript. It automatically routes requests through datacenter or residential proxies and applies browser fingerprinting techniques to slip past common blocking mechanisms. When your scraping needs grow—whether that's hundreds or thousands of pages—the infrastructure scales without you lifting a finger.

You can also grab screenshots of pages, which comes in handy when you need visual proof of what a site looked like at a specific moment or want to verify rendering issues.

SuperScraper API runs on Apify's new Standby mode, which means it's not a traditional Actor you start from the console. Instead, you call it directly via HTTP. This makes integration into existing workflows simple—it's just another API endpoint.

Getting Started

You'll need an Apify API token, which you can grab from Settings > Integrations in the Apify Console. Once you have that, authentication works two ways: either pass the token in an Authorization header (recommended for production), or append it as a token query parameter (handy for quick browser tests).

Here's a basic example using curl:

bash
curl -X GET
'https://super-scraper-api.apify.actor/?url=https://apify.com/store&wait_for=.ActorStoreItem-title&screenshot=true&json_response=true'
--header 'Authorization: Bearer '

This request fetches the Apify store page, waits for elements with the class ActorStoreItem-title to appear, captures a screenshot, and returns a detailed JSON response.

Key Parameters That Shape Your Requests

The API supports a wide range of parameters borrowed from popular scraping services like ScrapingBee, ScrapingAnt, and ScraperAPI. This compatibility means if you've worked with those tools before, you'll feel right at home.

URL and rendering: The url parameter is required—it's the page you want to scrape. By default, render_js is set to true, meaning the API uses a headless browser to render JavaScript. If you're scraping a static page and want to save resources, you can flip this to false.

Timing controls: Sometimes pages take a moment to load everything you need. Use wait to pause for a specific number of milliseconds, wait_for to hold until a CSS selector appears, or wait_browser to wait for browser events like load, domcontentloaded, or networkidle.

Screenshots: Set screenshot to true for a viewport capture, or use screenshot_full_page to grab the entire scrollable page. You can even target specific elements with screenshot_selector. When json_response is enabled, screenshots come back as Base64-encoded strings.

Proxy options: By default, the API uses datacenter proxies. If you're hitting sites with aggressive blocking, switch on premium_proxy for residential IPs. You can also specify a country_code (using two-letter ISO codes) to route through a particular region—useful when content varies by location. For Google-related sites, there's a dedicated custom_google flag.

Custom behavior: The js_scenario parameter lets you run JavaScript instructions after the page loads—clicking buttons, filling forms, scrolling, or evaluating custom code. The extract_rules parameter accepts JSON-based rules to pull specific data from the page, saving you from parsing HTML yourself. 👉 If you're comparing tools or need more robust scraping features for handling complex anti-bot systems, check out what ScraperAPI offers—it's worth exploring when you need enterprise-grade reliability for high-volume projects.

Response format: By default, you get back plain HTML. Set json_response to true for a verbose response that includes metadata like response headers, status codes, and any extracted data or evaluation results.

Custom Data Extraction Without Parsing HTML

Instead of dealing with HTML parsing libraries, you can define extraction rules directly in your API call. These rules use CSS selectors to target elements and specify what data to pull—text content, HTML, attributes, or even nested structures.

For example, if you want to scrape all blog post links and titles from a page, you'd structure your extract_rules like this:

json
{
"posts": {
"selector": "article.post",
"type": "list",
"output": {
"title": "h2.title",
"link": {
"selector": "a.read-more",
"output": "@href"
}
}
}
}

The API returns the extracted data in a clean JSON structure, already organized the way you specified. You can nest rules, target attributes by prefixing them with @, and even scrape tables directly into JSON or array formats using table_json or table_array.

Running JavaScript on the Page

The js_scenario parameter lets you automate interactions after a page loads. You define a sequence of instructions—waiting, clicking, scrolling, filling forms, or evaluating custom JavaScript—and the browser executes them in order.

Each instruction is a simple JSON object. For instance:

json
[
{"wait_for": "#cookie-banner"},
{"click": "#accept-cookies"},
{"wait": 2000},
{"scroll_y": 1000},
{"evaluate": "document.querySelectorAll('a').length"}
]

By default, scenarios run in strict mode, meaning if one instruction fails, the rest are skipped. You can disable this by wrapping your scenario in an object with "strict": false.

When you set json_response to true, the API includes results from any evaluate instructions in an evaluate_results field, letting you run quick data checks or calculations without a separate scraping step.

Pricing and Cost Control

You pay based on actual Apify platform usage—compute, storage, and network resources. Costs vary depending on the target sites, your parameter choices, request volume, and random factors like network conditions or site responsiveness.

The best way to understand your costs is to run a test with your real-world use case. Free accounts have higher per-unit pricing than paid plans, so if you're scaling up, you'll see better rates on higher tiers.

For reference, a typical test of 30 sequential requests plus 50 batched requests gives you a baseline to estimate your budget. From there, you can adjust parameters like block_resources (which blocks images and CSS by default) or premium_proxy usage to fine-tune costs.

Browser and Device Configuration

You can control viewport dimensions with window_width and window_height, useful when scraping responsive sites that serve different content based on screen size. The device parameter switches between desktop (default) and mobile user agents, triggering mobile-specific layouts.

If you need to scrape authenticated content, pass cookies as a string via the cookies parameter in the format name1=value1;name2=value2. You can also bring your own proxy by specifying own_proxy in the format protocol://username:password@host:port.

Header Forwarding

By default, HTTP headers starting with Spb- or Ant- prefixes are forwarded to the target page (without the prefix). This lets you pass custom headers while the API still sends its own required headers.

If you want exclusive control, use forward_headers_pure, which forwards only your prefixed headers and skips the API's defaults. This is helpful when you need to spoof specific request characteristics.

Handling Errors and Status Codes

Normally, if the target page responds with a status code outside the 200-299 range or a 404, the API returns a 500 error. This protects you from silently accepting failed requests.

If you'd rather handle status codes yourself, set transparent_status_code to true. The API will then pass through whatever status code the target site returned, letting you build custom retry logic or logging.

A Note on Compatibility

SuperScraper API intentionally mirrors the interfaces of ScrapingBee, ScrapingAnt, and ScraperAPI, making it easy to swap or compare services. A few parameters like session_id, block_ads, session_number, and autoparse aren't currently supported, but the core functionality covers most real-world scraping scenarios.

SuperScraper API strips away the complexity of browser automation, proxy management, and scaling headaches. You get a straightforward REST endpoint that handles dynamic rendering, anti-blocking measures, and flexible data extraction—all without managing infrastructure. Whether you're pulling structured data, capturing screenshots, or running custom JavaScript interactions, it adapts to your workflow. For teams looking to compare or complement this with a proven solution built for high-volume, enterprise-grade web scraping, 👉 ScraperAPI delivers robust performance across millions of requests.

Page updated

Google Sites

Report abuse