Scraping modern websites feels impossible when JavaScript rules the page. Static tools fail, CAPTCHAs block you, and infinite scrolling makes you want to quit. Selenium changes that—it mimics real users, handles dynamic content, and unlocks data other tools can't touch.
This guide shows you exactly how to scrape JavaScript-heavy sites, bypass common obstacles, and scale your operations without burning through resources.
Selenium wasn't built for scraping—it was built for testing web apps. But that's exactly why it works so well for dynamic sites.
Traditional scrapers grab HTML and call it a day. Selenium actually opens a browser, waits for JavaScript to run, scrolls the page, and clicks buttons just like you would. If a site loads content as you scroll or hides data behind interactions, Selenium handles it.
The tradeoff? Speed and resources. Selenium runs slower than lightweight parsers because it's rendering entire pages. But when the data you need won't load without JavaScript, there's no substitute.
Use Selenium when:
Content loads dynamically after the page renders
You need to click, scroll, or fill forms to access data
The site uses infinite scrolling or pagination
Traditional scrapers return empty HTML
For simpler tasks, stick with BeautifulSoup or Scrapy. Save Selenium for when nothing else works.
Before diving in, make sure you have:
Python 3.10 or newer installed
The pip package manager
A web driver for your browser (we'll use ChromeDriver)
Install Selenium with one command:
bash
pip install selenium
Download ChromeDriver from the official Chrome for Testing page and add it to your system PATH.
Start every Selenium script with these imports:
python
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
These give you everything needed to control browsers, locate elements, and handle dynamic content.
Here's your cheat sheet for basic scraping operations:
Open a website:
python
driver.get("https://www.example.com")
Capture a screenshot:
python
driver.save_screenshot('screenshot.png')
Scroll to bottom:
python
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Click an element:
python
button = driver.find_element(By.ID, "button_id")
button.click()
Wait for an element:
python
element = WebDriverWait(driver, 10).until(
EC.visibility_of_element_located((By.ID, "element_id"))
)
Handle infinite scrolling:
python
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
Pair with BeautifulSoup:
python
from bs4 import BeautifulSoup
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
That covers 90% of what you'll do with Selenium. Now let's build something real.
Start by setting up Chrome's behavior:
python
chrome_options = webdriver.ChromeOptions()
Want to run without opening a visible browser window? Enable headless mode:
python
chrome_options.add_argument("--headless")
Headless mode runs faster and uses fewer resources—perfect for automated tasks.
Launch Chrome with your configuration:
python
driver = webdriver.Chrome(options=chrome_options)
Open your target site:
python
driver.get("https://google.com/")
When you're done, close everything cleanly:
python
driver.quit()
Screenshots help debug issues:
python
driver.save_screenshot('screenshot.png')
Many sites hide content until you scroll. Here's how to reach it.
Scroll to the bottom:
python
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Scroll to a specific element:
python
element = driver.find_element(By.ID, "element_id")
driver.execute_script("arguments[0].scrollIntoView(true);", element)
Handle infinite scrolling:
python
import time
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
Selenium offers multiple ways to locate elements. Choose the fastest method that works.
By ID (fastest):
python
driver.find_element(By.ID, "element_id")
By class name:
python
driver.find_element(By.CLASS_NAME, "element_class")
By CSS selector:
python
driver.find_element(By.CSS_SELECTOR, "css_selector")
By XPath (slowest but most flexible):
python
driver.find_element(By.XPATH, "xpath_expression")
Click buttons:
python
button = driver.find_element(By.ID, "button_id")
button.click()
Type into text fields:
python
textbox = driver.find_element(By.NAME, "username")
textbox.send_keys("your_username")
Extract text:
python
element = driver.find_element(By.CLASS_NAME, "content")
print(element.text)
Grab attribute values:
python
link = driver.find_element(By.TAG_NAME, "a")
print(link.get_attribute("href"))
Websites plant invisible elements to catch bots. Don't fall for them.
Check if elements are actually visible before interacting:
python
elements = driver.find_elements(By.CSS_SELECTOR, '[style*="display:none"], [style*="visibility:hidden"]')
for element in elements:
if not element.is_displayed():
continue # Skip honeypots
Always verify visibility:
python
button_element = driver.find_element(By.ID, "fakeButton")
if button_element.is_displayed():
button_element.click()
else:
print("Detected honeypot, skipping")
Dynamic sites load content after the initial page renders. Don't interact with elements before they exist.
Here's a real example scraping Amazon search results:
python
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import time
chrome_options = webdriver.ChromeOptions()
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://amazon.com/")
search_bar = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "twotabsearchtextbox"))
)
search_bar.send_keys("headphones")
search_bar.submit()
time.sleep(10)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
products = []
productsHTML = soup.select('div[data-asin]')
for product in productsHTML:
if product.attrs.get('data-asin'):
products.append(product.attrs['data-asin'])
print(products)
driver.quit()
Save scraped data to CSV:
python
import csv
filename = "amazon_asins.csv"
with open(filename, mode="w", newline="", encoding="utf-8") as file:
writer = csv.writer(file)
writer.writerow(["ASIN"])
for asin in products:
writer.writerow([asin])
print(f"Data saved to {filename}")
Or use JSON:
python
import json
filename = "amazon_asins.json"
with open(filename, mode="w", encoding="utf-8") as file:
json.dump(products, file, indent=4)
Remove duplicates and handle missing values with pandas:
python
import pandas as pd
data = pd.DataFrame(products, columns=["ASIN"])
data = data.drop_duplicates()
missing_count = data["ASIN"].isnull().sum()
if missing_count > 0:
data = data.dropna(subset=["ASIN"])
print(f"Dropped {missing_count} rows with missing ASINs")
data.to_csv("cleaned_amazon_asins.csv", index=False)
Dynamic tables with pagination require special handling. Here's how to scrape all pages of a DataTable:
python
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import csv
driver = webdriver.Chrome()
driver.get("https://datatables.net/examples/styling/stripe.html")
table = driver.find_element(By.ID, "example")
headers = [header.text for header in table.find_elements(By.TAG_NAME, "th")]
table_data = []
while True:
rows = table.find_elements(By.CSS_SELECTOR, "tbody tr")
for row in rows:
cells = row.find_elements(By.TAG_NAME, "td")
if cells:
table_data.append([cell.text for cell in cells])
try:
next_button = driver.find_element(By.CSS_SELECTOR, ".dt-paging-button.next")
if "disabled" in next_button.get_attribute("class"):
break
next_button.click()
WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, "tbody tr"))
)
except Exception as e:
print(f"Navigation error: {e}")
break
filename = "datatable_full.csv"
with open(filename, mode="w", newline="", encoding="utf-8") as file:
writer = csv.writer(file)
writer.writerow(headers)
writer.writerows(table_data)
print(f"Scraped {len(table_data)} rows")
driver.quit()
When you need data from multiple sources, navigate between pages systematically:
python
from selenium import webdriver
from selenium.webdriver.common.by import By
import csv
import time
driver = webdriver.Chrome()
driver.get("https://www.imdb.com/search/title/?groups=top_100&sort=user_rating,desc")
movies = driver.find_elements(By.CLASS_NAME, "ipc-metadata-list-summary-item")
data = []
for movie in movies:
title = movie.find_element(By.CLASS_NAME, "ipc-title-link-wrapper")
link = title.get_attribute("href")
rating = movie.find_element(By.CLASS_NAME, "ipc-rating-star--rating").text
data.append({"Title": title.text, "Link": link, "Rating": rating})
for movie in data:
driver.get(movie["Link"])
time.sleep(2)
try:
description = driver.find_element(By.CSS_SELECTOR, ".sc-3ac15c8d-3.bMUzwm").text
movie["Description"] = description
except:
movie["Description"] = "N/A"
filename = "imdb_top_100.csv"
with open(filename, mode="w", newline="", encoding="utf-8") as file:
writer = csv.DictWriter(file, fieldnames=["Title", "Rating", "Description", "Link"])
writer.writeheader()
writer.writerows(data)
print(f"Data saved to {filename}")
driver.quit()
Websites block scrapers that send too many requests from one IP. Rotating proxies solves this.
Need reliable proxy rotation without the headache? 👉 ScraperAPI handles proxies, CAPTCHAs, and JavaScript rendering automatically, letting you focus on extracting data instead of fighting blocks.
Here's how to integrate ScraperAPI with Selenium:
python
from seleniumwire import webdriver
API_KEY = 'YOUR_API_KEY'
proxy_options = {
'proxy': {
'http': f'http://scraperapi:{API_KEY}@proxy-server.scraperapi.com:8001',
'https': f'http://scraperapi:{API_KEY}@proxy-server.scraperapi.com:8001',
'no_proxy': 'localhost,127.0.0.1'
}
}
driver = webdriver.Chrome(seleniumwire_options=proxy_options)
driver.get("https://quotes.toscrape.com/")
quote = driver.find_element(By.CLASS_NAME, "text")
print(quote.text)
driver.quit()
Make your scraper faster with these tweaks:
Block unnecessary resources:
python
chrome_options = webdriver.ChromeOptions()
chrome_prefs = {
"profile.default_content_setting_values": {
"images": 2,
"javascript": 2,
}
}
chrome_options.experimental_options["prefs"] = chrome_prefs
Use headless mode:
python
chrome_options.add_argument("--headless")
Smart waits instead of sleep:
python
wait = WebDriverWait(driver, 10)
element = wait.until(EC.presence_of_element_located((By.ID, "example_id")))
Choose fast locators:
python
driver.find_element(By.ID, "example_id")
driver.find_element(By.XPATH, "//div[@class='example']//span")
High resource consumption: Use headless mode and disable unnecessary features. For large projects, consider distributed solutions.
Slow execution: Use explicit waits, optimize locators, and only render JavaScript when needed.
Anti-bot measures: Rotate IPs and user agents. Better yet, let specialized services handle the complexity while you focus on data extraction.
Dynamic content loading: Implement proper scrolling and waiting mechanisms.
Scaling difficulties: Use Selenium Grid for parallel execution or offload heavy lifting to dedicated scraping infrastructure.
For JavaScript-heavy sites, you don't always need Selenium's overhead. Modern scraping APIs can render pages faster.
When dealing with complex JavaScript sites that need reliable rendering at scale, specialized tools can save you time and resources. They handle browser automation, proxy rotation, and CAPTCHA solving in one package—letting you skip the infrastructure headaches.
For sites requiring precise interactions like typing or clicking, rendering instruction sets provide the control you need without managing browser instances yourself.
You now know how to:
Configure Selenium for scraping dynamic sites
Locate and interact with elements while avoiding traps
Handle dynamic content with proper waiting strategies
Manage proxies to avoid IP bans
Scale operations efficiently
Render JavaScript content without browser overhead
Selenium works best when you need precise browser control. For large-scale operations or heavily protected sites, combining it with specialized scraping infrastructure gives you speed, reliability, and simplicity.
The key is matching your tool to your task. Use Selenium where its browser automation shines, and leverage APIs when efficiency matters more than granular control.
Ready to scrape smarter? Understanding when to use which tool separates successful projects from frustrating ones.