How to Build a Web Scraping Bot with Selenium and Python

Selenium wasn't built for web scraping. It started life as an automation testing tool, helping developers simulate user interactions in web applications. But here's the thing: that same ability to click buttons, fill forms, and wait for pages to load makes it surprisingly good at scraping dynamic websites.

Most basic scraping tools like BeautifulSoup hit a wall when they encounter JavaScript-rendered content. They grab the initial HTML and call it a day, missing everything that loads afterward. Selenium doesn't have that problem because it actually runs a real browser in the background.

We're going to scrape investing.com to pull historical currency exchange rate data. Yes, there are APIs that do this more easily, but the techniques you'll learn here apply to thousands of other websites that don't offer convenient data access.

Understanding the Target Website

The page we're working with shows historical USD to EUR exchange rates in a table format. There's a date range selector at the top, which is exactly what makes this interesting from a scraping perspective.

The URL structure is straightforward: investing.com/currencies/usd-eur-historical-data. Swap "eur" for any other currency code and you'll get that currency's data against the dollar. Want EUR against GBP instead? Just replace "usd" with your base currency.

By default, the page only shows about 20 days of data. We need to interact with those date fields to get the full range we want.

Setting Up the Scraper

Start with the essential imports. We need Selenium's webdriver and some helper classes for waiting and locating elements, plus a sleep function for strategic pauses and Pandas for data handling:

python
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from time import sleep
import pandas as pd

Our main function takes a list of currency codes, a start date, an end date, and an optional flag for CSV export. We'll store each currency's data in a list of DataFrames:

python
def get_currencies(currencies, start, end, export_csv=False):
frames = []

The plan is simple: loop through each currency, scrape its data, and move on to the next one.

Making Selenium Interact with the Page

For each currency, we build the URL and fire up a Chrome driver. The headless option is set to False here so you can watch what's happening, but flip it to True for faster, invisible scraping:

python
for currency in currencies:
my_url = f'https://br.investing.com/currencies/usd-{currency.lower()}-historical-data'
option = Options()
option.headless = False
driver = webdriver.Chrome(options=option)
driver.get(my_url)
driver.maximize_window()

Now comes the interesting part. We need to click the date selector button, clear the default dates, and enter our custom range. This is where WebDriverWait becomes crucial. It tells Selenium to wait up to 20 seconds for each element to become clickable before trying to interact with it:

python
date_button = WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.XPATH,
"/html/body/div[5]/section/div[8]/div[3]/div/div[2]/span")))
date_button.click()

When dealing with websites that require precise user interactions, 👉 reliable proxy solutions become essential for maintaining consistent scraping performance. Once you scale beyond a few requests, rotating your IP addresses prevents rate limiting and blocks.

Next, locate the start date field, clear whatever's in there, and type in your date:

python
start_bar = WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.XPATH,
"/html/body/div[7]/div[1]/input[1]")))
start_bar.clear()
start_bar.send_keys(start)

Repeat the same process for the end date:

python
end_bar = WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.XPATH,
"/html/body/div[7]/div[1]/input[2]")))
end_bar.clear()
end_bar.send_keys(end)

Hit the Apply button and pause for five seconds to let the page reload with your custom date range:

python
apply_button = WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.XPATH,
"/html/body/div[7]/div[5]/a")))
apply_button.click()
sleep(5)

If you kept headless mode off, you'll literally watch the browser clicking through these steps like a ghost is controlling it.

Finally, use Pandas to grab all tables from the page source and close the driver:

python
dataframes = pd.read_html(driver.page_source)
driver.quit()
print(f'{currency} scraped.')

Handling Failures Gracefully

Selenium can be temperamental. Pages sometimes load slowly, elements occasionally fail to appear, and network hiccups happen. Instead of letting one failure kill your entire scrape, wrap everything in a try-except block inside an infinite loop.

If something goes wrong, the code quits the driver (important for not clogging your memory), waits 30 seconds, and tries again:

python
for currency in currencies:
while True:
try:
# All the scraping code from above goes here
break
except:
driver.quit()
print(f'Failed to scrape {currency}. Trying again in 30 seconds.')
sleep(30)
continue

This retry logic keeps your scraper running even when individual requests fail.

Extracting the Right Data

The page source contains multiple tables, but we only want one. Loop through the DataFrames and check if the column names match what we expect:

python
for dataframe in dataframes:
if dataframe.columns.tolist() == ['Date', 'Price', 'Open', 'High', 'Low', 'Change%']:
df = dataframe
break

frames.append(df)

If the user requested CSV exports, save each currency's data to a file:

python
if export_csv:
df.to_csv('currency.csv', index=False)
print(f'{currency}.csv exported.')

After looping through all currencies, return the list of DataFrames:

python
return frames

Taking It Further

This scraper works well for a handful of currencies, but what if you want to scale up and scrape dozens of assets? You'll need to add more strategic delays between requests to avoid hammering the server.

More importantly, you'll want to rotate your IP addresses. Websites track how many requests come from each IP, and sustained scraping from a single address triggers rate limits or outright blocks. 👉 Using a residential proxy network lets you distribute requests across thousands of real IP addresses, making your scraper look like normal user traffic instead of a bot.

You could also extend this code to scrape stocks, commodities, indices, or futures using the same interaction pattern. The logic remains identical; only the URLs and table structures change.

Another useful addition would be an update function that takes an existing DataFrame and refreshes it with data up to the current date. This would let you maintain historical datasets without re-scraping everything from scratch.

The same Selenium techniques apply to countless other scenarios: logging into protected areas, navigating multi-step forms, handling dropdown menus, or waiting for search results to load. It's not always the most elegant solution, but when you need to mimic human behavior on a website, Selenium gets the job done.

Page updated

Google Sites

Report abuse