Whether you're managing a personal investment portfolio or hunting for emerging market opportunities, automated stock data extraction gives you instant access to pricing trends and market movements. This guide walks you through building a Python-based scraper that pulls live stock prices from multiple companies and organizes everything into a clean CSV file for analysis.
We're going to scrape live stock prices from Microsoft, Coca-Cola, and Nike off investing.com and dump them straight into a spreadsheet. The whole thing uses Python and BeautifulSoup—nothing fancy, just what works.
The script runs fine on its own, but if you want to scale this thing up and avoid getting blocked, you'll want a proper scraping infrastructure. That's where having rotating proxies and automatic CAPTCHA handling becomes essential—more on that in a bit.
If you've never touched BeautifulSoup before, you might want to get familiar with the basics first. It's pretty straightforward once you understand how HTML parsing works.
Create a folder called "scraper-stock-project" and open it in your text editor. Fire up the terminal and install what you need:
pip3 install bs4
pip3 install requests
Make a new file called "stockData-scraper.py" and import your dependencies.
Requests handles the HTTP calls to grab the HTML, then BeautifulSoup does the parsing. Let's test it on Nike's stock page and check the status code. You're looking for a 200—that means you're good to go.
Once you confirm the request worked, pass that response to BeautifulSoup for parsing:
python
soup = BeautifulSoup(page.text, 'html.parser')
Use whatever parser you want. We like html.parser.
Open https://www.investing.com/equities/nike in your browser. You'll see the company name, stock symbol, current price, and price change right there on the page.
Three things to figure out:
Is JavaScript loading this data?
What attributes can we use to grab the elements?
Are those attributes the same across all pages?
Check for JavaScript
Right-click, View Page Source. If you can see the data in the raw HTML, you're clear. No JavaScript injection here, so Requests will work fine.
This matters because Requests can't execute JavaScript. If the data was hidden behind a script, you'd need something like Selenium instead.
Picking the CSS Selectors
The company name and stock symbol sit inside an H1 tag with class 'text-2xl font-semibold instrument-header_title__GTWDv mobile:mb-2'. Easy grab.
The price and price change are trickier—they're split into separate spans, and the CSS class changes depending on whether the stock is up or down.
Here's the move: go up the DOM tree and find a parent div you can target. Then use find_all('span') to get all the span elements as a list. Since it's a list, you can just index into it and grab what you need.
Your targets look like this:
python
company = soup.find('h1', {'class': 'text-2xl font-semibold instrument-header_title__GTWDv mobile:mb-2'}).text
price = soup.find('div', {'class': 'instrument-price_instrument-price__3uw25 flex items-end flex-wrap font-bold'}).find_all('span')[0].text
change = soup.find('div', {'class': 'instrument-price_instrument-price__3uw25 flex items-end flex-wrap font-bold'}).find_all('span')[2].text
Run a test:
python
print('Loading: ', url)
print(company, price, change)
If it prints the data, you're good.
One stock is boring. Let's scale this up.
Make a list of URLs and loop through them:
python
urls = [
'https://www.investing.com/equities/nike',
'https://www.investing.com/equities/coca-cola-co',
'https://www.investing.com/equities/microsoft-corp',
]
for url in urls:
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
company = soup.find('h1', {'class': 'text-2xl font-semibold instrument-header_title__GTWDv mobile:mb-2'}).text
price = soup.find('div', {'class': 'instrument-price_instrument-price__3uw25 flex items-end flex-wrap font-bold'}).find_all('span')[0].text
change = soup.find('div', {'class': 'instrument-price_instrument-price__3uw25 flex items-end flex-wrap font-bold'}).find_all('span')[2].text
print('Loading: ', url)
print(company, price, change)
Works like a charm across all three pages. Keep adding more URLs to the list—just know you'll eventually hit anti-bot protection.
When you're sending dozens or hundreds of requests, websites start noticing. They use browser fingerprinting, CAPTCHAs, and IP monitoring to spot bots and block them.
If you're serious about scraping at scale, you need proper infrastructure handling these challenges automatically. Instead of building all that yourself, you can route requests through a service that manages IP rotation, handles CAPTCHAs, and optimizes headers for you.
👉 Get automated proxy rotation and CAPTCHA solving for reliable stock data extraction
Integrating a scraping API typically means adding just a couple lines of code. You pass your target URL through their endpoint, and they handle all the blocking prevention behind the scenes. For dynamic sites that load content with JavaScript, most services can render pages before sending back the response.
Between your URL list and your loop, add these three lines:
python
file = open('stockprices.csv', 'w')
writer = csv.writer(file)
writer.writerow(['Company', 'Price', 'Change'])
This creates the CSV and writes the header row. Put it outside the loop or you'll overwrite the file on every iteration.
Inside your loop, add:
python
writer.writerow([company.encode('utf-8'), price.encode('utf-8'), change.encode('utf-8')])
After the loop closes, add:
python
file.close()
Here's the full code. Drop in your own URLs and you're set:
python
import requests
from bs4 import BeautifulSoup
import csv
urls = [
'https://www.investing.com/equities/nike',
'https://www.investing.com/equities/coca-cola-co',
'https://www.investing.com/equities/microsoft-corp',
]
file = open('stockprices.csv', 'w')
writer = csv.writer(file)
writer.writerow(['Company', 'Price', 'Change'])
for url in urls:
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
company = soup.find('h1', {'class': 'text-2xl font-semibold instrument-header_title__GTWDv mobile:mb-2'}).text
price = soup.find('div', {'class': 'instrument-price_instrument-price__3uw25 flex items-end flex-wrap font-bold'}).find_all('span')[0].text
change = soup.find('div', {'class': 'instrument-price_instrument-price__3uw25 flex items-end flex-wrap font-bold'}).find_all('span')[2].text
print('Loading:', url)
print(company, price, change)
writer.writerow([company.encode('utf-8'), price.encode('utf-8'), change.encode('utf-8')])
file.close()
The stock market isn't open 24/7. NYSE closes at 4 pm EST and doesn't reopen until 9:30 am the next business day. No point running your scraper on Saturday night.
The most volatile periods are market open and close. Running your script at 9:30 am, 11 am, and 4:30 pm will catch the major price movements. Monday mornings are especially active—lots of trading happens right after the weekend.
Unlike Forex, stocks don't swing wildly every hour. But major news can tank or spike a stock fast—remember when Meta crashed or GameStop exploded. Keep an eye on the news for the stocks you're tracking.
You now have a working stock market scraper that pulls live data and organizes it for analysis. The basic version handles a handful of stocks just fine. When you're ready to scale up and track hundreds of tickers without getting blocked, 👉 a robust scraping infrastructure makes all the difference. Run it during market hours, keep your request rate reasonable, and you'll have clean, reliable stock data feeding whatever analysis or trading tools you're building.