Extract real-time job data, salary insights, and hiring trends from Indeed without the hassle of manual searches. This guide walks you through building a Python scraper that bypasses anti-bot protections, automates data collection, and structures job listings into CSV and JSON formats—giving you immediate access to the job market intelligence you need.
So you want to scrape Indeed? Smart move. You're sitting on one of the biggest job databases on the internet, and if you know how to pull that data out, you've got yourself a front-row seat to what's actually happening in the job market.
But here's the thing—Indeed doesn't exactly roll out the red carpet for scrapers. They've got CAPTCHAs, IP blocks, and anti-bot measures that'll stop you cold if you're not careful. That's where this guide comes in. I'm going to show you how to extract job titles, company names, locations, and salaries from Indeed using Python, and we'll do it without getting blocked.
By the end, you'll have a working scraper that can pull down job listings, save them in structured formats, and give you real insights into hiring trends. No fluff, no theory—just practical steps that actually work.
Ready? Let's get into it.
Indeed hosts millions of job postings from companies worldwide. If you need real-time insights into the job market, this is where you go.
Here's what you can do with Indeed job data:
Stay ahead of job market trends: See which industries are growing, what roles are in demand, and how hiring patterns are shifting. You're not guessing anymore—you're looking at actual data.
Research salaries and benefits: Compare compensation across job titles, industries, and locations. Know what companies are offering before you even apply.
Discover in-demand skills: See which qualifications, certifications, and technologies employers are looking for. Stay competitive by knowing what matters.
Monitor company hiring activity: Track job postings from specific companies. Understand their growth strategies and hiring needs.
Improve your hiring strategy: If you're a recruiter or employer, analyzing job descriptions helps you refine your listings, salary offerings, and benefits to attract top talent.
Find better job opportunities: If you're job hunting, analyzing postings helps you spot trends, find companies actively hiring, and tailor your applications accordingly.
When you need structured, real-time job data—whether for research, hiring decisions, or career planning—extracting information from Indeed gives you a significant advantage.
Before we start scraping, let's understand how Indeed structures its job postings so we can extract the correct data efficiently.
Open your browser and go to Indeed.com.
Enter a job title in the search bar (e.g., "Software Developer"). Enter a location in the "Where" field (e.g., "Las Vegas, NV"). Click Search, and you should see a list of job postings. Right-click on any job posting and select Inspect (Chrome) or Inspect Element (Firefox) to open Developer Tools.
Hover over different elements to see how job details are structured in the HTML.
After inspecting the page, we can see that all the job postings are stored in a <div> with the id "mosaic-jobResults".
Each individual job posting is stored inside a <li> tag with the class "css-1ac2h1w eu4oa1w0".
Inside each job container, we can extract the following details:
Job Title: Found inside an <a> tag with the class "jcs-JobTitle css-1baag51 eu4oa1w0".
Company Name: Located within a <span> tag with the class "css-1h7lukg eu4oa1w0".
Location: Appears in a <div> tag with the class "css-1restlb eu4oa1w0".
Salary (if available): Found in an <h2> tag with the class "css-1rqpxry e1tiznh50".
Job Type: Stored in a <div> tag with the class "css-18z4q2i eu4oa1w0", indicating whether the role is full-time, part-time, or contract-based.
Now that we know how Indeed organizes job listings, we can move on to writing a scraper using ScraperAPI and Python to extract this data efficiently.
You'll need the right tools and setup to scrape Indeed job postings. Here are the key requirements to ensure a smooth and efficient scraping process:
You need Python 3.x installed on your system. You can download the latest version from python.org.
We'll use the following Python libraries to handle requests, parse data, and interact with dynamic content:
requests – To send HTTP requests and retrieve HTML content
BeautifulSoup – To parse and extract job data from HTML
Selenium – To interact with dynamic job listings using a headless browser
JSON and CSV – To store and organize scraped data
You can install these libraries using:
bash
pip install requests beautifulsoup4 selenium
Indeed has strict anti-scraping measures, so using ScraperAPI helps avoid detection and bypass restrictions like CAPTCHAs and IP blocking. If you're serious about scraping job boards at scale without getting blocked, you need a tool that handles the technical headaches for you.
👉 Get reliable Indeed data without worrying about CAPTCHAs or IP bans
You'll need to sign up for an account and obtain an API key.
If you're using Selenium, you'll need to download the appropriate WebDriver for your browser:
ChromeDriver (for Google Chrome)
GeckoDriver (for Mozilla Firefox)
Ensure the driver version matches your browser version.
With these tools and libraries set up, you'll be ready to start scraping Indeed job listings efficiently.
Now that you have the necessary tools, it's time to start scraping Indeed job postings using Python and ScraperAPI. Since Indeed dynamically loads job listings using JavaScript, we'll use ScraperAPI's Render Instruction Set to handle JavaScript rendering and extract job data efficiently.
Now that we understand how Indeed structures job listings, let's write a Python script to scrape job postings efficiently. We'll use ScraperAPI to bypass anti-bot protections and BeautifulSoup to parse the HTML content.
Before writing any code, ensure you have the necessary Python libraries installed. Open your terminal or command prompt and run:
bash
pip install requests beautifulsoup4
Then, import the required modules in your Python script:
python
import requests
import json
import time
import random
from bs4 import BeautifulSoup
For this step, you'll need your API Key. Replace "YOUR_API_KEY" with your actual ScraperAPI key here.
BASE_URL: Defines the Indeed job search URL for "Software Developer" jobs in Las Vegas. The {page} parameter will help us paginate through multiple job listings.
Next, let's create a function called scrape_indeed_jobs(). This function will handle sending requests to Indeed, extracting job details, and handling retries if something goes wrong.
python
def scrape_indeed_jobs(start_page):
jobs = [] # Store job listings
page_number = start_page
Indeed lists about 10 jobs per page. We'll scrape multiple pages by incrementing page_number.
python
for _ in range(MAX_PAGES):
attempt = 0 # Keep track of retry attempts
Here, MAX_PAGES defines how many pages we scrape. If MAX_PAGES = 5, we scrape 5 pages (~50 jobs).
Each time we request a new page, we'll pass the correct URL to ScraperAPI:
python
while attempt < MAX_RETRIES:
try:
url = BASE_URL.format(page=page_number)
params = {
"url": url,
"api_key": API_KEY,
}
response = requests.get(SCRAPERAPI_URL, params=params)
How it works: We replace {page} in BASE_URL with the actual page_number. ScraperAPI fetches the page while handling bot detection for us.
If the request is successful (200 OK), we extract job details using BeautifulSoup:
python
if response.status_code == 200:
soup = BeautifulSoup(response.text, "html.parser")
job_elements = soup.find("div", attrs={"id": "mosaic-jobResults"})
individual_job_elements = job_elements.find_all("li", class_="css-1ac2h1w eu4oa1w0")
This extracts all job listings inside <li class="css-1ac2h1w eu4oa1w0">.
We loop through each job posting and extract job title, company name, location, salary (if available), and job type (Full-time, Part-time, etc.).
python
for job_element in individual_job_elements:
job_title = job_element.find_next("a", class_="jcs-JobTitle css-1baag51 eu4oa1w0").find("span").text if job_element.find_next("a", class_="jcs-JobTitle css-1baag51 eu4oa1w0") else "N/A"
company_element = job_element.find_next("span", class_="css-1h7lukg eu4oa1w0") if job_element.find_next("span", class_="css-1h7lukg eu4oa1w0") else "N/A"
location_element = job_element.find_next("div", class_="css-1restlb eu4oa1w0") if job_element.find_next("div", class_="css-1restlb eu4oa1w0") else "N/A"
job_type_element = job_element.find_next("h2", class_="css-1rqpxry e1tiznh50") if job_element.find_next("h2", class_="css-1rqpxry e1tiznh50") else "N/A"
salary_element = job_element.find_next("div", class_="css-18z4q2i eu4oa1w0") if job_element.find_next("div", class_="css-18z4q2i eu4oa1w0") else "N/A"
Once we've extracted the data, we format it nicely and store it in a list:
python
jobs.append({
"Job Title": job_title,
"Company": company_element.text.strip() if company_element != "N/A" else "N/A",
"Location": location_element.text.strip() if location_element != "N/A" else "N/A",
"Salary": salary_element.text.strip() if salary_element != "N/A" else "N/A",
"Job Type": job_type_element.text.strip() if job_type_element != "N/A" else "N/A"
})
Now, we have all job listings stored in jobs.
If we get a 500 error, we retry up to MAX_RETRIES times:
python
elif response.status_code == 500:
print(f"Error 500 on attempt {attempt + 1}. Retrying in {2 ** attempt} seconds...")
time.sleep(2 ** attempt)
attempt += 1
At the end of each loop, we move to the next page and add a random delay (to avoid detection):
python
page_number += 10 # Move to the next page
time.sleep(random.uniform(5, 10))
Finally, we run our scraper and save the results in a JSON file:
python
if name == "main":
job_listings = scrape_indeed_jobs(START_PAGE)
if job_listings:
with open("indeed_jobs.json", "w", encoding="utf-8") as json_file:
json.dump(job_listings, json_file, indent=4, ensure_ascii=False)
print("Saved job posting(s) to 'indeed_jobs.json'")
It will save a JSON file that looks like this:
json
[
{
"Job Title": "Slot Game Designer",
"Company": "Rising Digital Corporation",
"Location": "Las Vegas, NV 89146",
"Salary": "$80,000 - $125,000 a year",
"Job Type": "N/A"
},
{
"Job Title": "Full Stack Developer",
"Company": "Starpoint Resort Group",
"Location": "Las Vegas, NV 89119",
"Salary": "$70,000 - $85,000 a year",
"Job Type": "N/A"
}
]
Now we have all the Indeed job data we need.
Here's the complete code for the scraper:
python
import requests
import json
import time
import random
from bs4 import BeautifulSoup
API_KEY = "YOUR_API_KEY"
SCRAPERAPI_URL = "https://api.scraperapi.com/"
BASE_URL = "https://www.indeed.com/jobs?q=software+developer&l=Las+Vegas&start={page}"
START_PAGE = 0
MAX_RETRIES = 3
MAX_PAGES = 5
def scrape_indeed_jobs(start_page):
jobs = []
page_number = start_page
for _ in range(MAX_PAGES):
attempt = 0
while attempt < MAX_RETRIES:
try:
url = BASE_URL.format(page=page_number)
params = {
"url": url,
"api_key": API_KEY,
}
response = requests.get(SCRAPERAPI_URL, params=params)
if response.status_code == 200:
soup = BeautifulSoup(response.text, "html.parser")
job_elements = soup.find("div", attrs={"id": "mosaic-jobResults"})
individual_job_elements = job_elements.find_all("li", class_="css-1ac2h1w eu4oa1w0")
for job_element in individual_job_elements:
job_title = job_element.find_next("a", class_="jcs-JobTitle css-1baag51 eu4oa1w0").find("span").text if job_element.find_next("a", class_="jcs-JobTitle css-1baag51 eu4oa1w0") else "N/A"
company_element = job_element.find_next("span", class_="css-1h7lukg eu4oa1w0") if job_element.find_next("span", class_="css-1h7lukg eu4oa1w0") else "N/A"
location_element = job_element.find_next("div", class_="css-1restlb eu4oa1w0") if job_element.find_next("div", class_="css-1restlb eu4oa1w0") else "N/A"
job_type_element = job_element.find_next("h2", class_="css-1rqpxry e1tiznh50") if job_element.find_next("h2", class_="css-1rqpxry e1tiznh50") else "N/A"
salary_element = job_element.find_next("div", class_="css-18z4q2i eu4oa1w0") if job_element.find_next("div", class_="css-18z4q2i eu4oa1w0") else "N/A"
jobs.append({
"Job Title": job_title,
"Company": company_element.text.strip() if company_element != "N/A" else "N/A",
"Location": location_element.text.strip() if location_element != "N/A" else "N/A",
"Salary": salary_element.text.strip() if salary_element != "N/A" else "N/A",
"Job Type": job_type_element.text.strip() if job_type_element != "N/A" else "N/A"
})
print(f"Scraped page {page_number // 10 + 1}")
break
elif response.status_code == 500:
print(f"Error 500 on attempt {attempt + 1}. Retrying in {2 ** attempt} seconds...")
time.sleep(2 ** attempt)
attempt += 1
else:
print(f"HTTP {response.status_code}: {response.text[:500]}")
return None
except requests.exceptions.ReadTimeout:
print(f"Timeout on attempt {attempt + 1}. Retrying in {2 ** attempt} seconds...")
time.sleep(2 ** attempt)
attempt += 1
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
return None
page_number += 10
time.sleep(random.uniform(5, 10))
return jobs
if name == "main":
job_listings = scrape_indeed_jobs(START_PAGE)
if job_listings:
with open("indeed_jobs.json", "w", encoding="utf-8") as json_file:
json.dump(job_listings, json_file, indent=4, ensure_ascii=False)
print("Saved job posting(s) to 'indeed_jobs.json'")
So far, we've successfully scraped Indeed job postings using ScraperAPI with Requests and BeautifulSoup. But what if we need an alternative method to handle JavaScript-rendered content with an automated browser? That's where Selenium with a headless browser comes in.
Selenium by itself is often detected as a bot when scraping websites like Indeed. To avoid this, we're using SeleniumWire, which allows us to configure a proxy for all browser traffic. Instead of making requests directly, we'll route everything through ScraperAPI in proxy mode. This makes our browser activity look more like an actual user, helping us bypass bot detection.
First, install the necessary Python libraries:
bash
pip install undetected-chromedriver selenium selenium-wire beautifulsoup4 csv
Here's what we're using: undetected-chromedriver helps bypass bot detection by avoiding Selenium fingerprinting, selenium-wire lets us route traffic through ScraperAPI as a proxy, beautifulsoup4 extracts job data from HTML, and csv saves the scraped data into a file.
Next, we need to set up Selenium with ScraperAPI as a proxy. We start by defining the ScraperAPI proxy URL, which includes our API key. The proxy URL is formatted to tell ScraperAPI that we need JavaScript rendering enabled and that we want traffic to come from the US:
python
APIKEY = 'YOUR_API_KEY'
indeed_url = "https://www.indeed.com/jobs?q=software+developer&l=Las+Vegas"
proxy_url = f"http://scraperapi.render=true.country_code=us:{APIKEY}@proxy-server.scraperapi.com:8001"
We then configure Selenium options to prevent it from being detected as an automated browser. One common way websites detect bots is through "Blink Features", which are automation flags that Selenium leaves behind. By disabling them, we make the browser look more like a normal user session:
python
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--disable-blink-features=AutomationControlled")
Now, we configure SeleniumWire to route all browser traffic through ScraperAPI. Instead of making requests from our local IP, all traffic will go through ScraperAPI, which rotates IPs and handles CAPTCHAs automatically:
python
seleniumwire_options = {
'proxy': {
'http': proxy_url,
'https': proxy_url,
},
'verify_ssl': False,
}
Next, we launch an undetected Chrome browser using SeleniumWire, ensuring that ScraperAPI handles all network requests:
python
import undetected_chromedriver as uc
driver = uc.Chrome(options=options, seleniumwire_options=seleniumwire_options)
print(f"Fetching URL: {indeed_url}")
driver.get(indeed_url)
This command launches a Chrome browser, routes its requests through ScraperAPI, and opens Indeed's job search page for software developers in Las Vegas. Web pages don't always load instantly, so we add a short delay to give Indeed enough time to fully load job postings before scraping:
python
from time import sleep
print("Waiting for page to load...")
sleep(20)
If we scrape too quickly, we might get blocked or receive incomplete results. A 20-second wait time ensures the page loads fully before proceeding.
Now that the page has loaded, we wait until the job listings container appears. Websites sometimes delay rendering content, so we use WebDriverWait to ensure the data is available before we extract it.
If Indeed blocks our request or asks for a CAPTCHA, the script will exit to prevent unnecessary retries. Otherwise, we proceed by parsing the job listings using BeautifulSoup.
Once we have the job listings, we loop through each listing and extract key details:
python
jobs_data = []
for job_element in job_elements:
job_title_tag = job_element.find("a", class_="jcs-JobTitle css-1baag51 eu4oa1w0")
job_title = job_title_tag.find("span").text if job_title_tag else "N/A"
company_element = job_element.find("span", class_="css-1h7lukg eu4oa1w0")
company_name = company_element.text.strip() if company_element else "N/A"
location_element = job_element.find("div", class_="css-1restlb eu4oa1w0")
job_location = location_element.text.strip() if location_element else "N/A"
job_type_element = job_element.find("h2", class_="css-1rqpxry e1tiznh50")
job_type = job_type_element.text.strip() if job_type_element else "N/A"
salary_element = job_element.find("div", class_="css-18z4q2i eu4oa1w0")
salary = salary_element.text.strip() if salary_element else "N/A"
job_link = f"https://www.indeed.com{job_title_tag['href']}" if job_title_tag and job_title_tag.has_attr("href") else "N/A"
jobs_data.append([job_title, company_name, job_location, salary, job_type, job_link])
We check whether each job detail exists before extracting it. If a particular element isn't available, we return "N/A" instead of causing an error.
Once we have collected all job postings, we store them in a CSV file so we can analyze them later.
python
import csv
if jobs_data:
with open("indeed_jobs.csv", "w", newline="", encoding="utf-8") as csv_file:
writer = csv.writer(csv_file)
writer.writerow(["Title", "Company", "Location", "Salary", "Job Type", "Job Link"])
writer.writerows(jobs_data)
print("Saved job posting(s) to 'indeed_jobs.csv'")
print("Scraping session complete.")
This ensures that all job listings are saved in a structured format, which you can use for analysis, job tracking, or market research.
Here's the complete code for the scraper:
python
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import undetected_chromedriver as uc
from bs4 import BeautifulSoup
import csv
from time import sleep
APIKEY = 'YOUR_API_KEY'
indeed_url = "https://www.indeed.com/jobs?q=software+developer&l=Las+Vegas"
proxy_url = f"http://scraperapi.render=true.country_code=us:{APIKEY}@proxy-server.scraperapi.com:8001"
options = Options()
options.add_argument("--disable-blink-features=AutomationControlled")
seleniumwire_options = {
'proxy': {
'http': proxy_url,
'https': proxy_url,
},
'verify_ssl': False,
}
driver = uc.Chrome(options=options, seleniumwire_options=seleniumwire_options)
print(f"Fetching URL: {indeed_url}")
driver.get(indeed_url)
print("Waiting for page to load...")
sleep(20)
try:
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.ID, "mosaic-jobResults"))
)
except:
print("Could not locate job results")
driver.quit()
exit()
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
job_results = soup.find("div", attrs={"id": "mosaic-jobResults"})
job_elements = job_results.find_all("li", class_="css-1ac2h1w eu4oa1w0") if job_results else []
jobs_data = []
for job_element in job_elements:
job_title_tag = job_element.find("a", class_="jcs-JobTitle css-1baag51 eu4oa1w0")
job_title = job_title_tag.find("span").text if job_title_tag else "N/A"
company_element = job_element.find("span", class_="css-1h7lukg eu4oa1w0")
company_name = company_element.text.strip() if company_element else "N/A"
location_element = job_element.find("div", class_="css-1restlb eu4oa1w0")
job_location = location_element.text.strip() if location_element else "N/A"
job_type_element = job_element.find("h2", class_="css-1rqpxry e1tiznh50")
job_type = job_type_element.text.strip() if job_type_element else "N/A"
salary_element = job_element.find("div", class_="css-18z4q2i eu4oa1w0")
salary = salary_element.text.strip() if salary_element else "N/A"
job_link = f"https://www.indeed.com{job_title_tag['href']}" if job_title_tag and job_title_tag.has_attr("href") else "N/A"
jobs_data.append([job_title, company_name, job_location, salary, job_type, job_link])
if jobs_data:
with open("indeed_jobs.csv", "w", newline="", encoding="utf-8") as csv_file:
writer = csv.writer(csv_file)
writer.writerow(["Title", "Company", "Location", "Salary", "Job Type", "Job Link"])
writer.writerows(jobs_data)
print("Saved job posting(s) to 'indeed_jobs.csv'")
print("Scraping session complete.")
Scraping job listings from Indeed gives you access to real-time hiring trends, salary insights, and in-demand skills without manual searching. In this guide, you learned how to scrape Indeed using Python, ScraperAPI, and Selenium, along with strategies to bypass anti-bot protections and structure your data efficiently.
ScraperAPI with Requests is the best method for most scraping needs—it's fast, lightweight, and avoids the overhead of running a browser. However, if JavaScript-heavy pages require automation, a headless browser with Selenium can help, though it comes with added complexity and detection risks.
If you want to scrape job listings quickly, reliably, and without the hassle of CAPTCHAs or IP blocks, 👉 ScraperAPI handles the technical complexity so you can focus on extracting insights from job data. It's built specifically for scenarios like this—where you need consistent, scalable data extraction without constant maintenance or debugging.