How to Build a LinkedIn Data Scraper with Python: A Practical Guide

Extracting LinkedIn data helps you track job market trends, discover opportunities, and analyze competitor hiring patterns. This guide walks you through building efficient scrapers for job listings, profiles, and company data—while navigating LinkedIn's anti-scraping measures.

So you want to scrape LinkedIn? Smart move. Whether you're tracking job trends, building a candidate database, or keeping tabs on competitors, LinkedIn holds a goldmine of professional data. The trick? Getting it without getting blocked.

LinkedIn isn't exactly rolling out the welcome mat for scrapers. They've got anti-bot systems that can spot automated requests faster than you can say "HTTP 429." But here's the thing—with the right approach, you can collect the data you need without tripping alarms.

This guide shows you three practical methods: scraping public job data through a hidden API, collecting profile information with Selenium, and gathering company details through search results. Each approach has its uses. Let's figure out which one fits your project.

Scraping LinkedIn Job Listings: The Smart Way

First, let's tackle job listings. These are public, searchable, and surprisingly accessible once you know where to look.

Finding the Hidden API Endpoint

Here's a neat trick: LinkedIn uses infinite scrolling for job results, which means no "next page" buttons and no changing URLs. Annoying, right? But if you pop open Chrome DevTools and check the Network Tab, you'll spot something interesting.

When you scroll down, the browser sends fetch requests to a specific endpoint. That endpoint returns pure HTML with all the job data you want—no JavaScript rendering required. Copy that URL, tweak the start parameter, and you've got yourself a pagination system.

For a search like "Product Management" in San Francisco, the URL looks like this:

https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=Product%20Management&location=San%20Francisco%20Bay%20Area&start=0

Change start=0 to start=25 for the next page, start=50 for the one after, and so on. Simple.

Building the Scraper

Start with the basics—install your dependencies:

bash
pip3 install requests beautifulsoup4

Then set up your imports:

python
import csv
import requests
from bs4 import BeautifulSoup

Now create a function that takes the base URL and a page number, combines them, sends the request, and parses the response:

python
def linkedin_scraper(webpage, page_number):
next_page = webpage + str(page_number)
response = requests.get(next_page)
soup = BeautifulSoup(response.content, 'html.parser')

jobs = soup.find_all('div', class_='base-card')

for job in jobs:

job_title = job.find('h3', class_='base-search-card__title').text.strip()

job_company = job.find('h4', class_='base-search-card__subtitle').text.strip()

job_location = job.find('span', class_='job-search-card__location').text.strip()

job_link = job.find('a', class_='base-card__full-link')['href']

# Write to CSV (we'll set this up next)

Exporting to CSV

Before your function runs, open a CSV file and create the headers:

python
file = open('linkedin-jobs.csv', 'a')
writer = csv.writer(file)
writer.writerow(['Title', 'Company', 'Location', 'Apply'])

Inside your loop, write each job's data:

python
writer.writerow([
job_title.encode('utf-8'),
job_company.encode('utf-8'),
job_location.encode('utf-8'),
job_link.encode('utf-8')
])

Handling Pagination

Add a condition to keep scraping until you hit your limit:

python
if page_number < 975: # LinkedIn's last available page
page_number = page_number + 25
linkedin_scraper(webpage, page_number)
else:
file.close()
print('Scraping complete')

Call your function with the base URL and starting page:

python
linkedin_scraper('https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=Product%20Management&location=San%20Francisco%20Bay%20Area&start=', 0)

Scaling Without Getting Blocked

Here's where things get real. Scraping a few pages? No problem. Scraping thousands? You'll need more than basic requests.

LinkedIn watches for suspicious patterns—same IP hitting them repeatedly, requests coming in too fast, missing browser headers. Get flagged, and you're done.

That's where a tool like ScraperAPI comes in handy. Instead of managing proxies, rotating IPs, and handling CAPTCHAs yourself, you route your requests through their API. They handle the messy stuff—IP rotation, header management, JavaScript rendering—so you can focus on extracting data. For sites as aggressive as LinkedIn, tools designed to handle heavy anti-scraping measures can save you days of headaches and keep your scraper running smoothly.

👉 If you're planning to scale beyond test runs and need reliable data extraction without the constant battle against blocks, ScraperAPI handles the infrastructure so you can focus on building. It's built specifically for challenging sites and takes care of proxy rotation, CAPTCHA solving, and all the technical overhead that usually stops scrapers in their tracks.

To use it, grab your API key and modify your request URL:

python
API_KEY = "your_api_key_here"
url = f"http://api.scraperapi.com?api_key={API_KEY}&url=https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=Product%20Management&start=0"

response = requests.get(url)

For particularly stubborn pages, enable ultra-premium proxies:

python
url = f"http://api.scraperapi.com?api_key={API_KEY}&ultra_premium=true&url=..."

Scraping LinkedIn Profiles with Selenium

Job listings are one thing. Profile data—names, locations, current roles—requires a different approach. You'll need to log in, which means using a headless browser.

Quick warning: Scraping behind login walls gets legally murky. Use a burner account, go slow, and don't violate terms of service.

Setting Up Selenium

Install the necessary libraries:

bash
pip install selenium selenium-wire undetected-chromedriver beautifulsoup4 lxml

Import them:

python
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from time import sleep
import undetected_chromedriver as uc
from bs4 import BeautifulSoup
import csv

Configure Chrome to minimize detection:

python
options = Options()
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--headless=new")

driver = uc.Chrome(options=options)

Logging In

Navigate to your search URL (for example, "junior web developers" in the US and India):

python
linkedin_url = 'https://www.linkedin.com/search/results/people/?keywords=junior%20web%20developer'
driver.get(linkedin_url)

username_field = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.ID, 'username'))
)
username_field.send_keys('your_email@example.com')
driver.find_element(By.ID, 'password').send_keys('your_password')
driver.find_element(By.CSS_SELECTOR, "button[type='submit']").click()

sleep(10) # Wait for login

Extracting Profile Data

Loop through search result pages:

python
with open('linkedin_profiles.csv', 'w', newline='', encoding='utf-8') as csvfile:
fieldnames = ['Name', 'Profile Link', 'Job Title', 'Location', 'Current Role']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()

for page_number in range(1, 6):

page_url = linkedin_url + f"&page={page_number}"

driver.get(page_url)

sleep(5)

soup = BeautifulSoup(driver.page_source, 'lxml')

profiles = soup.find_all('li', class_='reusable-search__result-container')

for profile in profiles:

name = profile.find('span', {'aria-hidden': 'true'}).get_text(strip=True) if profile.find('span', {'aria-hidden': 'true'}) else "LinkedIn Member"

profile_link = profile.find('a', class_='app-aware-link')['href'] if profile.find('a', class_='app-aware-link') else "N/A"

job_title = profile.find('div', class_='entity-result__primary-subtitle').get_text(strip=True) if profile.find('div', class_='entity-result__primary-subtitle') else "N/A"

location = profile.find('div', class_='entity-result__secondary-subtitle').get_text(strip=True) if profile.find('div', class_='entity-result__secondary-subtitle') else "N/A"

writer.writerow({

'Name': name,

'Profile Link': profile_link,

'Job Title': job_title,

'Location': location,

'Current Role': "N/A"

})

driver.quit()

Scraping Company Data

Same concept, different target. Search for companies, extract their details—name, industry, follower count, job openings.

Navigate to a company search (for example, tech companies with 11-50 employees in the US):

python
linkedin_url = 'https://www.linkedin.com/search/results/companies/?companyHqGeo=%5B%22103644278%22%5D&companySize=%5B%22C%22%5D'
driver.get(linkedin_url)

python
with open('linkedin_companies.csv', 'w', newline='', encoding='utf-8') as csvfile:
fieldnames = ['Company Name', 'Profile URL', 'Industry', 'Location', 'Number of Followers']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()

for page_number in range(1, 6):

page_url = linkedin_url + f"&page={page_number}"

driver.get(page_url)

sleep(5)

soup = BeautifulSoup(driver.page_source, 'lxml')

companies = soup.find_all('li', class_='reusable-search__result-container')

for company in companies:

company_name = company.find('span', class_='entity-result__title-text').get_text(strip=True) if company.find('span', class_='entity-result__title-text') else "N/A"

profile_url = company.find('a', class_='app-aware-link')['href'] if company.find('a', class_='app-aware-link') else "N/A"

writer.writerow({

'Company Name': company_name,

'Profile URL': profile_url,

'Industry': "N/A",

'Location': "N/A",

'Number of Followers': "N/A"

})

driver.quit()

Turning LinkedIn Pages into LLM-Ready Data

Scraping structured fields is useful, but what if you want summaries, career highlights, or comparisons? Parsing HTML for every detail gets tedious fast.

Instead, grab the entire profile as Markdown and feed it straight into a language model like Gemini. No cleanup, no manual extraction—just clean, structured text ready for analysis.

Request a LinkedIn profile in Markdown format:

python
import requests

API_KEY = "your_scraperapi_key"
url = "https://www.linkedin.com/in/sundarpichai/"

payload = {
"api_key": API_KEY,
"url": url,
"ultra_premium": True,
"output_format": "markdown"
}

response = requests.get("http://api.scraperapi.com", params=payload)
markdown_data = response.text

Now send that Markdown to Gemini for analysis:

python
import google.generativeai as genai

genai.configure(api_key="your_gemini_api_key")
model = genai.GenerativeModel(model_name="gemini-2.0-flash")

prompt = f"""
Based on this LinkedIn profile, provide:

A 3-sentence professional summary
Three key career milestones
Core skills and expertise

Profile:
{markdown_data}
"""

response = model.generate_content(prompt)
print(response.text)

Wrapping Up

Scraping LinkedIn doesn't have to be complicated. Pick the right method for your needs—hidden API for job listings, Selenium for logged-in data, Markdown for LLM processing. Each has its place.

The key? Don't rush. Test your selectors, handle errors gracefully, and respect rate limits. If you're scaling beyond a few hundred pages, invest in proper infrastructure to avoid getting blocked. For projects that need to handle thousands of profiles or job listings reliably, ScraperAPI manages the technical challenges so your scraper stays running—rotating proxies, bypassing CAPTCHAs, and keeping your IP safe from anti-scraping systems.

Happy scraping.

Frequently Asked Questions About LinkedIn Scraping

Is it legal to scrape LinkedIn?
Scraping public data is generally permitted, but accessing data behind login walls or violating LinkedIn's terms of service can lead to legal issues. Always check LinkedIn's policies and consider using official APIs when available.

Why do I keep getting blocked when scraping LinkedIn?
LinkedIn uses sophisticated anti-bot detection that monitors IP addresses, request patterns, and browser fingerprints. Frequent requests from the same IP, missing headers, or automation signals trigger blocks.

What's the difference between scraping with Requests and Selenium?
Requests is faster and lighter for static pages, while Selenium renders JavaScript and handles login-required pages. Use Requests for public job listings and Selenium for profile or company data behind authentication.

How many pages can I scrape before getting blocked?
It varies. Without proper IP rotation and rate limiting, you might get blocked after just a few dozen requests. With infrastructure designed for scraping at scale, you can collect thousands of pages reliably.

Can I use the official LinkedIn API instead of scraping?
LinkedIn's official API has strict limitations and doesn't provide access to much of the data available through scraping. For large-scale data collection, web scraping remains the more practical option.

Page updated

Google Sites

Report abuse