Tired of manually browsing through endless job listings? Want to grab thousands of LinkedIn job postings in seconds instead of hours? This guide walks you through building a practical LinkedIn job scraper using Node.js and Cheerio—no fancy frameworks needed, just straightforward tools that get the job done. You'll learn how to handle JavaScript-rendered content, work around infinite scroll pagination, and export clean data ready for analysis.
Look, before we dive into code, let's address the elephant in the room.
Yes, scraping public LinkedIn data is legal. There's a 2019 case (LinkedIn vs. HiQ) that established this. LinkedIn appealed to the Supreme Court, but we haven't heard back yet. Until then, the 9th Circuit's decision stands.
The key word here is "public." We're not logging into accounts or accessing private information. We're just grabbing what anyone can see without signing in—job titles, company names, locations. Think of it like copying down listings from a bulletin board that's posted in a public square.
We'll keep things respectful and straightforward. No shady tactics, no overloading servers, just clean data extraction.
First things first—you'll need Node.js installed. Head over to the Node.js website and grab the installer for your system. If you're on an M1 Mac like us, get the ARM64 version.
Once that's done, create a folder called "linkedin-scraper-project" and open it in your code editor. Fire up the terminal and initialize a new project:
npm init -y
Now install the two libraries we'll use:
npm install axios cheerio
Axios handles our HTTP requests, and Cheerio parses HTML—think of it as jQuery for the backend. Simple, efficient, no unnecessary bloat.
Create an index.js file and add these imports at the top:
javascript
const axios = require('axios');
const cheerio = require('cheerio');
Open LinkedIn's homepage (make sure you're logged out) and you'll see a job search form right there. Type in something like "email developer" and hit search.
Each job appears in its own card with the title, company, location, and a link to the full listing. Inspect the page and you'll see everything sits inside <li> elements with a clean, predictable structure.
But here's the catch: LinkedIn uses infinite scrolling. As you scroll down, new jobs load automatically. No numbered pagination to work with.
Most people would reach for a headless browser at this point. Puppeteer, Selenium, the whole nine yards. But that's overkill for what we need. Let's be smarter.
Open your browser's DevTools and switch to the Network tab. Reload the page and scroll down past the initial batch of jobs.
Watch what happens—LinkedIn sends a fetch request to load more data. That's our golden ticket.
Click on that request and copy the URL. Paste it into your browser and boom—you've got a plain HTML page with the same job structure. No JavaScript rendering needed.
When dealing with complex websites that implement anti-bot measures, having reliable infrastructure becomes crucial. Tools like 👉 ScraperAPI handle proxy rotation, headless browsers, and CAPTCHA solving automatically, letting you focus on extracting data instead of fighting anti-scraping systems. It's especially useful when scaling beyond simple test scripts to production-level data collection.
Compare the URLs from page one and page two:
Page 1: ...&pageNum=0&start=0
Page 2: ...&pageNum=0&start=25
The start parameter increases by 25 for each page. Change it to start=0, start=25, start=50, and you've got sequential access to all listings. No scrolling, no browser automation, just direct URL manipulation.
Let's start with a simple request to grab the first page:
javascript
const axios = require('axios');
const cheerio = require('cheerio');
axios(url)
.then(response => {
const html = response.data;
const $ = cheerio.load(html);
})
Run node index.js to test it. If you get HTML dumped to your console, you're in business.
Back in DevTools, inspect the job cards to find the class names we need:
Job title: h3.base-search-card__title
Company: h4.base-search-card__subtitle
Location: span.job-search-card__location
URL: a.base-card__full-link
Test these selectors in the browser console using document.querySelectorAll() to make sure they work before adding them to your script.
Now we iterate through each job listing and pull out the information:
javascript
linkedinJobs = [];
const jobs = $('li');
jobs.each((index, element) => {
const jobTitle = $(element).find('h3.base-search-card__title').text().trim();
const company = $(element).find('h4.base-search-card__subtitle').text().trim();
const location = $(element).find('span.job-search-card__location').text().trim();
const link = $(element).find('a.base-card__full-link').attr('href');
linkedinJobs.push({
'Title': jobTitle,
'Company': company,
'Location': location,
'Link': link,
});
});
Notice the .trim() method? LinkedIn's HTML has extra whitespace everywhere, so we clean it up before storing.
For the URL, we use .attr('href') instead of .text() because we want the actual link, not the text inside the element.
We know the start parameter increments by 25 and maxes out around 1000. A simple for loop handles this perfectly:
javascript
for (let pageNumber = 0; pageNumber < 1000; pageNumber += 25) {
let url = https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=email%2Bdeveloper&location=United%2BStates&geoId=103644278&start=${pageNumber};
// ... rest of your scraping code
}
This sends requests for pages 0, 25, 50, 75, and so on until there's no more data.
Install the objects-to-csv package:
npm install objects-to-csv
Add it to your imports and create the CSV after each page:
javascript
const ObjectsToCsv = require('objects-to-csv');
// ... inside your .then() block, after jobs.each()
const csv = new ObjectsToCsv(linkedinJobs);
csv.toDisk('./linkedInJobs.csv', { append: true });
Setting append: true adds new data to the file instead of overwriting it each time.
Here's everything put together:
javascript
const axios = require('axios');
const cheerio = require('cheerio');
const ObjectsToCsv = require('objects-to-csv');
linkedinJobs = [];
for (let pageNumber = 0; pageNumber < 1000; pageNumber += 25) {
let url = https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=email%2Bdeveloper&location=United%2BStates&geoId=103644278&start=${pageNumber};
axios(url)
.then(response => {
const html = response.data;
const $ = cheerio.load(html);
const jobs = $('li');
jobs.each((index, element) => {
const jobTitle = $(element).find('h3.base-search-card__title').text().trim();
const company = $(element).find('h4.base-search-card__subtitle').text().trim();
const location = $(element).find('span.job-search-card__location').text().trim();
const link = $(element).find('a.base-card__full-link').attr('href');
linkedinJobs.push({
'Title': jobTitle,
'Company': company,
'Location': location,
'Link': link,
});
});
const csv = new ObjectsToCsv(linkedinJobs);
csv.toDisk('./linkedInJobs.csv', { append: true });
})
.catch(console.error);
}
Run it with node index.js and watch thousands of job listings get saved to your CSV in seconds.
Want to take this further? Here are some ideas:
Filter by keyword: Modify the script to only save jobs where the title contains "email developer" or "html email." You'll need to add an if statement before pushing to the array.
Add rate limiting: Right now we're hitting LinkedIn fast and hard. Add a delay between requests using setTimeout() or the delay npm package. Five seconds between pages keeps things respectful.
Multiple searches: Wrap the whole thing in another loop to search for different job titles or locations automatically.
You now have a working LinkedIn job scraper that bypasses infinite scrolling, handles clean data extraction, and exports everything to CSV. No headless browsers, no complex frameworks—just straightforward Node.js and some clever URL manipulation.
The key takeaway? Understanding how websites load their data lets you access it directly instead of fighting with browser automation. Next time you're facing a tricky scraping challenge, open DevTools and see what's happening under the hood. For production-level scraping at scale, 👉 ScraperAPI offers the infrastructure to handle thousands of requests reliably without getting blocked, making it easier to build robust data pipelines.