This page is dedicated to an AI Internship experiment. This page shows the main page and the sub page of a given website interns should scrape.
This page is dedicated to an AI Internship experiment. This page shows the main page and the sub page of a given website interns should scrape.
Task Description: We are going to go back to the scraping tool we built in Week 1. Remember that we scraped job title, employer, job location and url information from this URL (https://www.builtinnyc.com/jobs ). This time, we are going to build a comprehensive web scraping tool that scrapes a list of URL (https://www.builtinnyc.com/jobs ) that not only collects the same information (Job Title, Employer, Job Location and URL of the link to the job posting description page) but also access that job posting description page, and scrape any text block that contains information about required skill sets/ general requirements for the job. Again, this job posting description page can be accessed by clicking on the ‘block’ of the job, or the job title name. Note that clicking on the employer information will NOT take you to the job posting description page but rather an information page about the employer.
The code should be able to run at any given point in time. This job posting website updates frequently throughout the day, and the code should be generating different outputs as it should scrape the current website.
The scraped information should be saved in a table format as a csv file under the title ‘job_req_builtin.csv’.
Main Page (https://www.builtinnyc.com/jobs )
This page contains the Job Title, Employer, Job Location and URL of the link to the job posting description page information. For example, the first block in the below screenshot should give a data output of Genius Sports, Account Executive - Advertising Sales, and its URL (which in this case, is https://www.builtinnyc.com/job/senior-account-executive-advertising-sales/7103426 ).
Sub page (loaded from clicking 'Account Executive - Advertising Sales' under the first entry, 'Genius Sports')
(Here is the link to this page: https://www.builtinnyc.com/job/senior-account-executive-advertising-sales/7103426 )
This page should contain the requirements to the job. Note that Some job postings will not explicitly state the requirements. This posting, on the other hand, has some information under the title 'What You'll Bring'. Feel free to click on the posting page to explore the posting page.
This is how the scraping result should look like for this particular example. Remember that this is an example for one job posting and this is NOT THE ONLY OUTPUT what your code should be generating. Your code should be generating the FULL LIST of job postings from the main page.