The goals for project 4 are to work with your team to:
Find public corporate greenhouse emissions documents online, tracking how long it takes on average to find each one.
Find the Scope 1 CO2 emissions data within large corporate websites and pdf documents, tracking how long it takes you on average to find the data.
For both of the above goals you must capture and document a repeatable process that can be handed off to others, describing the optimal mix of human effort and use of AI tools.
Sustainable Fitch provides an assessment of the Environmental, Societal, and Governance (ESG) qualities of corporate entities and their financial instruments or securities (e.g. stocks and bonds). Read the Sustainable Fitch overview document (also included below) that describes the goal of collecting Scope 1 greenhouse gas emissions data listed as CO2 emissions, where Scope 1 emissions are defined as those emissions coming directly from a company's own operations. That overview document describes:
Phase 1: AI Development & Validation with Ground Truth Data, and
Phase 2: Exploration and Extension, which includes discovering and collecting documents.
We will actually be flipping the order of the phase 1 and phase 2 tasks described in that document, doing part of the phase 2 tasks first, where we will be gathering ESG documents that contain Scope 1 greenhouse emissions data.
Final Document Dataset: Google Sheet (updated: 11.19.25)
To get a sense of what the task involves, carefully look through the ten sample pdf documents, and the ground truth spreadsheet. You will be gathering data similar to what is in this spreadsheet.
For both the documents gathering as well as the Scope 1 CO2 emissions data extraction you should figure out the optimal mix between human effort and AI assistance. You will likely need to try different sorts of AI tools, measuring your throughput over time.
Find your group's set of ~100 companies in the list shown below. The left-hand column has your team number on the row that is the starting point for your team:
NYSE Bond List
Place the components shown below on your website. Each of these will be graded 0..3 as follows, for a total possible maximum of 12 points:
0 points: Not done
1 point: Partially done, with significant parts incomplete
2 points: Almost everything is there
3 points: Done completely, with all aspects addressed. Includes appropriate visual elements.
Link or embedding of your group's planning document, indicating
What exploratory (learning) you did or still need to do outside of class need to do
Timeline - Who/What/When – work-back from deadline, what needs to happen when, by whom
Link or embedding of your group's presentation slides.
A stand-alone description of your solution. This should have:
A one-sentence description of your solution
The average time per document, using your solution, to find emissions data in a pdf.
The estimated cost savings
A link to your solution, which could be a document, agent(s), or ML model.
Your personal contributions to the project as a STAR story (Situation, Task, Action, Result), the goal being that someday, when you are interviewing for a job, you could bring this up and use it to tell a compelling narrative about this project.