This page is dedicated to providing a comprehensive list of sources referenced or used in our project. Toggle the drop down option to view our annotations. If you would like to view or download the sources in MLA format, please visit the following page.
The dataset from Opportunity Insights’ study “Diversifying Society’s Leaders? The Determinants and Causal Effects of Admission to Highly Selective Private Colleges” includes anonymized admissions data from private and public institutions linked to income tax records and standardized test scores (SAT and ACT). According to the codebook, the dataset includes variables such as demographics, academic metrics, socioeconomic background, college admission details, and post-college outcomes. It provides pipeline analysis numbers by college and parental income bin for students in the U.S. who took the SAT or ACT in 2011, 2013, or 2015. Parental income is measured as total household-level pre-tax income. Estimates are used rather than exact values to protect privacy. Noise is added to the data to preserve confidentiality, and resulting negative values are replaced with missing values.
The dataset covers 139 selective colleges in the U.S., including Ivy-Plus colleges, other highly selective private colleges, highly selective public flagship colleges, New England Small College Athletic Conference (NESCAC) colleges, and other top-ranked institutions. Some colleges are omitted due to insufficient data or the inability to distinguish between specific campuses of multi-campus state universities.
The data reveals how admissions practices at highly selective private colleges disproportionately favor students from high-income backgrounds. Even when students have comparable test scores, those from wealthier backgrounds have a significant admission advantage. This insight underscores the systemic barriers faced by low-income students in accessing elite educational institutions.
Moreover, by analyzing factors such as alumni preferences, non-academic ratings, and athlete recruitment, the dataset highlights specific admissions practices that contribute to higher representation of affluent students. For instance, non-academic factors often favor students with more resources to invest in extracurricular activities and sports. Not only do students from wealthy families have the time to attend extracurricular activities instead of working, but their counselors from private high schools can also vouch for them, thus increasing their chances of standing out in college applications. As a result, this creates a significant gap between students from the highest 1% income families and those from middle and lower class backgrounds.
The use of anonymized admissions data linked to income tax records and standardized test scores raises concerns about data integrity and privacy. Although anonymization is intended to protect individual identities, there is still a risk of re-identification or misuse of sensitive information. Additionally, since the data is anonymized, it is impossible to verify the accuracy or originality of the data. This lack of transparency may discredit the dataset and the research conducted using it. Questions about the overall reliability and trustworthiness of the data emphasize the necessity for rigorous data handling practices and ethical standards in research.
The assumption that attending an Ivy-Plus college substantially increases a student’s chances of reaching the top 1% of the income distribution may reflect a bias towards the significance of elite institutions in shaping economic outcomes. This assumption may overlook the fact that most students attending these elite schools are already from wealthy families and that family backgrounds can play a crucial role in their success after graduation. Additionally, while one may argue that networking opportunities at these institutions can be advantageous, similar networks can be developed outside of classrooms. Other factors, such as personality, determination, resilience, and talent, also play essential roles in achieving success, which are not fully accounted for in this dataset.