The financial crisis of 2007-8 was an extremely complex, world-wide event.  The U.S. government's response to the crisis was arguably as complex, but better documented.  We invite researchers with natural language processing expertise to consider a corpus of reports, hearings, bills, and other transcripts related to the crisis.  We have organized a research competition around the data and these questions:
  • Who was the financial crisis?  We seek to understand the participants in the lawmaking and regulatory processes that formed the government’s response to the crisis: the individuals, industries, and professionals targeted by those policies; the agencies and organizations responsible for implementing them; and the lobbyists, witnesses, advocates, and politicians who were actively involved -- and the connections among them.
  • What was the financial crisis? We seek to understand the cause(s) of the crisis, proposals for reform, advocates for those proposals, arguments for and against, policies ultimately adopted by the government, and the impact of those policies.
Contrasting with “shared tasks” -- common exercises in the NLP community -- an unshared task does not specify a quantitative performance measure for comparing solutions and does not even specify what a solution might look like.  Instead, the organizers provide data and an open-ended prompt. Participants are invited to explore the use of NLP methods to help scholars in political science, communications, and other related fields make sense of a large, complicated corpus.  Participants are invited to show what they can do in the form of short papers describing exploratory research and optional demos.  We believe many such papers will discuss quantitative and qualitative analysis of existing NLP tools and systems on portions of the data, though new implementations are also welcome, as are newly processed datasets that may be more directly usable in future research projects. Read the papers here.

Papers will be reviewed by a panel of judges. These judges will author public responses discussing the relevance of unshared task submissions, suggesting uses in that may be unfamiliar to NLP researchers, as well as new research directions.  Above all, an emphasis is placed on evaluating the potential for future interdisciplinary research stemming from unshared task entries. In addition, the panel of judges may present an award to the entry (or entries) with the greatest potential. Our hope is that new collaborations between NLP researchers and those with substantive interests in political science will develop as a result of the unshared task.

Timeline
  • January 1, 2014:  official data release
  • February 14, 2014:  deadline to register your team
  • April 15 18 25, 2014:  twice-extended deadline for unshared task submissions (paper in ACL 2014 format, up to four pages not counting citations, and optional web-based demo)
  • May 31, 2014:  deadline for public reviews from the panel of judges
  • June 18, 2014: papers and reviews posted
  • June 26, 2014:  announcement of awardees at the ACL Workshop on Language Technologies and Computational Social Science in Baltimore, Maryland

Official Dataset


The following comprise the official data sources for the unshared task.
  • Federal Open Market Committee (FOMC):
  • Federal Crisis Inquiry Commission (FCIC; an independent commission created by Congress to investigate the causes of the crisis):
  • Congressional reports:
  • Congressional bills:
    • Troubled Assets Relief Program, 2008 (TARP)
    • Dodd-Frank Wall Street Reform and Consumer Protection Act (2010)
    • American Recovery and Reinvestment Act of 2009 (Stimulus)
    • Housing and Economic Recovery Act of 2008
    • Public Company Accounting Reform and Investor Protection Act of 2002 (Sarbanes-Oxley)
    • Financial Services Modernization Act of 1999 (Gramm-Leach-Bliley)
    • In addition to the above financial reform bills, the text of all versions of all Congressional bills introduced in the 110th and 111th Congresses
  • Congressional hearings, segmented into turns:
    • Monetary policy (26)
    • TARP (12)
    • Dodd-Frank (61)
    • Other selected committee hearings relating to financial reform (15)
Other Datasets

The additional sources below may be useful or interesting, but they are not part of the official collection.  You are welcome to use them.  We advise you to contact poli.informatics@gmail.com for more information and (in some cases) tools to help you, if you are interested in using these data.  We ask that any data you gather be shared publicly with other researchers in its original and most useful formats.

Finally, we suggest some other kinds of data that might be interesting, but which will require you to start from scratch.  We ask that any data you gather be shared publicly with other researchers in its original and most useful formats.
  • Media coverage of the financial crisis
  • Macroeconomic data during the crisis
  • Congressional floor debates
  • Legislator voting positions
  • Congressional press releases
  • Campaign contributions
  • Lobbying activity
  • Other committee hearings
  • Prior financial regulation legislation and laws
  • Election outcomes data
  • Population data (e.g., income, employment, health)
  • Policy reforms adopted outside the U.S. in response to the financial crisis
Tools and additional data
  • John Wilkerson has shared some Python scripts to help with obtaining 2008 FOMC transcripts.  If you have questions, please contact him (link below).  Download here.
  • William Li has shared the data from the 2008 FOMC meetings.  Download here.
  • Adam Dalton has shared scripts used to convert data from HTML to CSV format.  Download here.
Who is organizing this competition?

PoliInformatics leverages advances in computer science, machine learning, and data visualization to promote analyses of very large and unstructured datasets related to the study of government and politics. The PoliInformatics Research Coordination Network (PInet) is a working group funded by the National Science Foundation to build community and capacity for data-intensive research using open government data. PInet has focused its work on the 2007-8 financial crisis, government policy relating to the crisis, and public response to that policy.  PInet has provided the data and the panel of judges who will respond to the unshared task entries.

The NLP unshared task in PoliInformatics is being organized by:


This material found on this website is based upon work supported by the National Science Foundation under Grant No. 1243917 (Division of Social and Economic Sciences, Directorate for Social, Behavioral & Economic Sciences). Any opinions, findings, and conclusions or recommendations expressed are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


ċ
Wilkerson-code.tgz
(4k)
Noah Smith,
Mar 9, 2014, 3:53 PM