Crowdsourcing

This project started in 2002. It was among the first in the database research community to study crowdsourcing for data management (focusing on building data integration systems). And it was about 9 years early. Research on crowdsourcing in the database community didn't take off in earnest until around 2011.

Early Work (2002 - 2004)

I focused on crowdsourcing for schema matching. At the time, the term "crowdsourcing" didn't exist, so it was called "mass collaboration". The basic idea however was the same: pose a question, solicit multiple answers, aggregate the answers (e.g., via majority voting).

Building Data Integration Systems via Mass Collaboration, R. McCann, A. Doan, A. Kramnik, and V. Varadarajan. Proc. of the Int. Workshop on Web and Databases (WebDB-03).
Building Data Integration Systems: A Mass Collaboration Approach, A. Doan and R. McCann. Proc. of the IJCAI-03 Workshop on Information Integration on the Web.
Integrating Data from Disparate Sources: A Mass Collaboration Approach, R. McCann, A. Kramnik, W. Shen, V. Varadarajan, O. Sobulo, A. Doan. ICDE-05. Poster.
Matching Schemas in Online Communities: A Web 2.0 Approach, R. McCann, W. Shen, A. Doan. ICDE-08.

Work on Crowdsourcing Knowledge Bases (2005 - 2009)

Subsequently I focused on crowdsourcing to build community-centric knowledge bases, and deployed such a knowledge base called DBLife.

Community Information Management, A. Doan, R. Ramakrishnan, F. Chen, P. DeRose, Y. Lee, R. McCann, M. Sayyadian, and W. Shen.IEEE Data Engineering Bulletin, Special Issue on Probabilistic Databases, 29(1), 2006. Invited.
Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach, P. DeRose, W. Shen, F. Chen, A. Doan, R. Ramakrishnan. VLDB-07.
Building Community Wikipedias: A Human-Machine Approach, P. DeRose, X. Chai, B. Gao, W. Shen, A. Doan, P. Bohannon, J. Zhu. ICDE-08.
Efficiently Incorporating User Feedback into Information Extraction and Integration Programs, X. Chai, B. Vuong, A. Doan, J. Naughton. SIGMOD-09.

Surveys and Miscellaneous Work

Crowdsourcing Systems on the World-Wide Web, A. Doan, R. Ramakrishnan, A. Halevy. Communications of the ACM, 2011.

Crowdsourcing Work in Silicon Valley (2010 - date)

I did a fair amount of crowdsourcing work in industry, from 2010 to date. This work is still ongoing.

Social Media Analytics: the Kosmix Story, with many authors. IEEE Data Engineering Bulletin, Sept 2013.
Entity Extraction, Linking, Classification, and Tagging for Social Media: A Wikipedia-Based Approach, A. Gattani, D. Lamba, N. Garera, M. Tiwari, X. Chai, S. Das, S. Subramaniam, A. Rajaraman, V. Harinarayan, and A. Doan. VLDB-13, industrial paper. slides
Building, Maintaining, and Using Knowledge Bases: A Report from the Trenches, O. Deshpande, D. Lamba, M. Tourn, S. Das, S. Subramaniam, A. Rajaraman, V. Harinarayan, A. Doan. SIGMOD-13, industrial paper. slides
- Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing, C. Sun, N. Rampalli, F. Yang, A. Doan, VLDB-14 (industrial).
- Corleone: Hands-Off Crowdsourcing for Entity Matching, C. Gokhale, S. Das, A. Doan, J. Naughton, N. Rampalli, J. Shavlik, J. Zhu, SIGMOD-14.