Data Cleaning- Raman, Hellerstein. Potter's Wheel: An Interactive Data Cleaning System. VLDB 2001.
- Galhardas, Florescu and Shasha. Declarative Data Cleaning: Language, Model, and Algorithms. VLDB 2001.
- Dasu, Johnson, Muthukrishnan and Shkapenyuk. Mining database structure; or, how to build a data quality browser. SIGMOD 2002.
- Huhtala, Karkkainen, Porkka and Toivonen. TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies.
- Hellerstein. Quantitative Data Cleaning for Large Databases. Report for UNECE, 2008.
- Sarawagi and Bhamidipaty. Interactive Deduplication Using Active Learning. KDD 2002. [citeseer]
- See also their ICDE demo description on scaling the ALIAS system. [pdf]
- Other topics
Data Integration and Beyond- Introduction to Data Integration. Chapters 1 & 2 of "Principles of Data Integration" by Doan, Halevy and Ives. The chapters will be made available closer to the date.
- Generic Schema Matching with Cupid - Madhavan, Bernstein, Rahm, VLDB 2001.
- Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach. AnHai Doan, Pedro Domingos, Alon Y. Halevy. SIGMOD 2001.
- Renée J. Miller, Laura M. Haas, Mauricio A. Hernández: Schema Mapping as Query Discovery. VLDB 2000.
- Popa, L., Velegrakis, Y., Hernández, M. A., Miller, R. J., and Fagin, R. 2002. Translating web data. VLDB 2002.
- Franklin, Halevy and Maier. From Databases to Dataspaces: A New Abstraction for information Management. SIGMOD Record 34(4) 2005.
- Chiticariu, Kolaitis and Popa. Interactive Generation of Integrated Schemas. SIGMOD 2008.
- Dill et al. SemTag and Seeker: Bootstrapping the Semantic Web via Automated Semantic Annotation.WWW 2003
- Noy. Order from Chaos: How can ontologies and the Semantic Web help us structure the world's semi-structured information? ACM Queue 3(8) 2005.
Visualization and Social Data Analysis
- Robertson, Czerwinski and Churchill. Visualization of Mappings Between Schemas. CHI 2005
- North and Shneiderman: Snap-Together Visualization: A User Interface for Coordinating Visualizations via Relational Schemata. AVI 2000.
- Balakrishnan, Fussell, and Kiesler. Do Visualizations Improve Synchronous Remote Collaboration?
- Mackinlay: Automating the Design of Graphical Presentations of Relational Information
- Mackinlay, Hanrahan and Stolte: Show Me: Automatic Presentation for Visual Analysis
- Heer J, Viégas F, Wattenberg M. Voyagers and Voyeurs: Supporting Asynchronous Collaborative Information Visualization, ACM Conference on Human Factors in Computing Systems (CHI’07) 2007.
- Heer J, Agrawala M. Design Considerations for Collaborative Visual Analytics. Information Visualization Journal, 7(1):49-62, 2008.
- Huynh D, Miller R, Karger D. Exhibit: Lightweight Structured Data Publishing. WWW, 2007.
- Viégas FB, Wattenberg M, van Ham F, Kriss J, McKeon M. Many Eyes: A Site for Visualization at Internet Scale. IEEE Transactions on Visualization and Computer Graphics (InfoVis’07) 2007; 13(6): 1121-1128.
- Chan B, Wu L, Talbot J, Cammarano M, Hanrahan P. Vispedia: Interactive Visual Exploration of Wikipedia Data via Search-Based Integration. IEEE Information Visualization, 2008.
- Yu and Jagadish: Schema Summarization. VLDB 2006.
Web Data and Social Networks
- Cafarella et al. WebTables: Exploring the Power of Tables on the Web. VLDB 2008.
- Cafarella et al. Uncovering the Relational Web. WebDB 2008
- Weld et al. Intelligence in Wikipedia. AAAI 2008.
- Wu, Hoffmann and Weld. Information Extraction from Wikipedia: Moving Down the Long Tail. KDD 2008.
- Kossinets, Kleinberg and Watts. The Structure of Information Pathways in a Social Communication Network. KDD 2008.
- Liben-Nowell and Kleinberg. Tracing information flow on a global scale using Internet chain-letter data. PNAS, 1/2008
- von Ahn and Dabbish. General Techniques for Designing Games with a Purpose. CACM 8/2008.
- von Ahn, et al. reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science 9/2008.
- Bryan Chan, Leslie Wu, Justin Talbot, Mike Cammarano, Pat Hanrahan. "Vispedia: Interactive Visual Exploration of Wikipedia Data via Search-Based Integration". InfoVis 2008
- Other proposed reading
- Benkler Y. Coase's Penguin, or, Linux and the Nature of the Firm. Yale Law Journal 2002; 112(369).
- Cheshire C. Selective Incentives and Generalized Information Exchange. Social Psychology Quarterly 2007; 70(1).
- Golder SA, Huberman BA. The Structure of Collaborative Tagging Systems. Journal of Information Science April 2006; 32(2).
- Ling K, Beenen G, Ludford P, Wang X, Chang K, Cosley D, Frankowski D, Terveen L, Rashid AM, Resnick P, Kraut R. Using social psychology to motivate contributions to online communities. Journal of Computer-Mediated Communication 2005, 10(4).
Viz and Collaboration in the Sciences - Koop, Scheidegger, Callahan, Freire, and Silva. VisComplete: Automating Suggestions for Visualization Pipelines. IEEE Vis, 2008
- Klasky, Barreto, Kahn, Parashar, Podhorszki, Parker, Silver, and Vouk. Collaborative visualization spaces for petascale simulations. CTS 2008.
Fun |
Attachments (11)
-
(2008-10-28) Discussion Notes - Automatic Generation of Visual Presentations of Data.rtf - on Dec 11, 2008 10:52 AM by Joe Hellerstein (version 1)
6k
Download
-
AJAX.pdf - on Sep 8, 2008 1:39 PM by Joe Hellerstein (version 1)
69k
View Download
-
Autonomously Semantifying Wikipedia.pdf - on Dec 11, 2008 1:49 PM by Joe Hellerstein (version 1)
3727k
View Download
-
Bellman.pdf - on Sep 10, 2008 9:19 PM by Joe Hellerstein (version 1)
62k
View Download
-
Dirty Data.pdf - on Sep 3, 2008 8:56 PM by Joe Hellerstein (version 1)
41k
View Download
-
Information Extraction from Wikipedia.pdf - on Dec 11, 2008 1:49 PM by Joe Hellerstein (version 1)
3431k
View Download
-
Quantitative Cleaning.pdf - on Sep 10, 2008 10:39 PM by Joe Hellerstein (version 1)
70k
View Download
-
Sarawagi.pdf - on Sep 16, 2008 12:02 AM by Joe Hellerstein (version 1)
55k
View Download
-
TANE.pdf - on Sep 10, 2008 9:19 PM by Joe Hellerstein (version 1)
1037k
View Download
-
cs294_39_Discussion_Session.doc - on Dec 11, 2008 12:53 PM by Joe Hellerstein (version 1)
74k
Download
-
pwheel.pdf - on Sep 3, 2008 8:56 PM by Joe Hellerstein (version 1)
127k
View Download
|