Publications

2019

[C20] Slice Finder: Automated Data Slicing for Model Validation

Y. Chung, T. Kraska, N. Polyzotis, S. E. Whang

Accepted to IEEE Int'l Conf. on Data Engineering (ICDE), Macau SAR, China, Apr. 2019. Short paper.

2018

[J6] Data Management Challenges in Production Machine Learning: A Survey

N. Polyzotis, S. Roy, S. E. Whang, M. Zinkevich

ACM SIGMOD Record, June 2018 issue

[C19] TFX Frontend: A Graphical User Interface for a Production-Scale Machine Learning Platform

P. Brandt, J. Cai, T. Gannert, P. Joshi, R. Khot, C. Koo, C. Kuang, S. Leong, C. Mewald, N. Polyzotis, H. Quiroz, S. Roy, P. Yang, J. Wexler, S. E. Whang

SysML Conference, Stanford, California, Feb. 2018.

[C18] Slice Finder: Automated Data Slicing for Model Interpretability

Y. Chung, T. Kraska, N. Polyzotis, S. E. Whang

SysML Conference, Stanford, California, Feb. 2018.

[C17] Data Infrastructure for Machine Learning

E. Breck, N. Polyzotis, S. Roy, S. E. Whang, M. Zinkevich

SysML Conference, Stanford, California, Feb. 2018.

2017 and before

[C16] TFX: A TensorFlow-Based Production-Scale Machine Learning Platform

D. Baylor, E. Breck, H. Cheng, N. Fiedel, C. Foo, Z. Haque, S. Haykal, M. Ispir, V. Jain, L. Koc, C. Koo, L. Lew, C. Mewald, A. Modi, N. Polyzotis, S. Ramesh, S. Roy, S. E. Whang, M. Wicke, J. Wilkiewicz, X. Zhang, M. Zinkevich

In Proc. 2017 ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD), pp. 1387-1395, Halifax, Nova Scotia, Canada, Aug., 2017.

[T1] Data Management Challenges in Production Machine Learning

N. Polyzotis, S. Roy, S. E. Whang, M. Zinkevich

In Proc. 2017 ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD), pp. 1723-1726, Chicago, May 2017. Tutorial.

[C15] Lonlies : Estimating Property Values for Long Tail Entities

M. Farid, I. Ilyas, S. E. Whang, C. Yu

In Proc. 39th Int'l ACM SIGIR Conf. on Research and Development on Information Retrieval, pp. 1125-1128, Pisa, Italy, July 2016. Demonstration description.

[C14] Goods: Organizing Google's Datasets

A. Halevy, F. Korn, N. Noy, C. Olston, N. Polyzotis, S. Roy, S. E. Whang

In Proc. 2016 ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD), pp. 795-806, San Francisco, June 2016.

[C13] Discovering Structure in the Universe of Attribute Names

A. Halevy, N. Noy, S. Sarawagi, S. E. Whang, X. Yu

In Proc. 25th Int'l Conf. on World Wide Web (WWW), pp. 939-949, Montreal, Canada, Apr. 2016.

[W2] Discovering Subsumption Relationships for Web-Based Ontologies

D. Movshovitz-Attias, S. E. Whang, N. Noy, and A. Halevy

In Proc. 18th Int'l Workshop on the Web and Databases (WebDB), pp. 62-69, Melbourne, Australia, May 2015. (Best Paper Award)

[C12] ReNoun: Fact Extraction for Nominal Attributes

M. Yahya, S. E. Whang, R. Gupta, and A. Halevy

In Proc. 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP), pp. 325-335, Doha, Qatar, Oct. 2014.

[C11] Biperpedia: An Ontology for Search Applications

R. Gupta, A. Halevy, X. Wang, S. E. Whang, and F. Wu

In Proc. 40th Int'l Conf. on Very Large Data Bases (PVLDB), pp. 505-516, Hangzhou, China, Sept. 2014.

[J5] Incremental Entity Resolution on Rules and Data

S. E. Whang and H. Garcia-Molina

The VLDB Journal, vol. 23, no. 1, pp. 77-102, Jan. 2014.

[J4] Joint Entity Resolution on Multiple Datasets

S. E. Whang and H. Garcia-Molina

The VLDB Journal, vol. 22, no. 6, pp. 773-795, Nov. 2013.

[C10] Disinformation Techniques for Entity Resolution

S. E. Whang and H. Garcia-Molina

In Proc. 22nd ACM Int'l Conf. on Information and Knowledge Management (CIKM), pp. 715-720, San Francisco, California, Oct. 2013. Short Paper.

[C9] Question Selection for Crowd Entity Resolution

S. E. Whang, P. Lofgren, and H. Garcia-Molina

In Proc. 39th Int'l Conf. on Very Large Data Bases (PVLDB), pp. 349-360, Trento, Italy, Aug. 2013.

[J3] Pay-As-You-Go Entity Resolution

S. E. Whang, D. Marmaros, and H. Garcia-Molina

IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 25, no. 5, pp. 1111-1124, May 2013.

[W1] A Model for Quantifying Information Leakage

S. E. Whang and H. Garcia-Molina

In Proc. 9th VLDB Workshop on Secure Data Management (SDM), pp. 25-44, Istanbul, Turkey, Aug. 2012.

[C8] Joint Entity Resolution

S. E. Whang and H. Garcia-Molina

In Proc. 28th IEEE Int'l Conf. on Data Engineering (ICDE), pp. 294-305, Washington, DC, Apr. 2012. Full Paper.

[C7] Managing Information Leakage

S. E. Whang and H. Garcia-Molina

In Proc. 5th Biennial Conf. on Innovative Data Systems Research (CIDR), pp. 79-84, Pacific Grove, California, Jan. 2011.

[C6] Entity Resolution with Evolving Rules

S. E. Whang and H. Garcia-Molina

In Proc. 36th Int'l Conf. on Very Large Data Bases (PVLDB), pp. 1326-1337, Singapore, Sept. 2010.

[C5] Evaluating Entity Resolution Results

D. Menestrina, S. E. Whang, and H. Garcia-Molina

In Proc. 36th Int'l Conf. on Very Large Data Bases (PVLDB), pp. 208-219, Singapore, Sept. 2010.

[C4] Indexing Boolean Expressions

S. E. Whang, C. Brower, J. Shanmugasundaram, S. Vassilvitskii, E. Vee, R. Yerneni, and H. Garcia-Molina

In Proc. 35th Int'l Conf. on Very Large Data Bases (PVLDB), pp. 37-48, Lyon, France, Aug. 2009.

[C3] Entity Resolution with Iterative Blocking

S. E. Whang, D. Menestrina, G. Koutrika, M. Theobald, and H. Garcia-Molina

In Proc. 2009 ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD), pp. 219-232, Providence, Rhode Island, June 2009.

[C2] QuickStart: an Upfront Client-based Design Advisor for Parallel Data Warehouses

M. Castellanos, I. Jimenez, N. Coddington, H. Zeller, S. Whang, U. Dayal

In Proc. 25th Int'l Conf. on Data Engineering (ICDE), pp. 1543-1546, Shanghai, China, Mar. 2009. Demonstration description.

[J2] Generic Entity Resolution with Negative Rules

S. E. Whang, O. Benjelloun, and H. Garcia-Molina

The VLDB Journal, vol. 18, no. 6, pp. 1261-1277, Feb. 2009.

[J1] Swoosh: A Generic Approach to Entity Resolution

O. Benjelloun, H. Garcia-Molina, D. Menestrina, Q. Su, S. E. Whang, and J. Widom

The VLDB Journal, vol. 18, no. 1, pp. 255-276, Jan. 2009.

[C1] A Practitioner's Approach to Normalizing XQuery Expressions

K. Lee, S. Kim, E. Whang, and J. Lee

In Proc. 11th Int'l Symposium on Database Systems for Advanced Applications (DASFAA), pp. 437-453, Hilton Hotel, Singapore, Apr. 2006.

Others (invited papers, thesis)

Managing Google's data lake: an overview of the Goods system

A. Halevy, F. Korn, N. Noy, C. Olston, N. Polyzotis, S. Roy, S. E. Whang

IEEE Data Engineering Bulletin, vol. 39, no. 3, pp. 5-14, Sept. 2016.

Data Analytics: Integration and Privacy

S. E. Whang

Ph.D. Thesis, June 2012.

Developments in Generic Entity Resolution

S. E. Whang and H. Garcia-Molina

IEEE Data Engineering Bulletin, vol. 34, no. 3, pp. 51-59, Sept. 2011.