My research interests involve databases and data mining. My thesis is titled "Efficient evaluation of contextual and reverse Pareto-optimality queries". In this regard, I design algorithms along with prototypes.
Continuous Object Dissemination
Many real-world applications analyze users' profiles for effective object dissemination. I formalized the problem as finding Pareto-optimal objects (https://en.wikipedia.org/wiki/Pareto_efficiency) with regard to strict partial orders (a directed acyclic graph). Technically, in an append-only table of objects with multiple attributes and users' preferences on individual attributes are described as strict partial orders, the goal is to find out the users who qualify a new object as a Pareto-optimal object. My designed algorithms exploit shared computation across similar preferences of users. We studied the novel challenge of clustering users whose preferences are described as strict partial order and defined similarity functions to measure the similarity between two users/clusters. We simulated on two real datasets to form partial orders corresponding to users' preferences|movie (joining the Netflix dataset with the data from IMDB) and publication (ACM Digital Library). The solutions outperformed the baseline by orders of magnitude. (Java and MySQL)
Newsworthy Fact Discovery
Often we notice data-driven interesting news statements: e.g., "Paul George had 21 points, 11 rebounds and 5 assists to become the first Pacers player with a 20/10/5 (points/rebounds/assists) game against the Bulls since Detlef Schrempf in December 1992" (http://espn.go.com/espn/elias? date=20130205), "The social world's most viral photo ever generated 3.5 million likes, 170,000 comments and 460,000 shares by Wednesday afternoon" (http://www.cnbc.com/id/49728455). Motivated by such attention-seizing data-driven facts, we studied the novel problem of finding prominent situational facts using structured data to assist the journalists. Technically, given an append-only database capturing real-world events, the goal is to discover whether a new tuple appears as a "standout". We assume the skyline query results (https://en.wikipedia.org/wiki/Skyline_operator) as "standout tuples". I proved the problem as NP-hard and proposed algorithms through exploiting query optimization. I further borrowed the idea behind natural language generation in order to interpret each "standout tuple" as a "news statement". The experiments were evaluated over three datasets: NBA boxscore, NBA play by play, and weather forecast of the UK. The solutions outperformed the baseline by orders of magnitude. The prototype FactWatcher won the Excellent Demonstration Award, VLDB 2014. (Java, MySQL, JSON)
Wikipedia Infobox Suggestion
Besides improving the quality of Wikipedia articles, inboxes are used in Google's Knowledge Graph. I built Support Vector Machine (SVM) and Naive Bayes (NBC) classifiers that helps Wikipedia authors by suggesting the most appropriate infobox types. I used three different types of features for classification: words in the articles (the set of words in an article's content), categories of the articles (the set of Wikipedia categories assigned to an article), and named entities in the articles (the set of named entities hyperlinked from an article's content). The 2008-07-24 snapshot of English Wikipedia was used for the experimental evaluation. Achieved 92.03% accuracy. (Java, Weka, XML)
Coverage & Connectivity in Mobile Sensor Network
In my undergrad, I worked under the supervision of Dr. Mahmuda Naznin in my senior year at BUET. We studied the tradeoff between coverage and connectivity in the mobile sensor network. Based on Linear Programming, I proposed a self-adjusting coverage method providing desired connectivity in a heterogeneous mobile sensor network.
Talks
Continuous Monitoring of Pareto Frontiers on Partially Ordered Attributes for Many Users. Conference Presentation. EDBT, Vienna, 2018.
Incremental Discovery of Prominent Situational Facts. Conference Presentation. ICDE, Chicago, 2014.
Contextual Skyline. CSE 4334/5334 Data Mining, The University of Texas at Arlington, Spring 2014, Fall 2014.
Conflicting Goal Constrained Architecture of a Heterogeneous Mobile Sensor Network. Â Conference Presentation. NSyS, Dhaka, Bangladesh, 2016.
External Reviewers
Journal of Data and Information Quality 2019
IEEE International Conference on Big Data 2015, 2017
International Conference on Very Large Databases 2013, 2017
IEEE International Conference on Data Mining 2012, 2013, 2016
ACM Conference on Information and Knowledge Management 2012, 2013, 2016
Asia Pacific Web Conference 2016
International Conference on Web-Age Information Management 2012, 2016
ACM Special Interest Group on Management of Data 2015
ACM Conference on Knowledge Discovery and Data Mining 2015
International Conference on Extending Database Technology 2014
IEEE International Conference on Data Engineering 2013, 2014
International World Wide Web Conference 2013
Pacific Asia Conference on Knowledge Discovery and Data Mining 2013