Manuscripts:
Hyperedge Estimation using Polylogarithmic Subset Queries (with Arijit Bishnu, Arijit Ghosh, Gopinath Mishra)
Summary: We generalize the algorithmic frameworks developed by Beame et al. (ITCS 2018) for edge estimation and Bhattacharya et al. (ISAAC 2019) for triangle estimation to estimate the number of hyperedges in a hypergraph using only polylogarithmic queries to a much general query oracle.
Hardness of Approximation of Euclidean k-Median in Approx 2021 (with Dishant Goyal, Ragesh Jaiswal)
Summary: We give the first hardness of approximation result for the Euclidean k-median problem. More precisely, there exists a constant c>0 such that, assuming unique games conjecture, there does not exist an algorithm giving (1+c)-approximation for the Euclidean k-median problem, unless P=NP.
Even the Easiest (?) Graph Coloring Problem is not Easy in Streaming! to appear in ITCS 2021 (with Arijit Bishnu, Gopinath Mishra, Anannya Upasana)
Summary: We consider a variant of a graph coloring problem in the streaming setting where the coloring function is also streamed along with the edges of the graph. We study upper and lower bounds on the space complexity of one-pass streaming algorithms estimating the number of properly colored edges in the graph.
Streaming PTAS for Binary l_0-Low Rank Approximation to appear in FSTTCS 2020 (with Dishant Goyal, Ragesh Jaiswal, Amit Kumar)
Summary:
Disjointness Through the Lens of VC-dimension: Sparsity and Beyond in RANDOM 2020 (with Sourav Chakraborty, Arijit Ghosh, Gopinath Mishra, and Manaswi Paraashar)
Summary: We study communication complexity of the set disjointness problem on set systems of bounded VC dimension and provide upper and lower bounds on the deterministic and randomized communication complexities of the set dishointness problem in terms of the VC dimension of the underlying set system.
Noisy, Greedy and Not So Greedy k-means++ in ESA 2020 (with Jan Eube, Heiko Röglin, Melanie Schmidt)
Summary: Greedy k-means++ has been proposed by Arthur and Vassilviyskii, where in each of the k rounds, a number of points are sampled using D^2 -sampling, and the center for an iteration is chosen as the point from the sample that reduces the k-means objective function the most. No theoretical results are known for the Greedy k-means++ problem. In a joint work with Jan Eube, Heiko Roglin and Melanie Schimdt, we give a number of lower bounds for the Greedy k-means++ problem showing that both in the worst case and in expectation, Greedy k-means++ is at least as bad as k-means++. We also propose a variant of k-means++, called noisy k-means++, where an adversary can alter the sampling probability of a point by a (1 ± ε)-factor. We analyse the theoretical guarantees for noisy k-means++.
Triangle Estimation using Polylogarithmic Queries in ISAAC 2019 (with Arijit Bishnu, Arijit Ghosh and Gopinath Mishra)
Summary: Estimating the number of triangles in a graph has been a very well studied problem in theoretical computer science. In the query model of computation, tight bounds are known for the triangle estimation problem where the allowed queries are degree, edge existence and neighbor queries. Recently, Beame et al. (ITCS 2018) gave an algorithmic framework for counting the number of edges in a graph using bipartite independent set queries. In this work we extend their framework to count the number of triangles in a graph using polylogarithmic tripartite independent set queries.
Approximate Clustering with Same-Cluster Queries in ITCS 2018 (with Nir Ailon, Ragesh Jaiswal and Amit Kumar)
Summary: In recent works, semi-supervised learning frameworks have been explored that incorporate user feedback to design polynomial time algorithms for NP-hard optimization problems. Ashtiani et al. designed one such semi-supervised framework for active learning (SSAC) of clustering that has access to a same-cluster query oracle. The oracle gives answers to queries asking whether two objects belong to the same optimal cluster or not. They gave a polynomial time algorithm for the k-means problem in this framework under some assumptions on the dataset. We give a D^2 -sampling based (1 + ε)-approximation algorithm for the k-means clustering in SSAC framework without any assumption on the dataset, and study upper and lower bounds on the query complexity for (1 + ε)-approximation of k-means problem in this framework. Our lower bound result is conditioned on the Exponential time hypothesis (ETH). Our algorithms also generalize to the case when the query oracle is faulty, that is, it gives wrong answers to queries with some probability.
Approximate Correlation Clustering Using Same-Cluster Queries in LATIN 2018 (with Nir Ailon and Ragesh Jaiswal)
Summary: Correlation clustering is a graph clustering problem where pairwise similarity (or dissimilarity) information is given for every pair of vertices and the objective is to partition the vertices in to clusters that minimise the disagreement (or maximises agreement) with the pairwise information given as input. We study correlation clustering problem in semi-supervised learning framework of Ashtiani et al. We design polynomial time approximation schemes (PTAS) for the maximization and minimization versions of these problems, denoted as MaxAgree[k] and MinDisAgree[k] respectively, where k denotes the number of optimal clusters. We also give non-trivial upper and lower bounds on the number of same-cluster queries, the lower bound being based on the Exponential Time Hypothesis (ETH).
Summary: In order to better understand the k-means cost function, we study the relationship between the number of centers k and the optimal k-means cost for any dataset X ⊆ R^d . It is immediate that the optimal cost for the k-means objective function decreases as the number k increases. For any ε > 0, we define l(k,ε,X) to be the smallest number such that the optimal cost with respect to l(k,ε,X) centers is at most ε times the optimal k-means cost on dataset X. We compute upper and lower bounds on l(k,ε,X). Our upper bound is currently the best known bound for this problem, and we provide the first lower bound for this problem. This result also gives a better understanding of the pseudo-approximation behaviour of the D^2 -sampling based k-means++ seeding algorithm.
Faster Algorithms for the Constrained k-means Problem in STACS 2016 (with Ragesh Jaiswal and Amit Kumar). Full version appeared in Theory of Computing Systems (STACS special issue). Slides (PDF)
Summary: Ding and Xu introduced a unifying framework for studying the constrained versions of the k-means problem. In constrained k-means, in addition to optimizing the k-means cost function, any feasible clustering needs to satisfy some additional constraints. One example of such a problem is the r-gather clustering problem where the additional constraint is that the size of every cluster has to be at least r. The authors gave a (1 + ε)-approximation algorithm for the constrained k-means problem. We design a much faster D^2 -sampling based (1 + ε)-approximation algorithm for this problem. We also give some justification as to why the currently known (1 + ε)-approximation algorithms have running time that is exponential in k and 1/ε .
Sampling in Space Restricted Settings in COCOON 2015 (with Davis Issac, Ragesh Jaiswal and Amit Kumar). Full version appeared in Algorithmica (COCOON special issue). Slides(PPTX).
Summary: We design uniform and non-uniform sampling algorithms in the streaming setting. Our algorithms optimize the number of random bits and space usage. The number of random bits used by our sampling algorithm in the streaming setting matches the lower bound on the number of random bits used by any uniform sampling algorithm in offline setting, within constant factors.
A Tight Lower Bound Instance for k-means++ in Constant Dimension in TAMC 2014 (with Ragesh Jaiswal and Nir Ailon). Full version appeared in Theoretical Computer Science
Summary: Arthur and Vassilvitskii showed that k-means++ seeding yields O(log k)-approximation in expectation for k-means clustering. They also gave instances on which k-means++ seeding gives Ω(log k)-approximation. Brunsch and Röglin [16] showed that k-means++ seeding yields Ω(log k)-approximation with probability close to 1 and conjectured that k-means++ seeding yields O(log d)-approximation in expectation over d dimensional instances. The conjecture, if true, would guarantee that k-means++ seeding would give very good solutions for the k-means problem, particularly for low dimensional instances. We disproved this conjecture by giving explicit construction of families of two dimensional instances on which k-means++ seeding yields Ω(log k)-approximation in expectation.
Pre-PhD work:
SIMD-based Implementations of Eta Pairing Over Finite Fields of Small Characteristics in SECRYPT 2012 (with Abhijit Das, Dipanwita Roy Chowdhury, Bhargav Bellur and Aravind Iyer)
GPU-Based Implementation of 128-Bit Secure Eta Pairing over a Binary Field in AFRICACRYPT 2013 (with Utsab Bose and Abhijit Das)
Use of SIMD features to speed up eta pairing in E-Business and Telecommunications, Communications in Computer and Information Science, DOI: 10.1007/978-3-662-44791-8_9, Volume 455, pp 137-154, Springer-Verlag, 2014 (with Sabyasachi Karati, Abhijit Das, Dipanwita Roychowdhury, Bhargav Bellur and Aravind Iyer)