I have joined Google, as of May 2019. I work in both research and applied aspects of Deep Learning, currently more focused on Language and Vision Models.

My current work is focused on Multi-Modal Vision+Language Models and Representation Learning. This line of research can be potentially applied to improve Search Ranking by bringing Query/Document/Image Understanding together. I am also interested in recent research advances in Graph Representation Learning, where we utilize the graph structure on the data as extra information in training deep models.

I have also served as area chair for Women in Machine Learning (WiML) workshop at NeurIPS19.

Before Google, I was an Applied Research Scientist at Data Science and Machine Learning Team at Expedia, where I worked on designing scalable optimization algorithms and machine learning models for online auctions and search ranking.

Short Bio:

I did my Ph.D. in School of Computing at the University of Utah and a was a graduate member of The Data Group. I worked under supervision of Suresh Venkatasubramanian. In addition, I was fortunate to collaborate with Jeff M. Phillips, Justin Thaler, and Aditya Bhaskara.

I was a visiting research scholar at Simons Institute for the Theory of Computing at University of Berkeley for the research program on Theoretical Foundations of Big Data Analysis during 2013-2014. I was also a research intern at Search and Content Science Team at Yahoo Research with Kostas Tsioutsiouliklis and Stergios Stergiou during summer 2016.

I did my Bachelor on Computer Science at Sharif University of Technology in Iran, where I had Dr. Ebadollah S. Mahmoodian as my advisor. As my B.S. thesis, I worked mainly on Algorithms on Massive Graphs and Complex Networks.

Research Sketch and Projects (Ph.D.)

I successfully defended my PhD dissertation titled "Sublinear Algorithms for Massive Data Computations: Streaming and Coresets" on May 2017.

My PhD research was focused on designing algorithms for problems motivated from massive data analysis: focusing on resources like time, space, and communication, as well as accuracy. I also studied algorithmic design techniques in modern data models and explored how randomization and approximation can be utilized as a powerful resource for developing beautiful, simple, and efficient algorithms with provable theoretical and performance guarantees.

In particular, I focused on research problems in the general area of sublinear algorithms such as streaming, core sets and sublinear verification with a special interest in problems arising from data analysis including data summarization, clustering, matrix problems and massive graphs.

Sublinear Algorithms for Massive Data Computations

In recent years, there has been increased interest in the design and analysis of algorithms for massive data sets that occur more frequently in various applications. Managing and analyzing such data sets requires reconsidering the traditional notions of efficient algorithms: the classic algorithmic models do not provide accurate means for modern large-scale applications and even linear cost algorithms can be too slow and expensive. Hence, there is the desire to develop algorithms whose complexities are not only polynomial, but in fact are sublinear in input size n, which are called sublinear algorithms.

Constructing coresets is a technique for designing sublinear algorithms. This technique summarizes a large data set with a proxy set of potentially much smaller size that can guarantee error for certain classes of queries. Here the assumption is often that we have offline access to the data; however, due to the increasing size of the data this assumption may not be valid. This motivates study on the streaming algorithms, where data arrives in a streaming fashion and the goal is to extract only a small amount of information about the input (a “sketch”) for computing the approximate answer to the query. However, it is shown that many problems (especially on graph data) are intractable in the streaming setting and require a prohibitive amount of space or number of passes over the data. This motivates considering a relaxation of the standard streaming model, called streaming interactive proofs, in which a powerful third party (prover) assists with the computations to reduce the required memory while still verifying correctness by a sublinear amount of communication.

This research project consists of 3 main research components as follow:

[1] Coresets and Summarization for High-Dimensional Probabilistic Data

One of the most popular approaches to processing massive data is to first extract a compact representation (or synopsis) of the data and then perform further processing only on the representation itself, obtaining a good approximation to the original input. This approach significantly reduces the cost of processing, communicating and storing the data. Examples of this approach include techniques such as sampling, sketching and coresets. Coresets present a new approach to optimization in general and have huge success especially in tasks which use prohibitively large computation time or space. Furthermore, coresets offer a way to obtain algorithms that work in models such as distributed computing and the streaming model, where the space used by the algorithm must be significantly smaller than the input size: simply compute and store a coreset (typically, a coreset is much smaller than the input size), and then run a more expensive algorithm on the coreset rather than on the original input.

[2] Streaming Algorithms and Sketching for Massive Graphs

Large scale graphs are now a widely used tool for representing real world data. Many modern applications, such as search engines or social networks, require supporting various queries on large scale graphs efficiently. It is possible to store a massive graph on a large storage device, but the random access in these devices are often quite slow and the computation costs will be expensive. To overcome the challenges which arise by computation on massive graphs, an important approach is maintaining a succinct representation that preserves certain properties of the graph (i.e., coreset). Another popular approach is to process such large graphs in the data stream model using a limited amount of space. The core task here is to construct a synopsis data structure which is easy to construct in streaming fashion while also yielding good approximation of the properties of the graph data. Many of these synopses are constructed using sketching (computing a linear projection of the data) and sampling techniques.

[3] Streaming Interactive Verification for Massive Data and Graphs

One of the main challenges in streaming computation on massive data is designing algorithms for potentially difficult problems in which computation or space requirements are prohibitive under the streaming model. In this case, we can consider outsourcing the storage and processing of the data stream to a more powerful third party, the cloud, but the data owner would like to be assured that the desired computation has been performed correctly. In this setting, the resource-limited verifier (data owner) sees a data stream and tries to solve the problem with the help of a more powerful prover (cloud) who sees the the same stream. Here the goal is to develop efficient (in terms of communication and space) interactive proof protocols for verifying computations which are streaming in nature and explore how interaction and communication can be helpful in attacking problems in data analysis as well as classic and fundamental problems in geometric and combinatorial algorithms and optimization under streaming inputs.

Publications

Sublinear Algorithms, Streaming:

The 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Manuscript (Journal Version)

The 27th International Symposium on Algorithms and Computation (ISAAC 2016)

The 26th International Symposium on Algorithms and Computation (ISAAC 2015)

29th Annual ACM Symposium on Computational Geometry (SoCG). June 2013

ACM XRDS Magazine. Spring 2014

Network Algorithms:

Technical Reports (Undergraduate Research)

  • A Survey on Search Strategies in Dynamical Complex Networks.

  • BS Thesis in Persian. Spring 2010

  • On the Security of the Reduced Round Salsa20 Stream Cipher from Viewpoints of Differential and Guess-and-Determine Attacks. (with Taraneh Eghlidos)

  • Technical Report, Research Proceedings, Electronics Research Institute, Sharif University of Technology, Tehran, Iran, 2011

  • An Inside to the Design Methods of the eSTREAM Candidates. (With Taraneh Eghlidos)

  • Technical Report, Research Proceedings, Electronics Research Institute, Sharif University of Technology, Tehran, Iran, 2010

News