Projects

GDPR SideCar

Work in progress. To be updated soon.

Data-CASE: Grounding Data Regulations for Compliant Data Processing Systems

The GDPR, complex piece of legislation, was adopted in 2018 by the European Union to prevent the misuse and regulate the use of personal data. In its ninety-nine articles, GDPR grants rights to data subjects (to whom data belongs), specifies responsibilities of data controllers and processors and outlines penalties for non-compliance. To eradicate the complexities and vagueness of the legal formulation and provide developers and researchers with an adequate foundation to develop GDPR compliant infrastructures, the aim of this project is to present the Ten Commandments of GDPR, a reformulation of the relevant policies of the GDPR in a new formal model which is sufficiently rich to accommodate not only the policies of GDPR but yet allows room for flexibility in system design. Besides having the provisions for demonstrating provable compliance with GDPR, our model provides a unique glimpse at treating compliance as having varying degrees rather than a binary state. We conclude with the specification of GDPR Sidecar, a possible way to attain GDPR compliance using our formal model.

Slides Presented at EDBT '24

Croesus: Multi-Stage Processing and Transactions for Video-Analytics in Edge-Cloud Systems

Emerging edge applications require both a fast response latency and complex processing. This is infeasible without expensive hardware that can process complex operations---such as object detection---within a short time. Many approach this problem by addressing the complexity of the models---via model compression, pruning and quantization---or compressing the input. In this paper, we propose a different perspective when addressing the performance challenges. Croesus is a multi-stage approach to edge-cloud systems that provides the ability to find the balance between accuracy and performance.

Croesus consists of two stages: an initial and a final stage. The initial stage performs the computation in real-time using approximate/best-effort computation at the edge. The final stage performs the full computation at the cloud, and uses the results to correct any errors made at the initial stage. In this paper, we demonstrate the implications of such an approach on a video analytics use-case and show how multi-stage processing yields a better balance between accuracy and performance. Moreover, we study the safety of multi-stage transactions via two proposals: multi-stage serializability (MS-SR) and multi-stage invariant confluence with Apologies (MS-IA).

Publications:

Full version on arXiv.

Databases Meet Computational Social Choice

http://dbcomsoc.org/

The Possible Winner (PW) problem, a fundamental algorithmic problem in computational social choice, concerns elections where voters express only partial preferences between candidates. Via a sequence of investigations, a complete classification of the complexity of the PW problem was established for all pure positional scoring rules: the PW problem is in P for the plurality and veto rules, and NP-complete for all other such rules. More recently, the PW problem was studied on classes of restricted partial orders that arise in natural settings, such as partitioned partial orders and truncated partial orders; in particular, it was shown that there are rules for which the PW problem drops from NP-complete to P on such restricted partial orders. Here, we investigate the PW problem on partial chains, i.e., partial orders that are a total order on a subset of their domains. Such orders arise naturally in a variety of settings, including rankings of movies or restaurants. We classify the complexity of the PW problem on partial chains by establishing that, perhaps surprisingly, this restriction does not change the complexity of the problem, namely, the PW problem is NP-complete for all pure positional scoring rules other than the plurality and veto rules. As a byproduct, we obtain a new and more principled proof of the complexity of the PW problem on arbitrary partial orders.

Publications:

The Complexity of Possible Winners on Partial Chains

Algorithmic Techniques for Necessary and Possible Winners


Definability and Learning in Sparse Structures

Learning over sparse background structures, of at most poly-logarithmic degree, has been recently shown to be in poly-logarithmic time for First Order Logic (FOL) definable concepts. The primary aim of this proposal is to investigate if the claim holds for counting extensions of FOL and Modal Logic. I will also explore learnability of FOL definable concepts over the eld of the Real Numbers as a background structure. Extensions of FOL with some basic counting and aggregation are expressible in infinitary counting logics. These logics are a central component of relational database management systems like SQL. This research, thus arches between database management and machine learning.


There are a few general steps in the type of investigation we want to carry out. They are as follows.

1. Choose a background structure B based on the type of data available. Certain data can be best expressed as graphs. In such cases, we can choose B to be finitely labelled graphs. Some data may motivate the use of trees as a background structure.

2. Specify a model, generally parametric, using some formula of a logic, for example, firtst order logic or second-order logic. The definability of a concept or hypothesis depends on the expressive power of the logic being used.

Stated thus, the aforementioned steps motivate two possible lines of inquiry. First, how does the choice of background structure influence learnability? Second, are some logics better than others for learning in this context? My project sought to answer the first question.


Project Proposal

Time and Space Complexity in Chemical Reaction Networks

Chemical reaction networks (CRNs) formally model chemistry in a well-mixed solution. CRNs are widely used to describe information processing occurring in natural cellular regulatory networks, and with upcoming advances in synthetic biology, CRNs are a promising programming language for the design of artificial molecular control circuitry. Due to a formal equivalence between CRNs and a model of distributed computing known as population protocols, results transfer readily between the two models.

Traditionally CRNs have been used as a descriptive language to analyze naturally occurring chemical reactions. Recent work has viewed CRNs as a programming language for engineering artificial systems. These works have shown CRNs to have eclectic algorithmic abilities. Researchers have investigated the power of CRNs to simulate Boolean circuits, neural networks, and digital signal processing. It has been shown that bounded-space Turing machines can be simulated with an arbitrarily small, non-zero probability of error by a CRN with only a polynomial slowdown. Space and energy-efficient simulation of space-bounded Turing machines can be done by CRNs implementable by logically and thermodynamically reversible DNA strand displacement reactions, and as a consequence, it is PSPACE-hard to predict whether a particular species is producible [22]. Even Turing-universal computation is possible with an arbitrarily small probability of error over all time. The computational power of CRNs also provides insight on why it can be computationally difficult to simulate them and why certain questions are frustratingly difficult to answer (e.g. undecidable). The programming approach to CRNs has also, in turn, resulted in novel insights regarding natural cellular regulatory networks.

GreenFly: A Greener Flight Search Tool

GreenFLY (greenfly.ucdavis.edu) is an airline flight search website that prominently displays greenhouse gas emissions estimates along with the other important flight information, such as price and times, for each possible flight itinerary. We describe its software components and graphic design principles. Then we present a discrete choice experiment in which we asked participants to choose between itineraries presented in the GreenFLY format. Results suggest that consumers are willing to pay a significant amount for lower-emissions flights in the context of online flight search, especially when lower emissions are combined with fewer layovers. 

Publication:

GreenFly

Water-aware Recipes

Turing Machines and The Human Mind

In 1967, Kleene formulated the Church-Turing Thesis (CTT). It is one of the most fundamental theses as far as computation is concerned. More often than not, the CTT is misinterpreted as making a statement concerning the powers of the human mind, human intelligence, or a modern computer. These “variants” are not, as many claim, equivalent to the original thesis. In fact, most of them make claims that are stronger than the actual one and make overt assumptions rendering these variants uninteresting. We reject such claims and argue that neither Turing nor Church intended to characterise their theses as such. In 1980 Robin Gandy forwarded his Thesis M and established that any function computable by discrete dynamic systems is also computable by Turing Machines. Some of the flawed readings are closer to Thesis M than they are to the CTT.

Project Report