Date and Venue: 30 Aug-1 Sep 2024, at Seminar Room 12, Victor Menezes Convention Center (VMCC), IIT Bombay
Titles
Speaker information and abstracts
Speaker: Dinesh Patra, Electrical Engineering, Indian Institute of Technology, Kharagpur
Abstract: We consider a target attack defense scenario where the attacker aims to reach the target while avoiding the defender, and the defender wants to protect the target by intercepting the attacker. While most of the prior works assume the target to be known to the defender, in our setting the defender has access to only the current states of the attacker at any instant, and is not aware of the target coordinates. A novel approach is proposed to leverage past data of the attacker states to construct a future trajectory of the attacker. The defender deploys a model predictive control scheme to minimize the discrepancy between its own future trajectory and the predicted future trajectory of the attacker. Simulation results show that use of estimated future trajectories helps in more effective protection of the target compared to when only current state of the attacker is used. The effectiveness of the proposed approach is also highlighted in the presence of obstacles.
Speaker: Eshwar S R, Computer Science and Automation, Indian Institute of Science, Bengaluru
Abstract: Reinforcement learning has traditionally been studied with exponential discounting or vanilla averaging of sequential rewards, mainly due to their mathematical tractability. However, these ideas fall short of accurately capturing human behavior, which often has a bias towards instant gratification. Quasi-Hyperbolic (QH) discounting is a simple alternative for modeling this bias. Unlike in traditional discounting, though, the optimal QH-policy, starting from some time $t_1,$ can be different to the one starting from $t_2.$ Hence, the future self of an agent, if it is naive or impatient, can deviate from the policy that is optimal at the start, leading to sub-optimal overall returns. To prevent this behavior, an alternative is to work with a policy anchored in a Markov Perfect Equilibrium (MPE). In this work, we propose the first model-free algorithm for finding an MPE. Using a two-timescale analysis, we formally prove that our algorithm converges to invariant sets of a suitable Differential Inclusion (DI). Concurrently, we also show that any MPE would be an invariant set of our identified DI. Moreover, we show that, if our algorithm converges to a point, it must be an MPE. Finally, we discuss the utility of our algorithm for the standard inventory control problem with stochastic demands.
Speaker: Lakshmi Mandal, Computer Science and Automation, Indian Institute of Science, Bengaluru
Abstract: Reinforcement Learning (RL) algorithms combined with deep learning architectures have achieved tremendous success in many practical applications. However, the policies obtained by many Deep Reinforcement Learning (DRL) algorithms are seen to suffer from high variance making them less useful in safety-critical applications. In general, it is desirable to have algorithms that give a low iterate-variance while providing a high long-term reward. In this work, we consider the Actor-Critic paradigm, where the critic is responsible for evaluating the policy while the feedback from the critic is used by the actor for updating the policy. The updates of both the critic and the actor in the standard Actor-Critic procedure are run concurrently until convergence. It has been previously observed that updating the actor once after every L > 1 steps of the critic reduces the iterate variance. In this paper, we address the question of what optimal L-value to use in the recursions and propose a data-driven L-update rule that runs concurrently with the actor-critic algorithm with the objective being to minimize the variance of the infinite horizon discounted reward. This update is based on a random search (discrete) parameter optimization procedure that incorporates smoothed functional (SF) estimates. We prove the convergence of our proposed multi-timescale scheme to the optimal L and optimal policy pair. Subsequently, through numerical evaluations on benchmark RL tasks, we demonstrate the advantages of our proposed algorithm over multiple state-of-the-art algorithms in the literature.
Speaker: Nibedita Roy, Computer Science and Automation, Indian Institute of Science, Bengaluru
Abstract: Robustness to adversarial attacks in online distributed learning within a single parameter server and multiple worker node framework is a critical research area. Traditional methods for this setup achieve resilience through a two-step process in each iteration. First, all worker nodes synchronize to compute and communicate identical quantities, such as gradients at a shared point, to the server. Then, the server uses a robust aggregate of these quantities---obtained via techniques like the median or trimmed mean---to update its solution estimate. However, this approach falls short in applications like network tomography, where the measurements across different nodes are sporadic and heterogeneous. A novel two-timescale algorithm was recently proposed to deal with such scenarios. In this study, we establish that its convergence rate is \(O(1/\sqrt{n})\), which is optimal for non-strongly convex optimization. Separately, due to the sporadic nature of data, it is inevitable that this rate expression degrades when more honest agents are incorporated into the system. For a fixed number of adversaries, our work reveals that this degradation is of order $O(\sqrt{N}),$ where $N$ is the number of worker nodes. Lastly, we demonstrate the applicability of this algorithm and our theoretical results to network tomography.
Speaker: Soumyajit Guin, Computer Science and Automation, Indian Institute of Science, Bengaluru
Abstract: We consider, in this work, the risk-sensitive cost criterion with exponentiated costs for Markov decision processes and develop a model-free policy gradient algorithm in this setting. Unlike additive cost criteria such as average or discounted cost, the risk-sensitive cost criterion is less studied due to the complexity resulting from the multiplicative structure of the resulting Bellman equation. We develop an actor-critic algorithm with function approximation in this setting and provide its asymptotic convergence analysis. We also show the results of numerical experiments that demonstrate the superiority in performance of our algorithm over other recent algorithms in the literature.
Speaker: Ananta Kant Rai, Electrical Communication Engineering, Indian Institute of Science, Bengaluru
Abstract: Data driven methods are a useful tool to study and analyse uncertain or unknown dynamical systems. These methods provide ways for system identification, controller design and other control objectives. In scenarios where identifying an exact model is difficult, data-driven methods leverage system data for direct computation of system parameters and properties without exact system identification. We study data-driven methods for two control problems – Stability Radius (SR) and Minimum Gain Pole Placement (MGPP). In the SR problem, we obtain Markov parameters from finite input-output data and utilise these to compute approximate SR. We propose to solve the MGPP problem without system identification using state and input experimental data of the system.
Speaker: Kushal Chakrabarti, TCS Research
Abstract: Adaptive gradient-descent optimizers are the standard choice for training neural network models. Despite their faster convergence than gradient-descent and remarkable performance in practice, the adaptive optimizers are not as well understood as vanilla gradient-descent. Modeling the adaptive family in a continuous-time closed-loop framework allows us to present intuitive convergence proofs of prominent adaptive optimizers. In discrete-time, our unified approach proves linear convergence of AdaGrad and Adam, for the first time, under PL inequality.
Speaker: Anand Singh Chauhan, TCS Research
Abstract: Power grids globally play a crucial societal and economic role by providing uninterrupted, reliable, and transient-free power to industries, businesses, and households. The advent of renewable power resources and electric vehicles has introduced uncertain generation and highly dynamic load demands, making robust operation of power networks essential to manage transient stability issues and localize blackout events. In light of the increasing stress on modern grid infrastructure and operators, this talk will cover HybridAgent, a reinforcement learning (RL) framework designed to mitigate the effects of unexpected network events and reliably maintain electricity across the network. HybridAgent leverages a heuristic-guided RL framework for optimal topology selection, ensuring safe and reliable grid operation without overloads. It has been extensively tested in the Learning to Run a Power Network (L2RPN) 2023 Challenge hosted by Réseau de Transport d'Électricité (RTE) and Delft University of Technology (TU Delft). With its state-of-the-art AI-based framework, HybridAgent outperformed other existing approaches on multiple datasets featuring forced contingencies in the IEEE-118 network. Despite its reduced action space, HybridAgent topped the leaderboard in the L2RPN NeurIPS 2020 challenge (Robustness track) and was the top-performing agent in the L2RPN WCCI 2020 challenge. Additionally, HybridAgent achieved third position in the L2RPN 2023 challenge hosted by RTE and TU Delft. Detailed analysis demonstrates HybridAgent's state-of-the-art performance in several test scenarios.
Speaker: Aashi Shrinate, Electrical Engineering, Indian Institute of Technology, Kanpur
Abstract: In a society, often there are individuals with expertise and credibility who influence the decisions of others. These influential agents can even alter the public discourse in critical matters relating to electoral politics, policy making, and consumer behaviour among others. Therefore, it is important to understand the underlying factors affecting their influence. In this talk, we will present an opinion dynamics model that takes into consideration the features of a realistic social network by including agents' prejudices, antagonistic interactions and a sparse network with closely tied sub-networks denoting communities. We will also discuss the evolution of opinions in the network and identify the influential agents arising out of the opinion dynamics and the network connectivity. In the given framework, we will show that the network interconnections have a critical effect on the influence of an agent. Finally, we will discuss some numerical simulations to illustrate these results.
Speaker: Tejaram S, Electrical Engineering, Indian Institute of Technology, Madras
Abstract: Motivated by risk-sensitive reinforcement learning scenarios, we consider the problem of policy evaluation for variance in a discounted reward Markov decision process (MDP). For this problem, a temporal difference (TD) type learning algorithm with linear function approximation (LFA) exists in the literature, though only asymptotic guarantees are available for this algorithm. We derive finite sample bounds that hold (i) in the mean-squared sense; and (ii) with high probability, when tail iterate averaging is employed with/without regularization. Our bounds exhibit exponential decay for the initial error, while the overall bound is O(1/t), where t is the number of update iterations of the TD algorithm. Further, the bound for the regularized TD variant is for a universal step size. Our bounds open avenues for analysis of actor-critic algorithms for mean-variance optimization in a discounted MDP.
Speaker: Sumedh Sunil Gupte, Computer Science and Engineering, Indian Institute of Technology, Madras
Abstract: We consider the problems of estimation and optimization of utility-based shortfall risk (UBSR), which is a popular risk measure in finance. In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of the classical sample average approximation (SAA) of UBSR. Next, in the context of UBSR optimization, we derive an expression for the UBSR gradient under a smooth parameterization. This expression is a ratio of expectations, both of which involve the UBSR. We use SAA for the numerator as well as denominator in the UBSR gradient expression to arrive at a biased gradient estimator. We derive non-asymptotic bounds on the estimation error, which show that our gradient estimator is asymptotically unbiased. We incorporate the aforementioned gradient estimator into a stochastic gradient (SG) algorithm for UBSR optimization. Finally, we derive non-asymptotic bounds that quantify the rate of convergence of our SG algorithm for UBSR optimization.
Speaker: Vishesh Kaushik, Centre for Systems and Control, Indian Institute of Technology, Bombay
Abstract: Nuclei with Intrinsic Spin greater than (1/2), posses a nuclear electric quadrupole moment , which couples with electric field gradient. This perturbs the Nuclear intrinsic Spin Hamiltonian significantly. Controlling the transitions between multiple Quantum Spin states of Quadrupolar Nuclei is much more challenging and exciting than spin (1/2) systems. This talk is about extending the techniques used for controlling spin (1/2) nuclei spins to Quadrupolar nuclei.
Speaker: Viyom Vivek, Centre for Systems and Control, Indian Institute of Technology, Bombay
Abstract: I will talk about a retraction map based general method for developing numerical integrators on manifolds. This method will then be further lifted to the tangent/cotangent bundles, where mechanical systems evolve, leading to structure preserving integrators especially symplectic integrators. Finally, these integrators will be modified to satisfy the nonholonomic constraints.
Speaker: Prabhat Lankireddy, C-MInDS, Indian Institute of Technology, Bombay
Abstract: The work analyses the interaction of online algorithms, particularly those used in recommendation systems, with their environments. Typically, these algorithms assume that user preferences remain static. However, in practice, the recommendations themselves can influence and change user preferences over time. We model the interaction between evolving user preferences and the parameters being learned by the algorithm using a dynamical system. Specifically, we consider a contextual bandit algorithm where users' 'state of mind' evolves based on the recommendations they receive. Our analysis reveals two key insights: First, as the algorithm shifts from exploration to exploitation, the environment tends to settle into a few equilibrium points determined by product attributes. Second, accurate tracking of user states is achievable only if the static attributes used by the algorithm meet certain criteria. This work highlights the importance of considering the dynamic nature of user preferences in online learning systems.
Speaker: Mizhaan Maniyar, C-MInDS, Indian Institute of Technology, Bombay
Abstract: We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. In this paper, we propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function using sample trajectories. The first algorithm requires an exact solution of the cubic regularized problem in each iteration, while the second algorithm employs an efficient gradient descent-based approximation to the cubic regularized problem. We establish convergence of our proposed algorithms to a second-order stationary point (SOSP) of the value function, which results in the avoidance of traps in the form of saddle points. In particular, the sample complexity of our algorithms to find an ϵ-SOSP is O(ϵ−3.5), which is an improvement over the state-of-the-art sample complexity of O(ϵ−4.5).
Speaker: Vedang Gupta, Electrical Engineering, Indian Institute of Technology, Bombay
Abstract: We consider the problem of best arm identification in stochastic multi-armed bandits, in the setting that each arm is sampled once in each round. This uniform sampling regime is a conceptually simple setting that is relevant to many practical applications. The aim is to stop and correctly identify the best arm with probability at least 1 - $\delta$, while keeping the number of rounds low. We derive a lower bound on the sample complexity for this setting. Thereafter, we propose two natural stopping rules for Bernoulli bandits: one based on PPR martingale confidence sequences, and the other based on the GLR statistic. Both rules are shown to match the lower bound as $\delta \to 0$. Our analysis and experiments suggest that the relative performance of the two stopping rules depends on a property of the bandit instance.
Speaker: Manish Kumar Singh, Centre for Systems and Control, Indian Institute of Technology, Bombay
Abstract: In the strategic classification problem, the sender misreports the feature vectors to the classifier during the testing phase. The sender has a preference for labels for each feature vector and thus misreports it at a cost to obtain the desired label. The classifier, aware of the sender’s strategic behaviour, must adapt to be robust against such gaming. We begin by examining a scenario where the classifier knows the ground truth, and then we characterize the necessary alterations to ensure robustness against such gaming. In the subsequent scenario, the classifier does not know the ground truth but has access to uncorrupted data. Here, we present two learning paradigms: Vanilla ERM and strategy-aware ERM for learning classifiers. In the latter paradigm, we incorporate the strategic structure of the problem during learning. We then define and characterize the learnability of a hypothesis class under strategic manipulation by the sender. Finally, we discuss the merits and demerits of each paradigm through an example.
Speaker: Siddhartha Ganguly, Centre for Systems and Control, Indian Institute of Technology, Bombay
Abstract: This talk will focus on a recent algorithm for generating one-shot approximate solutions for a class of robust min-max model predictive control problems under constraints. We employ techniques from convex semi-infinite programming theory to solve the min-max problem over a uniform grid, generating (state-action) data, and then use approximation and learning-theoretic tools to provide one-shot approximate policies with uniform guarantees of convergence. Numerically, the algorithm is orders of magnitude faster than conventional robust MPC techniques.
Speaker: Pritish Chakraborty, Computer Science and Engineering, Indian Institute of Technology, Bombay
Abstract: Influence maximization (IM) refers to the problem of finding a subset of nodes in a network through which we could maximize our reach to other nodes in the network. We explore a variant of the IM problem where we wish to reach out to and maximize the probability of infection of a small subset of bounded capacity 𝐾. We show that this problem does not exhibit the same submodular guarantees as the original IM problem, for which we resort to the theory of 𝛾-weakly submodular functions. Subsequently, we develop a greedy algorithm that maximizes our objective despite the lack of submodularity. We also develop a suitable learning model that out-competes baselines on the task of predicting the top-K infected nodes, given a seed set as input.
Speaker: Indradyumna Roy, Computer Science and Engineering, Indian Institute of Technology, Bombay
Abstract: In various search applications related to passage retrieval, text entailment, and subgraph search, a document is considered relevant if it "contains" the query. Recent applications model containment by encoding queries and documents as fixed-size vectors and checking for element-wise vector dominance. Existing LSH methods are designed for mostly symmetric or few simple asymmetric distance functions, which are not suitable for asymmetric vector dominance. In response, we propose FourierHashNet, where we first transform the dominance distance into a bounded dominance similarity measure, which is then Fourier-transformed into an expectation of inner products of functions in the frequency domain. Finally, the expectations are approximated with an importance-sampled estimate, which allows for the use of traditional LSH, but in the frequency domain. Our experimental results, published in NeurIPS'23, show that FourierHashNet provides a better query time vs. retrieval quality trade-off, compared to several baselines.
Speaker: Ninad Gandhi, C-MInDS, Indian Institute of Technology, Bombay
Abstract: The talk discusses the importance, challenges and complexities of the trajectory prediction problem in autonomous vehicles, and outlines the current approaches taken to tackle the problem.
Speaker: Soumyadeep Dey, Microsoft
Abstract: The smartphone camera has simplified the capture of various physical documents in digital form. The ease of share of digital documents (e.g., via messaging/networking apps) has made them a popular source of information dissemination. However, readability of such digitized documents is hampered when the (original) physical document is degraded. For instance, the physical document may contain extraneous elements like stains, wrinkles, ink spills, or can undergo degradation over time. As a result, while scanning such documents (e.g., via a flat-bed scanner), these elements also get incorporated into the document image. In case of capturing document images via mobile cameras, the images are prone to being impacted by shadow, non-uniform lighting, light from multiple sources, light source occlusion, etc. Such noisy elements not only affect the comprehensibility of the corresponding digitized document to the human readers, they may also break down the automatic (document-image) processing/understanding pipeline in various applications (e.g., OCR, bar code reading, form detection, table detection, etc). This talk going to cover various challenges associated with document scanning and its applications in mobile devices.
Speaker: Reema Deori, Centre for Systems and Control, IIT Bombay
Abstract: In any persuasion setting, it is natural to ask, what makes the receiver trust the sender’s suggestion? We address this question in our talk. We study a Bayesian persuasion interaction between a sender and a receiver where the sender has access to private information about a source and the receiver attempts to recover this information from messages sent by the sender. The sender persuades the receiver by committing to a randomized signalling policy crafted to maximize its utility. The receiver on the other hand attempts to know the true information of the sender. Our contribution is a characterization of the sender’s expected utility in a Stackelberg equilibrium as a linear program. We also get a linear programming characterization of the minimum amount of truth revealed to the receiver in any Stackelberg equilibrium. In the process we uncover a key element of such problems: persuasion must be subject to “trust constraints” for it to work. The influencer’s optimal policy must reveal some truth.
Speaker: Ashwin Aravind, Centre for Systems and Control, IIT Bombay
Abstract: We present a first-order distributed algorithm for solving a convex semi-infinite program (SIP) over a time-varying network. In this setting, the objective function associated with the optimization problem is a summation of a set of functions, each held by one node in a network. The semi-infinite constraint, on the other hand, is known to all agents. The nodes collectively aim to solve the problem using local data about the objective and limited communication capabilities depending on the network topology. Our algorithm is built on three key ingredients: consensus step, gradient descent in the local objective, and local gradient descent iterations in the constraint at a node when the estimate violates the semi-infinite constraint. The algorithm is constructed, and its parameters are prescribed in such a way that the iterates held by each agent provably converge to an optimizer. That is, as the algorithm progresses, the estimates achieve consensus, and the constraint violation and the error in the optimal value are bounded above by vanishing terms. A simulation example will illustrate our results.