Abstracts

Here are this year's presentation abstracts ordered by presenter last name

Verifying Generalization in Deep Learning

Guy Amir, Osher Maayan, Tom Zelazny, Guy Katz, Michael Schapira

Deep neural networks (DNNs) are the workhorses of deep learning, which constitutes the state of the art in numerous application domains. However, DNN-based decision rules are notoriously prone to poor generalization, i.e., may prove inadequate on inputs not encountered during training. This limitation poses a significant obstacle to employing deep learning for mission-critical tasks, and also in real-world environments that exhibit high variability.

We propose a novel, verification-driven methodology for identifying DNN-based decision rules that generalize well to new input domains. Our approach quantifies generalization to an input domain by the extent to which decisions reached by independently trained DNNs are in agreement for inputs in this domain. We show how, by harnessing the power of DNN verification, our approach can be efficiently and effectively realized.

We evaluate our verification-based approach on three deep reinforcement learning (DRL) benchmarks, including a system for real-world Internet congestion control. Our results establish the usefulness of our approach, and, in particular, its superiority over gradient-based methods.

More broadly, our work puts forth a novel objective for formal verification, with the potential for mitigating the risks associated with deploying DNN-based systems in the wild.

Tandem Attack: DDoS Attack on Microservices Auto-scaling Mechanisms

Michael Czeizler, Anat Bremler-Barr
INFOCOM 23 poster

Auto-scaling is a well-known mechanism for adapting systems to dynamic loads of traffic by increasing (scale-up) and decreasing (scale-down) the number of handling resources automatically. As software development shifted to micro-services architecture, large software systems are nowadays composed of many independent micro-services, each responsible for specific tasks. The breakdown to fragmented applications influenced also on the infrastructure side where different services of the same application are given different hardware configurations and scaling properties. Even though created to accelerate software development the micro-services approach also presents a new challenge - as systems grow larger, incoming traffic triggers multiple calls between micro-services to handle each request.

Cooperative Multi-Agent Data Gathering in Wireless Sensor Networks Using Reinforcement Learning

Efi Dvir

We present a novel multi-agent distributed reinforcement learning approach for the optimization of report delivery in energy-harvesting wireless sensor networks.

The protocol is resilient to system changes, e.g. emerging and faulty sensors, while having minimal coordination overhead.

Contrary to the well-known stationary policy we show that the algorithm attains high-performance throughput under a history-dependent deterministic policy in various scenarios.

Perfectly Covert Communication with a Reflective Panel

Or Elimelech, Itai Bitton, Eran Nehushtan, Asaf Cohen

arXiv

Covert communication, a sub-field of information security, is focused on hiding the mere existence of communication from unwanted listeners via the physical layer, i.e., via signal and noise characteristics, rather than assuming coding or secure protocols at the higher layers.

In this work, we consider the problem of perfect covert communication in wireless networks. Specifically, harnessing an Intelligent Reflecting Surface (IRS), we turn our attention to schemes which allow the transmitter to completely hide the communication, with zero energy at the unwanted listener (Willie) and hence zero probability of detection. Applications of such schemes go beyond simple covertness, as we prevent detectability or decoding even when the codebook, timings and channel characteristics are known to Willie. That is, perfect covertness also ensures Willie is unable to decode, even assuming communication took place and knowing the codebook.

We define perfect covertness, give a necessary and sufficient condition for it in IRS-assisted communication and define the optimization problem. For N=2 IRS elements we compute the probability of finding a solution and give the solution analytically. For N>2, we also analytically compute the probability of such a zero--detection solution and show that it tends to 1 as the number of IRS elements increases. We provide a perfectly covert scheme achieving it and prove its convergence. The results are also supported by simulation results, showing that a small amount of IRS elements allows for a positive rate at the legitimate user yet with zero probability of detection at an unwanted listener.

Zero-copy packet processing on DPUs

Haggai Eran

Data-processing units (DPUs) incorporate a networking interface and programmable processing cores, offloading networking functions from the CPU. However, processing every packet on the DPU cores involves copying packets to the DPU, utilizing scarce resources such as memory bandwidth, NIC pipeline bandwidth, or cache capacity. While NIC offloads can bypass the DPU cores altogether, some applications cannot be expressed this way. Instead, we propose a way of reducing DPU resource utilization by passing packet headers along with descriptors to the DPU software stack instead of copying the full packet. A preliminary implementation on the BlueField-2 DPU shows a 6.5× reduction in DPU DDR bandwidth utilization and a 2.5× increase in application throughput for a basic packet forwarding microbenchmark.

Collaborative Secure Image Matching with Visual Cryptography

Alexander Fok, Shlomi Dolev, Michael Segal

IEEE NCA 2022

Collaborative secure image matching is a problem that is applicable in various domains, for both – data in rest and data in motion. The problem is defined as follows. There is a secret image, and a set of n mobile agents. The set of mobile agents should match (compare) an observed image to the original secret image. In this research we discuss some of the existing approaches, and present an alternative solution applied and analyzed for different applications. The first application is a swarm of Unmanned Aerial Vehicles (UAV) that search for a target specified by an image. The second application is a social network that serves as a smart storage device capable of performing distributed, secret image matching operations. Our solution is based on the well-known Visual Encryption Scheme (VES) and projections of visual bit maps rather than (quadratic complexity) messages exchange in implementing Secure Multi Party Computation (MPC) scheme. We present a perfect-information-theoretic secure solution for this problem. To keep the original image secrecy, at least k out of n mobile agents are required to retrieve any information about the original image. A preliminary version appeared in CSCML 2022, PhD Track. Part of this research published in IEEE NCA 2022 conference as “Swarming with (Visual) Secret (Shared) Mission” paper [1]. https://ieeexplore.ieee.org/document/10013507

Using Deep Reinforcement Learning for mmWave Real-Time Scheduling

Barak Gahtan, Reuven Cohen, Alex M. Bronstein, Gil Kedar

arXiv (Under review at IFIP 2023)

We study the problem of real-time scheduling in a multi-hop millimeter-wave (mmWave) mesh. We develop a model-free deep reinforcement learning algorithm called Adaptive Activator RL (AARL), which determines the subset of mmWave links that should be activated during each time slot and the power level for each link. The most important property of AARL is its ability to make scheduling decisions within the strict time slot constraints of typical 5G mmWave networks. AARL can handle a variety of network topologies, network loads, and interference models, it can also adapt to different workloads. We demonstrate the operation of AARL on several topologies: a small topology with 10 links, a moderately-sized mesh with 48 links, and a large topology with 96 links. We show that for each topology, we compare the throughput obtained by AARL to that of a benchmark algorithm called RPMA (Residual Profit Maximizer Algorithm). The most important advantage of AARL compared to RPMA is that it is much faster and can make the necessary scheduling decisions very rapidly during every time slot, while RPMA cannot. In addition, the quality of the scheduling decisions made by AARL outperforms those made by RPMA.

Go-to-Controller is Better: Efficient and Optimal LPM Caching with Splicing

Itamar Gozlan, Chen Avin, Gil Einziger, Gabriel Scalosub

Sigmetrics 23

Modern data center networks are required to support huge and complex forwarding policies as they handle the traffic of the various tenants. However, these policies cannot be stored in their entirety within the limited memory available at commodity switches. The common approach in such scenarios is to have SDN controllers manage the memory available at the switch as a fast cache, updating and changing the forwarding rules in the cache according to the workloads dynamics and the global policy at hand. Many such policies, such as Longest-prefix-match (LPM) policies, introduce dependencies between the forwarding rules. Ensuring that the cache content is always consistent with the global policy often requires the switch to store (potentially many) superfluous rules, which may lead to suboptimal performance in terms of delay and throughput. To overcome these deficiencies, previous work suggested the concept of splicing, where modified Go-to- Controller rules can be inserted into the cache to improve performance while maintaining consistency. These works focused mostly on heuristics, and it was conjectured that the problem is computationally intractable. As our main result, we show that the problem of determining the optimal set of rules, with splicing, can actually be solved efficiently by presenting a polynomial-time algorithm that produces an optimal solution, i.e., for a given cache size we find an optimal set of rules, some of which are go-to-controller, which maximize the total weight of the cache while maintaining consistency. However, such optimality comes at a cost, encompassed by the fact that our algorithm has a significantly larger running time than SoTA solutions which do not employ splicing. Therefore, we further present a heuristic exhibiting close-to-optimal performance, with significantly improved running time, matching that of the best algorithm, which does not employ splicing. In addition, we present the results of an evaluation study that compares the performance of our solutions with that of SoTA approaches, showing that splicing can reduce the cache miss ratio by as much as 30%, without increasing the cache size. Lastly, we propose a simple and fast-to-compute metric (that is consistency-oblivious) in order to evaluate the potential benefits of splicing compared to classical LPM-caching, for a given policy and traffic distribution. We show that our metric is highly correlated with such benefits, thus serving as an indication of whether splicing should be incorporated within the system architecture.

Cerberus: The Power of Choices in Datacenter Topology Design

Chen Griner, Johannes Zerwas, Andreas Blenk, Manya Ghobadi, Stefan Schmid, Chen Avin

ACM Proc. on Measurements and Analysis of Computing Systems

The bandwidth and latency requirements of modern datacenter applications have led researchers to propose various topology designs using static, dynamic demand-oblivious (rotor), and/or dynamic demand-aware switches. However, given the diverse nature of datacenter traffic, there is little consensus about how these designs would fare against each other.

In this work, we analyze the throughput of existing topology designs

under different traffic patterns and study their unique advantages and potential costs in terms of bandwidth and latency ``tax''. To overcome the identified inefficiencies, we propose Cerberus a unified, two-layer leaf-spine optical datacenter design with three topology types.

Cerberus systematically matches different traffic patterns with their most suitable topology type: e.g., latency-sensitive flows are transmitted via a static topology, all-to-all traffic via a rotor topology, and elephant flows via a demand-aware topology. We show analytically and in simulations that Cerberus can improve throughput significantly compared to alternative approaches and operate datacenters at higher loads while being throughput-proportional.

JULIET-PUF: Enhancing the Security of IoT-Based SRAM-PUFs Using the Remanence Decay Effect

Amit Kama, Michael Amar, Snir Gaaton, Kang Wang, Yifan Tu, Yossi Oren

IEEE IOT Journal

The cloud-based Internet of Things (IoT) enables the creation of innovative computer applications based on sensing, analyzing, and controlling the physical world. IoT deployments, however, are at a particular risk of counterfeiting, through which an adversary can corrupt the entire ecosystem. Therefore, entity authentication of edge devices is considered an essential part of the security of IoT systems. This research addresses the challenge of generating a unique ID in IoT devices. Unique IDs allow the IoT system maker to identify each edge device, and to ensure that only genuine devices can upload data to the cloud. Traditional ID mechanisms are not feasible in IoT, due to the edge device’s constrained runtime environment, or the additional costs and the deployment difficulties that they introduce. In this work, we present JULIET-PUF, a novel PUF-based method for IoT identification, which relies on SRAM content retrieval after power glitches with time differences. Our scheme comes with no added hardware cost on the edge device. We evaluate JULIET-PUF using a data set of 24 units of a popular commercial IoT device, and show that it is on average 95.58 times more secure than the common use of SRAM-PUF.

QueuePilot: Reviving Small Buffers With a Learned AQM Policy

Prof. Isaac Keslassy, Micha Dery, Orr Krupnik

Infocom 23

Internet router buffers help deal with congestion and traffic variability. Reducing buffer sizes may be one of the main outstanding open problems in networking, as small buffers would provide lower delays for users and free capacity for vendors. Unfortunately, with small buffers, a passive policy has an excessive loss rate and existing AQM (active queue management) policies, which signal developing congestion, can be unreliable.

In this talk, we introduce QueuePilot, a reinforcement learning-based AQM that enables small buffers in backbone routers, trading off high utilization with low loss rate and short delay. QueuePilot automatically tunes the probability of marking packets to signal congestion. After training once offline with a variety of settings, QueuePilot produces a single lightweight policy that can be applied online without further learning. We evaluate QueuePilot on real networks with hundreds of TCP flows, and show how its performance in small buffers exceeds that of existing algorithms, and even exceeds their performance with larger buffers.

Multiplicative Partially Homomorphic CRT Secret Sharing

Yaniv Kleinman, Shlomi Dolev

IEEE NCA 2022

We present a new CRT-based multiplicatively homomorphic secret-sharing scheme with perfect information-theoretic (PIT) security. The scheme supports the evaluation of products of secrets belonging to multiplicative groups. Even though our scheme is only partially homomorphic, it may be regarded as fully homomorphic for practical scenarios, such as bounded-sized multi-cloud databases.

DNSRU - Real Time DNS update

Ariel Litmanovich

DNSRU (DNS Real-time Update) enables secure real-time updates of cached domain

records in resolvers, before the associated TTLs have expired. Thus, Internet services can update their domain records in resolvers cache in case the wrong IP address has been mistakenly distributed with a large TTL. DNSRU eliminates the main motivation for short TTL values, thus allowing larger TTL values in resolvers cache that reduce the traffic load on authoritative servers and resolvers. DNSRU is efficient, secure, backward compatible, and supports gradual deployment.

DOTE: Rethinking (Predictive) WAN Traffic Engineering

Yarin Perry, Felipe Vieira Frujeri, Chaim Hoch, Srikanth Kandula, Ishai Menache, Michael Schapira, Aviv Tamar

NSDI 23 Best Paper

We explore a new design point for traffic engineering on wide-area networks (WANs): directly optimizing traffic flow on the WAN using only historical data about traffic demands. Doing so obviates the need to explicitly estimate, or predict, future demands. Our method, which utilizes stochastic optimization, provably converges to the global optimum in well-studied theoretical models. We employ deep learning to scale to large WANs and real-world traffic. Our extensive empirical evaluation on real-world traffic and network topologies establishes that our approach's TE quality almost matches that of an (infeasible) omniscient oracle, outperforming previously proposed approaches, and also substantially lowers runtimes.

NeuroLPM - Scaling Longest Prefix Match Hardware with Neural Networks

Alon Rashelbach

Longest Prefix Match algorithm (LPM) is broadly used in computer systems and especially in modern network devices such as Network Interface Cards, switches and routers. However, existing LPM hardware fails to scale to millions of rules required by modern applications, is often optimized for specific applications, and is performance sensitive to the structure of LPM rules. We describe NeuroLPM, a new architecture for multi-purpose LPM hardware that replaces lookups in traditional memory- intensive trie- and hash-table data structures with inference in a lightweight Neural Network-based model, called RQRMI. NeuroLPM supports scaling to millions of rules under small on-die SRAM budget and achieves stable, input-agnostic performance, allowing its use in various applications. We solve several unique challenges when implementing RQRMI inference in hardware, including the elimination of floating-point computations while maintaining query correctness, and scaling to larger inputs while ensuring small, deterministic off-chip memory bandwidth. We prototype NeuroLPM in System Verilog and evaluate it on real-world packet forwarding rule-sets and network traces. We demonstrate that NeuroLPM offers substantial scalability benefits. For example, it efficiently serves a 950K -large rule-set with only 2MB of SRAM, and reduces the DRAM bandwidth per query, the dominant performance factor, by up to 9× and 3× compared to the state-of-the-art Tree Bitmap and SAIL, even though it is general and lacks network forwarding-specific optimizations that they have.

Safe Defaulting for ML-Powered Systems

Noga H. Rotman

In recent years, the incorporation of deep learning (DL) into a variety of computer and networked systems has been proposed. A key challenge for DL-augmented systems, which poses significant obstacles to their real-world adoption, is that when the operational environment for such a system deviates from its training environment, the system is prone to poor decisions, inducing bad performance. We argue that reaping the benefits of DL-powered systems without compromising on safety requires building into such systems the capability to fall back to a robust default policy when the system’s ML-guided decisions are no longer deemed coherent.

We present a general framework for online defaulting by ML-powered systems. Our framework incorporates methodologies for quantifying decision uncertainty in sequential decision making and for building on this information to transition from ML-based decision making to a fallback policy. To illustrate the usefulness of our approach, we present and explore safe defaulting mechanisms for two recent proposals for ML-driven adaptive bitrate (ABR) selection in video streaming: the Pensieve ABR protocol, which reflects a deep-reinforcement learning approach, and the Puffer TV service, which incorporates Fugu, a deep-supervised-learning-based ABR protocol. We show, through extensive experimental and empirical analyses of large datasets of live video streaming performance, that, in both contexts, online defaulting is key to maintaining high performance when system decisions are sound, while avoiding degraded performance when this is not so.

Smart Network Observability – Connection Tracking

Ronen Schaffer

Flow Logs Pipeline (a.k.a. FLP) is an observability tool that consumes flow logs from various inputs, transforms them and exports logs to Loki and / or time series metrics to Prometheus. While flow logs encompass a lot of valuable data, observing the network from the level of flow logs is often too low. In many cases, we are interested in observing it from a higher level, the level of connections.

In this work, we introduce a new processing stage in FLP that allows aggregating flow logs from the same conversation - conversation tracking.

Constrained In-network Computing with Low Congestion in Datacenter Networks

Raz Segal, Chen Avin, Gabriel Scalosub

INFOCOM 22

Distributed computing has become a common practice nowadays,

where recent focus has been given to the usage of smart networking devices with in-network computing capabilities. State-of-the-art switches with near-line rate computing and aggregation capabilities enable acceleration and improved performance for various modern applications like big data analytics and large-scale distributed and federated machine learning.

In this work, we formulate and study the theoretical algorithmic foundations of such approaches, and focus on how to deploy and use constrained in-network computing capabilities within the data center.

We focus our attention on reducing the network congestion, i.e., the most congested link in the network, while supporting the given workload(s). We present an efficient optimal algorithm for tree-like network topologies and show that our solution provides as much as an x13 improvement over common alternative approaches. In particular, our results show that having merely a small fraction of network devices that support in-network aggregation can significantly reduce the network congestion, both for single and multiple workloads.

AutoMon: Automatic Distributed Monitoring for Arbitrary Multivariate Functions

Hadar Sivan, Moshe Gabel, Assaf Schuster

SIGMOD 22

Approaches for evaluating functions over distributed data streams are increasingly important as data sources become more geographically distributed. However, existing methodologies are limited to small classes of functions, requiring non-trivial effort and substantial mathematical sophistication to tailor them to new functions.

In this work we present AutoMon, the first general solution to this problem. AutoMon enables automatic, communication-efficient distributed monitoring of arbitrary functions.

Given source code that computes a function from centralized data, the AutoMon algorithm approximates the function over the aggregate of distributed data streams, without centralizing data updates.

Our evaluation shows that AutoMon sends the same number or fewer messages as state-of-the-art techniques when monitoring specific functions for which a distributed, hand-crafted solution is known. AutoMon, however, is a lot more powerful. It automatically generates a communication-efficient distributed monitoring solution for arbitrary functions, e.g., monitoring deep neural networks inference tasks for which no non-trivial solution is known.

NRDelegationAttack: Complexity DDoS attack on DNS Recursive Resolvers

Shani Stajnrod, Yehuda Afek, Anat Bremler-Barr

USENIX Security 23

Malicious actors carrying out distributed denial-of-service (DDoS) attacks are interested in requests that consume a large amount of resources and provide them with ammunition. We present a severe complexity attack on DNS resolvers, where a single malicious query to a DNS resolver can significantly increase its CPU load. Even a few such concurrent queries can result in resource exhaustion and lead to a denial of its service to legitimate clients. This attack is unlike most recent DDoS attacks on DNS servers, which use communication amplification attacks where a single query generates a large number of message exchanges between DNS servers.

The attack described here involves a malicious client whose request to a target resolver is sent to a collaborating malicious authoritative server; this server, in turn, generates a carefully crafted referral response back to the (victim) resolver. The chain reaction of requests continues, leading to the delegation of queries. These ultimately direct the resolver to a server that does not respond to DNS queries. The exchange generates a long sequence of cache and memory accesses that dramatically increase the CPU load on the target resolver. Hence the name non-responsive delegation attack, or NRDelegationAttack.

We demonstrate that three major resolver implementations, BIND9, Unbound, and Knot, are affected by the NRDelegationAttack, and carry out a detailed analysis of the amplification factor on a BIND9 based resolver. As a result of this work, three common vulnerabilities and exposures (CVEs) regarding NRDelegationAttack were issued by these resolver implementations. We also carried out minimal testing on 16 open resolvers, confirming that the attack affects them as well.

How to Attack and Congest Delay-Sensitive Applications on the Cloud

Jhonatan Tavori

INFOCOM 2023

The delay and service-blocking experienced by users are critical measures of quality of service in real-time distributed systems. Attacks directed at such facilities aim at disrupting the service and hurting these metrics. Our goal is to characterize worst-case attacks on such systems.

We use queueing models to study attackers who wish to maximize damage while constrained by attack resources. A key question pertaining to systems design is whether a damage maximizing attack should focus on heavily affecting a small number of facilities or spread its efforts over many facilities.

We analyze attacks which damage the system resources (i.e., Capacity attacks) and show that optimal capacity attacks are concentrated. We further use a Max-Min (attacker and defender) analysis where the defender can migrate requests in response to the attack: An intriguing result is that under certain conditions an optimal attack will spread its efforts over many sites. This is in contrast to the attack concentration predictions of (agnostic to queueing delays) prior studies.

We also address DDoS (or Flow) attacks where attackers create loads of dummy requests and send them to the system. We prove that concentrating the attack efforts is always the optimal strategy, regardless of whether the system reacts by migrating requests, in contrast to the capacity attacks.

The talk is based on a paper accepted to IEEE INFOCOM 2023.

MAC Protocol for Massive MIMO based on Group Testing

George Vershinin

The challenge of effective scheduling and resource sharing for uplink communications with a base station (BS) becomes computationally prohibitive with the ever-increasing demand for more per-household devices and the increasing number of antennas thereof. Motivated by this challenge, we devise and study a robust (to channel state information, CSI, estimation errors) multiple-access protocol for massive multiple-input-multiple-output (MIMO) systems based on sparse coding techniques originated in group testing (GT) and Index Modulation for systems with non-cooperative self-scheduling users. Our suggested scheme has extraordinarily low complexity and no scheduling overhead, hence is not only reliable but also practical.

In this study, we analyze our scheme's bit-error rate, decoding error probability, scaling laws, system sum-rate, and complexity. Moreover, we numerically evaluate how our system scales with increasing active devices and signal-to-noise ratio (SNR). Finally, we present a matching converse result for a growing number of users (both active and inactive), many messages per user, and an increasing SNR under a linear minimum mean-squared error estimation algorithm. The order-optimality of our scheme is given by both achieving the converse result asymptotically and by comparing our sum-rate with the perfect CSI model's ergodic sum-rate.

A PHYsical Layer Security Architecture for Ethernet

Qing Wang / Adi Molkho

This talk discussed and proposed a security(encryption/decryption) architecture on Ethernet physical layer---PHYSec, which can be implemented in optical module. Based on our evaluation, PHYSec could provide strong privacy protection and also has better bandwidth utilization, less implementation complexity and higher energy efficiency, especially for beyond 400G. While existing cybersecurity mechanism such as IPsec and MACsec are implemented above layer 2.

Semi-Decentralized Federated Learning with Collaborative Relaying in Unreliable Networks

Michal Yemini, R. Saha, E. Ozfatura, D. Gündüz and A. J. Goldsmith

ISIT 2022

Intermittent client connectivity is one of the major challenges in centralized federated edge learning frameworks. Intermittently failing uplinks to the central parameter server (PS) can induce a large generalization gap in performance especially when the data distribution among the clients exhibits heterogeneity. To mitigate communication blockages between clients and the central PS, this talk introduces the concept of knowledge relaying wherein the successfully participating clients collaborate in relaying their neighbors' local updates to a central PS in order to boost the participation of clients with intermittently failing connectivity. I will accomplish this vision by presenting a collaborative relaying-based semi-decentralized federated edge learning framework where at every communication round each client first computes a local consensus of the updates from its neighboring clients and eventually transmits a weighted average of its own update and those of its neighbors to the PS. Further, we will discuss how to appropriately optimize these averaging weights to reduce the variance of the global update at the PS while ensuring that the global update is unbiased, consequently improving the convergence rate. I will validate the theoretical results and demonstrate that this proposed scheme is superior to the Federated averaging benchmark especially when data distribution among clients is non-iid. Finally, if time permits I will also discuss the privacy ramifications of the presented scheme and how to update the collaborative relaying to adhere to privacy constraints.