Here is a list of papers that you can select from - to decide on a seminar topic. Please note that this list is not intended to be exhaustive. You are encouraged to find other papers that may be of interest. Ideal seminars will cover the topic in a comprehensive manner - and may need to refer to more than one paper. Selection of a topic is first come first served.
Seminars will be presented in groups of 2-3 students in class on Mar 06/ Mar 13. Each seminar needs to include a 20-30 min presentation on the topic. Slides need to be submitted for evaluation.
Systems
Overview of Apache Flink
Overview of Storm -- Search literature
The Evolution of Stream Processing Systems
Overview of Amazon Kinesis -- search literature
Runtime Adaptation of Data Stream Processing Systems: The State of the Art
Overview of Arcon
Overview of Ray
NEPTUNE: Real Time Stream Processing for Internet of Things and Sensing Environments
Millwheel: Google Stream processing
Stream Programming
Ambrosia Distributed Programming
Stream Programming with Hardware Acceleration
SPADE: The System S declarative stream processing engine. SIGMOD, 2008.
KSQL: SQL for stream processing on Kafka
Optimizing Stream Processing
Load Shedding in a Data Stream Manager. VLDB, 2003.
Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing. VLDB, 2007.
Data Aware Load Shedding, VLDB 2022
Hone: Mitigating Stragglers in Distributed Stream Processing With Tuple Scheduling, IEEE TPDPS, 2021
SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems. Middleware, 2008.
Auto-pipelining for data stream processing. IEEE TPDS, 2012.
Runtime Adaptation of Data Stream Processing Systems: The State of the Art, 2021
Fault Tolerance
Highly-Available, Fault-Tolerant, Parallel Dataflows. SIGMOD, 2004.
Fault-tolerance in the borealis distributed stream processing system. ACM TODS, 2008.
Language-level checkpointing support for stream processing applications. DSN, 2009.
Fault injection-based assessment of partial fault tolerance in stream processing applications. DEBS, 2011.
High-Availability Algorithms for Distributed Stream Processing. ICDE, 2005.
A Cooperative, Self-Configuring High-Availability Solution for Stream Processing. ICDE, 2007.
Robust Distributed Stream Processing, ICDE, 2013.
Towards Optimal Resource Allocation in Partial-Fault Tolerant Applications, Infocom 2008.
Stream Mining
Papers of your choice to cover
Stream data pre-processing: Descriptive Statistics, Sampling, Sketches, Transforms, Quantization, Dimensionality Reduction
Stream data mining: Classification, Regression, Clustering, Frequent Pattern Mining
Previous Student Seminars
Overview of Apache Flink
Overview of Storm
Overview of Amazon Kinesis
STREAM: The Stanford Stream Data Manager. IEEE Data Engineering Bulletin, 2003.
Gigascope: A Stream Database for Network Applications. ACM SIGMOD, 2003.
The Design of the Borealis Stream Processing Engine. CIDR, 2005.
The CQL Continuous Query Language: Semantic Foundations and Query Execution. VLDB Journal, 2005.
SPADE: The System S declarative stream processing engine. SIGMOD, 2008.
Towards a streaming SQL standard. VLDB, 2008.
IBM Streams Processing Language: Analyzing Big Data in Motion. IBM Journal of Research and Development, 2013
SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems. VLDB, 2010.
Operator Scheduling in a Data Stream Manager, VLDB, 2003
Flux: An adaptive partitioning operator for continuous query systems. IEEE ICDE, 2003.
Providing Resiliency to Load Variations in Distributed Stream Processing. VLDB, 2006.
Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing. VLDB, 2007.
SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems. Middleware, 2008.
Load Shedding in a Data Stream Manager. VLDB, 2003.
Efficient Construction of Compact Shedding Filters for Data Stream Processing. ICDE, 2008.
Elastic scaling of data parallel operators in stream processing. IEEE IPDPS, 2009.
Auto-pipelining for data stream processing. IEEE TPDS, 2012.
Auto-parallelizing stateful distributed streaming applications. PACT, 2012.
Highly-Available, Fault-Tolerant, Parallel Dataflows. SIGMOD, 2004.
Fault-tolerance in the borealis distributed stream processing system. ACM TODS, 2008.
Language-level checkpointing support for stream processing applications. DSN, 2009.
Fault injection-based assessment of partial fault tolerance in stream processing applications. DEBS, 2011.
High-Availability Algorithms for Distributed Stream Processing. ICDE, 2005.
A Cooperative, Self-Configuring High-Availability Solution for Stream Processing. ICDE, 2007.
Robust Distributed Stream Processing, ICDE, 2013.
Towards Optimal Resource Allocation in Partial-Fault Tolerant Applications, Infocom 2008.
Overview of Apache Kafka(Kafka documentation)
Edge computing: Vision and challenges
Data-driven Stream Processing at the Edge
Realtime Data Processing at Facebook