Tools and Advices
Advices for Ph.D students
A Ph.D Is Not Enough-- A Guide to Survival in Science by Peter J. Feibelman
The Ph.D grind by Philip Guo
Advice Collection maintained by Tao Xie and Yuan Xie
Develop A Research Career by Francine Berman
Papers for Deep Learning Systems
Narayanan, D., Harlap, et al, 2019, October. PipeDream: generalized pipeline parallelism for DNN training. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (pp. 1-15).
Huang, Yanping, et al. "Gpipe: Efficient training of giant neural networks using pipeline parallelism." In Advances in Neural Information Processing Systems, pp. 103-112. 2019.
Paszke, Adam, et al. "PyTorch: An imperative style, high-performance deep learning library." Advances in Neural Information Processing Systems. 2019.
Agrawal, Akshay, et al. "Tensorflow eager: A multi-stage, python-embedded dsl for machine learning." arXiv preprint arXiv:1903.01855 (2019).
Jeong, Eunji, et al. "{JANUS}: Fast and Flexible Deep Learning via Symbolic Graph Execution of Imperative Programs." 16th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 19). 2019.
Sergeev, Alexander, and Mike Del Balso. "Horovod: fast and easy distributed deep learning in TensorFlow." arXiv preprint arXiv:1802.05799 (2018).
Chen, Tianqi, et al. "{TVM}: An automated end-to-end optimizing compiler for deep learning." 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 2018.
Lian, Xiangru, et al. "Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent." Advances in Neural Information Processing Systems. 2017.
Zhang, Hao, et al. "Poseidon: An efficient communication architecture for distributed deep learning on {GPU} clusters." 2017 {USENIX} Annual Technical Conference ({USENIX}{ATC} 17). 2017.
Goyal, Priya, et al. "Accurate, large minibatch sgd: Training imagenet in 1 hour." arXiv preprint arXiv:1706.02677 (2017).
Abadi, Martín, et al. "Tensorflow: A system for large-scale machine learning." 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 2016.
Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system." Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 2016.
Kumar, Arun, Jeffrey Naughton, and Jignesh M. Patel. "Learning generalized linear models over normalized data." Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 2015.
Chen, Tianqi, et al. "Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems." arXiv preprint arXiv:1512.01274 (2015).
Xing, Eric P., et al. "Petuum: A new platform for distributed machine learning on big data." IEEE Transactions on Big Data 1.2 (2015): 49-67.
Li, Mu, et al. "Scaling distributed machine learning with the parameter server." 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14). 2014.
Chilimbi, Trishul, et al. "Project adam: Building an efficient and scalable deep learning training system." 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14). 2014.
Ho, Qirong, et al. "More effective distributed ml via a stale synchronous parallel parameter server." Advances in neural information processing systems. 2013.
Dean, Jeffrey, et al. "Large scale distributed deep networks." Advances in neural information processing systems. 2012.
Recht, Benjamin, et al. "Hogwild: A lock-free approach to parallelizing stochastic gradient descent." Advances in neural information processing systems. 2011.
Papers for Machine Learning Systems Based on Relational Database, Spark, MapReduce
Jankov, Dimitrije, et al. "Declarative recursive computation on an RDBMS: or, why you should use a database for distributed machine learning." Proceedings of the VLDB Endowment 12.7 (2019): 822-835.
Luo, Shangyu, et al. "Scalable linear algebra on a relational database system." IEEE Transactions on Knowledge and Data Engineering 31.7 (2018): 1224-1238.
Boehm, Matthias, et al. "On optimizing operator fusion plans for large-scale machine learning in systemml." Proceedings of the VLDB Endowment 11.12 (2018): 1755-1768.
Pansare, Niketan, et al. "Deep Learning with Apache SystemML." arXiv preprint arXiv:1802.04647 (2018).
Boehm, Matthias, et al. "Systemml: Declarative machine learning on spark." Proceedings of the VLDB Endowment 9.13 (2016): 1425-1436.
Zhang, Ce, Arun Kumar, and Christopher Ré. "Materialization optimizations for feature selection workloads." ACM Transactions on Database Systems (TODS) 41.1 (2016): 1-32.
Meng, Xiangrui, et al. "Mllib: Machine learning in apache spark." The Journal of Machine Learning Research 17.1 (2016): 1235-1241.
De Sa, Christopher, et al. "Deepdive: Declarative knowledge base construction." ACM SIGMOD Record 45.1 (2016): 60-67.
Kumar, Arun, Jeffrey Naughton, and Jignesh M. Patel. "Learning generalized linear models over normalized data." Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 2015.
Niu, Feng, et al. "DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference." VLDS 12 (2012): 25-28.
Low, Yucheng, et al. "Distributed GraphLab: a framework for machine learning and data mining in the cloud." Proceedings of the VLDB Endowment 5.8 (2012): 716-727.
Feng, Xixuan, et al. "Towards a unified architecture for in-RDBMS analytics." Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 2012.
Ghoting, Amol, et al. "SystemML: Declarative machine learning on MapReduce." 2011 IEEE 27th International Conference on Data Engineering. IEEE, 2011.
Brown, Paul G. "Overview of SciDB: large scale array storage, processing and analysis." Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 2010.
Papers for End-to-End Machine Learning
Kunft, Andreas, et al. "An intermediate representation for optimizing machine learning pipelines." Proceedings of the VLDB Endowment 12.11 (2019): 1553-1567.
Boehm, Matthias, et al. "SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle." arXiv preprint arXiv:1909.02976 (2019).
Palkar, Shoumik, et al. "Evaluating end-to-end optimization for data analytics applications in weld." Proceedings of the VLDB Endowment 11.9 (2018): 1002-1015.
Miao, Hui, et al. "Towards unified data and lifecycle management for deep learning." 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, 2017.
Baylor, Denis, et al. "Tfx: A tensorflow-based production-scale machine learning platform." Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017.
Sculley, David, et al. "Hidden technical debt in machine learning systems." Advances in neural information processing systems. 2015.
Papers for UDF-Centric Analytics
Crotty, A. et al. "An architecture for compiling udf-centric workflows". Proceedings of the VLDB Endowment, 8(12) 2015, pp.1466-1477.
Palkar, Shoumik, et al. "Weld: A common runtime for high performance data analytics." (2017).
Rheinländer, A., et al. Optimization of complex dataflows with user-defined functions. ACM Computing Surveys (CSUR), 50(3) (2017), 1-39.
Zou, Jia et al. "Lachesis: automatic partitioning for UDF-centric analytics" Proceedings of the VLDB Endowment 14, no. 8 (2021): 1262-1275.
Foufoulas, Y. et al. YeSQL: " you extend SQL" with rich and highly performant user-defined functions in relational databases. Proceedings of the VLDB Endowment, 15(10) (2022), 2270-2283.
Sichert, Moritz, et al. "User-defined operators: Efficiently integrating custom algorithms into modern databases." Proceedings of the VLDB Endowment 15, no. 5 (2022): 1119-1131.
Foufoulas, Y. et al. "Efficient Execution of User-Defined Functions in SQL Queries". Proceedings of the VLDB Endowment, 16(12) (2023), 3874-3877.