Literature

The following study materials are required readings for the written exam:
  1. Pritzker, P., and May, W. (2015). NIST Big Data interoperability Framework (NBDIF): Volume 1: Definitions. NIST Special Publication 1500-1. Final Version 1. National Institute of Standards and Technology.
  2. Shenoy, A. (2014). Hadoop Explained: An introduction to the most popular Big Data platform in the world. Packt Publishing.
  3. Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
  4. Ghemawat, S., Gobioff, H., & Leung, S. (2003). The Google file system. SIGOPS Operating Systems Review, 37(5), 29-43.
  5. Spruit,M., & Jagesar,R. (2016). Power to the People! Meta-algorithmic modelling in applied data science. In Fred,A. et al. (Ed.), Proc. 8th Int.Conf. on Knowledge Discovery (pp. 400–406). KDIR 2016, November 11-13, 2016, Porto, Portugal: ScitePress.
  6. Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science, 343(6176), 1203-1205.
In addition, the following two materials are considered to be the course foundation, and are therefore considered to be required background reading:
  • Davenport, T. H., & Patil, D. J. (2012). Data scientist: The Sexiest Job of the 21st Century. Harvard business review, 90(5), 70-76.
  • Stair, R. & Reynolds, G. (2012). Fundamentals of Information Systems. Sixth Edition. NOTE: Chapters 1 and 3 ONLY, on Information Systems in Perspective & Database Systems, Data Centers, and Business Intelligence. Cengage: Boston, MA. ISBN-13: 978-0-8400-6218-5. (other more recent editions are fine as well).
  • Chapman, P. Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., and Wirth, R. (2000). CRISP-DM 1.0 Step-by-step Data Mining Guide. [@IBM]
Finally, various literature is recommended troughout the course, including but not limited to:
  • White, J. (2016). Hadoop: The Definitive Guide. Third edition. O'Reilly.
  • Chambers, B., & Zaharia, M. (in press). Apache Spark - The Definitive Guide. Excerpts from the upcoming book, Release 1, Databricks.
  • Manyika, J. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute, McKinsey & Company.
  • Linden, A., Krensky, P., Hare, J., Idoine, C., Sicular, S., & Vashisth, S. (2017). Magic Quadrant for Data Science Platforms. Gartner.
  • <for more, see throughout course slides on Slack's #materials channel>