Course schedule

Join us on Slack!

Join us on Slack!

Anouncements

  • Extra time (update) Based on the current status of a number of you with respect to the Mid-term Assignment, we have decided to give you an extra week to complete the assignment. The new DEADLINE for submitting the Mid-term assignment is now Tuesday Oct 17 at 15:00 CET. Furthermore, we will provide more step-by-step guidance in the workshop on Tue Oct 10 before the regular lecture, for those you have trouble with Python or commandlines and stuff. Finally, some tips that emerged from the questions you had this morning:Use Debugging tip #1 in the lecture slides: test each mapper script first from the local machine's commandline to see the actual output it results in, which will ...
    Posted 14 Oct 2017, 08:04 by Marco Spruit
  • Midterm report template (update) Here is a template for the written report with rationales, to accompany your midterm assignment. Good luck!UPDATE: When submitting your assignment, email it to me as a ZIP file containing at least the report PDF, the collection of complete scripts, and the outputs of each script. The latter could also be in the report, though. The raw input data I'd rather not receive 50 times ;-)  Even though when compressed the neo.csv won't take that much space anyway. Note that other submission item combinations are possible, just use your common sense (and document it) to submit all materials that we need in order to convince us that you did the good job that you did.
    Posted 16 Oct 2017, 11:55 by Marco Spruit
  • An inspiring pitch event Here are some random thoughts from our Pitch Event earlier this week... We had 16 5-minute pitches in two parallel sessions, so I missed half of it ;-)  From what I heard afterwards, it just almost as good as the session I was in. I was pleased to see that most of you PLUS-ed the pitch delivery using techniques like asking questions to the audience to engage them, writing down key words on the whiteboard, providing analogies to better communicate key ideas, and much more. Apparently it worked because the audience had questions for all pitchers, as we hoped would happen. We had all sorts of pitching styles: some were naturally casual, or business-y professional, or highly energetic ...
    Posted 21 Sep 2017, 08:34 by Marco Spruit
  • Book review assignment Well... it was surprisingly more complicated than last year to assign everyone a book from his Top-3 list. Bruce Schneier's Data and Goliath: The hidden battles to collect your data and control your world was the winner with 4 #1 positions. Quite some of you had more or less similar books in mind. I managed to assign everyone a Top-3 book, though, but I think around 10 people have been assigned their third choice. C'est la vie! Enjoy...
    Posted 8 Sep 2017, 13:11 by Marco Spruit
Showing posts 1 - 4 of 5. View more »

Asssignments

  • End-term assignment: Epidemiology The goal of this final assignment is to demonstrate your familiarity and hands-on experience with big data technology tools, in casu Spark, either using Python or R, to the extent that you can apply these tools to answer real-life questions from domain experts. This end-term assignment of the INFOMDSS course constitutes 30% of your final grade. Note that this assignment is an individual endeavour. We do check your scientific integrity… This assignment has been kindly proposed by the Epidemiology staff of the UMCU/Julius Center.The assignment consists of four parts. The first part “Preprocess data” requires the several basic, generic and essential Spark data processing skills to complete. Upon completion (in a correct and well-described ...
    Posted 17 Oct 2017, 11:27 by Marco Spruit
  • Mid-term Assignment: Neonatology Below you'll find the mid-term assignment on processing and analysing Neonatology data within the Hadoop distributed computing environment. This is your chance to demonstrate your familiarity and hands-on experience with big data technology tools, in casu the Hadoop ecosystem, to the extent that you can apply these tools to answer real-life questions from domain experts! This mid-term assignment of the course constitutes 25% of your final grade. Note that this assignment is an individual endeavour. This assignment has been kindly proposed by the Neonatology staff of the UMCU. See the attachment for all details and tasks.We are aware that some of the tasks are probably much easier to perform within other environments. However ...
    Posted 2 Oct 2017, 13:24 by Marco Spruit
  • Tutorial 4: Spark This is the fourth in our series of tutorial excercises -- Spark: again the ubiquitous example -- to help you get started and acquainted with the Hadoop big data technologies environment. As mentioned on the last slide, and as communicated earlier as well, it is required to go through these tutorials, even though you won't be graded for doing them. Below (and on the final slide in the attachment) is our lightweight solution to communicate to us that you have completed the tutorial:Upload a screendump of the VM in a non-maximised window Displaying the command line window after executing the ls –l command in the main user directory Made with the PrintScreen (PS) button on your keyboard (thus showing ...
    Posted 21 Sep 2017, 06:44 by Marco Spruit
  • Tutorial 3: MapReduce This is the third in a series of tutorial excercises -- Map/Reduce: its ubiquitous example -- to help you get started and acquainted with the Hadoop big data technologies environment. As mentioned on the last slide, and as communicated earlier as well, it is required to go through these tutorials, even though you won't be graded for doing them. Below (and on the final slide in the attachment) is our lightweight solution to communicate to us that you have completed the tutorial:Upload a screendump of the VM in a non-maximised window Displaying the command line window after executing the ls –l command in the main user directory Made with the PrintScreen (PS) button on your keyboard (thus showing ...
    Posted 21 Sep 2017, 06:46 by Marco Spruit
  • Tutorial 2: Pig & Hive This is the second installment in a series of tutorial excercises to help you get started and acquainted with the Hadoop big data technologies environment. As mentioned on the last slide, and as communicated earlier as well, it is required to go through these tutorials, even though you won't be graded for doing them. Below (and on the final slide in the attachment) is our lightweight solution to communicate to us that you have completed the tutorial:Upload a screendump of the VM in a non-maximised window Displaying the command line window after executing the ls –l command in the main user directory Made with the PrintScreen (PS) button on your keyboard (thus showing the entire screen incl ...
    Posted 20 Sep 2017, 08:45 by Marco Spruit
Showing posts 1 - 5 of 7. View more »