CS410/510: Data Engineering

Mondays and Wednesdays: 14:00 - 15:50
Room: Lincoln Hall #247 (zoom: http://pdx.zoom.us/my/birvin

Bruce Irvin - Office Hours: Wednesdays 13:00 - 14:00 in/near classroom, and on zoom by appointment

TA - Vysali Kallepalli (vysali@pdx.edu)
Office hours by appointment: vysali.youcanbook.me

slack channel: #dataeng-spring-2024-sec1 within the birvin.slack.com workspace on slack

Looking for the Tuesday/Thursday Section of Data Engineering with Mina Vu?    find it here

Course Description

This course explores the challenges of designing, building and maintaining data processing pipelines. We focus on concepts, techniques and technologies for gathering, validating, transporting, transforming, enhancing, storing, integrating and maintaining diverse data sets common to modern enterprises. Throughout the course we tie the course material to relevant ethical issues related to gathering, storing and processing of data.

Prerequisites

CS 486/586 - Introduction to Database Management Systems (recommended)

This class requires programming in Python. If you don't know Python then you will need to spend additional time learning it.

Goals

Upon the successful completion of this class, students will be able to:

This course focuses on fundamental concepts which are independent of today’s technologies. Most course assignments and projects will be implemented in Python, so the student will gain considerable practice with Python and prepare themselves for learning/using new data engineering technologies.

Textbook

This course will use readings from research papers, popular press and technical publications

Topics

Project

Students will work in groups (or alone) to build, develop, test and monitor a small-scale data pipeline. The project will include four assignments with each assignment emphasizing a specific aspect of Data Engineering.

Grading

(the following is subject to change)

50%: Class Participation

Each week your classroom participation will be based on your attendance, and your participation in classroom work activities. Class participation also includes two peer code reviews/critiques of your code and reviewing of your teammates' code. 

50%: Project

You can do the project alone or in a group of up to 4 students.  You build, develop, test and monitor a small-scale data pipeline. This project will be split into four parts with each part related to a specific sub-area of Data Engineering.

Grade Scale for this Class:

A=93+, A-=90+, B+=87+, B=83+, B-=80+, C+=77+, C=73+, C-=70+

We do not plan to use Canvas in this course. Assignment scores will be provided via shared documents.

Grading for Late Submissions

Policies

Attendance: Class attendance (in person or via zoom) required. There will be weekly in-class activities with an emphasis on writing code. You may attend up to four class periods remotely, with the rest of your attendance in-person in the classroom. 

Assignments are due prior to the posted deadline. If an extraordinary situation (for example hospitalization) prevents you from working for a period of time, contact me as soon as possible to discuss your situation and arrange a specific modified schedule for you.

Requests for regrading must be submitted to me in writing within one week of the time the graded assignment was made available to you. Be specific in saying why you feel your answer deserves additional credit. A request for regrade may result in a re-evaluation of the entire assignment and your total grade may increase or decrease as a result.

Makeup assignments will only be given in cases of severe medical or family emergencies. You must contact the instructor to arrange for a special circumstance. Note: personal or business travel is not considered a valid excuse for missing an assignment. If you know in advance that you will miss class or miss an assignment then contact me immediately to discuss.

Disability Resource Center If you have, or think you may have, a disability that may affect your work in this class and feel you need accommodations, contact the Disability Resource Center. 503-725-4150, drc@pdx.edu, https://www.pdx.edu/drc.

If you already have accommodations, please contact me to make sure that I have received a faculty notification letter and discuss your accommodations.

Academic Honesty

Do not submit work as your own if you did not create it yourself. This includes assignments in which substantial amounts of the material was done by someone else. Some assignments will be done in teams, be sure to indicate all of the team members’ names when submitting these. When working alone, students need to be especially careful that in the process of discussing problems with other students they do not end up using someone else's work. Similarly, failing to cite a source that contributed substantially to the solution of a problem is also considered to be cheating. Any literature consulted should be referenced precisely. Do not make your solutions available to other students in the class.

In the event a case of cheating is discovered, the student will receive a score of zero (0) for that assignment.