DS220: Data Management for Data Sciences (DM4DS)
Logistics
Time: Mon/Wed 6pm-7:15pm
Location: E205 Westgate
Instructor: Dongwon Lee (dul13). Office Hour: WED 3-5pm @ E353 Westgate
TA: Tracy Shen (jqs5443). Office Hour: TUE 1:30-3:30pm @ E343 Westgate
Overview
The course will introduce advanced relational databases and issues/techniques related to managing (large-scale) non-relational data. It builds upon knowledge gained in IST 210 Organization of Data. This course has two major components:
Advance students’ knowledge in relational database and their skills in using SQL and database indexing
Introduce NoSQL databases such as document-oriented database, key-value database, column-oriented database, and graph database
In the first component, the course will review the techniques learned in IST 210, strengthen students’ skills in using advanced SQL queries and introduce students to indexing and salability issues in relational databases.
While relational databases are quite practical for many situations, the needs of big data have driven a new class of other data storage methods with different strengths and weaknesses. This course will introduce students to non-relational data storage methods and the characteristics that distinguish these tools from relational databases. We will introduce both the concepts of NoSQL databases and how the concepts are implemented in the various open-source database systems, including models such as key-value, column family, document and graph databases.
Learning Objectives
At the completion of this course, students are expected to be able to:
Identify the characteristics, strengths, and weaknesses of relational, key-value, column-oriented, graph, vector space data models and databases
Choose an appropriate data model and database solution for a given application
Use the chosen database to organize, manage, query, and use data
Textbook
This course does NOT have a mandatory textbook. Instead, instructor will draw course materials from various sources. However, the following books could be useful for understanding the course materials better:
Fundamentals of Data Management Systems (2nd Edition), 2011
Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement (2nd Edition), 2018
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems (1st Edition), 2018
DBMS
This course use the Microsoft SQL Server in IST's VLabs environment. If you are within PSU IP domain, you can read related books from O'Reilly's Safari for free. Our choice of the DBMS has nothing to do with the superiority of the product. Therefore, by all means, if you are already familiar with other one, you are free to use other DBMS for your projects (i.e., MySQL, PostgreSQL, mSQL, DB2, Oracle, etc).
Projects
Two hands-on team projects are planned:
Project #1: Proj1 (Nittany AI Challenge)
Project #2: Proj2 (App using NoSQL DB)
Grading Weights
Subject to change:
Attendance (randomly checked; need at least 90% attendance) and Class participation: 5%
Assignments: 45%
HW #1: 6%
HW #2: 6%
HW #3: 6%
Project #1: 10% (Bonus for Nittany AI Challenge Phase 2: 3%)
Project #2: 17% (Bonus for invited team presentation: 1%)
Exams: 50%
Midterm: 20%
Final: 30%
Grading Scale & Curving
Instructor will use both fixed grading scale below as well as the curved grading to determine a final course grade.
Grade Percentage Points --------------------------------------------- A 94% to 100% 940 to 1000 A- 90% to 93.9% 900 to 939 B+ 87% to 89.9% 870 to 899 B 83% to 86.9% 830 to 869 B- 80% to 82.9% 800 to 829 C+ 77% to 79.9% 770 to 799 C 70% to 76.9% 700 to 769 D 60% to 69.9% 600 to 699 F Less than 60% less than 600
Assignment Submission Policy
Homework and Projects are usually assigned during WED class
Dues are by default SUN 11:59pm (EST)
Students can submit late with the penalty of 25% deduction for every 12 hours late (up to 2 days)
After 2 days, no more late submission is allowed
Academic Integrity
According to the Penn State Principles and University Code of Conduct: Academic integrity is a basic guiding principle for all academic activity at Penn State University, allowing the pursuit of scholarly activity in an open, honest, and responsible manner. In according with the University’s Code of Conduct, you must not engage in or tolerate academic dishonesty. This includes, but is not limited to cheating, plagiarism, fabrication of information or citations, facilitating acts of academic dishonesty by others, unauthorized possession of examinations, submitting work of another person, or work previously used without informing the instructor, or tampering with the academic work of other students. Any violation of academic integrity will be investigated, and where warranted, punitive action will be taken. For every incident when a penalty of any kind is assessed, a report must be filed.
Plagiarism (Cheating): Talking over your ideas and getting comments on your writing from friends are NOT examples of plagiarism. Taking someone else's words (published or not) and calling them your own IS plagiarism. Plagiarism has dire consequences, including flunking the paper in question, flunking the course, and university disciplinary action, depending on the circumstances of the office. The simplest way to avoid plagiarism is to document the sources of your information carefully.
Disability Access Statement
Americans with Disabilities Act: The School of Information Sciences and Technology welcomes persons with disabilities to all of its classes, programs, and events. If you need accommodations, or have questions about access to buildings where IST activities are held, please contact us in advance of your participation or visit. If you need assistance during a class, program, or event, please contact the member of our staff or faculty in charge. Access to IST courses should be arranged by contacting the Office of Human Resources, 332 IST Building: (814) 865-8949.
Students with Disabilities: It is Penn State’s policy to not discriminate against qualified students with documented disabilities in its educational programs. (You may refer to the Nondiscrimination Policy in the Student Guide to University Policies and Rules.) If you have a disability-related need for reasonable academic adjustments in this course, contact the Office for Disability Services (ODS) at 814-863-1807 (V/TTY). For further information regarding ODS, please visit the Office for Disability Services Web site at http://equity.psu.edu/ods/.
In order to receive consideration for course accommodations, you must contact ODS and provide documentation (see documentation guidelines at http://equity.psu.edu/ods/guidelines/documentation-guidelines). If the documentation supports the need for academic adjustments, ODS will provide a letter identifying appropriate academic adjustments. Please share this letter and discuss the adjustments with your instructor as early in the course as possible. You must contact ODS and request academic adjustment letters at the beginning of each semester.
Statement on Nondiscrimination & Harassment (Policy AD42)
The Pennsylvania State University is committed to the policy that all persons shall have equal access to programs, facilities, admission and employment without regard to personal characteristics not related to ability, performance, or qualifications as determined by University policy or by state or federal authorities. It is the policy of the University to maintain an academic and work environment free of discrimination, including harassment. The Pennsylvania State University prohibits discrimination and harassment against any person because of age, ancestry, color, disability or handicap, national origin, race, religious creed, sex, sexual orientation, gender identity or veteran status. Discrimination or harassment against faculty, staff or students will not be tolerated at The Pennsylvania State University. You may direct inquiries to the Office of Multicultural Affairs, 332 Information Sciences and Technology Building, University Park, PA 16802; Tel 814-865-0077 or to the Office of Affirmative Action, 328 Boucke Building, University Park, PA 16802-5901; Tel 814-865-4700/V, 814-863-1150/TTY.
For reference to the full policy (Policy AD42: Statement on Nondiscrimination and Harassment): http://guru.psu.edu/policies/AD42.html