Courses in Fall 2004
CS 331-F24
DESCRIPTION
This course is a continuation of the study of internal and external data structures in algorithms with an on-going emphasis on the application of software engineering principles. Trees, graphics and the basic algorithms for creating, manipulating and using them will be covered. Various types of hash and index random access file structures will be discussed and implemented. B-trees and internal file storing will be introduced. Internal and external data-file organization and algorithm will be compared and analyzed. Students will carry out a number of programming projects which will include the various interface aspect of the software development process. Prerequisites : CS 112
GOALS
- Apply software engineering practices learned in CS 112 to larger, inter-related programs.
- Reinforce skills in the interface aspects of software development.
- Further develop knowledge and implementation experience of various types of binary search trees and their uses.
- Introduce graphics and simple graph algorithms and discuss implementation issues.
- Develop an appreciation of internal vs. external data storage/access time/space consideration.
- Provide an understanding of and programming experience using random access file structures to enable appropriate choice to be made.
PHILOSOPHY
- Software engineering issues will continue to be incorporated in the assignments and the discussion of them, building on students' CS 112 background.
- The applied aspects of data and file structures will be emphasized.
- An empirical/"intuitive appreciation of time/space complexity issues will be emphasized leaving the formal aspects to CS 531.
- Class time will not be spent on advance language issues since it is assumed that the students have sufficient experience from CS 112.
- Students will be aspected to demonstrate use of good software development practices having had sufficient experience (reinforce with adequate evaluation/grading of such) in CS 112.
PRERQUISITES BY TOPIC:
(Assumes CS 112 as 4 hour course including laboratory.)
- Understanding of:
- binary vs. ASCII files, ASCII/integer representation/conversion.
- trees in general and binary search trees
- various implemetations, search, update and traversals
- tape and disk characteristics including
- time/space calculations
- blocking and buffering
- Understanding and actual programming of :
- using binary files and file dumps
- high-level language advanced concepts
including
- include files
- external subroutines, parameter passing, local/global scope, run-time library subroutines
- files of records (i.e., GET/PUT whole record)
- variant record files
- generic parameter passing, file parameter passing
- sequential and random file handling commands
- linked structures (linked lists and stacks or queues)
- internal searching (linear, binary)
- internal sorting (bubble, insertion, selection, merge, quick, heap)
- sequential file handling (batch update algorithm)
- simple random file handling (e.g., a direct address file) : create, print, query, update of direct address relative file
- Understanding and actual experience of (including "white box" grading of) software engineering aspects :
- use good programming practices (e.g., generic code instead of near duplicate routines, good modularizing, main as big-picture, good data/subroutine naming, shallow nesting, visual clarity, efficient code, selecting the best code structure, etc.)
- follow detailed data, interface, program and system specifications
- re-use modules, modify existent programs (student's own work)
- build up a library of commonly used subroutines
- interface with other students' and instructor's work (e.g., re-use/modify others' code, share external modules, query file created by another student, test other student's program)
- critique others' code
- do program maintenance
- design algorithms which mimic a manual process
- use hierarchy chart, IPO diagrams, the basics of macro-level flowchart
- write experimental programs to test how language details work
TOPICS
Trees and Graphs
- binary search trees : perfectly balance, weighted, 2-3 trees (or AVL) search, insert, delete, implemetation issues (3hrs)
- heaps (review) for use in external sorting (1 hr)
- graphs : concepts, terminology and representations
- breadth-first/depth-first traversal, spanning trees, shortest paths (6 hrs)
Problem Solving Paradigms
- brief introduction to Greedy and Divide and Conquer (1 hr)
Hash Files (6 hrs)
- direct address files (review) as a special case of hash files
- hash functions and collision resolution techniques
- create, query, random_read_all, update functions
- hash tables and perfect hashing - time and space considerations
- implementation issues (e.g., tombstone, using blocking, 1 vs 2 files)
- advantages/disadvantages
Indexed Files
- VAX Pascal's (or VAX C's) built-in index file handling facilities (1 hr)
- indexed random (inverted) files (4 hrs)
- internal vs external index structures
- implementation issues, e.g., separate vs integrated index and data files, seperate key-value and data block pointer files
- create, seq_read_all, query, update
- time and space considerations
- advantages and disadvantages
- indexed sequential files (4 hrs)
- comparison with inverted files
- cylinder and surface (ISAM - briefly)
- index and data block (VSAM) algorithms & implementation issues
- create, seq_read_all, query, update
- time and space considerations
- advantages/disadvantages
- B trees and B+ trees for external indexes (3 hrs)
- create, seq_read_all, query, update
- implementation issues
- comparison of B and B+ trees
- simple analysis of block size
- efficiency issues
- multi-key files (multilists) - just and introduction/overview
External Sorting (3 hrs)
- external merge sort (balanced), using internal heap sort, using replacement selection (priority queue)
- efficiency considerations
Software Engineering issues
- integrated with other topics in conjunction with assignments (4 hrs)
Course intro and exams (4 hrs)
Courses in Fall 2005
CS3310 - F25
Mondays and Wednesdays, 4:00-5:15pm
INSTRUCTOR: Ajay Gupta - ajay.gupta@wmich.edu - B239 CEAS office hours: 1:30pm-2:50pm M and W and by appointment. CRN 40955
CATALOG DESCRIPTION: This course is a continuation of the study of internal and external data structures and algorithms with an on-going emphasis on the application of software engineering principles. Trees, graphs and the basic algorithms for creating, manipulating and using them will be covered. Various types of hash and indexed random access file structures will be discussed and implemented. B-trees and external file sorting will be introduced. Internal and external data/file organizations and algorithms will be compared and analyzed. Students will carry out a number of programming projects which will include the various interface aspects of the software development process.
PREREQUISITE Topics & Experiences:
Courses CS1110 (aka CS111) & CS1120 (aka CS112) (or equivalent “CS2” course which builds on a “CS1” course where both use the same programming language)
including the following:
- a thorough understanding of these concepts (including related algorithms):
- data structures: arrays (1 & 2-dim.), linked lists, stacks, queues, (possibly an intro to trees)
- other concepts: sequential access files, recursion, simple order of complexity, internal sorting
- a very good understanding of a programming language (i.e., 2 semester’s using the same language
as in CS1 & CS2) so that you can use it as well as look up, learn about, experiment with and
implement new aspects of that language
- actual programming experience (as an individual programmer, not mainly on group projects) using:
- array, linked list, stack or queue, record structures, sequential access file I/O
- ”extensive” programming experience including:
- the equivalent of 5+ programs in CS1 & 5+ programs in CS2 of a reasonable size and degree
of difficulty for that level
- all (or virtually all) programs done as individual projects, not team projects
- whole programs designed and written on your own, given a pictorial/verbal/written problem
description or pseudo-code algorithm by instructor (not just adding modules to
instructor’s prewritten code)
- tested & debugged programs on your own, making up your own test data
- followed detailed specifications for the program, the user interface and the deliverable
NOTE: If you did NOT have CS1120 (aka cs112) at WMU with a grade "C" or better (undergrad's) or a "B" or better (grads) you MUST SEE ME at the end of class by Wed. August 31, 2005 or you’ll be DROPPED from the class.
COURSE OBJECTIVES: The in-class “lectures” use an active, problem-solving approach, presenting material as a development process rather than a finished product. The course is assignment-driven – where problems reflect real-world applications (simplified). Each class starts with addressing students’ questions about the current project. New concepts are then introduced which will help solve the current problem – as well as considering advantages and disadvantages of alternative approaches and techniques. For maximum and efficient learning, you should begin working on an assignment as soon as it is given out. This allows you to get your questions answered in class - and it provides motivation for learning the new concepts and approaches presented in class (and in the readings). This course is designed to provide the knowledge and experiences to:
- learn new data & file structure concepts and algorithms (see “TOPICS” below)
- learn when and where these concepts would be used in real-word applications and programming contexts
- apply many of these concepts/algorithms by using them in programming projects
- do projects which give further programming practice beyond CS1120-level – i.e.,
- larger, more complex programs
- projects and concepts presented at a more abstract level than pseudo code (e.g., functionality)
- many language and program design structure details and choices left to the student to determine
- individual rather than team projects so everyone works through the full experience
- greater independence for testing/debugging (vs. direct 1-on-1 dependence on one’s TA)
- further develop their general programming skills including design, language specifics, testing,
debugging, format/style
- learn further details about their programming language through formal presentation of new material,
and through individual investigation of program behavior and debugging efforts
- acquire further programming support tools, skills and concepts (i.e., software engineering “in the small”)
- learn further problem-solving skills
- learn to follow written specifications (program, user interface, project deliverable)
- further develop a professional attitude towards the programming enterprise
LEARNING OUTCOMES: Students who earn a “C” or better in this course should be able to:
- knowledgeably converse about the concepts covered in the course (see “TOPICS” below) including
using commonly accepted terminology (e.g., future CS courses, employment interviews)
- select appropriate data & file structures (from those covered in CS1120 & CS3310) for a particular problem
- apply the concepts covered in the course to new problem areas (e.g., in the CS4000/5000-level courses)
- be effective, independent programmers and problem-solvers
- write readable / maintainable programs with fewer bugs
- follow written specifications (e.g., program functionality, user interface, deliverable format)
- teach themselves new aspects of their programming language through the use of human and written
resources and through writing/running experimental test programs
- more quickly and confidently debug programs
“TEXTBOOKS”:
Reqired:
1. “Data Structures and Algorithms in C++” – M. Goodrich, R. Tamassia, and D. Mount, Wiley 2004, ISBN 0-471-20208-8 [at both bookstores]
Recommended:
1. "File Structures in C++" - Folk & Zoellick & Riccardi – Addison Wesley 1998, ISBN 0-201-87401-6
[at both bookstores]
2. “Trees & Graphs” booklet (from Pothering & Naps C++ data structures book) [at Bernhard bookstore]
GREEN cover (or use your CS112 Data Structures book, if it has full chapters on these 2 topics)
3. "C++ File Handling Examples" booklet - bright PINK cover [at Bernhard bookstore]
4. "Intro to Algorithms" - Cormen, Leiserson, Rivest and Stein - McGraw Hill, 2003.
REFERENCE MATERIAL:
- Scott Fleming’s “Instructional Pages” for CS111/112, www.cs.wmich.edu/~sdflemin/instr_pgs/ includes:
- programming in a UNIX environment: Essential UNIX Commands, Makefile’s,
GNU Debuggers (gdb & ddd)
- C++ programming: Compiling Basics, Command-line Arguments, GetOpt Demonstration,
Output Formatting
E-MAILING LIST: At times, I send email regarding asgn specifications, clarifications, demo data, announcements, etc. However, the course website will always contain the latest information. I’ll assume you browse the class website frequently and read your email regularly. I’ll use your WMU (Bronco) email address. (NOTE: According to summer announcements, as GoWMU is activated, you will no longer be able to automatically forward your WMU mail to some outside email account like yahoo/hotmail/…). I will not read any emails not coming from an account at WMU, your email address must end in wmich.edu or it will be deleted.
ASSIGNMENTS: 4-5 programming asgn's (some multi-part) – each worth 10-12.5% (total 50%) of your final grade
- “written” asgn specifications given in class or emailed to you
- keep all GRADED hardcopy programs as proof of your marks until the semester is over
- points lost for NOT following all program SPECS exactly (written, emailed, discussed in class)
- NOT having CORRECT OUTPUT (predict it ahead of time, then compare output with that)
- NOT following all DEMO specs exactly (in the order specified, including highlighting)
- NOT using appropriate program FORMAT (see booklet and generally acceptable style)
- 0 points if no grade-able output – I don’t grade on "time/effort" – so use incremental development
- start working on your asgn’s as soon as they’re talked about in class – that allows you to ask questions in
class and understand in-class material which addresses other people’s questions
ASSIGNMENT DELIVERABLES (= “what to hand in”) [unless specified otherwise in class]:
- a hardcopy (paper) script including the output, the program(s) and the running of the program
- follow the written demo specs given out (or emailed) with each assignment – EXACTLY!!!
- run things in the EXACT ORDER specified
- the 1st version of the asgn turned in will be graded (a later version turned in would count as LATE)
- use MY demo data (NOT yours) for the final demo to hand in (make up your own data for TESTING)
- highlight (yellow or light see-through color) or circle (noticeable) the things specified
(making sure they're still readable)
- write your name & asgn # & class time (M W 4pm) on the top right of the top page
- I may also ask you to submit your asgn files electronically so that they can be run through a program
analyzer to check for program-similarity!
DUE DATES & TIMES for Assignments:
- due date specified at the top of the asgn specs (any extensions discussed in class & emailed later)
- “ON TIME” = hardcopy is TURNED IN at the BEGINNING of class on the due date (else it’s LATE)
- asgn’s turned in LATE lose 30% for each class period after the due date that they’re late (“turned in time”)
- you can NOT turn in the FINAL ASGN late – it must be ON TIME
- 1 GRACE PERIOD will be given for the semester (= 1 asgn turned in 1 class period late with no penalty)
- you can NOT use this on the final asgn of the semester
- this is for reasons such as illness, missed bus, exam in another class, oversleeping, started the
asgn too late, no Unix/Linux machines free on campus, etc.
[- serious illnesses, death in the family, car accidents, etc. will be treated separately on a case-by-case basis]
PROGRAMMING LANGUAGE:
- assume C++ or C. Other languages are fine – but see me if you don’t use C/C++.
- language specifics are generally not covered in class with a few exceptions
- new C++ language concepts are described/demonstrated in the recommended book/booklet
- the final script that you turn in for each asgn must be run on Unix/Linux system so that you can
demonstrate the running of the program(s) [i.e., that THIS program produced THIS output]
- if you use Visual C++ for developing your program, allow sufficient time for porting/changing it
ATTENDANCE: - may be taken at the BEGINNING of each class (if you’re LATE, you’re marked absent)
- excessive absences (> 3?) results in you not earning “assumed learning” points on exams, if any are given
- you may attend the other 3310 session on occasion if you have to miss your regular class
- missed class? Do NOT ask me to repeat/summarize the material covered, nor provide notes/handouts, nor
answer questions, nor explain the asgn, etc.! And you are still responsible for any announced
changes in schedule, asgn, due date, etc.
- you may not be registered for a class on main campus in the hour just preceding or following this class
CLASS CANCELLED ?
- to check if WMU’s closed for weather problems call 387-1001 or see www.wmich.edu (WMU News)
- if I am ill on a Monday/Wednesday and will be canceling classes, I will try to email you by 10:00am that day
FINAL GRADE: 50% based on assignments (10-12.5 % each), 45% exams (20-25% each), 5% popQuizzes
However, regardless of overall points - to PASS THE COURSE, more than half the assignment work must be done (fairly completely/correctly) - and to get an A, BA, or B, all assignments must be done (fairly completely/correctly). Assuming all asgn’s are done, the grading scale is:
A (93-100%), BA (88-92%), B (83-87%), CB (78-82%), C (73-77%),
DC (68-72%), D (60-67%), E (<60%)
- NOTE: since CS3310 is a PREREQUISITE course for CS 4000’s and 5000’s, you must earn at least a “C”
– otherwise you’ll need to repeat the course
- No EXTRA CREDIT work
- No INCOMPLETE’s except under extreme, substantiated, documented circumstances, where you are currently passing and caught up on assignments. An "I" cannot be given as a substitute for an E or for falling behind on the work.
EXAMS: exam1 (20%) & exam2 (25%) - worth 45% of your final grade
a number of pop quizzes - total worth 5% of your grade
- no make-up exams except in EXTREME, documented cases
DATES TO NOTE:
Aug. 29 (Monday) 1st CS3310 class
Sept 2 (Friday) last day to drop with 100% refund for class
Sept 5 (Monday) NO CLASSES at WMU – Labor Day
Sept 7 (Wednesday) last day to drop without academic penalty
Sept 12 50% refund if you drop the class
Sept 19 25% refund if you drop the class
??? asgn 1/2/3/4 due dates – to be announced as assignments are given out
Oct 12 (Wednesday) EXAM1 (tentative)
Nov 23 (Wednesday) NO CLASSES at WMU after 12:00 – Thanksgiving Recess
Nov 30 (Wednesday) TENTATIVE: Senior Engineering Design Project Conference No Lectures till 4:00
Dec 02 (Friday) last asgn due by 2:00pm – it will NOT BE ACCEPTED LATE
Dec 05 (Monday) EXAM2 at 5:00-7:00pm in regular classroom
ACADEMIC DISHONESTY:
“You are responsible for making yourself aware of and understanding the policies and procedures in the Undergraduate (pp. 274-276) [Graduate (pp.25-27)] Catalog that pertain to Academic Honesty. These policies include cheating, fabrication, falsification and forgery, multiple submission, plagiarism, complicity and computer misuse. If there is reason to believe you have been involved in academic dishonesty, you will be referred to the Office of Student Conduct. You will be given the opportunity to review the charge(s). If you believe you are not responsible, you will have the opportunity for a hearing. You should consult with me if you are uncertain about an issue of academic honesty prior to the submission of an assignment or test.” [WMU Policy]
My policy on any incident of suspected academic dishonesty is to follow the University policy and immediately report the incident to the Office of Student Judicial Affairs. If a finding of academic dishonesty is determined, I will then (at most, depending on the particular situation): 1) give that assignment/exam 0 points, and 2) lower the final overall grade by a full letter grade or down to a DC, whichever lower. [Note: you need a C or better in CS3310 to take CS4000 or 5000-level classes]. This includes both the “copier” and the person allowing someone to copy their work.
It is considered "dishonest":
- to hand in something which is not COMPLETELY your own work (unless I specify it as a team project),
- or to help another student to do so;
- to copy all or part of someone else's work (e.g., someone currently in CS3310, someone who did CS331 in
the past, someone who’s never taken CS331 here, downloading a solution from internet, etc.)
- or to let someone copy all/part of your work;
- to have someone else write all or part of your work (whether they're in the class or not)
- or to write all/part of a program for someone else;
- to WORK TOGETHER with someone on all or part of an assignment or project;
- to discuss the design of your program with another student to such a degree so that the 2
programs look “substantially alike”;
- to use someone else's work "JUST FOR REFERENCE"
- or to give them your work “just for reference”;
- to discuss (verbally, in writing, via email, via phone, etc.) the exam material with students in the other
section of the class until after they’ve taken the exam.
[Please read the catalog for further explanation. If you are in any doubt about some practice, see me AHEAD of time]. If there are similar assignments turned in, BOTH students will be penalized (regardless of who actually wrote it). Do not leave old versions of your assignments in university waste baskets; do not print out your work unless you collect it immediately.
It is NOT considered “dishonest”:
to help (or be helped by) another student with a bug in their or your work;
to discuss OVERALL strategy or structures for solving a problem;
to demonstrate or ask how a particular concept or language statement works;
to give or get clarification about assignment specifications;
to verify whether your output agrees with someone else's (after you have both completed the work independently).
But when pen gets put to paper, (or fingers to keyboard) you MUST DO THIS ALL ON YOUR OWN !.
TOPICS: (not in this order)
Core-concept-wise, about 40% of the course focuses on data and file structures, 60% focuses on algorithmics and related issues. However, software engineering issues are integrated with the main topics & assignments throughout the course. Specific content topics include:
- data structures - terminology, representation, implementation, algorithms for:
- trees, especially binary search trees (traversal/search/update, height-balanced) & heaps
- graphs: traversals, spanning trees, shortest path
- hash tables
- file structure/storage/processing fundamentals - terminology, concepts, implementation issues including:
file organization, sequential vs. random access, ASCII vs. binary files, file dumps, files of records, physical vs. logical issues, C++ language-specifics for these issues
- basic file organization & choices options:
- serial, key-sequential files, direct address, hash, indexed (Btree & B+tree)
- create, query, print, insert, delete, dump algorithms
- issues related to implementation, advantages/disadvantages, practical use, alternatives, time & space
- software engineering issues at the “programming” level
- program design, algorithm design
- program implementation, incremental development, investigating language details
- program interface with the “the outside” including data files, users, other programs, designer’s
specs, external modules, other programmers, client
- test plans, good test data
- program format/style/readability, maintenance
- following specifications for the program, user interface, data files, project deliverable
Tentative SCHEDULE
Week 1 syllabus, intro to course, pre-req check, email list, “life-line”
records, files, direct access files, create/query/print/update, asgn 1
Week 2 asgn 1 continued, C++ read/write/seekp/seekg, record structures, “holes”, null-term.
Week 3 binary vs. ASCII files, dumping files, complexity analysis - frequency counting, recurrence relations
Week 4 static hash files, asgn 2
Week 5 asgn 2 continued, disks, blocking, test plan
Week 6 trees, binary search trees
Week 7 asgn 3, tree traversals, AVL trees, EXAM1
Week 8 more trees and more asgn 3
Week 9 heaps, external sorting
Week 10 indexed files, asgn 4
Week 11 B-trees, B+ trees
Week 12 graphs, asgn 5
Week 13 graphs
Week 14 graphs, more external sorting
Exam Week EXAM2
CEAS “CAE Computer Lab” (Parkview): in C section of building [see signs for lab open hours]
- main lab has 13 Sun Workstations with Unix, 2+8 Linux machines, and lots of PC’s
- 4 additional PC labs in back – to be opened as overflow of the main lab
- CS1110/1120 (Linux) lab in room C0224 [Open policy during free time when not used for CS1110/1120]
- you don’t have to do your asgn’s at the Parkview Lab – there are Unix/Linux machines in various labs
on main campus – or you may use your own system
LAPTOPS in Class:
- you may use a laptop in class FOR NOTE-TAKING ONLY
- other uses (games, email, surfing, instant messaging, homework, etc.) are UNACCEPTABLE
- those using laptops must sit in the front rows
Student Computer Support: C208 (Parkview) (check signs for open days/times)
- any CS student (including non-CS students taking regular CS major classes – e.g., CPE) can have Linux and/or a new Microsoft OS (and other MS software like Visual Studio, except not MS Office) installed on their personal computers at no charge by the staff here
NEW TO WMU?
- get a CS account - for the CS Unix network (where many CS3310 students do their asgn’s)
- passed out later this week (after that, see JohnH/Ed/Joe (sysadmin@cs.wmich.edu )
- get a general WMU “Bronco account” for the University’s computer network for free access to:
- the internet, including various on-line University services (grades, registration, . . .)
- the Unix and Linux systems (where many CS 331 students do their asgn’s)
in the CAE Lab (Parkview) and on main campus (UCC, Bernhard, etc.)
- the PC network in the CAE Lab & on main campus
- the CS Unix network
NOTE: you may also access the CS Unix system or the WMU Unix/Linux systems from home or
elsewhere on campus – see CAE Lab for instructions and help
- I will try to arrange a workshop in the CS1110/1120 Linux lab for you next week
- see Scott Fleming’s web sites for CS1110/1120 instructional pages [see “References” section above]
- utilize your “CS3310 buddy”
QUESTIONS: Ask questions in class since generally other students have the same questions as you do. I can then address them ONCE, with the help of the board and with your follow-up questions. For non-general questions see me during office hours. Email is for setting up appointments if you can’t come in my office hours and for non-pressing issues – I may not respond for several days if I see no urgency. Email is also only for VERY BRIEF questions needing VERY BRIEF answers – long issues are best handled in person. Do not send me your code for debugging – that’s YOUR job. When you come to see me, bring your laptop and code with you so that I can answer your question(s) more precisely rather than in general terms.
PROGRAM DEBUGGING: I do not debug student programs at this level of CS. That is one of the prerequisite skills you should already have and will be further developing in this course through practice. Follow good software engineering practices which put fewer bugs in your program and makes them easier to detect and fix. For example,
Incrementally develop and implement the program in a modular fashion so you can more easily pinpoint which code caused the problem. The program should’ve been working correctly (100% sure) before a NEW module is added.
Do a thorough mental & paper/pencil “walk-through” of your algorithms and pseudo-code at the design stage, and then again on each new module’s code.
In debugging, it often helps to leave the computer and just sit down with a printout of your program to do a thorough “desk-check” of a “walk-through” of the code. And don’t make leaps and assumptions!
Try to narrow down where the bug might be – where in the code is the last place where you’re sure that things were working correctly – and are you 100% sure or have you just assumed things were ok at that point?
Consider whether it’s the program itself or the data which caused the problem. This is particularly true when dealing with I/O files. E.g., did you run your update program earlier, which changed the contents of the file itself?
Use the commonly acceptable format/style/practices (see booklet) to reduce the potential for bugs: e.g., good naming, aligning/indenting, “if/else/…” for mutually exclusive conditions, “while” for event-controlled looping and “for” for count-controlled looping.
Check for common types of errors: e.g., “edge cases” (first/last record/node in a file or array or data structure), starting at 0 or 1 (when the opposite should be used), wrong read/loop structure (use priming “read”, with process/”read” loop), mix up of “and” and “or” in a condition.