Unit 9 Projects: Working with Files and Real-World Data
Project 1: Student Test Scores
Description:
Write a program to read a file of students with their test scores and a boolean indicating if they passed. For each student, display their name, scores, and pass/fail status. Also, compute the average score of all students.
Starter File (students.txt):
Alice 90 95 true
Bob 80 85 false
Charlie 100 100 true
Example Output:
Alice: 90, 95, Passed? true
Bob: 80, 85, Passed? false
Charlie: 100, 100, Passed? true
Average Score: 91.67
Optional Dataset Questions:
What if a student's scores are missing? How will your program handle it?
Could this dataset have bias if it only contains high-performing students?
Project 2: Movie Ratings
Description:
Read a CSV file of movies (title, rating, year). Display only movies with a rating higher than 8.0.
Starter File (movies.csv):
The Godfather,9.2,1972
Inception,8.8,2010
Titanic,7.8,1997
Example Output:
The Godfather (1972) - Rating: 9.2
Inception (2010) - Rating: 8.8
Optional Dataset Questions:
How might missing ratings affect your results?
Could there be bias in which movies were included?
Project 3: Word Count in a Text
Description:
Read a file containing a paragraph. Count the number of words, find the average word length, and identify the longest word.
Starter File (paragraph.txt):
The quick brown fox jumped over the lazy dog.
Example Output:
Number of Words: 9
Average Word Length: 3.9
Longest Word: jumped
Optional Dataset Questions:
How would punctuation affect word counting?
Could this dataset be biased if it only contains short sentences?
Project 4: Letters in Words
Description:
Read a file containing a sentence. Ask the user for a letter and display all words containing that letter.
Starter File (sentence.txt):
Where are they now?
Example Input:
Letter: e
Example Output:
Where
are
they
Optional Dataset Questions:
What if the sentence has mixed case letters? How will you handle that?
Could the dataset contain typos that affect results?
Project 5: Real-World Data Exploration
Description:
Pick a dataset from a real-world source (CSV or TXT). Write a program to analyze it for a specific question. For example, you could analyze:
Average temperatures from a weather dataset
Top-rated books or movies
Student grades from a school dataset
Starter File (data.csv):
Students choose a small real dataset.
Example Output:
Varies based on dataset and question.
Optional Dataset Questions:
Is the dataset complete and accurate?
Is there any bias in the dataset?
Could the dataset answer your question appropriately?
Project 6: Advanced Scores Analysis
Description:
Read students.txt containing names, 3 test scores, and a boolean for pass/fail. Compute:
Each student's average
Class average
Students who scored below average
Starter File (students.txt):
Alice 90 95 88 true
Bob 80 85 78 false
Charlie 100 100 100 true
Example Output:
Alice Average: 91
Bob Average: 81
Charlie Average: 100
Class Average: 90.67
Students Below Average: Bob
Optional Dataset Questions:
How does missing or incorrect data affect averages?
Could this dataset contain bias if some groups are underrepresented?
Project 7: Sports Stats Analyzer
Description:
Read a file of players with points scored in multiple games. Compute total points, average per game, and display players with above-average performance.
Starter File (players.txt):
LeBron 25 30 28
Durant 20 22 27
Curry 30 32 29
Example Output:
LeBron Total: 83, Average: 27.7
Durant Total: 69, Average: 23.0
Curry Total: 91, Average: 30.3
Above Average Players: LeBron, Curry
Optional Dataset Questions:
How would you handle missing scores?
Could the dataset be biased toward certain types of games or players?
Project 8: Temperature Trends
Description:
Read a file of daily temperatures (date, high, low). Compute the highest and lowest temperatures and the average high/low for the month.
Starter File (temps.txt):
2025-09-01 85 70
2025-09-02 88 72
2025-09-03 90 75
Example Output:
Highest Temperature: 90
Lowest Temperature: 70
Average High: 87.7
Average Low: 72.3
Optional Dataset Questions:
How might missing days affect averages?
Could the dataset contain bias if it only includes one city or region?
Project 9: Book Ratings Filter
Description:
Read a file of books (title, author, rating, genre). Ask the user for a minimum rating and display all books meeting that rating.
Starter File (books.txt):
The Hobbit, Tolkien, 9.0, Fantasy
1984, Orwell, 8.5, Dystopia
Twilight, Meyer, 5.5, Romance
Example Input:
Minimum Rating: 8.0
Example Output:
The Hobbit by Tolkien - Rating: 9.0
1984 by Orwell - Rating: 8.5
Optional Dataset Questions:
How would missing ratings or genres affect filtering?
Could the dataset be biased if it only contains popular books?
Project 10: Sales Data Analysis
Description:
Read a file of sales (employee, amount sold). Compute total sales per employee, overall total, and highest-selling employee.
Starter File (sales.txt):
Alice 200
Bob 150
Charlie 300
Example Output:
Alice Total: 200
Bob Total: 150
Charlie Total: 300
Overall Total: 650
Top Seller: Charlie
Optional Dataset Questions:
How would missing sales entries affect totals?
Could the dataset be biased toward certain employees or products?
Project 11: Movie Genre Counts
Description:
Read a file of movies (title, genre). Count the number of movies in each genre and display the results.
Starter File (movies_genre.txt):
Titanic, Romance
Inception, Sci-Fi
The Godfather, Crime
La La Land, Romance
Example Output:
Romance: 2
Sci-Fi: 1
Crime: 1
Optional Dataset Questions:
Could missing or miscategorized genres affect counts?
Is the dataset appropriate for analyzing trends across genres?
Project 12: Word Frequency Counter
Description:
Read a paragraph from a file. Count how many times each word appears and display the top 3 most frequent words.
Starter File (text.txt):
The quick brown fox jumps over the lazy dog. The dog was not amused.
Example Output:
the: 3
dog: 2
quick: 1
Optional Dataset Questions:
How would punctuation or capitalization affect counting?
Could this dataset contain bias if it is too short or unrepresentative?