Prog2: OutOfSorts

9/15 Link to quicksort added, shown in blue.

9/18 Made results item #1 consider upper/lower case, and results item #2 then ignore it, which is switched from what it was originally.  

Results must be shown in the order specified.  Link to useful string functions is provided. These changes are shown in red.

9/21 Updated description of what to do if there are no duplicate passwords in a file.  Substrings to be considered must be of length 4 or greater.  Added specification for filename for extra credit. Sort doesn't need to be quicksort, but must be faster than bubble sort.  These additions shown in green.

Description

Given a file of passwords people have used, find attributes of the most common passwords.  Write a C program to do the sorting and matching.  The results of your resulting analysis should be included as header documentation within your program, clearly labelled for each of the categories described below.  The analysis must be the result of your own program's results.

Use this 51KB file of ~12K passwords that were once-upon-a-time used on a dating site before that site got hacked. (Note that this is  ".bz2" compressed file, which can be opened on a Mac, but may not work by default on a PC unless you have an unzip program that can handle it.  Use this local .zip version if this is an issue for you.)

This program is both very easy and hard.  Easy because each component I'm asking for is not that difficult, but hard because dealing with large real-world data can be a pain.

What you Need to Know

Reading from a file; Using arrays; Sorting; Quicksort or some other sort that is better than BubbleSort; string functions

Notes

The first step is figuring out how to read from a file into an array.  I suggest you implement your ideas with a very small data file (e.g. only 20 entries) which you can visually look at and verify that your results are correct.

Grading

Your results must be shown in the order given here.  Your program should indicate which of these are completed and which have not been done.  Your datafile must be named "words.txt"

Turning it In

Turn this in on Blackboard to the Program 2 Assignment.  This time turn in only your source code as a .cpp file, where the name of the file is your UIC netid.  Once again you need to zip this file before you submit it.  For instance if your netid is reed then you would name your file reed.cpp, zip it so it becomes reed.zip, and submit reed.zip.   Do NOT turn in your data file.  

Extra Credit

Do your analysis instead on this 60MB file, which has probably ~7 million passwords.  

You should first do your analysis for the original non-extra-credit part of the assignment shown above, and then in addition do the analysis and display results for the extra credit portion.  Call the datafile extra.txt