Prog 3: Best Wordle Word

Wordle is a fun online word-guessing game. If you are not familiar with it, go play Wordle online. Think about word choices carefully, because you can only play it once a day!

(You may work with a partner on this program. It can be a CS 211 student from this semester in any of the class sections. To work with a partner you must register at least one week ahead of time, as described here in the course syllabus.)

After playing a few times, players start to strategize and think about best starting words, such as words with lots of vowels, or words with the most common letters, or a combination of those. For this program our goal is to find the best starting word(s), and for each of those also the best second word(s).

By "best", we mean the valid dictionary word found in either the guesses or answers files that has the most letter matches when compared to all the answers words. Several program runs using the Tiny file versions are shown below, where the single input is the menu option.

Your program will start by reading in a guesses file and a answers file. When playing Wordle, the word you are trying to guess is one of the (more common) words in the answers file. Your guesses are limited to valid words found in either the answers or (less common word) guesses files. There are Large and Tiny versions of these files shown at the bottom of this page. We recommend that you develop your program with the smaller versions (guessesTiny.txt with 6 words, and answersTiny.txt with 5 words), debugging and working your way up to answersLarge.txt with 2309 words, and guessesLarge.txt with 10638 words.

The number of words in your answers and guesses files must be computed by your program! Hard-coding them will result in a 15 point deduction, plus you may fail hidden test cases that use answers and guesses files that are hidden to you. Guesses files do not include the words in the answers files, so valid words which may be played on a move are words that are found in either of the answers or guesses files.

Sample starting code provided in Replit has code for the menu, and sample code to read in from a file, shown below:

Steps

This program can get complex and confusing if you are not organized. I highly recommend you take it one step at a time, and that you organize your thoughts in your code using comments before you write the code itself. I also highly recommend that you thoroughly test and display intermediate results at each stage before going on.

  1. Open the files in turn and read the words in them one at a time until you reach the end of the file. Increment a counter as you do this so you know how many words there are in each file.

  2. Create a struct to store the score and word together, since you will later need to sort them and need to keep the scores associated with the words they belong to.

  3. Use the file sizes to malloc space for all the words in each file. Don't forget to leave space for the '\0' character that will automatically be included at the end of each word if you plan on using string functions, which I recommend. You should allocate space for an array that contains both the answers and guesses together, since that is the set of all the words that can be used for guessing.

  4. Re-read the files, this time storing the words into the space you allocated in the previous step. You can't do this all at once the previous first time through because you don't know what the total size of each file is ahead of time.

  5. Step through each word in the array of all the words, computing its score as you compare against all answer words, accumulating points when letters match. When a letter matches and is also in the correct position add 3 points. When a letter matches but is not in the correct position add 1 point.

  6. Sort the results in descending order by score, so highest scoring words are first. It is possible that multiple words will be tied with the same top score. For words that have the same score also sort them in ascending alphabetical order.

  7. Now you are ready to find the highest scoring second words. For each high-scoring first word you will again step through each word in the array of all the words, computing its score as you compare against all answer words, accumulating points when letters match. This time, however, you must ignore any letters that already were accounted for in the high-scoring first word. There are different ways to do this.
    One way to do this is to make a copy of the answer words and go through and
    blank out letters that would have already been covered by the first guess word. After this has been done you can do the same scoring that you did before, this time using this modified copy of all the answer words.
    Be careful in the letter "
    blanking out" process. It is not as simple as taking each letter in the highest-scoring word and removing all occurrences of that letter, since in a game only the first occurrence of that letter would be marked as matching. In other words if the best first word were clapt and you were eliminating letters in the answer word adapt, then the a in clapt would blank out the middle letter in adapt making it ad pt, and the first a in adapt would be left alone.

Suggestions

Start with the small answersTiny.txt and guessesTiny.txt files to test and debug your program, before graduating to the official larger files. Those files contain:

In answersTiny.txt:

abuts
adapt
cleft
leant
trait

In guessesTiny.txt:

adept
clear
clapt
darns
sours
tried

Do extensive debugging. As examples, below are debugging printouts used in developing this program, using answersTiny.txt and guessesTiny.txt. Your debugging output may be different, and your final output should not include this debugging information, but it is very helpful to have along the way as you are working on your program.

The first debugging output below shows the words and scores, already in sorted order, which can be used to find the top first word(s):

The next debugging output below was used to validate the letter-blanking-out process, as part of finding the best second words for each top first word:

(added 10/31) If there are multiple tied top-scoring second words, the words and score for each should all be displayed on one line using this format specifier for each one: printf(" %s %d", ...

Along the way in the development process you may find it useful to create hidden menu options that you can select to set values without you having to type them in each time, such as a hidden menu option 5 to set the filenames to answersLarge.txt and guessesLarge.txt.

Consider using the built-in qsort function to do the sorting for you. See the starter code below for an example.

Files used in this program are shown below and are also available as part of the Replit starter code project.