In this project you will implement the logic for a single-player word game, and then write a four simple AIs that solve the game by determining a good (or the best) play. You are allowed to use list comprehensions, but some problems will require recursion. Please do not use higher-order functions yet. Scavenge is a simple game which proceeds as follows:
The player is dealt a hand of n letters.
The player makes a word using some of the letters. They are removed from the hand.
That word is scored as in Scrabble.
The player then repeats, picking words until either all letters are used or they give up.
Unused letters are not scored.
To solve the game, we will consider four strategies. The first is greedy algorithm that is not guaranteed to find the highest-scoring list of words, but should do well in practice. The second and third are brute-force algorithms: you will compute the powerset of all words, and choose the highest-scoring set. This is still not guaranteed to find the best list, as it does not consider repeated words. The last is a game-tree algorithm, which considers all words as the greedy algorithm does, but instead chooses the word that leads to best overall play rather than the best first move.
Wednesday 9/18 - Project Released.
Tuesday 9/24 - Milestone deadline.
Monday 9/30 (extended to Tuesday)- Project submission deadline.
Wednesday 10/2 (extended to Thursay)- Late submission deadline.
Be sure to push your changes to the repository for each deadline. Late submissions are automatically accepted, but will be assessed a 5% (⅓rd letter grade) penalty.
Scavenge.hs This is the file you will edit. All type aliases are included. The functions are listed
in roughly the suggested implementation order. Problem details are given in the file.
TestCases.hs This file contains useful test cases, including many used in the automated grading tests.
Main.hs This file contains the I/O actions that turn your code into an executable. You do not
need to understand anything in this file.
Coloring.hs Contains ANSI code. Very boring, and you do not need to understand.
Dictionary.hs Contains the small dictionaries we use to determine valid words.
words.txt Contains the large (86k words) dictionary. Trinity ID (i.e. sfogarty).
README.md This file contains a brief project description, as well as spots for your name and ID.
Makefile The makefile supports make and make clean. The executable is output to scavenge.
Be certain make runs before you submit!
The executable scavenge can be built using make. The executable and compiled source files can be removed using make clean. By default scavenge will play the game in interactive mode, but it supports the following flags (see -h).
-h or --help Prints a help message and exits.
--test Runs unit tests on your code.
You must complete all milestone tests before the remaining tests will run.
--handSize=n Set hand size to n. Default 10.
-q or --quiet Only print error messages on tests, or minimal output when solving.
-d[n] or --dict[=n] Uses smaller dictionaries for faster execution.
Values range from 0 (default, 86k words) to 4 (10 words).
-s str or --solve=str Solve the game, instead of playing interactively. str must be greedy, nBrute, sBrute, or best.
For the milestone, you must implement the basic mechanics of Scavenge. You are given the following type aliases to differentiate all the strings you will be handling.
type Hand = [Char]
type Dictionary = [String]
type Move = String
type Play = [Move]
We break the mechanics of the game into the following areas:
Scoring a move (a single string) or a play (a list of moves).
Making a move. This involves updating a hand by removing all the letters of a move from the hand.
Note a letter may occur multiple times in a hand. You should only remove it once for each time that letter occurs in the move.
Determining if a move/play if valid. For a move of a string to be valid, the string must be a word (in a dictionary), and the hand must be able to make the string. Note that a letter may occur multiple times in a string. For the play to be valid, that letter must occur at least that many times in the hand.
Determining all the independently valid moves for a given hand and dictionary. Since the moves are independent, using letters from the hand to make one move does not exclude these letters from being used in a different way for a different move.
For the full project you must write four solvers for scavenge. Each solver takes a dictionary and a hand, and return a play that does well. Only the last solver is guaranteed to return the best possible play.
Greedy algorithm: The greedy algorithm finds a list of all valid moves, and then greedily selects the single best move. After doing so, the hand is updated and the process repeats. Note that the highest-scoring single move might preclude better combinations of words. For example, using the 4k word dictionary and the hand "CLOSEFLOOR", the best possible play is "CLOSE" (7 points) and "FLOOR" (8 points) with a total of 15 points. However, "FORCE" is worth 10 points, and so the greedy algorithm will select that next. After "FORCE" the only word that can be made with the remaining hand LSLOO is "SO", netting a total of 12 points.
Brute-Force Algorithms: The brute force algorithms are based on the powerset function. The powerset of a set contains every possible subset of that set. For instance, the powerset of {7,1,3} is { {}, {7}, {1}, {3}, {7,1}, {7,3}, {1,3}, {7,1,3} }. The powerset of a set of n elements will have 2n elements. This is the more complicated component of the brute-force algorithms: do not underestimate this problem.
The naive brute-force algorithm computes a list of (almost) every possible play, by taking the powerset of the dictionary. Do not try this with anything but the 10-word dictionary. Each play is then scored, and the maximum play is chosen. Since the powerset of the dictionary will contain every possible combination of words, we must score and find the maximum play.
The smart brute-force algorithm avoids taking the powerset of the entire dictionary. Rather, the smart brute-force algorithm first prunes the dictionary to only the valid moves, and then computes the powerset of the valid moves. This should allow you to solve small hands with reasonable efficiency, even using a very large dictionary. For larger hands, however, the algorithm will be too slow.
Note that the brute-force algorithms fail when the best play contains a word more than once: since sets can't contain repetitions, the brute force approach misses out on this possibility. There are ways to fix this, but you do not need to consider them.
Recursive Game-Tree algorithm: The recursive game-tree algorithm resembles the greedy algorithm, but chooses the best move more carefully. Instead of just considering the highest-scoring individual move , it computes the best play available for the remaining hand. It then chooses the move with the maximum total score of the move and the resulting best play. This is conceptually the most complicated algorithm, and I highly recommend helper functions.
It is easy to blow-up your complexity by implementing inefficient primitive functions. For full credit, the following functions must have the listed complexities:
canMake - Linear or n log(n), instead of quadratic.
validMoves - Linear in the size of the dictionary, assuming a fixed-size hand.
greedyPlay - Low-order polynomial, instead of exponential.
powerset - Exponential, instead of polynomial
Even the game-tree algorithm will be very slow. This is because the branching factor is very large. There are a number of ways to improve the game-tree algorithms by avoiding repetitive work.
Pruning the dictionary: The algorithm goes through the dictionary repetitively. When considering the original hand, it goes through the dictionary to find the valid moves. Then, for each move the algorithm compute the remaining hand and finds the best play for that hand. The first step in finding the best play is, in turn, to compute the valid moves for the remaining hand. However, the remaining hand can't possibly have more valid moves than the original. Thus, the algorithm can safely replace the dictionary with the list of valid moves at each step.
Avoiding permutations of plays: Each play is a list of moves. However, some plays are permutations: they contain the same words in different orders. This means that if the best play is "CLOSE","OF", the game-tree algorithm will consider both "CLOSE","OF" and "OF","CLOSE". This is unnecessary. To avoid this, consider only sorted plays. Different sorting orders might result in different speedups. Note that removing unsorted plays after computing them is probably too late: the algorithm must only generate sorted plays. The best way to do this is left as an exercise for the reader.
Other improvements are possible, but not likely to help much. For instance, modifying the dictionary. Some words are permutations of each-other: they use the same letters in the same combinations, just in different order. These could be removed. On another track, the dictionary can be stored more efficiently: for instance by grouping all the words beginning with a letter, similar to how buildCorpus worked in Project 1. If you change the type of Dictionary, be sure to update buildDict.
The following test cases should help you evaluate your program.
Your greedy algorithm should fail to properly solve
./scavenge -d2 -s greedy CLOSEFLOOR
Giving you "FORCE SO" instead of "CLOSE FLOOR" (or another 15-point solution)
However, your smart brute force should be able to solve that case pretty quickly:
./scavenge -d2 -s sBrute CLOSEFLOOR
As you increase the size of the input, you can check the efficacy of your optimizations. The following chart displays the running times for my implementation in seconds. Your times may vary considerably, but what you can solve shouldn't change very much. These use -d2, for example:
./scavenge -d2 -s best CLOSEFLOOR
As you can see, each step of "avoiding unnecessary work' dramatically improves the scalability of the algorithm. None the less, the algorithm remains rather exponential.
It's also interesting to note that the smart brute-force algorithm managed to scale rather well, since it doesn't consider any repetitions of the words. Thus it doesn't really generate any more plays for the longer strings: there aren't that many words that can be made with CLOSEFLOORCLOSEFLOORCLOSE that requires that third C. However, it cannot exactly solve all inputs.
Scavenge Mechanics 33
Scavenge Solvers 42
Algorithm Complexities 10
Avoiding Unnecessary Work 12
Implementation Showdown 3