I took Paul Franzon's ECE 520 ASIC Design class in 2007 and designed a Sudoku ASIC capable of solving 300,000 puzzles per second. Before designing the chip in Verilog, my classmate and I carefully considered different approaches to this problem, since our grade depended on both speed and chip area (and getting the project done on time). I wrote algorithms in C to study their performance and ensure they could solve the sample puzzles. Since the ASIC didn't have onboard RAM, and external RAM had a high access/area cost, we needed an algorithm that minimized use of external RAM. Our two final candidate algorithms were a stack-based algorithm and a parallel row/column/box scanning algorithm. The stack algorithm was elegant and required very little chip area, but suffered a large external RAM access penalty. We finally settled on a parallel scanning algorithm. The final chip design was quite elegant, with three scanning engines and a scoreboard to keep track of progress toward a solution. We finished the project ahead of schedule and received a top score for our solution.
I think the main lessons I learned from this exercise were (a) keep it simple and (b) do a lot of design/research up-front. We really concentrated on the simplest possible algorithms we could find, since it was no use having a highly-optimized algorithm that we could not possibly implement within the allotted schedule. We found that that time spent talking about the different algorithms, doing napkin sketches, and simulating in C was well-worth the effort in the long run. It seemed like we were burning a lot of time up-front, but once we started coding Verilog, our work went very smoothly, and we spent very little time tweaking our simulation once we got it running.
Final report: http://sites.google.com/site/edwardcraighyatt/projects/sudoku-asic/sudoku.pdf?attredirects=0
Sudoku project definition: http://sites.google.com/site/edwardcraighyatt/projects/sudoku-asic/Project07.pdf