There are a number of factors that will effect performace of SharpNEAT. The folowing set-up will give you the best possible performance (with some caveats)
Core i7 CPU with Hyperthreading(HT) enabled. I run a four core i7 with HT enabled to give eight logical cores. Enabling HT will yield something like a 50% speed-up. Core i7 will also provide something like a 25% speed-up per core over a Core2 which in turn provides a similar speedup over P4 and Athlon era CPUs (per core).
The .Net4 CLR and framework currently provides the fastest platform for CPU intensive dotNet applications. One possible exception to this is the LLVM support in Mono which I have not tested with yet. Other factors I know of are the different garbage collector algorithms in Mono and MS .Net, MS use a compacting garbage collector which is preferable for long running CPU bound applications. In tests the ParallelKMeansSpeciation strategy ran 30% faster running on dotNet4 than on dotNet2, it is not known whether this is due to improvements to the Parallel Extensions (dotNet 2 relies on Reactive Extensions to provide the Parallel Extensions framework classes), or improvements in dotNet4's implementation of sync locking. Possibly both of these factors are helping along with other improvments in the dotNet4 CLR and framework classes. Performance differences may also be observed for different operating systems, e.g. Windows7 has an improved thread scheduling algorithm which may be beneficial, however, in tests I saw no improvement upgrading from Vista to Windows7.
MS dotNet provides two distinct garbage collectors, a workstation and a server GC. The workstation GC is tuned to reduce latency thus improving responsiveness of GUIs, when the workstation GC starts all threads are stopped and so a little GC gives a broad indication of the workstation GC algorith. The server GC is less concerned about latency and more about overall throughput, thus it may avoid a collection at the expense of using some more available memory; doing collections less often generally reduces the overall amount of GC that is required (less constant re-traversal of the object reference graph/pointers). In some scenarios the server GC will be enabled by default, e.g. ASP.NET application running on IIS on a Windows Server OS. In addition it can be switched on with the following entry in the application config xml file (same name as the exe but with a .config extension instead of .exe):
Note that .Net4 has an improved workstation GC algorithm (see CLR 4.0: Garbage Collection Changes), but the server GC remains unchanged (or mostly unchanged). It seems the server GC actually runs on a single thread and suspends all other threads while it operates, not having to handle a changing object graph and thread synchronisation are the primary reasons for it's higher throughput at the cost of some additional latency.
I hope to provide a simple to use benchmark routine that will allow the performance of different set-ups to be compared, as well as a number of baseline performance numbers to allow you to determine if you are getting the best possible speed for your hardware/software.