Dear Chess Friends,

I am pleased to announce that,I organized another new GUI/Engine testings!
First of all, let Me start please withAny engine's rating depends on our used conditions:- Hardware speed- Opening Book/Suite- MP or 1 Core- Ponder ON/OFF- Time Control- Hyper-Threading ON/OFF- Hash-Table Size- GUI- Tester/TD And etc.
This time,Only two Top engines are tested under Arena 3.5.1 and Cutechess 1.2.0The target is to see the influence of running simultaneously matches! One of the main differences is that (between both GUI testings),Several GUIs were running (during Arena's simultaneously matches)E.g all of the engine games are played by separate folder of enginesI mean, for each Arena GUI: I installed a separate copy of enginesWhere via Cutechess: only one GUI was running....And all the games are played by same copy/folder of engines...As you may know, Cutechess GUI has a simultaneously option:Concurrent games' (playing automatically matches in same time)So it's not required to run several GUIs (in same time)Note: Concurrent's option can be used for machines with many CPUsFor example, my 2x 2686 has 36 real cores (Hyper-Threading OFF):So I can run max. 36 simultaneously matches= 1 Core / Ponder OFFIn case of 1 Core but with Ponder ON: max. 18 simultaneously matchesBtw, Cutechess crashed with Concurrent games: 60 (1 Thread, HT ON)
Now one of the main questions is that (matches on same hardware):- What is the right and best way of testing the chess engines?And according to my experience,There is no any perfect GUI or Engine or Opening Book etc...Sure, this is also true that many testers are doing their bestWhere they produce a lot of useful data...BIG thanks to all...Anyhow, I see such testings, where no much info about the conditionsJust a few examples: Hyper-Threading or Pondering is enabled or not etc.Even some don't care to mention the hardware or the used opening bookBtw, note also that the openings issue is really very important factor!E.g via weak, critical openings:I suggest thousands games per player! but using many various openings,Otherwise the Elo error margin can be too high...just saying...!Where via strong openings,500-1000 games (per player) is quite reasonable and enough data! Another missing important issue is that (regarding no much info): Running simultaneously matches in same time (e.g under Cutechess): I assume that the simultaneously games are played by same engine exe I mean for each game is not installed a separate engine copy... ?! It seems Cutechess GUI runs automatically the same engine copies... On other hand, For example, usually I test the engines (I mean for previous tours): MP, mainly under Arena, Fritz GUIs, Ponder ON, Hyper-Threading OFF Plus, my all engine games are played by separate (other) engine copy... E.g for each Arena or Fritz GUI, I install separate folder of engines So I believe in that (for best/max. performance): not so bad method Sure, anyone is free to test at any conditions, it's ok (for my side)!) But in reality...do we know and sure exactly what we are testing...?! Who knows...maybe there is a something wrong in our testings ?!
Continuing... Meanwhile, as far as I am aware,Many run games: Ponder OFF (simultaneously, exactly by same exe)Probably due to save more free time and to produce more games...About this issue I have to say: Keep no more cats than can catch mice!Now the most important question is coming (in the below methods):Under which GUI: the current Top engines are played at full strength?In short, which chess GUI generates/produce more accurate results?And for this reason, a new idea is born: - Testing same version of engines under two different GUIs!
And here are the played results: 1st Test: Cutechess 1.2.0 GUI: 30sec+0.5sec 1 Core, Ponder OFF
Cfish 031120 performed 65+ Elo better: 1 Cfish 031120 +208/-24/=768 59.20% 592.0/1000 2 Stockfish 12 +24/-208/=768 40.80% 408.0/1000-------------------------------------------------- 2nd Test: Arena 3.5.1 GUI: 30sec+0.5sec 1 Core, Ponder OFF Cfish 031120 performed 40+ Elo better:
1 Cfish 031120 +147/-32/=821 55.75% 557.5/1000 2 Stockfish 12 +32/-147/=821 44.25% 442.5/1000-------------------------------------------------- 3rd Test: Cutechess 1.2.0 GUI: 3min+0.5sec 3 Cores, Ponder ON
Cfish 031120 performed 42+ Elo better: 1 Cfish 031120 MP +61/-1/=438 56.00% 280.0/500 2 Stockfish 12 MP +1/-61/=438 44.00% 220.0/500-------------------------------------------------- 4th Test: Arena 3.5.1 GUI: 3min+0.5sec 3 Cores, Ponder ON
Cfish 031120 performed 19+ Elo better: 1 Cfish 031120 MP +33/-5/=462 52.80% 264.0/500 2 Stockfish 12 MP +5/-33/=462 47.20% 236.0/500
Games
More Details:All games are played on same hardware (2x E5-2686 2.00 GHz)128 MB Hash (for Bullet), 512 MB Hash (for Blitz), 4-MEN SyzygyBoth engines are played as NNUE (used default Evalfiles), as BMI2Balsa Opening Suite is used (up to 5 moves) with reversed colorsIt seems Cfish performs better via Cutechess or Stockfish suffersNot sure exactly about which GUI produces more accurate results?!Of course I can be wrong...but my 6th feeling says: Arena GUI Sure, I am referring about my current GUIs test conditions...!)For more info, questions, answers...please feel free to Contact And I hope,All these testings to be useful...sure we need more of them...However, at least now we have some new data to compare!

Best,Sedat