1. save as much intermediate and final data as possible in a pickle, csv, json, h5, or similar format
2. use a spreadsheet to analyze every trial, and make notes
3. save videos/visualizations for every trial, and put a link to them in the spreadsheet
The purpose of this article is to share with you how I have recently been managing and analyzing my experiments, in the hopes that you can learn something from it. I'd also love to hear what you think works for you, in which case you can talk to me @petermitrano on twitter. For a bit of context, I'm a PhD student in robotics and my experiments are usually done inside physics simulations (Gazebo, Mujoco, etc.) but these techniques also apply to real world robot experiments. They probably don't work well for big-data style experiments, because they involve manual work for every single trial, so if you do more than 100 trials this is probably not for you.
Let's start with my first point, which is on how you save your data. I recommend a file format that can be loaded and read without the code you used to generate it. This makes it future proof, and personally I use json (or hjson for data I want to read or edit by hand in a text editor) and pickle most often. Another tip here is to save every you can, because you never know what you'll need later. Of course the trade-off here is disc space, but disc space is cheap and your time as a researcher is worth more than disc space costs. As a specific example, when I run an RRT I save every state and action in the planning tree, not just the final actions. I also save all the inputs to the planner (environment state, hyperparameters, etc). I also like to save command line arguments I used when running the trial, and any other info that I might want to know later but will have forgotten.
Now let's say you've finished running the trials and are going to analyze the results. You probably have metric you can calculate automatically and that's great, but I would argue that in robotics you always need to actually look at the trials and see what happened -- seeing is believing. This is especially true if you are doing reinforcement learning research, where your agent might do something that you didn't expect. Before you get too excited and start writing your paper, take the time to really understand the different strengths and weaknesses of your agent and keep careful track of the types of failures. Recently I've been using spreadsheets (google sheets specifically) for this task. I've included a real copy of one of these spreadsheets below for you to examine! But I'll summarize the general things I'd think about when you making your own:
If there is something in particular you are looking for in each trial, make a column for that and mark it when you review/analyze that trial
leave a column to write misc notes to yourself
add a link to a visualization/video of the trial for cases that are particularly interesting
Use conditional formatting to highlight examples you want to look at again/more closely
use "Frozen" columns or rows to make it easier to see column headings and trial # as you scroll around
For me, this spreadsheet was a way to track down what are the biggest failure modes of my methods. You'll also see for other pages in this sheet, where I do an in depth analysis of a binary classifier. Here I didn't go through every example in my validation set, only a few.
As a final point, I cannot recommend strongly enough that you take videos or make detailed visualizations. Just looking at "average task error" or something like that gives you very little information and summarizing statistics can be misleading. You will catch strange bugs, unexpected behavior, and have an all around better understanding of your results if you invest the time to make visualizations! (RViz is great if you are using ROS).
Example Spreadsheet
Example Visualizations