Haile and KS data
Phil Haile's data
Phil Haile's timber auction data is a popular reference, however, the data is stored in a somewhat unfriendly "tapelayout" format and contains a several "broken" observations. I wrote a small parser in Python that takes care of that and produces a friendly dataset.
This data contains both sealed bid (first price) and English (equivalent to second price) auctions. The columns in the data are named as in the tapelayout. For more details, read Haile's original paper, as well as another useful ECTA paper.
The most important variables are
sale - sale id
sale_method - A for english, S for sealed
bid_value_i - is the total bid of firm i
volume_j - is the volume of j-th part of tract
advertised_rate_j - is the per-volume estimate of j-th part of tract
You will have to multiply each volume by the corresponding advertised rate and add up (j between 1 and 13) in order to compare it to the total bid.
Krasnokutskaya and Seim data
Krasnokutskaya Seim road construction data is another popular reference. The data is in a spreadsheet but column names are missing. The beginning of the main.m script shows how relevant variables were created from the original data.
After looking at the paper and comments carefully, we can see that the 41 columns in the data should correspond to:
proj_id - auction id
co_id - firm id
bidamount - firm's bid
sbpref_act - firm is a certified small bussiness dummy
estimate - work engineer's estimate
wordays - work difficulty in days
sbnum - number of small firms
lbnum - number of large firms (that are not small)
sbplanh - potential number of small firms (plan holders)
lbplanh - potential number of large firms (plan holders)
y1...y3 - obviously year dummy (out of 4)
m1...m11 - obviously month dummy (out of 12)
cat1...cat4 - I guess work type?
small - small contract dummy
medium - medium contract dummy
d1...d11 - I have no idea