Haile and KS data

Phil Haile's data

Phil Haile's timber auction data is a popular reference, however, the data is stored in a somewhat unfriendly "tapelayout" format and contains a several "broken" observations. I wrote a small parser in Python that takes care of that and produces a friendly dataset.

This data contains both sealed bid (first price) and English (equivalent to second price) auctions. The columns in the data are named as in the tapelayout. For more details, read Haile's original paper, as well as another useful ECTA paper.

The most important variables are

  • sale - sale id

  • sale_method - A for english, S for sealed

  • bid_value_i - is the total bid of firm i

  • volume_j - is the volume of j-th part of tract

  • advertised_rate_j - is the per-volume estimate of j-th part of tract

You will have to multiply each volume by the corresponding advertised rate and add up (j between 1 and 13) in order to compare it to the total bid.

Krasnokutskaya and Seim data

Krasnokutskaya Seim road construction data is another popular reference. The data is in a spreadsheet but column names are missing. The beginning of the main.m script shows how relevant variables were created from the original data.

After looking at the paper and comments carefully, we can see that the 41 columns in the data should correspond to:

  • proj_id - auction id

  • co_id - firm id

  • bidamount - firm's bid

  • sbpref_act - firm is a certified small bussiness dummy

  • estimate - work engineer's estimate

  • wordays - work difficulty in days

  • sbnum - number of small firms

  • lbnum - number of large firms (that are not small)

  • sbplanh - potential number of small firms (plan holders)

  • lbplanh - potential number of large firms (plan holders)

  • y1...y3 - obviously year dummy (out of 4)

  • m1...m11 - obviously month dummy (out of 12)

  • cat1...cat4 - I guess work type?

  • small - small contract dummy

  • medium - medium contract dummy

  • d1...d11 - I have no idea