YA Question Sample (2015)

For your convenience, attached please find a sample of 1000 Yahoo Answers QIDs, from August 2013, sampled from the list of categories recently announced for the TREC LiveQA track:

- Arts & Humanities 

- Beauty & Style

- Computers & Internet

- Health    

- Home & Garden

- Pets    

- Sports    

- Travel 

All QIDs are associated with human answers. The list of QIDs can be found in: 

    https://github.com/yuvalpinter/LiveQAServerDemo/tree/scraping/data

We also provide a sample code that extracts some of the QID fields, including the question title, question body, top level category, and date.

    https://github.com/yuvalpinter/LiveQAServerDemo/blob/master/src/org/trec/liveqa/GetYAnswersPropertiesFromQid.java

The code can be easily modified to extract any other content fields appearing on YA pages.

Please note that the sample code depends on two 3rd party libraries which you should download (and accept their license):

 - apache commons http client - commons-httpclient-3.1.jar -  http://mvnrepository.com/artifact/commons-httpclient/commons-httpclient/3.1

 - jsoup-1.8.2.jar core library -  http://jsoup.org/download