Online Services

iArabicWeb16

iArabicWeb16 is a freely-available Web-based tool making ArabicWeb16 dataset more accessible to the research community via both Web interface and programming API.

  1. Web Search interface: allows the registered users to perform interactive search similar to commercial search engines.
  2. Programming API: a Java client API which enables developers or researchers to perform search operations with different configurations and retrieve documents directly using their IDs within their programs. The API provides 3 main functions:
        • search: enables the users to issue a specific query on the collection with the specified configuration (e.g., ranking function, number of returned results, etc.). It returns a string in JSON format that can be parsed to an array of results using parseResults function (not shown in the figure).
        • retrieveSingleDoc: returns the document with the given ID.
        • retrieveBatchOfDocs: writes the content of the requested documents in the destination file specified by the user in a compressed format.

For details, check our LREC/OSACT3 paper:

Khaled Yasser, Reem Suwaileh, Abdelrahman Shouman, Yassmine Barkallah, Mucahid Kutlu, and Tamer Elsayed. iArabicWeb16: Making a Large Web Collection More Accessible for Research. Proceedings of the 3rd  Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT 2018) at LREC 2018, pp. 75-79, Miyazaki, Japan, May 2018.

Get access to iArabicWeb16 here.