Computational Linguistics is a relatively young academic discipline which brings together resources from Linguistics and Computer Science to address the question how human language is used as a means of transmitting, storing and processing information, and how these processes can be modeled on a computer and made available to specific applications.
I did a master in this discipline to characterize the main and tiny features of Arabic and to measure both its computerized qualifications and search engines that treat with Arabic.
Title of this thesis: Indexing Arabic content on the Web; an empirical study.
Date of Publication: 2013
Abstract: A survey of Arabic Web content, deals with the literature of the experimental and evaluative studies in information retrieval systems, from the mid-fifties of the last century through 2010, with a concentration on these which dealt with Arabic. The challenges that encounter search engines in dealing with electronic resources are discussed, including crawling and indexing the Web content. The characteristics of Arabic Web content is evaluated in terms of bibliometrics, using Bradford’s Law and Brookes’ Measure for Categorical Dispersion. Two experiments are conducted to test the efficiency of the characteristics of Arabic (morphology, grammar, semantics and lexicography) in dealing with automated systems, compared with English. The other experiment is carried out to measure the ability of automated systems (Google, Yahoo! Bing, Araby and Ayna) that handle Arabic, depends upon recall, precision, search failures, search results ranking, novelty and making use of metadata. Two electronic questionnaires were emailed to a sample of researchers in information science. Results are summarized and recommendations are presented.
The results of the study are presented as follows:
1. English literature of Arabs in the Arabic electronic journals is so little.
2. The scientific research is the most categories that committed in the Arabic Web content.
3. Qualitative productivity couldn’t be statistically analyzed due to the selectivity that affected the Arabic Web content of books.
4. The Arabic Web content is constantly increased. Studying of the Web content has dominated the authors’ interests in this field. Quantitatively and qualitatively, Saudi Arabia is the best Arab countries, then Egypt.
5. There are no significant differences between Arabic and English in terms of recall and precision in retrieval techniques.
6. English is more precise than Arabic, whether in dealing with nominal forms that resulted from morphological and grammatical phenomena or in dealing with synonyms, which is related to semantics.
7. There is a difference between Arabic and international search engines in terms of the efficiency of information retrieval. Yahoo! Google and Live Search (Bing) outperformed Ayna and Araby.
For more information about this thesis, you can follow this link. (it has been written in Arabic)
More literature will be published here consecutively.