Integration of innovation and technology transfer concepts to provide market-driven solutions through incubation programs
Extracting metadata as well as keywords for a huge and heterogonous collection of scientific data, for various purposes such as classification and search, is important but yet challenging. Challenges include various research documents in various languages such as Arabic which contain various types of information to extract, documents vary greatly in format and structure and some documents are being processed as scanned images in non- machine readable format.
Through this project, we overcome these challenges by building a model and developing a software tool for automatic extraction of metadata and keywords to be used for searching and classification of scientific documents in any scientific context. The model can be generalized to the repository of any research area and the developed tools can be used in any context; either stand alone or incorporated with any research repository.
To develop a general machine learning model and a software tool for automatic extraction of metadata and keywords to be used for searching and classification of scientific documents within IUG Space repository.
Using IUG Space repository as a corpus of Arabic and English research documents and select a collection of them as a use case for experimental purposes.
Extracting important metadata from these documents such as title, abstract, author/s, institution, dates and other important research topic-related features and attributes. This will affect the mechanism of search and reduce both time and effort and at the same time accurately classify research documents within the repository.
Extracting important keywords from documents and this will affect the mechanism of classification.
Developing a model to classify document groups and implement this model as a software tool to be deployed in the cloud.
The need for an effective, accurate, usable, and unified national repository is agreed upon among Palestinian practitioners, librarians and researchers in West Bank and Gaza Strip. This is assured by various Palestinian university participants in similar projcts.
ROMOR Erasmus+ project where IUG Space was originally developed.
For example, IUG Space is a repository developed within the ROMOR (Erasmus +) project. It is up and running and hosts more than 6000 research articles and needs to accumulate this number. It is now usable by many researchers from various institutions.
Also, there is an ongoing effort nowadays by UCAS and IUG, in a joint project, to form and implement a federation of research repositories at the national level to leverage the management and sharing of research results from all Palestinian concerned institutions.
The model to be built and the software to be developed can be generalized and extended to any repository and to any research area and format. It can be deployed later as a standalone cloud-based tool to be used by anyone.
The idea of the project is considered innovative because it stems from a real need to enhance and empower research repositories with helpful tools based on a well established research results. It merges recent and up-to-date machine learning and data extraction technologies with research information management.
It improves the current situation and effective use of a research repository by adding more accurate classification mechanism leading to more researchers and users of research repositories. It will enable these researchers to search more effectively and get most relevant research articles and documents.
The idea of the project is based on the area of information technology and its exploitation in diverse fields which is an extensive and ongoing entrepreneurial projects. Since the idea is an information-based one, it can be developed and extended to other fields and finds its way to the market. Ideas such as Research Gate, Academia and Scholar are examples of similar research output management ideas that has turned into big and successful entrepreneurial projects.
There is a direct and positive effect of the project on the society of scientific researchers from all research related institutions such as universities, private institutions conducting research-related work, public institutions looking and searching for specific research results or data. Since users and beneficiaries of the repository are diverse and are likely to search for direct and exact information in research repositories, the developed tools will enable them to do so, but with less time and effort.
The project will likely foster the ongoing effort by Palestinian institutions to form a research repository system federation. This system will certainly need effective data extraction tool/s such that it will make the federated repository more effective and productive.