Preface........................................................................................................................................ i Abbreviations.............................................................................................................................II List of figures.............................................................................................................................V Chapter one.................................................................................................................................1 General information search techniques........................................................................................1 1.1 Introduction......................................................................................................................1 1.2 Information search techniques ..........................................................................................2 1.2.1 Search formulation (subject definition).............................................................................3 1.2.2 Identifying appropriate sources of information..................................................................4 1.2.3 Applying the right search tools and techniques .................................................................6 1.2.3.1 Phrase searching ...........................................................................................................6 1.2.3.2 Boolean searching.........................................................................................................6 1.2.3.3 Truncation ..................................................................................................................10 1.3 Common words.............................................................................................................11 1.4 Relevance feedback ........................................................................................................11 Chapter two...............................................................................................................................13 Searching electronic databases..................................................................................................13 2.1 Overview........................................................................................................................13 2.2 Library catalogues..........................................................................................................13 2.3 Electronic journals..........................................................................................................17 Chapter three.............................................................................................................................24 Subject directories and search engines.......................................................................................24 3.1 Overview........................................................................................................................24 3.2 Subject directories..........................................................................................................24 3.3 Internet search engines ...................................................................................................26 3.4 Deep Web.......................................................................................................................31 3.5 Web 2.0 technology........................................................................................................32 Chapter four..............................................................................................................................35 Open access scholarly content...................................................................................................35 4.1 Overview........................................................................................................................35 4.2 Open access journals ......................................................................................................35 iv 4.3 Self-archiving................................................................................................................36 4.3.1 Disciplinary repositories.................................................................................................37 4.3.2 Institutional repositories .................................................................................................37 Chapter five ..............................................................................................................................39 Quality control and citations......................................................................................................39 4.1 Evaluating information sources.......................................................................................39 4.2 Plagiarism ......................................................................................................................40 5.3 Citations.........................................................................................................................41 5.3.1 Citing within the text ......................................................................................................42 5.3.2 Citation in the list of references/bibliography..................................................................45 5.3.3 Citing electronic information ..........................................................................................49 Bibliography .............................................................................................................................52
CHAPTER ONE GENERAL INFORMATION SEARCH TECHNIQUES 1.1 Introduction Information is a fundamentally important factor for success of many human activities. In the current information society, people from all walks of life require accurate, reliable and timely information to carry out various activities. Virtually every task, simple and complex ones, requires information of some kind. While some of the information can be readily available, other information requires extensive searching and consultation of multiple sources. In an academic environment, information is particularly important input for learning, teaching, research, and outreach activities. This implies that those individuals who have timely access to relevant information stand a higher chance of academic success. Developments in Information and Communication Technologies (ICTs) have greatly changed the whole information environment. ICTs have led to rapid increase in the volume of information (information explosion); information is made available through multiple media and formats (print, electronic, graphical, audio, and textual); information is managed and stored in a variety of resources - CD-ROM1 s, OPAC2 s, digital libraries, and web-based databases; and that scholarly communication and publishing are increasingly taking place in electronic forms. While libraries increasingly get licensed access to electronic databases, at the same time the amount of information freely available on the Internet continues to increase. Yet, access to print information resources in libraries continues to be essential. Thus, the current problem is no longer one of not having enough information; it is just the opposite - too much information, in various formats, and of uneven quality. Searching information in the electronic environment is, in most cases, time consuming and frustrating for those lacking the required skills. This is due to the enormous amount of information made available on the web; different ways in which information is stored and retrieved; diversity of search tools available; overall lack of organization of the web content; uneven quality of the web content; and the constant evolution of electronic sources of information. For that reason, information users require conceptual knowledge to convert 1 Compact Disc Read-Only-Memory 2 Online Public Access Catalogues 2 information needs into searchable queries, semantic knowledge to construct queries for a given system, as well as technical knowledge to enter queries as specific search statements. Consequently, information literacy skills become very important in such a complex information environment. The hit-or-miss process that worked for some users to obtain needed information from the libraries in the past is not effective today. Broadly defined, information literacy entails a range of skills required for locating, retrieving, evaluating, and using information from various sources in an effective and efficient way. An information literate person is able to determine the needed information; locate and access needed information effectively and efficiently; evaluate information and its sources; use information to accomplish a specific purpose; and understand socio-economic and ethical issues surrounding information use. Without these skills, information searchers tend to spend many hours retrieving irrelevant documents that do not match their needs. While computer literacy is a pre-requisite for information literacy, a person who knows all computer hardware and software, and every keystroke may not be information literate. Information literacy is more than computer literacy as it focuses on the efficient use of information resources. The American Library Association (1989:1) describes information literate people as: Those who have learned how to learn. They know how to learn because they know how knowledge is organized, how to find information and how to use information so that others can learn from them. They are people prepared for lifelong learning because they can always find the information needed for any task or decision at hand. 1.2 Information search techniques Information search is a process that aims at gathering recorded information from various sources for a defined purpose. For the searchers to retrieve information, their needs must be expressed in the form of queries that are interpretable by the information system. However, the ability of search tools to understand what a person wants is often very limited. Many search tools tend to look for occurrences of keywords, but do not understand what the keywords mean or why they are important to the searcher. In other words, search tools have not yet reached the point where humans and information systems understand each other well enough to communicate clearly. As a result, most queries tend to be quite brief and may not satisfy the normal rules of language syntax. From the information search viewpoint, the major task is to establish the relationship between people’s information needs and documents 3 that meet those needs. A successful information search process can be summarized into the following steps: • Search formulation (subject definition) • Identifying appropriate sources of information • Applying the right search tools and techniques • Search evaluation (This is explained in chapter five) 1.2.1 Search formulation (Subject definition) Obtaining information from electronic databases depends to a great extent on how precise the search queries are. Generally, poor queries return poor results while good queries return great results. This suggests that before one encounters the search exercise, it is important to thoroughly define the subject matter for a given topic. Thus, search formulation is the analysis of a topic to recognize key concepts and words that describe a topic and how they relate with each other. Search formulation results into appropriate terms and keywords to be used in the search process. The process also helps to understand fully what one wants to search and clarify unfamiliar terms. During search formulation, the following linguistic diversities should be considered: • Synonyms – different words that have similar meanings e.g. swine/pig. • Variant spellings such as those in British and American spelling e.g. behaviour/behavior. • Distinctive words or phrases e.g. cardiovascular system; food security. • Plurals and other endings e.g. woman/women. • Acronyms e.g. CaMV = Cauliflower mosaic virus. • Broader, narrower and related terms e.g. livestock - cattle/ sheep/ donkey. • Scientific and common names e.g. Pawpaw/Carica papaya • Homonyms - words that share the same spelling and pronunciation but have different meanings e.g. mercury (metal)/mercury (planet). • Compound words – words that may be split into two e.g. wastewater/ waste-water. • Variations of a root word e.g. feminism, feminist, feminine. 4 One way to analyze a topic is by using a concept mapping technique. This is simply a way to display concepts and their relationships. The technique helps to organize ideas and define a topic. Fig. 1 illustrates how a concept map on a topic - sustainable development - is developed: • First, write down the main idea in the center. This will be a starting point for the concept map. In this case, “sustainable development” is the central theme. • Next, think of some issues related to the central theme. Link these issues around the central theme, and draw lines connecting them. In this example, there are three major categories of “sustainable development”: • Economic development • Environmental protection • Social development • For each issue, think of sub-concepts that relate to it and connect them to the appropriate issue. Each of the three issues has sub-ideas for possible research exploration. • You can continue to brainstorm possible ideas. For example "resource use" can be divided into "renewable resources" and "nonrenewable resources". • Not necessary that all concepts are included in the search, but this approach enables one to see relationships among topics and concepts. Relevant search terms can also be obtained from: • Printed and online dictionaries, encyclopedias and thesauri. • Keywords and descriptors used in key journal articles. • Subject headings and lists of controlled vocabulary in individual databases. As a general guideline, the broader the term, the greater are the chances of getting information, both relevant and irrelevant. Narrower terms usually give fewer results but most relevant. It is therefore recommended to start searching with specific terms and if not getting enough information, then go for broader terms. 1.2.2 Identifying appropriate sources of information After obtaining appropriate subject headings, terms and keywords, one should identify the right information source(s) to use for searching. It is important to begin a search with the most relevant source for the subject area and search it thoroughly before moving onto another 5 source. In other words, it is important to consult specific information sources suited to particular types of information. A comprehensive search requires one to first, scan secondary sources such as library catalogues and bibliographic materials. This implies that information searchers should be familiar with a range of information sources in order to pick the right starting place. Fig. 1: A concept map on "sustainable development" Economic growth Social services Environmental protection Social development Economic development Resource use Greenhouse effect Agricultural production Economic growth Gender equity Cost internalization Income generation Political participation Population stabilization Deforestation Sustainable development 6 1.2.3 Applying the right search tools and techniques The subject headings, terms and keywords developed in the first step will be employed for actual searching using appropriate search tools. In electronic databases, actual searching can be executed by entering keywords in search boxes or through browsing the organized topics. In most electronic databases, queries can be searched either as: • Phrase query, • In a combined form using Boolean operators, • By Truncation (Wildcards). 1.2.3.1 Phrase searching Phrase searching is the use of quotation marks (" ") to locate documents containing a distinctive string of words associated with a topic. Phrases are constituent terms that are naturally married such as sustainable development and poverty reduction strategies. When using phrases, spaces between words are as important as any other character. For example, if a double space is included between any two words that make a phrase, the search will not produce the same results as those where there is a single space. There is variability in the way databases treat phrases. Some databases provide options for phrases, others do not, but almost all allow one to enter a phrase in quotes, ignoring the quotations if not supported. In most databases, if a phrase is not specified in a search statement, the default search is one in which any of the words may be present. This can lead to thousands of useless hits. 1.2.3.2 Boolean Searching Information searching in most electronic databases is based on the principles of Boolean logic. This is the logical relationship among search terms, and is named after the British mathematician and logician George Boole. Boolean logic is used to construct search statements in which search terms are joined together by using logical operators in specified syntax. Boolean operators improve search results by broadening, narrowing or excluding search results. Typically, the Boolean operators are OR, AND, NOT (AND NOT). The underlying principle of Boolean logic is the set theory. For example, the AND operator is equivalent to the intersection set operation while the OR operator is equivalent to the union set operation. 7 Sometimes symbols are used to represent Boolean logical operators. In some databases, plus sign (+) is similar to AND operator and minus sign (-) is similar to NOT operator. Absence of a symbol is also significant, as the space between keywords is, in many cases, defaults to AND operator. Some databases offer a search template which allows the user to choose the Boolean operator from a menu. Usually, the logical operator is expressed with substitute language rather than with the operator itself. AND: This operator retrieves records in which ALL of the search terms are present. In other words, search terms on both sides of this operator must be present somewhere in the document in order to be scored as a result. The AND operator tends to narrow a search and make it focused. In Fig. 2, records that contain the words environment and development will be retrieved. The more concepts combined in a search with AND logic, the fewer records will be retrieved as in the case of environment AND development AND poverty. Fig. 2: Boolean operator AND AND The AND operator only requires that the terms immediately on both sides must all appear in the document. It neither specifies the location of the terms in the document with respect to one another nor the logical sequence of the search terms. This operator can be used to attach several required terms together, all of which must be present for the outcome to be successful. For example, the query environment AND development AND poverty would only return documents that contained all the three terms. 8 OR: This operator is equivalent to the UNION operator in set theory. When used, the logical operator OR retrieves documents that contain EITHER of the search terms and usually broadens search results. It is used when looking for synonyms, spelling variations, or foreign spellings. It should be noted that the OR operator is not strictly equivalent to a sum but do approximate the total of the two separate queries. Documents which contain both terms still get counted as a single document. For example, in Figure 3, records in which one of the search terms “atmospheric pollution“ OR “air pollution“ is present will be retrieved. The more terms combined in a search with OR operator, the more records will be retrieved as in the case of ″atmospheric pollution″ OR ″ air pollution″ OR “smog”. Unless used in a controlled way to expand results, most often as part of a parenthetical argument dealing with synonyms, it is not recommended to use the OR operator. Fig. 3: Boolean operator OR OR NOT: This operator excludes documents that contain the keywords a searcher wouldn’t like to appear in the document. Documents containing the term after this operator (NOT) are rejected from the results. It is used when lots of search results with unwanted terms are anticipated. NOT is a unary operator meaning that it only works on the term or phrase that immediately follows the operator. It does not evaluate terms or phrases on both sides of the 9 operator. In Fig. 4, records that contain the word pollution but not “greenhouse effect” will be retrieved. In some databases AND NOT is used in place of NOT. Fig. 4: Boolean operator NOT NOT NEAR: The AND operator requires that the terms on both sides can appear anywhere in the document in order to get a successful result. The NEAR operator requires the two terms to be within a specified word count of one another to be counted as a successful result. Generally, most search engines that support the NEAR operator have a set value of ten words maximum distance. Some engines also use ADJ (for adjacent) as the equivalent operator to NEAR. The NEAR operator does not care which of the phrases or terms on either side of the argument comes first or not, just that the two phrases or terms are within the specified distance. The NEAR operator is important to ensure that search terms occur within the same sentence or same paragraph. It can remove large Web sites that contain a lot of terms. BEFORE and AFTER: These operators work in the exact same manner as the NEAR operator, only that one can specify which terms need to come first or second. In the case of the BEFORE operator, the first term must occur before the second term within the specified word distance. In the case of the AFTER operator, the first term must occur after the second term within the specified word distance. These operators do provide greater control to searches. 10 Nested searching Many databases allow creation of complex queries using Boolean operators, phrases and parentheses. This is called nested searching. Each bounded argument set off by parentheses is called a Boolean expression. Search services that support structured syntax do not always read from left to right. Instead, they read “inside-out”, in order of precedence for the logical connectives. The combination within the parentheses is evaluated before being combined with the terms outside the parentheses. For example, in the Boolean expression: environment AND (″atmospheric pollution″ OR ″air pollution″), the phrases in the parentheses will be evaluated first. 1.2.3.3 Truncation Truncation or wildcard searching is a useful way of retrieving all words having a particular word root without having to type them all separately. Truncation symbols differ depending on database, but mainly an asterisk (*) is used. Truncation can be performed in different ways: • Right hand truncation - When * is placed after a word or part of a word, it locates documents containing words that start with the combination of letters. For example agricultur* will retrieve documents that contain the terms agriculture, agricultural, agriculturalist, etc. • Left hand truncation is useful for searching complex (organic) compounds if a searcher does not know the root of the compound but knows the ending (functional group) e.g. *ethylene. • Middle truncation. For example wom*n will retrieve documents that contain the terms women and woman. • Some databases support truncation in phrases. For example, searching the query “poverty * strategies" will return documents that contain the phrases poverty reduction strategies and poverty alleviation strategies. When using truncation, care should be taken to make word roots neither too short nor too common so that irrelevant records are not retrieved (e.g. cat* retrieves cat, cats, catastrophe, catalogue, caterpillar etc.). Fortunately, most web-based search engines have increasingly been automatically searching all words having a particular word root. Thus, there is no need to truncate words in such engines. 11 1.3 Common words Search services ignore common words and characters known as “stop words.” These include a, about, an, and, are, as, at, be, by, from, how, i, in, is, it, of, on, or, that, the, this, to, we, what, when, where, which, with. Though such words are essential connectors in language, search facilities ignore them because they are commonly found. Search services include these common words in their “stop-lists” so that whenever introduced in a query they are ignored. This means that search engines do not respond the same way humans do. So, for example if one types: what is the meaning of biodiversity? (without the quotation marks) in Google, the search engine will drop the words; what, is, the and of. 1.4 Relevance feedback The process of searching information is often uncertain. It frequently happens that the initial query posed does not provide satisfactory results. There are several reasons for this case: • A searcher may have only vague ideas of what information is needed (may not be very clear of what information he/she is searching for); • A searcher may not be able to express his/her conceptual ideas into suitable queries; • A searcher may not know the organization and content of the system or may not have good ideas of what information is available for retrieval; • The vocabulary posed by a searcher may be quite different from that of a system; • A searcher may pose a query that is broader or narrower than what is needed thus returning either far too many documents or not enough. The idea of relevance feedback entails the use of retrieved records that seem relevant to obtain more records on a particular topic. It is a modification of the search process to improve accuracy by incorporating information obtained from prior relevance judgments. Relevance feedback exploits the notion that although it may be difficult to formulate a good query, it is easy to judge the retrieved documents. Seeing some retrieved documents may also lead information searchers to refine their understanding of the information they are seeking. The basic procedure in the relevance feedback process as illustrated in Figure 5 is: • The user issues a query. • The system returns an initial set of retrieval results. • The user examines some returned documents for relevance. 12 • The user adds words from known relevant document to the query (the user may get new search terms on the topic). • The system displays a revised set of retrieval results. Relevance feedback can go through one or more iteration of this sort. This means searchers can build their searches from what they have found. In the library, catalogues tend to offer alternative subject headings that one can search again to obtain other records on similar topics. One can also use these references to find other information. The relevance feedback concept can also be applied in the libraries as books tend to be shelved together in subject areas. Thus, if a useful book is found on the shelf, there will probably be others nearby. Fig. 5: Relevance Feedback Example Search terms Database/ Document collection Documents retrieved Relevant retrieved Not Relevant Most relevant documents Modified/New Search terms Documents Retrieved Relevant retrieved 13 CHAPTER TWO SEARCHING ELECTRONIC DATABASES 2.1 Overview An electronic database is an organized collection of related records in a standardized format, searchable in a variety of ways, such as browsing or presenting search terms with the help of computer software. Electronic databases include but not limited to library catalogues, citation indexes, CD-ROMs, subject directories, e-journal databases as well as local databases attached to individual computers. These databases are mostly searched using computer software known as search engines. Following rapid advances in ICTs, search interfaces of electronic databases evolve rapidly as well. Changes to the search interface or the whole database may occur rapidly, though the search techniques remain the same. The following sections present examples of electronic databases and how to search information from them. 2.2 Library catalogues Library catalogues are databases that contain organized bibliographic records describing the materials available or accessible in a particular library. The card catalogue which was being used in libraries for generations has been replaced by Online Public Access Catalogue (OPAC). OPAC is an online database of materials held by a particular library. To locate library materials such as books, searching can be carried out in two stages: • Searching for bibliographic information, • Identification of physical documents. Searching the OPAC enables the user to obtain a classification number (call number or class mark), which is a unique identifier for the library material. Taking an example of the Sokoine National Agricultural Library (SNAL), catalogued library materials are arranged according to the Library of Congress Classification System (Table 1). This system uses a combination of letters and numbers to determine the location of documents on the shelves. The CALL NUMBER (e.g. TA330.D3) consists of one or two letters (TA in this case), indicating the main subject or classification, the number (330), is for the special aspect of the document, and the third letter (D) is the code of the author or title of the document. 14 Table 1: Library of Congress Classification System A General Works H Social Science General encyclopedia, reference books, dictionaries etc. HA Statistics B Philosophy-Religion HB-HJ Economics, Business, Commerce, Management B-BJ Philosophy, including BF, Psychology HM-HX Sociology BL-BX Religion J Political Science C Auxiliary Sciences of History JA-JC Political science CB History of civilization (general) JF-JQ Constitutional history and public administration CC Archeology JS Local government CS Genealogy JX International law CT Biography (general) K Law D History: General & Old World KFK 1200 Kentucky law D World history, including world wars L Education DA-DR Europe M Music DS Asia N Fine Arts DT Africa NA Architecture E-F History of America NB Sculpture E- America (general) NC Graphic arts E 151-857 United States ND Painting F Local U.S. history NK Décorative arts F 446-460 Kentucky P Language & Literature F 1001-1140 Canada P Philology and linguistics F 1201 Other countries, Latin America PA-PN Language and literatures G Geography, Anthropology, Folklore PE English language dictionaries G Geography (general) PR English literature GB-GC Physical Geography PS American literature GN Anthropology PZ Juvenile literature GR Folklore GV Recreation Q Science T Technology QA Mathematics TA-TH General and civic engineering QA76 Computer science TJ-TP Mechanical, electrical, and chemical engineering QB Astronomy TR Photography QC-QH Physical and earth sciences TS Manufactures QK-QR Life sciences TT Handicrafts, arts and crafts R Medicine TX Home economics RT Nursing U Military School.