5.7 How to Handle Search Results

You may realize that no matter how carefully you formulate your search query in the Web of Science or any other sources, it is always possible that your search results contain irrelevant topics.

I recommend you to consider the following strategy. Instead of refining your query endlessly, you take the dataset that may include irrelevant topics and let CiteSpace to differentiate various topics. Until then, you should keep an open mind. You can determine whether a topic is indeed irrelevant only after you have a chance to examine the visualized results.

In most of the cases, it becomes straightforward to spot irrelevant topics because they would end up in an isolated cluster all by themselves. If it appears hard to tell, then the “irrelevant topics” may not be that irrelevant as you thought after all!

Here is an example to illustrate this point. The dataset was collected by a topic search for ‘hacker*’, intended to catch topics relevant to hackers, hacker behavior, and associated topics.

The largest component shown above evidently contains relevant topics such as intrusion detection system, validating trust measure, cyber terrorism, and vulnerabilities. On the other hand, the visualization also reveals an interesting second largest component.

The second largest component is not about the ‘hacker’ topics we wanted. Instead, these items were included in the original search result because the term Hacker is, as it turns out, the name of a prolific author – Hacker, P. M. S., a philosopher who published a number of articles in 1996 and 2003 and a book on Scepticism, Rules, and Language. The two clusters contained in the second largest component are essentially on topics irrelevant to the hackers in the context of computer security.

It is a good idea to use a broader search query than a narrow one and defer the differentiation till the visual analytic process later on. The relevance of a topic would be much easier to detect at a later stage of the process.