The attached pdf file is a presentation about how to go about evaluating ECA tools - kind of an Early Case Assessment Tool Evaluation Tool!!!
As we evaluate these tools it gives us some good thoughts to keep in mind.
One suggestion is a Form or Check List that focuses upon the primary requirements and how these various 'thoughtware' programs handle or do not handle these needed functions:
Here is a brief evaluation form - using a scale of 0 to 10 with 10 being the best and 0 being the worst.
1. ___ PST files - how they are viewed, indexed and how accurate the search results.
2. ___ Attachments - how well are they indexed, is text extracted accurately and which, if any, native file attachments caused problems.
3. ___ When a problem attachment or file occurs during the processing, how well does the program handle this and report this.
4. ___ Cost of the program
5. ___ In use by other law firms.
6. ___ Stability of the software
7. ___ Speed of the program.
8. ___ Percent of documents in a sample collection it can handle well.
9. ___ How well does it handle large volumes of data.
10. ___ How much eyes on supervision does it require during its processing procedures.
11. ___ How does it integrate with other e-discovery 'thoughtware'
12 ____How does the thoughtware deal with audio and video and image files - does it index them?
13. ____What kind of search methods does it incorporate (see list below)
Some of the search methodologies
So, when evaluating the various ECA/EDA software programs, asking the company - "Does your software gives us the ability to use these search methods?"
Boolean searches (and, or, not)
Wildcard searches (*auto*, *tion)
Proximity searches
A proximity operator is a character or word used to narrow search engine results by limiting them to those that have query keywords placed within a specific number of words in the content. PROXIMITY (or NEAR) gets results for terms which appear in close proximity to one another within the resulting page hit. The proximity operators are composed of a letter (N or W) and a number (to specify the number of words). ... Using the W/n Connector -- Use the W/n connector to find documents with search words that appear within "n" words of each other
Thesaurus/Synonym search
Meeting, conference etc
Fuzzy searching
A fuzzy matching program can operate like a spell checker and spelling-error corrector. For example, if a user types "Misissippi" into Yahoo or Google (both of which use fuzzy matching), a list of hits is returned along with the question, "Did you mean Mississippi?" Alternative spellings, and words that sound the same but are spelled differently, are given. A fuzzy matching program can compensate for common input typing errors, as well as errors introduced by optical character recognition ( OCR ) scanning of printed documents. The program can return hits with content that contains a specified base word along with prefixes and suffixes. For example, if "planet" is entered as a search word, hits occur for sites containing words such as "protoplanet" or "planetary." The program can also find synonyms and related terms, working like an online thesaurus or encyclopedic cross-reference tool. In the Ask Jeeves search engine, if the word "galaxy" is entered, hits are returned such as "Galaxy Photography," "Milky Way," and "The Nine Planets Solar System Tour."
Fuzzy matching programs usually return irrelevant hits as well as relevant ones. Superfluous results are likely to occur for terms with multiple meanings, only one of which is the meaning the user intends. If the user has only a vague or general idea of the topic, or does not know exactly what to look for, the ratio of relevant hits to irrelevant hits tends to be low. (The ratio is even lower, however, when an exact matching program is used in this situation.)
Stemming
ran (token)(whitespace)
, (punctuation)
(whitespace)
slept (word)
Stemming maps a word to its common lemma (stem). Thus, in the example above, ran stems to the verb run and slept stems to the verb sleep. Like tokenization, stemming rules are language-specific. An unstemmed search matches only the word form you're searching for. For example, searching for ran will not match a document containing runs. When stemmed search is enabled, the search matches the exact term, plus words with the same stem. Thus, a search for ran will also match documents containing runs or running because they all share the stem run in English.
Conceptual searching
the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query - example "meeting" also will find "get together" and "conference" etc.
Adaptive pattern recognition
Is sometimes used to determine if an image matches the search terms or an audio pattern in audio files.
Associative retrieval
Associative Retrieval. Definition(s). When certain terms appear frequently in the vicinity of the terms for which the user is searching
Clusters of related phrases
https://www.phrases.org.uk/cgi-bin/phrase-thesaurus/pf.cgi?w=group
: “Sedona: “The use of search and information retrieval tools does not guarantee that all responsive documents will be identified in large data collections, due to characteristics of human language. Moreover, differing search methods may produce differing results, subject to a measure of statistical variation inherent in the science of information retrieval.”