It may not be the most exciting activity but searching is critical to successful eDiscovery. When searching, the first step is to determine whether you want to narrow or expand a particular document set. This seems like a basic inquiry, but it’s vital to align expectations with searching results. The search strategy is different if your goal is to find a key document vs. narrowing the set vs. expanding the set - search is bigger than just deciding on a few keywords.
There are several different search strategies we will cover here, including filtering, keyword search, and analytics (including concept searching).
Filtering is commonly done by date, file type, and custodian (although you can filter on any field the data set contains). You can often easily reduce a set by dismissing file types based on the particulars of the case or by removing operating system files.
When using keywords and search terms, logic is a part of the process that is easy to trip over. The most basic search operators (or words) are AND, OR, and NOT. AND is a limiting search term that requires both listed words to be present – Anne AND Bob. OR expands a search and requires only one – Anne OR Bob. NOT is also a limiting term that finds mentions of Anne where Bob is not present.
In order to perform more granular searches, you can use proximity, wildcards (carefully), and quotes. Proximity searches use the terms NEAR and WITHIN (W/); for example, a search to find Anne W/2 of Bob would result in any instance where the word Anne exists before or after Bob by two words. Wildcards are * or !, meaning that any characters can occur afterward. For example, gas* will return gasoline or gaslight.
You can use quotation marks when you know the exact phrase or words you are looking for, and they are especially helpful in cases involving industry-specific terms or descriptions. The word “Quotes” would not return a search for the word “quote” in this instance.
There are four ways data is searched: live, indexed (the most common), clustered, and metadata. Searches can also be combined, for example, to include a keyword search as well as a search on a metadata or coding field. Combining searches should be done carefully with a mind toward the order of operations.
Analytics is more than just predictive coding – it is aggregating data, understanding timelines, and asking bigger questions about a data set. It’s the next step up that allows you to see common threads; for example, what is the most common email subject? Who is most of this email from? What are the most common words in this data set? Clustering lets you look at data, figure out a library of the words that occur, and determine the common ideas in a data set. That can take you back to the beginning of the process and inform what search terms to choose.
Unexpected results are a part of searching. It’s a struggle to know what you’re going to get with particular search terms, and that’s true whether you’re a seasoned eDiscovery professional or new to the process. Help is available from experts and also by using the full capabilities of analytics.