6 min read

Why eDiscovery is Like Algebra

Featured Image

This article was original content on the ACEDS Blog and written by Gavin W. Manes.


With back to school on the mind in this season of the year, we discovered that there are some parallels between algebra and eDiscovery. Go with us on this one for a minute (or, maybe, for the rest of this blog post.) You won’t have to do any actual math, we promise.

In algebra, it’s all about solving for that variable. What’s the equivalent in eDiscovery? What’s the one variable that, if you don’t get it figured out, the problem remains a problem and not a solution? 

Certainly it’s not the same for everyone or for every case. But in every eDiscovery project, there are three fundamental elements: human, data, and technology (tools and processes). Each of these could be the thing that leads to solving that equation – or that thorny eDiscovery problem.

The Human Element

The most important “variable” to get right is the eDiscovery Project Manager. Those that have a depth of knowledge and have done this so many times before can look ahead to know what the likely outcome will be. Having someone on the team that knows about all the factors that go into the discovery phase, eDiscovery specifically, and all the other parts that may not seem germane to getting data processed but actually make the entire operation smoother: like motion practice, deadlines, and perhaps most importantly – managing expectations.

As an example, you’re in a situation where you need to run search terms in Microsoft Purview and have the results reviewed in three days. If you take that request to a skilled Project Manager, they would be able to adjust the expectation of whether than can happen, and what resources would need to be rallied in order to have the best chance of success.

The most important piece of this is, of course, to take those questions to the Project Manager and to do so as early as possible. Timing issues, newly discovered data, and broken processes do happen, and the more warning everyone has, the better.


The Data Element

Data is, after all, why we’re here, and its collection is the key variable in the eDiscovery equation.

Forensically collected data is the gold standard but frequently circumstances don’t allow for it. It is important to note that data collection can drive many future issues including unexpected data errors, manipulated data leading to false conclusions, or data that’s missing entirely. If a forensics collection isn’t possible, it’s best to at least keep careful documentation of what was done, and to understand the who, what, where and why of collection events.

Communication between the collecting party and the processing or receiving party is generally done in the form of a chain of custody which describes any issues that were addressed during the collection. Furthering the above example in Purview, a common a common undocumented mistake is to collect mail for multiple custodians into one .PST (this is the default option in the software). Someone with less experience or just thinking of efficiency might believe that having the smallest amount of data is most important. But from an eDiscovery perspective, tracking which documents were in which custodian’s possession is more important than having a small amount of data. When an eDiscovery tech receives a collection of email it should be obvious whether the collection was custodian-centric or not. If it isn’t (and there’s time) it’s advisable to re-collect by custodian – this saves hours and dollars in the long run (especially for motion practice or deposition preparation). It’s impossible to determine from email metadata which email was in possession of a given custodian – they might have deleted an email, it might have gone to junk, or several other scenarios. Just because an email address is in the To, From, CC, or BCC doesn’t mean that custodian saw or possessed that email.

When collecting data, using multiple methods or tool is advised. Phones are not like computers, there is no method to get bit-by-bit forensics copy of a cell phone; instead, we can perform extractions where we ask the phone to provide us a certain type of data using a certain method. The “physical copy” that we can make of a computer hard drive is not normally possible in a phone’s memory (and it doesn’t render usable results due to encryption and device security). The methods of collection for mobile devices are known as either logical, advanced logical, or file system collections. Depending on the software and hardware, one or more of those methods may not be available. One of those collection methods may retrieve deleted text messages and another might not.

It would be obvious to an eDiscovery or forensics tech receiving extractions from a phone whether only one extraction was done based on the types of files provided during the evidence gathering stage of a device. There are other sources of mobile device data (like online backups) that should be sought during mobile device collection but that’s a whole other post.

Taking the time to collect the device correctly saves thousands of dollars in motion practice when facing motions for sanctions and spoliation or failure to preserve evidence. In a cell phone case, there are two parties: the sender and the receiver and one of them is most likely going to have the data.


The Technology Element

The most important part of technology in eDiscovery is having tools and processes that normalize data types. This means that documents fed into a tool can be easily analyzed and reviewed, to be ultimately produced in an investigation or lawsuit.

It’s easy to load common data like email or loose files into document review tools but converting modern data such as Slack, text messages, or other collaboration systems requires pre-conversion before being loaded to review systems. For example, a complex issue that we often handle is the collection of text messages from various cell phones and custodians. De-duping that information across those custodians and devices makes sense from an efficiency standpoint. But then, review is often performed with those conversations in “bubble” format, and then presented in a PDF.

In reality, what’s happening there is the creation of a demonstrative of the underlying evidence from the view of one participant in that conversation. The appropriate way to honor the evidence is to keep the individual messages as records and use a tool to look at the conversation from the perspective of recipient. An even better solution is to look at it in third person view and not focus on one type of sender or receiver.

Of course, AI is now a part of every equation being considered. If you had asked me a year ago If computing technology such as AI would replace first pass review of documents I would have said no; however, I have witnessed technology that can review a million documents for under half a million dollars in three days with precision, accuracy and recall rates in the high 90s. The difference in the new AI tech is that they require immense computing power and training prompts to be successful. Computing costs alone for sets in the millions can be in the hundreds of thousands of dollars just for the cloud computing time. Adding in project management and QC, costs can be in the hundreds of thousands but that may often be less expensive than human review.


Conclusion

Things change, but there will always be an elements of these three things that is important. Data shifted from email to more complex collaborative data, which caused the tools to change which causes the humans to change their process. That cycle is likely to repeat when the next type of unique data comes along or a better technology arises. There’s an ebb and flow to this equation, but those three variables will always be present.

 


What did you think? Any good takeaways? Let us know here.