7 min read

Generative AI and the Evidence it Creates

Featured Image

Guest post by Doug Austin, Editor of eDiscovery Today

Generative AI is the “elephant in the room” these days – you can’t get away from talking about it. Whether it’s the amazing things it can do (like serving as officiant in a wedding or finding the right diagnosis for a boy’s pain after 17 doctors failed to do so), the issues it causes (like hallucinations here and here & data privacy/cybersecurity concerns here and here), litigation related to it (like here and here) and misuse of the technology (like here and here).

Even the “soap opera” associated with the company (OpenAI) that makes ChatGPT, the leading generative AI platform today, is compelling. OpenAI fired its CEO (Sam Altman), then re-hired him. Same as it ever was. However, the story may not be over yet.

See what I mean? The discussion about generative AI is never ending. Altman was even named CEO of the Year by TIME Magazine, which has never announced a “CEO of the year” before, but did so at the same time they named Taylor Swift Person of the Year. Hey, at least somebody is being discussed more than AI these days!

Many people are talking about the capabilities and challenges associated with generative AI. Those capabilities and challenges extend to eDiscovery, where many providers have released (or at least announced) generative AI capabilities. But I’ve seen hardly anybody talking about the aspect of generative AI that may impact eDiscovery professionals as much or more than the capabilities being added to their eDiscovery platforms: the evidence it generates. Evidence created by generative AI algorithms may be the next big wave of evidence that is discoverable in litigation and other eDiscovery use cases – and the possibilities for the evidence it may eventually create are endless.

Examples of Generative AI-Related Evidence

I had already been thinking about those possibilities when eDiscovery Today & LTMG published an infographic last month on 12 legal use cases for generative AI. In response to my post about it on LinkedIn, Dr. Gavin Manes, CEO of Avansic, pointed out that one amazing relatively new use of generative AI is the AI Companion capability in Zoom, which is capable of doing an amazing job of transcribing meetings conducted on the platform. However, as Gavin noted, those meeting minutes (and any meeting videos associated with them) are potentially discoverable in litigation.

That’s the other side of the coin when it comes to generative AI technology – the evidence it creates is potentially discoverable.

Another example I’ve observed of generative AI related evidence is the interactions with the generative AI platform itself – those are tracked and recorded and accessible for the foreseeable future. I signed up for a ChatGPT Plus account early on so that I could get access to GPT-4, which is much more advanced than GPT-3, having been trained on 1.76 trillion parameters as opposed to a “mere” 175 billion parameters with GPT-3.

As a ChatGPT Plus user, one of the things I noticed after a while was that various queries that I’ve applied to GPT-4 (and GPT-3 before that) are still stored in their environment. That’s great from a convenience standpoint – I can go back to a query I performed months ago and still access that information. However, so can opposing parties if I’m involved in litigation and my query history in GPT-4 is relevant to the case. As is the case with any ESI source, it has become important to consider implementation of retention periods for generative AI activity.

Want one more example? How about “discovery about discovery” of the generative AI results generated from eDiscovery platforms during the discovery process? We’re over 11 years removed from technology assisted review (TAR)/predictive coding being court approved for use in discovery, yet parties still negotiate a TAR-based workflow much more stringently than a search terms/manual review-based workflow (even though the latter isn’t any more reliable). Do you think they are going to just accept generative AI outputs at face value? Of course not. Parties will want to know when generative AI has been used and they will want to know how it has been used (or will be used if it’s still being negotiated). That’s true, especially when there are deficiencies with the produced ESI that open the window for “discovery on discovery”.

The Sky is the Limit with Generative AI-Related Evidence

Those are just a few examples of potential generative AI related evidence – the sky is the limit. Why? Here are two statistics that should illustrate the potential extent of generative AI related evidence:

  • ChatGPT currently has over 180 million registered users: This statistic, about a single AI product (that, admittedly, has captured the imagination of the world) shows just how extensive the use of these products have already become. Do you think that this statistic means that there are scores of “shadow IT” users in organizations everywhere that are using ChatGPT? Me too. Custodian interviews will have to include questions about their use of generative AI tools, along with any other potential uses of undocumented technology solutions.
  • 60% of the data being used to develop AI and analytics will be artificially produced by 2024: That’s what Gartner predicted (as shown here by TechTarget). That prediction was issued back in July of 2021 – long before the explosion of interest in generative AI that has swept over the world. I can only imagine what they would predict today, given what we know now.

What to Expect in the Future

That last point illustrates the unpredictability of generative AI related evidence – we don’t know what to expect over the long term. There will likely be sources of generative AI related evidence we haven’t even anticipated yet. One thing we do know – there will be lots of evidence created by generative AI solutions and all types of evidence being generated. We can anticipate some of those sources today – others will emerge over the years.

If you’re an eDiscovery professional, good news! This adds to your job security for the next several years. We will see more litigation generated by using generative AI solutions, and more data sources than ever before to discover. If you’re an organization trying to keep eDiscovery costs manageable, generative AI will help in making eDiscovery more efficient and more cost effective, but it will also add to the evidence that must be discovered. Be prepared to talk about and address both aspects of generative AI for eDiscovery.


Doug Austin

Doug Austin is the Editor of eDiscovery Today and an established eDiscovery thought leader with over 30 years of experience providing eDiscovery best practices, legal technology consulting, and technical project management services to numerous commercial and government clients.