Legal and expert opinion depends on the simple premise that preserved documents can be searched. But human interaction with document processing can result in key evidence not being indexed and as such not being responsive to discovery searches.
Charlie Woodley offers words of warning and urges those conducting searches during eDiscovery to be on guard and to periodically check they can search for what they see. Or, risk overlooking potentially relevant facts, missing that elusive smoking gun or otherwise avoidable cost.
Parties, legal advisors and experts all rely upon eDiscovery platforms to search, review, identify and organise relevant data. Confidence of opinion is largely enabled by optical character recognition (OCR), including pre-processing designed to improve its accuracy, so understanding any vulnerability is essential if the risk is to be mitigated.
OCR is imperfect even when the processing completes without error. The best way to understand the limitations to traditional OCR, that which is incorporated into eDiscovery processing workflows, is to see for yourself. Click here to see a side-by-side comparison of seven leading OCR tools using multiple kinds of documents, from Factful. Credit Ted Han and Amanda Hickman.
In the rush to preserve documents or compile them for disclosure there is a misconception, by some, that consolidating large document volumes into single files is helpful when it is counterproductive. Instead this results in a single pdf with many hundreds of pages with highly varied document resolution and format. This can trigger an OCR error prematurely stopping the process.
The configuration and management of document processing are critical if such errors are to be resolved without compromising the underlying assumption of the parties and advisors of total visibility i.e. that every document disclosed by the parties can be searched. The processing logs of forensic tools, such as Nuix, are intended to prompt review where the OCR process is not completed as expected. If such programmatic warnings are ignored then partially OCR'd documents are introduced creating a blindspot with associated risk to certainty and cost.
How big a blindspot?
Whether it is correspondence, periodic reports or financial records, when bundled in inappropriately large collections this can trigger an unactioned OCR error. There is then a material risk that relevant content will be missed. If only a handful of pages of a 500-page document are processed then the document blindspot is a staggering 99%. If this issue repeats on all documents with a high page count the blindspot rapidly multiples as does the risk that relevant content will be missed.
A real-world example
I provide an example from a matter in international arbitration for which I provided quantum expert assistance. My task was routine, to reconcile invoiced sums claimed against contemporaneous management reports, relying on the unique document references included in opposing expert evidence.
On the surface, there was no issue. Searches using the document references identified month-end bundles of invoices and associated backup. However, these pdf bundles ran to many hundreds of pages and the opposing experts did not identify the page numbers for the claimed invoices.
Of course, this should be no issue given the forensic capability of the host platform, a keyword search for the invoice reference or value within the bundle should immediately identify the specific invoice. The underlying problem only became apparent when these in document searches returned no results, an issue confirmed by a separate review of the extracted text for this and numerous other high page count records.
This will have invariably, and avoidably, increased the legal and expert costs for both parties who will have been required to conduct labour intensive manual reviews to advise on, compile or opine on the quantum of the claim. Notably, the Claimaint's expert did not include page references either. This indicates they were not aware of the issue exposing outdated labour intensive review or indeed gamesmanship. The first is arguably not a surprise given the relative underutilisation of technology in expert practice, the latter breaches the expert witnesses overriding duty to the Tribunal. Of course, uncovering of such antics can in itself convey advantage.
What you can do
How this issue could impair legal or expert opinion is obvious. Awareness prompts action. eDiscovery managers, legal advisors and experts should take note and keep vunerabilities front of mind when designing, conducting or drawing conclusions from searches.
There are procedural remedies for eDiscovery managers who can determine the file size and page counts to identify at-risk documents in advance and introduce post-processing measures to verify OCR errors have been appropriately actioned.
Legal and expert team members conducting searches should be mindful of the risk and proactively and conduct test searches for what they see as a matter of course. Doing so will quickly identify where the problem exists so that a flag can be raised and remedial action is taken.
What we do
At Dispute Data we are wise to how seemingly innocuous actions taken in response to information requests can manifest as problems and avoidable cost during claim preparation, rebuttal or expert evidence. By providing information expertise in contemplation of claims or at the outset of dispute help avoid, detect and resolve foreseeable useability issues that impact efficiency and inflate costs. We
Act for claimants and respondents directly to prevent them contributing to informational challenges whether on an ad-hoc basis or in the role of integrator where we manage the collation, provision and stakeholder interaction with project records to drive efficiency and control cost
Act for legal advisors or provide expert assistance for teams facing challenging project records commonplace in engineering and construction claims or disputes. We apply information expertise to better inform strategy, protect the budget resource, overcome analytical bottlenecks and increase the robustness of evidence
We understand our client's commercial and legal objectives. Our knowledge of new and emerging technologies and their practical applications, often for the first time in dispute resolution, allows us to drive innovation and achieve outcomes unattainable by traditional methods.
About the author
Charlie Woodley, Managing Director, has provided analytical firepower to the claims and disputes sector for over 12 years. He combines established legal forensics with emerging technologies and provides solutions that unlock efficiency gains in claim preparation and everyday business operation. Charlie’s experience incorporates contracting, claim and expert advisory and expert witness practice. He has supported technical, quantum and delays teams and been instructed as data expert in international arbitration. Convinced of the potential of technology to disrupt dispute resolution, Charlie writes from a digital perspective incorporating his commercial experience and research in the fields of dispute causation and behavioural economics. email@example.com
About Dispute Data
Dispute Data is a specialist data wrangling consultancy servicing the global engineering and construction sector. They harness technology to find, collate, clean and unify messy and complex data sets for easy access and analysis. Spanning commercial operation, claims, dispute resolution and expert services they provide analytical firepower and information expertise when it is needed most. For further information please visit www.disputedata.io