In this short article we talk with Nomio about Data Capture – how to efficiently aggregate dense document data and manage it like a piece of software.
What is meant by Data Capture?
A legal document can be thought of as having some implicit structure to it. In using the word structure, we are talking specifically about the data-points within the document and their relation to one another.
A simple example would be a commencement date, and a renewal period. Obviously we can find the first renewal date using the formula: commencement date + renewal period = first renewal date.
The core tenet of data capture is to determine such relations and represent them in accordance with a suitable data structure. For some pieces of data, like the above example, this can be hard-coded into any capture system.
However, it would be naive to think that the semantics of a legal contract could all be summarised as such. For instance, consider a Force Majeure clause. It is not a simple exercise to reduce such a clause to a one dimensional piece of data, be that a number or logical boolean – such a clause may be incredibly nuanced.
It is far wiser to label such a clause for what it is, retain human involvement in the process of managing its implications, and so properly interpret the semantics behind the clause.
This is the key idea behind data capture: a document, or a set of documents, can be treated as a relational database, rather than trying to go full whack and extract the underlying logic of legal expressions. Indeed, as Artificial Lawyer points out in their article, analogy to computer science begins to break down when considering the nuance of the semantics in such statements.
Nor does the extraction approach begin to consider the expressivity of human language. Absent interpretation, it is a many-to-one function. With interpretation, it becomes many-to-many.
A key hypothesis driving data capture is that the inherent structure of the data is straightforward, but the semantic interpretation is not, and thus, having a human in the loop to subsequently interpret and manage the document as a whole is much preferred. Trying to treat human language in the same way as a set of logics, such as a programming language, is an error – the former is ambiguous and context dependent, whilst the latter is not.
The overall approach may also be encapsulated in the idea of separation of concerns (see our previous article or wiki). The data in the document, the ability to view it, and its processing, are decoupled. Up until now, everything has been bundled into one.
For instance, the amount of a loan – £100,000,000 – might appear:
- in multiple locations within a single document, e.g. clause 3 on page 9 and clause 7.7 on page 12 of Contract A; and / or
- in multiple locations across many documents, e.g. clause 3 on page 9 and clause 7.7 on page 12 of Contact A plus clauses 11 and 24 of Contract B
To update the loan amount requires the user to manually and independently update each reference to this value wherever it appears in each document.
This is an inefficient way of managing data, since you will have to switch back and forth between each reference and each contract to make independent updates to shared values. Far better to treat the data as one object, with links to the documents within which it is represented.
Databases vs Documents.
With the data captured, and a database established, the task of managing a document becomes incredibly simple. Databases are optimised for querying, storage, transmission and modification whilst documents are optimised for human viewing. John Warnock, inventor of the PDF, described its aims in the following manner
“[to] effectively capture documents from any application, send electronic versions of these documents anywhere, and view and print these documents on any machine.”
This difference becomes extremely important when we begin to try and manage the data within a document. Suppose we were using just a PDF, or Word document. To search for all occurrences of a piece of data within such a document requires us to go into each document and hit Ctrl+F, then type in our search. With a database, we can do this in a matter of seconds, and return all occurrences of the specific data-point we seek in one fell swoop. A trivial, but powerful example of how databases are superior to documents.
A practical, real world example of where transforming to a database improves the handling of data can be found in contract management. The data-points within the contract, say, the dates of obligations, are easily related to one another within the database and so a timeline can be constructed. For a recent customer, the database approach to construct a timeline found twice as many obligations as the customer had found themselves through manual management.
Nomio is a Tech Startup with a focus on capturing data in documents. This helps companies manage the data within their documents exceptionally well. With significant traction in the infrastructure sector, they are beginning to provide service to an even greater number of markets.
To find out more visit nomio.com.
Why the name Nomio?
The name is a take on the Greek word for Law: νόμος, romanised to nómos. Unfortunately, Nomos was taken, so we settled on Nomio instead.