background

Enriching the EU Taxonomy Regulation using NLP

Uwais Iqbal • 2022-10-26

Working with the EU Taxonomy regulation is challenging. Briink teamed up with simplexico to improve the user experience of working with the regulation. We used NLP to enrich the original regulatory content by extracting legal citations and creating hyperlinks to the relevant cited resources automatically.

TL;DR

Working with the EU Taxonomy regulation is challenging. Briink teamed up with simplexico to improve the user experience of working with the regulation. We used NLP to enrich the original regulatory content by extracting legal citations and creating hyperlinks to the relevant cited resources automatically.

EU Sustainable Finance Regulation

As part of the EU Green Deal, the European Union has released a set of ambitious sustainable finance regulations to facilitate the allocation of capital to truly sustainable businesses. The EU Taxonomy is the flagship regulation of this new wave. It defines a common framework to classify sustainable activities in a standardized way.

Briink has developed a SaaS platform that uses machine learning to go beyond reporting to help collect and analyse the data you need to complete your EU Taxonomy assessment. Briink also provides a Briink App to help users navigate and understand the EU Taxonomy.

Working with The EU Taxonomy

While the Taxonomy is a powerful framework, the regulation itself contains many cross-references to other legislative acts, including treaties, statutes, regulations, decisions and directives. In the original text of the EU Taxonomy regulation, these cross-references are not hyperlinked. As a result, if a user of the Briink App wants to explore the cited material, they have to copy and paste the citation, search for it outside the application, find the relevant resource on the web, scroll to the relevant section, read and understand the citation and then revert back to the Taxonomy.

Not only is this a painful and tedious exercise, it forces users to switch between applications as they complete their assessment. It creates unnecessary friction and a disjointed user experience making it more difficult to work with the regulation.

Samuel King, Briink CTO provides an overview of the new enriched content within the Briink App

NLP for Enriching Regulation

The goal of the project was to improve the user experience of the Briink App by enriching the text of the EU Taxonomy regulation with hyperlinks to the relevant cited material. This would allow users to directly click through a cited reference and land on the relevant web resource.

The project consisted of two main tasks:

  1. Extracting mentions of citations from the text (e.g., Article 29(7), point (b), of Directive (EU) 2018/2001 )
  2. Mapping the citation to the relevant web resource from the EU and creating the hyperlink (e.g. Article 29(7), point (b), of Directive (EU) 2018/2001 )

Given the standardised structure of the citations and the lack of available annotated data, we opted for a rules-based approach using regex as a solution to perform the entity extraction. EUR-Lex, the official website of EU law and other public documents of the EU, provides a standardised way for creating stable hyperlinks to web resources hosted on EUR-Lex.

"We worked in tight collaboration with Simplexico to define the project goals and iterate rapidly to build a really powerful solution in less than a month. By taking a pragmatic approach, Simplexico was able to achieve the required deliverables ahead of schedule and move on to tackling additional nice-to-have requirements!” - Samuel King, Briink CTO

We combined the regex extraction along with the rules-based hyperlink construction to create a content enrichment pipeline that enriches the EU Taxonomy text with the relevant hyperlinks for the cited material.

In order to rapidly iterate on a solution, we created a test data set with over 390 sample citations to evaluate the performance of the solution. The final solution scored an average F1 score of 94% in extracting citations across the different types of citations mentioned in the EU Taxonomy.

The solution was designed with the posterity of the EU Taxonomy regulation in mind. If the EU Taxonomy is updated, the same pipeline can be applied to automatically enrich the updated regulation text.

The structured data of the extracted citations opens the door for analytics on the EU Taxonomy regulation. These analytics can help to provide further insights to Briink users. For example, Regulation (EC) No 1893/2006 is cited 101 times in the EU Taxonomy. It’s probably a good idea for users of the EU Taxonomy to make themselves intimately familiar with Regulation (EC) No 1893/2006. The second most cited reference is Article 29(7), point (b), of Directive (EU) 2018/2001 which is only cited 19 times across the EU Taxonomy.

NLP and The Future of Regulation

Regulation governs how individuals and companies operate. Consuming regulation and making it as easy as possible to use, understand and navigate is key to ensuring its proper adoption and compliance.

Regulation is a means to an end. The textual nature of regulation means that NLP techniques can be used to improve the user experience of working with regulation. A better user experience can have a follow-on impact to improve adoption, facilitate compliance and ensure proper implementation.

“As sustainable finance regulations get more complex, NLP will play an increasingly critical role in ensuring these crucial regulations are understand and adopted in a timely manner.” - Samuel King – Briink CTO

Here at simplexico we believe that technology should be in the service of humans. AI and NLP solutions should be designed and delivered to help aid knowledge professionals and make them better at what they do.

If you are interested in finding out more about our work or are interested in working with us, book a free discovery call here.