TNLS: a new taxonomic name matching system in the age of Big Data

On 1st May 2024, the Taxonomic Name Linking Services (TNLS) project officially launched as part of the TETTRIs cascade funding program — 1 of 12 innovative projects designed to transform the field of taxonomy through collaborative solutions. Supported by a €150,000 grant, TNLS is tackling a pressing issue in modern taxonomy: matching the names of different organism species across different datasets.  

By developing permanent identifiers that are specific to individual species, TNLS sets out to produce a unified system for matching species names.

The complexity of name matching in modern Taxonomy

For more than two centuries, the Latin naming system has been used to classify organisms, providing a universal way to identify species, a concept commonly referred to as “Taxonomy.” However, the surge in Big Data – massive amounts of information that are difficult to process with traditional tools – has introduced new challenges. 

Differences in the classification of species such as inconsistent spelling, duplicate names for the same species (homonyms), and varying taxonomic opinions have made the merging of datasets a big challenge. In fact, 10-20% of species names fail to match perfectly, and often require human intervention or the acceptance of errors, leading to a large time commitment or inaccuracies in datasets.

This phenomenon, which researchers call “data fog,” is becoming a significant barrier to advancing biodiversity research. With big datasets that contain thousands of species, the need for a standardized system to match names is more critical than ever.

About TNLS 

To address this challenge, TNLS is developing a unified name matching system capable of accurately matching species names across multiple big datasets. The project will achieve this through the use of stable identifiers, which act as an ID number for a species, even if the name or classification of a species is different across datasets. 

Project coordinator, Bart Vanhoorne, explains  “The project is developing a unified and transparent approach to name matching Big Data, creating lasting tools that make biodiversity research more accurate, connected, and impactful.” 

After the development of the name-matching system, TNLS will focus on: 

  1. Integrating the system into existing datasets, including those on marine and plant species.  
  2. Providing practical support to other institutions integrating the system.
  3. Creating a notification service to keep researchers up to date on taxonomic changes.

 

The team behind TNLS

TNLS is led by the Flanders Marine Institute (VLIZ) and supported by the Royal Botanic Garden Edinburgh (RBGE). The project also counts on the expertise of the renowned consultants Walter Berendsohn and Andreas Müller from the Botanischen Garten Berlin.

  • VLIZ, based in Belgium, serves as both the financial administrator and project manager. Importantly, VLIZ is also the host institute of the World Register of Marine Species, which manages the Aphia database, serving as a backbone for marine biodiversity data.

 

RBGE, based in Scotland, works in the project as part of their broader contributions to the World Flora Online, particularly through their hosting of the WFO Plant List, serving as a comprehensive database for plant taxonomy.

Figure 1. The TNLS team at the kick-off meeting in May in Edinburgh. In the picture from left to right: Roger Hyam (RBGE), Andreas Müller (consultant), Walter Berendsohn (consultant), Liesbeth Lyssens (VLIZ), Bart Vanhoorne (VLIZ)

What has been achieved so far?

One of TNLS’s first major achievements since its launch has been creating a detailed list of how name-matching services work across top European providers. This analysis will identify commonalities and discrepancies in their methodologies, which will inform the development of a unified system for name matching.  

In parallel, TNLS is working to update the Aphia database to include non-marine taxonomy. This expansion will benefit a wider range of biodiversity projects and researchers, including other TETTRIs-funded initiatives such as iSedge, as it will allow efficient linking and matching across marine, non-marine, animal and non-animal species.

Figure 2. How the Aphia platform operates

What’s next for TNLS?

Looking ahead, the project will focus on finalising the list of name-matching services and defining a common interface – an essential step for establishing standardization across datasets. This work will be followed by further enhancements to the Aphia database, ensuring it is aligned with TNLS’s findings. Alongside these technical developments, the team will also concentrate on promoting the adoption of TNLS tools and services within the wider research community.

Stay connected

Explore TNLS findings recommendations for researchers and system implementers, and case studies here:

 

To keep up with the latest updates from TNLS, follow the project on:

Share:

More newsposts

TrAILSID -Training artificial intelligence models for Land Snail identification

TrAILSID (Training Artificial Intelligence Models for Land Snail Identification) is an ambitious initiative with a simple goal: making snail identification a fast, accessible and accurate process for everyone

iSedge: An integrative, appealing and dynamic digital platform for sedges

Sedges are very important for our ecosystems, yet not enough studied. iSedge gives also an online database already available www.cyperaceae.org

CRYPTERS: Untangling the cryptic diversity of the Vesubia jugorum spiders

The CRYPTERS project seeks to reveal the hidden diversity of Alpine spiders, with a particular focus on Vesubia jugorum, a wolf spider that thrives in some of Europe’s most extreme environments.