PEPPOL Statistics updated
2025-12-09We have updated the tooling that creates our Peppol Statistics pages. The aggregation algorithm that is used has been updated, and it has been made more resistant to unexpected data in the Peppol Directory. In this article I will describe what changed, why it was changed, and why the numbers are now slightly different from what they were.
The data source for our statistics
These statistics are based on the data provided by the Peppol Directory. The directory provides a very helpful download option for all of its public records.
Every day, the statistics server downloads this data and processes it into the graphs you all know and love.
The problem with the data source
However, creating statistical data from that data source is not as straightforward as walking through that file and counting the records.
This has to do with the way participants on the peppol network are addressed: by their participant identifier (also known as their Peppol Identifier, or perhaps their Electronic Addressing Scheme (EAS) identifier). This is usually their national business register ID, their VAT number, or an international number such as a GLN. Companies can be reachable by multiple such identifiers, and often choose to do so.
On the network, there is no direct way to tell that two different identifiers belong to the same organization; each identifier stands on its own, and each identifier gets a separate ‘participant’ entry in the Peppol Directory database.
Now, since we want to try and count organizations, not individual identifiers, we’ll have to aggregate these somehow.
Entries in the Peppol Directory may specify a list of ‘additional identifiers’, but its semantics are unclear; does it mean that those identifiers also belong to that organization, or does it mean that thoses identifiers are also published on the Peppol network? But perhaps more importantly, most of the entries don’t have this value set anyway.
What the Directory does provide (and the network itself does not), is the name of the organization.
Therefore, since the first version of the statistics page, we’ve been aggregating the number of receivers in each country by their name, per country. If the organization name for multiple identifiers is the same, we consider it the same organization, with a new specific exceptions, such as the companies ‘—’, ‘Unknown’, and ‘Deleted Company’.
This name matching has one other addition: many registered organizations have additional information within their ’name’, between parentheses. Anything after the first parenthesis is ignored in the name aggregator.
This matching isn’t perfect; many organizations are not only registered with multiple identifiers, but those registrations use different names as well. These are counted as separate organizations for the statistics.
Changes in this latest version
We’ve now expanded on this aggregation, following the observation that in a number of countries, the VAT number is equal to the business registration numbers.
In the data present in the Peppol Directory, this applies mostly to Belgium, where there are many organizations are registered with both their KBO number and their VAT number, and in a nontrivial amount of cases, under different names.
As of today, those registrations are aggregated as well.
And, while the number of occurrences is much lower, we’ve also found a number of other schemes to have registrations with the same number in the directory.
The following schemes are now aggregated on their identifier number:
- NO:VAT (9909) and NO:ORG (0192)
- BE:VAT (9925) and BE:EN (0208)
- DE:VAT (9930) and DE:LWID (0204)
- EE:VAT (9931) and EE:CC (0191)
- LV:VAT (9939) and LV:URN (0218)
If there are any others, please let us know and we’ll add them to the list!
Other changes to the statistics is that they’ll now show unexpected data as well, rather than ignore them or try and interpret what their values were supposed to mean, such as ‘scheme ID’ 7676 or 9999, and unknown Peppol Document Type Identifiers.
Some numbers are now different
This update is retroactively applied to all data in the last half year, and disabled older historic data. So if you do suddenly see some numbers that were slightly different than before, this is why.
The resulting numbers are mostly the same for most countries, but for Belgium the total count is now a bit lower.
