Validating Peppol Documents

2023-08-17

Peppol is a global interoperability framework for the exchange of business documents, such as orders and invoices. An important part of that interoperability, and a requirement in the Peppol agreements, is that only valid documents are sent over the network.

The best way to make sure all of your documents are valid is to explicitely validate them before sending them over the network, but it’s not always clear to everyone what this means in practice.

In this article, we’ll discuss what it means for documents to be valid, the standards that are involved in validation, and how you can make sure the documents you create and send conform to the relevant specification.

Table of contents:

What it means to be a ‘valid’ document

In the most general sense, “valid” could mean a number of things; is it structurally correct? Does it make sense? Is it true (for some definition of true)? For instance, in an invoice, is the price amount for a given product correct? Is the seller company one that the buyer is actually doing business with?

However, when we say that documents sent over the Peppol network must be valid, the scope of its meaning is more limited.

When talking about document validity on the Peppol network, what we mean is that the document conforms to a specific set of rules, so that the recipient of the document can automatically process that document, without running into issues due to unexpected or missing data elements.

An example of a simple rule is “An invoice should have a total amount.”. An example of a more complex rule is “The total amount for an invoice should be the sum of the invoice line amounts, plus all charges, minus all allowances”.

The Peppol network allows many different types of documents, and each type has their own set of such rules. For instance, both the structural and semantic rules are very different between Peppol BIS Orders and XRechnung CII CreditNotes, as these are completely different classes of documents. But between the UBL-versions of NLCIUS (SI-UBL 2.0) and Peppol BIS 3 Invoices there are also a number of differences, though this number is much smaller, and a lot of documents could easily be valid for either.

In many cases, documents must even adhere to multiple sets of rules, and this is something a lot of implementors may not always realize.

Rules for different document types

The Peppol requirements state that documents send over the network must be valid, but the Peppol does not prescribe how you make sure that they are; you are free to hand-code a validator that checks all the rules for all the different document types, should you want to. There are, however, tools provided to help you with this. For most document types these come in two forms:

  1. XML Schema definitions, one per document type
  2. Schematron definitions, zero or more per document type

XML Schemas

For XML, a common approach is to use XML Schemas for this. From Wikipedia:

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

In other words, XML Schemas can be used to make sure that XML document adhere to a pre-defined structure, in terms of which elements occur where, how often, and in which order. There is also some capacity to check for the values of elements and element attributes, but this capacity is very limited.

Schematron

In many cases there are more complex requirements, for which XML Schemas are not sufficient. For instance, there might be requirements that span multiple fields, or requirements that contain calculations, or even dependencies on other requirements.

For such cases, there is a standard called Schematron. From Wikipedia:

Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It is a structural schema language expressed in XML using a small number of elements and XPath languages. In many implementations, the Schematron XML is processed into XSLT code for deployment anywhere that XSLT can be used.

Schematron is capable of expressing constraints in ways that other XML schema languages like XML Schema and DTD cannot. For example, it can require that the content of an element be controlled by one of its siblings. Or it can request or require that the root element, regardless of what element that is, must have specific attributes. Schematron can also specify required relationships between multiple XML files. Constraints and content rules may be associated with “plain-English” (or any language) validation error messages, allowing translation of numeric Schematron error codes into meaningful user error messages.

XML Schema and Schematron for Peppol

For Peppol BIS documents, there is always an XML Schema and one or more Schematron definitions. The XML schema is usually either a specific type of UBL, as defined by Oasis, or a UN/CEFACT D16B CII document. Since these are more general than just the Peppol documents, the source XML schemas that you would need to use are not published by OpenPeppol, but by the corresponding organization that manages them.

OpenPeppol does publish schematron files for all of the Peppol BIS documents. In most cases there is just a single Schematron file. The only case where this is not so is for Invoices, which have 2 schematron files; one contains rules to check for compliance to the European Norm EN-16931, the other contains rules to check for compliance to Peppol-specific rules.

For document types registered for use on the Peppol network, but not maintained by Peppol themselves similar rules are often defined, refer to the documentation for those document types to find the requirements, and if available, the Schema and/or Schematron files.

The most important thing to realize here is that if there are multiple files to validate against, you must validate against all of them, not just a single one. Validating only the XML Schema for UBL will not catch violations of Peppol-specific rules, but the reverse is true too: the Schematron validation rules expect that the XML Schema has already been performed.

How to validate XML files

There are many tools and software libraries to parse and create XML documents, and most of them contain functionality to perform XML Schema Validation as well. So, in practice, and in most frameworks, validation for XML schema is as simple as the following diagram:

For Schematron, the story is slightly different: Schematron definitions are generally not used directly (though some implementations do support this), instead, they are transformed into XSLT (eXtensible Stylesheet Language) files, which can be used by any XSLT transformer to transform a given document into an SVRL (Schematron Validation Report Language) document.

Simply said, this is a new XML document that contains a list of warnings and errors, about the XML document. If the document adheres to all rules defined in the Schematron file, these lists are empty. By checking for the presence or absence of errors in the SVRL result file, you can check whether a given XML document is valid or not.

A full description of how Schematron works is out of scope for this article, but if you would like to see that as well, let us know!

For now, it is enough to know that Schematron source files (.SCH) can be used directly (by some implementations), or they can be used to create an .xsl transformation stylesheet that ’transforms’ a given document into a list of errors and warnings about that document.

Validation in practice

There is one more caveat regarding Schematron and the resulting XSLT files; a schematron document is created using a specific version of XSLT. In many cases, this is XSLT 2, which is only supported by a limited number of libraries in an even more limited number of programming languages. Even if you would use Schematron files directly, and not the derived XSLT 2 file, your environment must still support XSLT 2’s XPath 2 specification.

With that in mind, let’s go through a simple scenario to validate a document. In this example, we assume that we already have the correct schemas and schematron files available.

  1. Validate the document against the correct XML Schema. Which schema this is is determined by the namespace and tag of the root element. For instance, if the element is Invoice and the namespace is urn:oasis:names:specification:ubl:schema:xsd:Invoice-2, the document is a UBL 2 Invoice. Which specific version of UBL 2 can, in the case of UBL, be determined by the UBLVersionID element. In the case of Peppol, this number defaults to 2.1. If the document does not conform to the XML Schema, the process can immediately stop, as this is not a valid document.

  2. If it does conform to the schema, we can move to the next step. We know it is UBL, we can determine the specific document type. For UBL, this is specified in the element CustomizationID. If our document is a Peppol BIS 3 Invoice, this element contains the value urn:cen.eu:en16931:2017#compliant#urn:fdc:peppol.eu:2017:poacc:billing:3.0. Peppol BIS 3 invoice documents must be compliant to two schematron rulesets: that of the European Norm, EN-16931, and that of Peppol BIS 3 itself.

  3. To determine the document conforms to the European Norm, a schematron file (or its derived XSLT file) can be used to transform the document into an SVRL document with a list of errors and warnings. If the list of errors is not empty, the document is invalid and the validation process can stop.

  4. If the list of errors from step 3 is empty, we can move on to the second Schematron validation. Again, the document is transformed to an SVRL document with a list of errors and warnings, this time using the schematron definition for Peppol BIS 3 Invoices itself. Again, if the list of errors is not empty, the document is invalid.

  5. If the list of errors is empty, the document is valid.

Peppol BIS 3 invoices are relatively unique in that they require 2 schematron validations; most document types only require the XML Schema and a single Schematron validation, and there are a number of document types that only have the XML Schema available at this moment. The exact validations that should be run can be found in the documentation of the document type you are validating.

Where to get XML Schema and Schematron files

For each type of document you want to validate you’ll need to find the correct rule definition files to download. These are generally published by the organization that defines the rules. For instance, the Schematron files for Peppol can be found here:

Peppol also publishes their validation files on github:

In the Netherlands, the schematron files and derived XSLT files for SI-UBL 2.0 (the UBL implementation of NLCIUS), is published on Github as well:

Note that such files are regularly updated; The peppol validation files, for instance, have update releases twice a year, as do the SI-UBL 2.0 files. The release and update policy differs per document type and organization.

Validation Tools

Online tools

There are several tools online that can help you validate documents. These are not intended to provide production-level service, but can be of use when developing a specific document type and you want to find out whether your output is correct.

There are a number of companies that do provide production-level validation, but these generally require a contract. Search for ‘Validate Peppol Documents’ and you will get several relevant results.

ion-docval

Of course, the problem with using an online validator in production, apart from capacity, costs, and service availability, is that you are sending potentially sensitive documents to an external service. You may want, or need, to perform all document validation locally.

Ionite has published a freely available open source toolkit to perform validation against (multiple) XML Schema and Schematron files. The toolkit is written in Java, but is specifically intended to be usable in other environments, by way of a local web API.

For more information, see the ion-docval website or its github page.

Conclusion

Validating documents completely and correctly is more involved than a lot of people realize, and it is very important that it is performed correctly. Not just because Peppol requires it in its agreements, but also because the general document formats are very broad, and an important step to interoperability is that the data that is exchanged between systems follows all the rules and restrictions that the receiver expects.

To a valid future!