An essential part of Blockcerts verification is checking the integrity of the certificate: we want to determine if the certificate under inspection has changed since its creation. A certificate that has been altered should fail.
All versions of Blockcerts have used certificate hashing as part of this process. This has the property that changes to the certificate will change the hash, causing the certificate to be rejected during verification.
In version 1.2 we switched from computing a hash of the binary file to computing a hash of the JSON-LD normalized -- or, more accurately, canonicalized -- file.
This post talks about why we made that change, and the improvements it enabled.
Before 1.2, the Blockcerts issuing process read the (certificate) file in its binary form, took a hash of that, and issued a transaction with that value on the blockchain. During verification, the hash of the binary file would be recomputed to ensure it hadn't changed.
This approach complicated implementations and reduced convenience in a couple of ways:
You could not store proof of the transaction in the same file as the certificate, because (by design) any change to the file would change the hash of the file, thus invalidating the certificate.
The binary file would have to be stored. Because a JSON object is an unordered set of key/value pairs, using a JSON document store (as opposed to preserving the binary file) would likely corrupt the order.
In other words, pre-v1.2 verifiers needed to retrieve the certificate in its original binary encoding AND evidence/transaction receipt.
Recall that before 1.2, we used gridfs in cert-viewer to store the certificate in its original form so that we could predictably take the hash of the file. We stored transaction data separately, and verification would use both these pieces.
V1.2 and beyond
In V1.2 we wanted to make issuing certificates more economical by issuing batches at a time. The well-known technique for doing this is to combine the hashes of documents into a Merkle tree, obtain the root, and create a transaction containing the Merkle root as an output.
This approach requires a change to verification: the certificate hash has to be combined with the Merkle proof to trace back the contents to the value stored on the blockchain. This results in yet more information (not part of the certificate) that is needed to be stored alongside the certificate to enable verification. This would complicate even further storage and manageability for recipients and issues, and motivated us to improve the convenience.
Therefore, in V1.2 we had a goal to improve the usability of the blockchain certificates by embedding the proof in the certificate. That meant that we couldn't rely on the hash of the binary file. We wanted a way to deterministically obtain the hash of the contents except for those signature or proof.
As described in #2, a JSON object is an unordered set of key/value pairs. In fact, popular libraries in different languages will (by default) serialize JSON objects differently. So for cross-platform verification to work, one might consider serializing JSON objects to string in a deterministic order. In python, for example, you can sort the keys during serialization through use of the
But JSON-based canonicalization only works for JSON. As our long-term goals include increased alignment with Verifiable Claims, where signatures are syntax agnostic, we turned to JSON-LD canonicalization. This syntax-agnostic approach converts JSON-LD to RDF format, and can be used to predictably format the certificate JSON document before hashing. This approach works across platforms.
(The original motivation for use of JSON-LD in Blockcerts V1.2 was the fact that Open Badges began using JSON-LD contexts -- canonicalization was a useful side effect of this work.)
JSON-LD canonicalization allows the following benefits over the previous technique (in which we hashed the binary file):
- no need for extra binary storage of file
- the proof needed to verify the certificate within the same JSON file (without breaking the hash, of course)
- everything but the signature block is canonicalized and hashed, and verification can duplicate this
JSON-LD canonicalization is just one aspect of how JSON-LD has been useful in Blockcerts. Using JSON-LD will also allow us to more easy define extensible Blockcerts, with different vocabularies, and semantic mapping (per domain). Those are topics for a future post.