Towards an easier application process: Payload PDFs with JSON Blockcerts

Hi!

Our team just came up with an idea to ease the application process for students receiving Blockcerts. I wanted to see with you what feedback you may have!

Problem statement

One of our concern was that students will have to apply by sending their certificates as a JSON file or by sending a URL pointing to this certificate to employers.

We already issue certificates as PDF files and we want Blockcerts to be the topping of the process. We don’t want future employers to worry about/handle JSON files and we want students to get PDF files they can view, share, and store the way they already do with other documents.

Idea 1: Connect the pdf to the outside world

The first idea we had was to add a QR code on the pdf. This QR code would be the URL pointing to the online certificate, allowing the independent verification of this latter by the employer.
However, PDFs can be easily tampered with and any student could modify the QR code and make it point to a forged Blockcerts of their own.

Idea 2: Connect the pdf to the outside world and protect it

The second idea was the same but we also digitally sign the PDF so the employer receiving the file can be assured that it hasn’t been modified and that the QR Code is indeed the one written by the university.
However, we don’t want to rely on any third party company to issue a certificate for us so we can sign PDFs. We want PDFs to live by themselves.

Current idea: Payload the pdf with the JSON

Our current idea is to embed the JSON Blockcert inside the pdf (like JSON files embedded in PNG open badges). In this configuration, students can apply by sending the PDF of their certificate. Employers can drop the PDF onto the online verifier and get the same verification process they would have with the JSON file. We can still add a QR Code pointing to the online certificate (or some documentation explaining the purpose of Blockcerts etc.).

In order to make sure that the PDF hasn’t been modified by the student, we write the hash of the PDF we generate inside the JSON file before issuing on the blockchain. Students messing with the PDF content will have to modify the hash written inside the JSON cert, leading to a failure of its verification.

With this system, students get a pdf and they benefit the power of Blockcerts. Also, employers can trust the PDF without compromising the verification process.

I drop an illustration below,
I am looking forward to your feedback,
Florent.

out

Hi @f-dufour, interesting idea. Do PDF’s have a mechanism for embedding additional hidden data inside of the file? I like the idea of blockcerts inside pdf’s and changing the verification process to verify blockcert-embeded pdf’s.

I think your flaw of idea #1 is still present here though.

However, PDFs can be easily tampered with and any student could modify the QR code and make it point to a forged Blockcerts of their own.

This can still happen in your process as well. PDF’s are alterable, and if a student is going through the effort of creating their own blockcerts in attempts of tricking a verifier, they can go the extra step and create a new PDF, hash it, put the hash in new blockcert, and then back into the PDF. It’s more steps, but quicker than a college degree :wink:

Answer Aronning payloading pdfs

Hi @aronning!

Thanks, I really like this idea too!

The pdf standard allows indeed to embed any kind of file into the document. It is documented in the pdf specification, and in greater details in this article. As a matter of fact, the payload is written as a stream inside the pdf itself so the final file is completely standalone.

You can also find on this Didier’s Stevens blog post, a simple python script that allows you to generate a pdf and embed any file from your hard drive inside.

Here is a minimalistic pdf I was able to generate with a sample blockcert:

Bottom line:

  • Embedding a file is made possible by the pdf standard (no hack required).
  • As the payload is written as a binary stream, the pdf is completely standalone.
  • As the payload is written as a binary stream, it makes it a bit harder for students to mess around.
  • For further security, it is possible to encrypt the json before loading the pdf so that only the verifier will be able to decrypt it. This can be a solution to the concern we stated.
  • It is possible to extract any embedded file out of a pdf (the itext library does that well: example)
  • It might be important to make sure that embedded json inside pdf are not flagged by anti-viruses.

Another idea to embed data could be to use steganography and write the content of the Blockcert inside an image (for example the logo of the university in the header of the pdf). In such case the verifier would have to:

  1. Extract the logo of the university as a bitmap file.
  2. Use steganography to extract content hidden in the image.
  3. (Enventually decrypt the data).
  4. Submit the extracted (and decrypted) data to the generic Blockcerts verifier.

Note: Steganography is already used in Open Badges png images.

Cheers,
Florent.

Personally, all of this sounds more complicated than simply using signed JSON files, which are human readable and machine readable. (The code to display the information is embedded). The entire goal is to have a digital record with a verifiable display of information, the same as the data contained inside the record. Any separation creates problems. Hiding a JSON file inside the PDF still leaves room to manipulate the display, which most human-driven workflows depend upon. Even using an image as a pointer to the “real” information hosted somewhere else limits usefulness. I’ve written about this more at length here.

1 Like

I think that by inputing a hash of the PDF (minus Blockcert) inside of the blockcert so that it can be cross referenced should allow one to verify that the display of the PDF is unaltered.

Same thing can be done if it’s in 2 separate files (much like open timestamps works).

I think this isn’t a bad idea if a company/issuer wants to implement PDFs like this on top of the standard - just add pdfHash as an additional field in your blockcerts schema and write your own code that does the inline/extraction & comparison process. The blockcert json should still verify on its own in a standard blockcerts verifier.

1 Like

@aronning good point! While this may be technically possible, this thread began with the intent of making it easier for holders and verifiers … but using two files for every credential, instead of one, doesn’t seem easier for holding, submitting, or verifying.

1 Like