One-way Hashes & Privacy

What a one-way hash is

SHA-256 turns an arbitrary input (text, image bytes, a PDF file) into a fixed 256-bit output. If the input changes by even one character, the output changes completely. The important property here is one-wayness: given only the hash, you can’t realistically recover the input.

Deterministic

Collision-resistant

Preimage-resistant

Examples

These examples are for exact UTF‑8 byte sequences (no newline). They are easy to verify locally.

Input	SHA-256
`hello`	`2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824`
`hello world`	`b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9`
pg10.txt (Project Gutenberg “King James Bible” plain text)	`2623a56935731e3628e86114f5e776a14b6f61225c5c29791f31e40ff6e6774e`

Hashing works the same way for PDFs or any other file type: SHA-256 is over the file bytes. Two “King James Bible” files from different sources (or in different formats) will almost certainly have different hashes because their bytes differ (line endings, metadata, fonts, layout, etc.). To hash a specific file you have locally:


                 # This is for Mac, Linux, and Windows via WSL
                 # Project Gutenberg “King James Bible” in plain text 
                 sha256sum pg10.txt

Why publishing hashes is safe

An SHA-256 hash is 256-bit, so brute forcing arbitrary inputs requires on the order of 2 to-the-power-of 256 attempts (≈ 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 possibilities).
Issuers publish only hashes and a small status (“OK”, “Valid ID”, “Awarded”, etc.)—not the document text.
Anyone who already has the original can recompute the hash to verify it matches; outsiders can’t read the original from the hash.

To translate that giant number into something human-readable: it’s roughly 10⁷⁷ (a 1 followed by ~77 zeros). Even if you imagined all data centers on Earth doing SHA-256 guesses all day, every day, the time-to-crack is still absurd:

At 1 trillion guesses/sec (10¹²): about 3×10⁵⁷ years.
At 1 quintillion guesses/sec (10¹⁸): about 3×10⁵¹ years.

For comparison, even if you measure time from 1 AD (roughly two thousand years), these estimates are still on the order of 10⁴⁸ to 10⁵⁴ times longer. These are not “marketing numbers”; they’re the implication of a 256-bit one-way function.

And even in the fantasy scenario where someone could run guesses for “billion‑billion‑billion” years and eventually find some bytes that match a target SHA‑256, those bytes would almost certainly be meaningless noise, not a human-readable document. The chance of stumbling onto the exact original PDF/text bytes is vanishingly small.

Caveat: this gets less impossible when the attacker already knows most of the structure and many of the fields are predictable. If a credential format is fixed and the remaining unknowns come from a small set (low entropy), an attacker can guess those unknowns and hash each guess. That is why adding an issuer-generated random “salt line” (see “Adding unpredictability” below) is a simple and effective defense: it enlarges the search space so guessing attacks become impractical again.

One subtle point: people sometimes mix up two different kinds of attacks—collisions and reversing. A collision means “two different inputs produce the same hash”. Reversing (more precisely, a preimage) means “given a specific hash, find any input that produces it”. Live Verify relies on preimage resistance: if you only know the hash, you still can’t feasibly invent bytes that match it.

SHA‑1: a prior historical hashing algorithm

SHA‑1 was a widely used hash function for many years (software downloads, file integrity checks, and more). Back then, it was considered strong enough.

In 2017, researchers publicly demonstrated a practical SHA‑1 collision by producing two different PDF files with the same SHA‑1 hash (the “SHAttered” demonstration). That matters because if collisions are feasible, the hash stops being a reliable fingerprint for integrity and authenticity workflows.

The industry response was to move to newer, stronger hashes—most commonly SHA‑256 (part of the SHA‑2 family). Live Verify uses SHA‑256 because it’s widely deployed, fast, and (today) has no known practical collision or “reverse-the-hash” attacks.

Issuer obligations

Hashes can be public, but the underlying documents and registries should be treated as sensitive systems. In the era of constant breaches, issuers should assume strong obligations for the plain text that would produce the hashes—think PCI-class separation and stringency (see PCI DSS).

Store plain-text documents behind strong access controls and logging.
Segment hashing services from customer-facing endpoints so no single attacker can pivot.
Monitor for anomalies or repeated hash lookups that suggest credential harvesting.

For very high-value credentials, some issuers may choose a stronger operational model: keep the plain-text registry in an internal environment (even air-gapped), and publish only a one-way feed of hashes outward on a schedule. That way, the public verification endpoint never has direct access back to the sensitive source records.

Important caveat: hashes are not encryption. If the thing you hash is too short or too predictable, attackers can sometimes guess it. For example, if the input space is “very small” (common phrases, short IDs, a tiny set of templates), an attacker can hash guesses and look for matches.

Low-entropy inputs and “guessing attacks”

The one-way property protects you when the underlying content has high “entropy” (lots of unpredictable variation). It is weaker when the content is short or highly guessable. This is why issuers must think about what is being hashed:

Good: Rich documents where the exact text varies significantly per person (many fields, many combinations).
Risky: Tiny claims with only a few possible values (e.g., “Over 18: YES/NO” with no other context).

If a document falls into the “risky” bucket, issuers can add unpredictability so there’s no small guessable space to brute force.

See the “Elvis Aaron Presley” sample credential below for what that can look like in practice.

Adding unpredictability (a random salt line)

A simple hardening technique is to include a random, issuer-generated salt line as part of the hashed text—something like: random chars: so3iewf 8fhs rwef. This makes guessing attacks impractical because an attacker would need to guess the salt too.

The salt line must be printed (or otherwise present) so the verifier can OCR it and include it in the hash.
The salt should be generated securely and be long enough to be unguessable.
In Live Verify terms: it becomes part of the normalized text that is hashed, so the issuer must store the corresponding hash.

Example (carriage returns shown as new lines). This is the kind of plain text that would be normalized and hashed:

Elvis Aaron Presley
DOB: 1935-01-08
State: TN
DL#: P1234567
Expires: 1960-01-08
random chars: so3iewf 8fhs rwef

Without that last line of random chars, the regular structure of the name/DOB/State/DL/expiry could be more guessable with a brute-force attack. The value of the knowledge "oh look, Elvis has a suspended Tennessee driving license" is questionable though.

One-way hash + verification workflow

The phone app (or future camera app) performs OCR + normalization → SHA-256 locally, then requests https://issuer.example.com/c/<hash>. The issuer hosts a simple response (often static): “OK”, “Valid ID”, “Awarded”, “Denied”, etc. Because the hash is intended to be public, no login is required for lookups; HTTPS still protects the request in transit.

Read Privacy Declaration Explore Use Cases