One-way hashes stay safe in public

Live Verify relies on a simple idea: you can safely publish a list of SHA-256 hashes on a public web server, because a hash is a one-way “fingerprint” of the original content—useful for matching, useless for reading.

Live Verify: the next logical step up from the iPhone's Live Text.

What a one-way hash is

SHA-256 turns an arbitrary input (text, image bytes, a PDF file) into a fixed 256-bit output. If the input changes by even one character, the output changes completely. The important property here is one-wayness: given only the hash, you can’t realistically recover the input.

Deterministic
Collision-resistant
Preimage-resistant

Examples

These examples are for exact UTF‑8 byte sequences (no newline). They are easy to verify locally.

Input SHA-256
hello 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
hello world b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
pg10.txt (Project Gutenberg “King James Bible” plain text) 2623a56935731e3628e86114f5e776a14b6f61225c5c29791f31e40ff6e6774e

Hashing works the same way for PDFs or any other file type: SHA-256 is over the file bytes. Two “King James Bible” files from different sources (or in different formats) will almost certainly have different hashes because their bytes differ (line endings, metadata, fonts, layout, etc.). To hash a specific file you have locally:


                 # This is for Mac, Linux, and Windows via WSL
                 # Project Gutenberg “King James Bible” in plain text 
                 sha256sum pg10.txt
            

Why publishing hashes is safe

To translate that giant number into something human-readable: it’s roughly 1077 (a 1 followed by ~77 zeros). Even if you imagined all data centers on Earth doing SHA-256 guesses all day, every day, the time-to-crack is still absurd:

For comparison, even if you measure time from 1 AD (roughly two thousand years), these estimates are still on the order of 1048 to 1054 times longer. These are not “marketing numbers”; they’re the implication of a 256-bit one-way function.

And even in the fantasy scenario where someone could run guesses for “billion‑billion‑billion” years and eventually find some bytes that match a target SHA‑256, those bytes would almost certainly be meaningless noise, not a human-readable document. The chance of stumbling onto the exact original PDF/text bytes is vanishingly small.

Caveat: this gets less impossible when the attacker already knows most of the structure and many of the fields are predictable. If a credential format is fixed and the remaining unknowns come from a small set (low entropy), an attacker can guess those unknowns and hash each guess. That is why adding an issuer-generated random “salt line” (see “Adding unpredictability” below) is a simple and effective defense: it enlarges the search space so guessing attacks become impractical again.

One subtle point: people sometimes mix up two different kinds of attacks—collisions and reversing. A collision means “two different inputs produce the same hash”. Reversing (more precisely, a preimage) means “given a specific hash, find any input that produces it”. Live Verify relies on preimage resistance: if you only know the hash, you still can’t feasibly invent bytes that match it.

SHA‑1: a prior historical hashing algorithm

SHA‑1 was a widely used hash function for many years (software downloads, file integrity checks, and more). Back then, it was considered strong enough.

In 2017, researchers publicly demonstrated a practical SHA‑1 collision by producing two different PDF files with the same SHA‑1 hash (the “SHAttered” demonstration). That matters because if collisions are feasible, the hash stops being a reliable fingerprint for integrity and authenticity workflows.

The industry response was to move to newer, stronger hashes—most commonly SHA‑256 (part of the SHA‑2 family). Live Verify uses SHA‑256 because it’s widely deployed, fast, and (today) has no known practical collision or “reverse-the-hash” attacks.

Issuer obligations

Hashes can be public, but the underlying documents and registries should be treated as sensitive systems. In the era of constant breaches, issuers should assume strong obligations for the plain text that would produce the hashes—think PCI-class separation and stringency (see PCI DSS).

For very high-value credentials, some issuers may choose a stronger operational model: keep the plain-text registry in an internal environment (even air-gapped), and publish only a one-way feed of hashes outward on a schedule. That way, the public verification endpoint never has direct access back to the sensitive source records.

Important caveat: hashes are not encryption. If the thing you hash is too short or too predictable, attackers can sometimes guess it. For example, if the input space is “very small” (common phrases, short IDs, a tiny set of templates), an attacker can hash guesses and look for matches.

Low-entropy inputs and “guessing attacks”

The one-way property protects you when the underlying content has high “entropy” (lots of unpredictable variation). It is weaker when the content is short or highly guessable. This is why issuers must think about what is being hashed:

If a document falls into the “risky” bucket, issuers can add unpredictability so there’s no small guessable space to brute force.

See the “Elvis Aaron Presley” sample credential below for what that can look like in practice.

Adding unpredictability (a random salt line)

A simple hardening technique is to include a random, issuer-generated salt line as part of the hashed text—something like: random chars: so3iewf 8fhs rwef. This makes guessing attacks impractical because an attacker would need to guess the salt too.

Example (carriage returns shown as new lines). This is the kind of plain text that would be normalized and hashed:

Elvis Aaron Presley
DOB: 1935-01-08
State: TN
DL#: P1234567
Expires: 1960-01-08
random chars: so3iewf 8fhs rwef

Without that last line of random chars, the regular structure of the name/DOB/State/DL/expiry could be more guessable with a brute-force attack. The value of the knowledge "oh look, Elvis has a suspended Tennessee driving license" is questionable though.

One-way hash + verification workflow

The phone app (or future camera app) performs OCR + normalization → SHA-256 locally, then requests https://issuer.example.com/c/<hash>. The issuer hosts a simple response (often static): “OK”, “Valid ID”, “Awarded”, “Denied”, etc. Because the hash is intended to be public, no login is required for lookups; HTTPS still protects the request in transit.

Read Privacy Declaration Explore Use Cases