Skip to content

In search of the perfect identifier: about subjects, URLs, CIDs and more #1114

@joepio

Description

@joepio

Most IDs are bad

Many systems give identifiers to resources using numbers or UUID. But if you get that data, you can't point to it, so you'll have to share the entire thing with others. You can't link to it without providing a bunch of context.

URLs are awesome

That's why the URL is one of the most impactful inventions on the planet. I feel like too few people understand how fundamentally powerful they are:

  • Globally unique identifier
  • Resolvable

It's pretty much why I fell in love with linked data in the first place, and why I designed Atomic Data to be the way that it is.

HTTP: fast, readable

The HTTP URL is by far the most used example of a URL. They're fast (because DNS is fast), they are human readable (when well-designed, at least) but they have one big flaw:

HTTP URLs only work as long as that server works. If anything happens to that machine, too bad. If someone renames the target resource filename, whoops.

So now our internet is aging and more and more of the original web is dying. Help!

CIDs: resilient

Content Identifiers (CIDs) use hashes (and DHTs) to remove the dependency on any specific server. The hash algorithm guarantees that the contents are as you want them to be, and haven't changed. If you combine that with a DHT and a smart protocol, you get something like torrent magnet URLs or IPFS. Yay, resilience!

But at what cost?

  • Readability. A bunch of random characters doesn't convey any sementic meaning, contrary to how HTTP URLs tend to look
  • Speed. Getting a piece on content can take a long time (>10s) depending on how many hops are needed in the network.
  • Authorship. It becomes hard to prove who made the data.
  • No changes. If the resource is changed by the author, it gets a new ID.

Signatures: traceable

We can solve the authorship issue by using cryptographic signatures!
Sign a message with a private key, and share the public key.
We create an object for the initial commit like this:

{
  "pubkey": "ed25519:6ff7a2a3d88bb4e5a9a4d9bb124fddc8f3bc53b3d7b46a31c1ef9ed9f53a44b2",
  "signature": "MEUCIQDg0B63PfK4epDJCeNENbq5NqsnDNwLKQa0YQwMe9X1rgIgUgO4vwRgW7zfgzY8FczRmF8MEK5Pdk6bK/V3BLgPaF0=",
  "timestamp": "2025-09-03T15:45:00Z",
  "content": {
    "title": "sup world"
  }
}

We then follow all commits from this one to construct the latest version.

We can put these signatures in DHTs, same as IPFS does for the CIDs, but again if we rely on that performance can be bad.

Combining HTTP with signatures / CIDs

Why would I have to choose? If my server is online, I can get the performance benefits from HTTP. If its offline, the client can resolve it in some other way.

Here's my concept:

{http url hint}#signature={cryptographic signature of initial commit}

E.g.

https://example.com/my-content#signature=asongoaiesniaesngoasn

Pretty good, because:

  • The http part is just a hint, we can do without it
  • Just works in normal browsers, HTTP is still alive
  • Fast because DNS is very fast
  • Even faster if Atomic Data clients first check the signature for local cache
  • Atomic Data clients use the fragment if the request fails
  • Domain can be overwritten if that server is offline.
  • Clients can create content locally, use localhost#signature=ngaiungau Local-first resources - don't require a server when creating a resource #998

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions