Hi all--
I know folks on this list are concerned about the future prospects of the ill-named "Web of Trust" [0]. I wanted to kick off discussion about one particular corner of it: how to evaluate the "validity" of OpenPGP certificates for particular names.
Thanks to Justus for coining the term ORCA: OpenPGP Relational Certificate Authentication. (maybe "Relational" should be "Recursive" or "Rational"?) I hope this message isn't "squatting" it too much. I'm happy to use a different term if there is an objection, but i like having a catchy shorthand, one that *isn't* "WoT".
This e-mail is an attempt to just frame ORCA as a specific problem. I'm hoping that future messages on this list will try to tackle deeper definitions, implementation details, and algorithms.
I'm proposing the following problem statement (which i drafted earlier today on IRC, but has evolved a bit in my mind since then):
- ORCA is interested in authenticating aspects of an OpenPGP certificate. Concretely, the goal is to determine whether it is appropriate to associate a given User ID with a particular primary key.
- ORCA models this problem as a function, where the inputs are: - set of OpenPGP public keys P, - set of OpenPGP certifications C, - a "root trust" mapping T from P → N where N ∈ [0.0,1.0] - user ID (UTF-8 string) U, and - primary key K and the output is a numerical score from 0.0 (meaning "K" is inappropriate for use in contexts where the User ID is expected to be "U") to 1.0 (meaning "K" is entirely appropriate for use in contexts where the user ID is expected to be "U")
I observe that T in the definition above is slightly more flexible than the "set of root certificates" framing that Neal has been developing. It's capable of representing that of course (when X is a "root cert", T[X] = 1.0, else T[X] = 0.0), but could also represent something more nuanced, like GnuPG's "marginal ownertrust" (whatever that means). Maybe we want to go with just a "simple set of root certs" definition for T, but given that GnuPG exists and I think we want to be able to model its behavior, i lean toward starting with the more flexible mapping above.
I also observe that N (a float from 0 to 1 that is output by T) has a remarkably distinct semantics from the output of the function itself. Maybe it'd be better to declare them with different ranges or different datatypes somehow to make it clear that you can't just map one to the other.
Note also that this problem statement deliberately excludes several things. This is not to say that these things are ruled out or bad, just that they are *different problems* than the core of ORCA. Hopefully a clearer understanding of ORCA makes thinking about these other problems easier:
- From the ORCA primitive function, we can build higher-level functions like "select the best match for user ID U from this set of OpenPGP certificates". But this particular framing does not concern itself with how to build this higher-level functionality.
- ORCA doesn't concern itself with how to *discover* the various OpenPGP certifications that it uses. It accepts a set of certifications as input, and works from them.
- This framing doesn't care at all about peripheral information like encryption-, signing-, or authentication-capable subkeys, algorithm preference advertising packets, keyserver preferences, etc.
- This framing doesn't even contemplate OpenPGP "Certificates" (or "Keyblocks" or "TPKs" if you prefer). Rather, it assumes that some other layer is capable -- given the relevant OpenPGP packets -- of reassembling the appropriate packets into these higher-level objects.
- This framing treats the User ID as a raw UTF-8 string. This isn't always how people will want to match or discover User IDs. For example, many MUAs might deliberately ignore the non-e-mail address part of a User ID, and just try to evaluate it in that way. This framing assumes that some sort of translation can be done to map these "filtered" user IDs onto real User IDs and vice versa, but that concern is not part of the ORCA model.
- This framing ignores any attempt to evaluate User Attributes or other OpenPGP-style identity data.
- This framing deliberately rules out the use of any additional metadata, such as "TOFU" information, Autocrypt state, etc.
- The definition of T precludes applying "root trust" to anyything other than a pubkey. In particular, you can't apply it to a <pubkey,User ID> combination. It also precludes offering any more detailed nuance beyond a single scalar per pubkey for this mapping. Given the flexibility offered by the OpenPGP regex and "trust signature" subpackets, and the ability to synthesize new, privately-held pubkeys which can make new tsigs, i think we don't need anything more, but please sing out if you think otherwise.
We have at least two known implementations of this function, which are GnuPG's "pgp" and "classic" trust models. In GnuPG terminology, these models calculate user ID "validity" based on its ownertrust DB and the certifications stored in its pubring. GnuPG allows the user to tweak its settings. These modules are additionally parameterized by GnuPG's options --completes-needed, --max-cert-depth, and --marginals-needed.
Even given this fairly narrowly-constrained problem space, distinct implementations, which could provide different results.
What do folks think about this problem statement?
--dkg
[0] "Web of Trust" is a disturbing term because it masks and conflates several of the underlying elements, for example willingness to rely on a particular key's certifications vs. willingness to use a particular key for a particular communications peer. It's also troubling because most people think they know what it means already, but haven't actually thought through the model in enough detail to understand their own confusion (i fall into this trap frequently myself). So using the term "ORCA" instead is a way to "reset" thinking about this, to allow a distinct framing of the problem.