ORCA problem statement: OpenPGP Relational Certificate Authentication - Devel - lists.sequoia-pgp.org

3 Mar 2021


      Hi all--
I know folks on this list are concerned about the future prospects of
the ill-named "Web of Trust" [0].  I wanted to kick off discussion about
one particular corner of it: how to evaluate the "validity" of OpenPGP
certificates for particular names.
Thanks to Justus for coining the term ORCA: OpenPGP Relational
Certificate Authentication.  (maybe "Relational" should be "Recursive"
or "Rational"?)  I hope this message isn't "squatting" it too much.  I'm
happy to use a different term if there is an objection, but i like
having a catchy shorthand, one that *isn't* "WoT".
This e-mail is an attempt to just frame ORCA as a specific problem.  I'm
hoping that future messages on this list will try to tackle deeper
definitions, implementation details, and algorithms.
I'm proposing the following problem statement (which i drafted earlier
today on IRC, but has evolved a bit in my mind since then):
- ORCA is interested in authenticating aspects of an OpenPGP
   certificate.  Concretely, the goal is to determine whether it is
   appropriate to associate a given User ID with a particular primary
   key.
- ORCA models this problem as a function, where the inputs are:
    - set of OpenPGP public keys P,
    - set of OpenPGP certifications C,
    - a "root trust" mapping T from P → N where N ∈ [0.0,1.0]
    - user ID (UTF-8 string) U, and
    - primary key K
   and the output is a numerical score from 0.0 (meaning "K" is
   inappropriate for use in contexts where the User ID is expected to be
   "U") to 1.0 (meaning "K" is entirely appropriate for use in contexts
   where the user ID is expected to be "U")
I observe that T in the definition above is slightly more flexible than
the "set of root certificates" framing that Neal has been developing.
It's capable of representing that of course (when X is a "root cert",
T[X] = 1.0, else T[X] = 0.0), but could also represent something more
nuanced, like GnuPG's "marginal ownertrust" (whatever that means).
Maybe we want to go with just a "simple set of root certs" definition
for T, but given that GnuPG exists and I think we want to be able to
model its behavior, i lean toward starting with the more flexible
mapping above.
I also observe that N (a float from 0 to 1 that is output by T) has a
remarkably distinct semantics from the output of the function itself.
Maybe it'd be better to declare them with different ranges or different
datatypes somehow to make it clear that you can't just map one to the
other.
Note also that this problem statement deliberately excludes several
things.  This is not to say that these things are ruled out or bad, just
that they are *different problems* than the core of ORCA.  Hopefully
a clearer understanding of ORCA makes thinking about these other
problems easier:
- From the ORCA primitive function, we can build higher-level functions
   like "select the best match for user ID U from this set of OpenPGP
   certificates".  But this particular framing does not concern itself
   with how to build this higher-level functionality.
- ORCA doesn't concern itself with how to *discover* the various
   OpenPGP certifications that it uses.  It accepts a set of
   certifications as input, and works from them.
- This framing doesn't care at all about peripheral information like
   encryption-, signing-, or authentication-capable subkeys, algorithm
   preference advertising packets, keyserver preferences, etc.
- This framing doesn't even contemplate OpenPGP "Certificates" (or
   "Keyblocks" or "TPKs" if you prefer).  Rather, it assumes that some
   other layer is capable -- given the relevant OpenPGP packets -- of
   reassembling the appropriate packets into these higher-level objects.
- This framing treats the User ID as a raw UTF-8 string.  This isn't
   always how people will want to match or discover User IDs.  For
   example, many MUAs might deliberately ignore the non-e-mail address
   part of a User ID, and just try to evaluate it in that way.  This
   framing assumes that some sort of translation can be done to map
   these "filtered" user IDs onto real User IDs and vice versa, but that
   concern is not part of the ORCA model.
- This framing ignores any attempt to evaluate User Attributes or other
   OpenPGP-style identity data.
- This framing deliberately rules out the use of any additional
   metadata, such as "TOFU" information, Autocrypt state, etc.
- The definition of T precludes applying "root trust" to anyything
   other than a pubkey.  In particular, you can't apply it to a
   <pubkey,User ID> combination.  It also precludes offering any more
   detailed nuance beyond a single scalar per pubkey for this mapping.
   Given the flexibility offered by the OpenPGP regex and "trust
   signature" subpackets, and the ability to synthesize new,
   privately-held pubkeys which can make new tsigs, i think we don't
   need anything more, but please sing out if you think otherwise.
We have at least two known implementations of this function, which are
GnuPG's "pgp" and "classic" trust models.  In GnuPG terminology, these
models calculate user ID "validity" based on its ownertrust DB and the
certifications stored in its pubring.  GnuPG allows the user to tweak
its settings.  These modules are additionally parameterized by GnuPG's
options --completes-needed, --max-cert-depth, and --marginals-needed.
Even given this fairly narrowly-constrained problem space, distinct
implementations, which could provide different results.
What do folks think about this problem statement?
--dkg
[0] "Web of Trust" is a disturbing term because it masks and conflates
    several of the underlying elements, for example willingness to rely
    on a particular key's certifications vs. willingness to use a
    particular key for a particular communications peer.  It's also
    troubling because most people think they know what it means already,
    but haven't actually thought through the model in enough detail to
    understand their own confusion (i fall into this trap frequently
    myself).  So using the term "ORCA" instead is a way to "reset"
    thinking about this, to allow a distinct framing of the problem.