Difference between revisions of "Identity Model"

From MgmtWiki
Jump to: navigation, search
(The Context)
(Word usage)
Line 217: Line 217:
The terms used often depend on the subset of the English language that is used. In normal English usage the user would be the thing doing the action, or the subject. The [[User Object]] is the thing being acted upon. In the language of the DBA an entity is just a named object, any subjects (actors) are not part of the DB schema. In the database EAR model there are defined to be entities, attributes and relationships. This Model uses the term linkages rather than relationships and adds the concept of behaviors as separate from attributes. In the IDESG a digital entity is a named object that deals with identifiers. Authorization happens at many locations among the digital entities described here. The only place it is used in this model is for the "authorization code" and the processes used to acquire and enforce the user's intent. In most places the usages will not conflict, but caution is urged when these terms are used elsewhere to be precise about meaning.
The terms used often depend on the subset of the English language that is used. In normal English usage the user would be the thing doing the action, or the subject. The [[User Object]] is the thing being acted upon. In the language of the DBA an entity is just a named object, any subjects (actors) are not part of the DB schema. In the database EAR model there are defined to be entities, attributes and relationships. This Model uses the term linkages rather than relationships and adds the concept of behaviors as separate from attributes. In the IDESG a digital entity is a named object that deals with identifiers. Authorization happens at many locations among the digital entities described here. The only place it is used in this model is for the "authorization code" and the processes used to acquire and enforce the user's intent. In most places the usages will not conflict, but caution is urged when these terms are used elsewhere to be precise about meaning.
The IDESG [[Glossary]] is not currently maintained and is not in complete agreement with the [[Baseline Functional Requirements v1.0]] which also has a glossary.
Other organizations, like NIST, refer to Identifier Providers as Registration Authorities, which term also has other formal properties.
Other organizations, like NIST, refer to Identifier Providers as Registration Authorities, which term also has other formal properties.

Revision as of 13:24, 24 July 2018


Identity is core to any object that we want to access, change or even discuss. Before we can even describe some thing, we need to be able to name that thing. The naming of objects has been a concern ever since the book of Genesis was written. One problem with the name of a object, is that most objects that we see in our everyday life are changing even as we try to describe them. Even the ancient Greek philosopher Heraclitus understood that the name of a river was ephemeral as you can never step twice into the same river as the contents of the river are constantly changing. The same statement can be made about any human being or any organization. But the reality is that all relationships are based around the idea of identity, as diffuse as that idea might be. Before we can start to make statements about an identity, it is best to create some model that will help us establish a context where we can intelligently discuss identity. Existing identity models have been organized around the identifier that a user has registered at an Identity Provider (IdP). This has lead to a back-lash against the use of the term identity in the technical community.

Note that other philosophers have established the Law of Identity to be a naming of one particular thing. In logic, the law of identity states that each thing is identical with itself. By this it is meant that each thing (be it a universal or a particular) is composed of its own unique set of characteristic qualities or features, which the ancient Greeks called its essence. It is the first of the three classical laws of thought. For this model the essence of an identity will be the identifiers, attributes, behaviors and inferences collected about that thing, and how to accommodate that data as it changes over time.

This model uses the term Identifier in places where other models use the term Identity for reasons that should be clear towards the end of the description of the way that User Private Information is held by sites on the Internet. Here there is a determined effort to separate the attributes known by a digital entity about the user (the User Object) from the real-world user (the subject). No user attributes should be required to authenticate the user. In this model the user of a service is identified in the real-world by the identifiers, attributes and behaviors, while the data in the User Object held in some internet service will never contain more than a small subset of those attributes. The model seeks to describe the relationship and the interaction of the real-world user with the User Objects that are contained in a wide range of networked computer systems.

This page is written for the architects of Identity Systems and so has lots of technical details. For a management presentation with minimal technical details see the Identity Model Overview.

For general concepts see the Identity Modeling Introduction.

The Problem

Designers and architects of real-world systems lack actionable guidance on the construction of internet connected applications that contribute to the realization of an Identity Ecosystem that is compliant with the National Strategy for Trusted Identities in Cyberspace (NSTIC) principles. The guidance that does exist is not aligned with the way that real-world systems are constructed. Generally "programmers necessarily rely on approximations of system behavior (called Mental Models) during development. The intent of a mental model is to allow programmers to reason accurately about the behavior of a system." [1]. To get good implementations of systems for an identity ecosystem, it is necessary to provide helpful models of handling of identity information.

The Context

  • The users, providers and relying parties all desire to be part of an IDESG compliant identity ecosystem as described by the Baseline Functional Requirements v1.0 (BFR).
  • This identity model is designed to apply to any User Object that collects and maintains User Private Information over time at any compliant internet digital entity.
  • Identifiers serve as a name to bind a collection of attributes together for transmission between digital entities. These identifiers and the data bound to them serve as the input data for User Objects maintained by those entities.
  • The model is oriented around real-world Natural Persons (users) and their experiences on a digital display device attached to the internet dealing with digital entities which maintain information about them.
  • For this model the User Agent that interfaces between the user and the other digital entities is considered to be an independent browser. Web apps supplied by service providers are not fully explored in this version.
  • The providers of user identifiers and attributes have responsibility that identifiers are consistently authenticated over time and that the attributes provided belong to the identified party.
  • As this page is designed for the US, it addresses only private data and not public data. The EU GDPR which goes into full force on May 25, 2018 seems to regulate all user data, both public and private for EU residents and EU citizens, regardless of location. In the US contracts can be used to force employees and others to "opt-out" of privacy regulation. With the GDPR no "opt-out" is permitted.
  • This is not designed to be an authoritative model of people that work for or within an organization, an HR data model.

Identity from the Relying Party Perspective

The Relying Party provides a collection of resources or assets to users. The RP uses identifiers to collect attributes into a User Object that allows them to control access to some subset of those resources. Their primary concern is the risk of losing any economic and reputational ownership of those resources. It is very likely that an RP will have more than one identifier in each User Object with one (the key) used to identify that User Object. At the very least they have their own identifier for the user and an identifier from any IdP hosting that user's identifier and attributes. Users can authenticate to the User Object in the computer as a Principal in that system for the duration of the authentication lifetime, which should be coincident with the interchange lifetime as used by the BFR.

Identity from the Identity Providers Perspective

The various Identity Providers look at the collection of user identifiers and attributes as a resource that helps them to establish a relationship with the users. They typically also offer some service to the user like email or financial services. A prudent IdP will provide an opaque identifier to every user-RP pair as the user identifier with the IdP should be considered to be User Private Information. This model breaks the Identity functions into 3 providers: Identifier, Authorization and Attribute.

Identity from the User's Perspective

The user would not even bother with identifiers if they didn't need them on the Internet. They do serve to bind a collection of attributes for disclosure to a class of relying parties. This collection of attributes create a persona that is defined by that collection of attributes and the behavior of that user at web sites that have access to that identifier. Most user's have more than one persona on the internet. It is important to users that they maintain control of their internet identities as identity theft happens when that control is lost. But if the user's behaviors are the same from one persona to the next, the users true identity will leak out.

Identity from the Internet's Perspective

The internet has defined a set of schemes (like: http: or mailto:) that are used to create unique Uniform Resource Identifiers (URI) for every object known to the Internet. There are other naming schemes in other venues, but the only other two that have relevance to this model is that of a Global Unique IDentifier (GUID) and the X.400 Distinguished Name (DN). The GUID is randomly generated with sufficient entropy to be statistically extremely likely to never be repeated for at least a human lifetime. While these identifiers are not "known to the internet" they are still often used to create names for things like User Objects and other places where a digital entities are required to have unique identities that are not possible for attackers to guess. The DN is assigned by the ITU representing the telephone companies of the world. It is now only seen in X.509 certificates and internal directory systems because of its adoption by the IETF protocol LDAP.

Identity in the Real-World

While this model focuses exclusively on attributes and identifiers held by Digital Entities, one attribute commonly included in User Objects will be a Legal Name, which is assigned in the real-world and required in many situations where digital transactions occur. For example in health care it is very important that the same real-world individual is not permitted to see two prescribing physicians and get two subscription for drugs that are either controlled substances, or antagonists of each other. One point about legal names that is important to consider in User Objects is that they are NOT IMMUTABLE and often NOT UNIQUE. Because of these limitations, digital entities should expect that legal names will change and cannot be used a keys in databases. So, while the legal name is just an attribute, it is one that many enterprises will need to take special care to proof and protect from disclosure, just as email addresses and mobile phone numbers often go through some sort of proofing process. That is all beyond the scope of this document.

Brief History of Digital Identifiers

The first common digital identifier was the user@host identifier defined in the 1960s on the arpanet with a password secret that was required to gain access. Sometimes it seems like nothing has changed in the years since that innovation. Web sites still depend on user name and password for access control even though the systems below describe some of the alternate methods.

Kerberos from MIT's Project Athena allows single sign-on across MIT in 1983. Kerberos later surfaced in many commercial identity systems like FreeBSD, Microsoft Windows and Solaris. As a general rule Kerberos works well in closed, controlled environments where it continues to provide good service today.

The first X.400 email Recommendations were published in 1984 (Red Book). They were designed to become a standard for identifying users, but that role has been assumed by internet email addresses. Still the X.400 names survive in email servers like Microsoft Exchange and in the X.509 certificate described below. The "Distinguished Name" that appears in a variety of contexts originated in these standards. With the failure of the monopoly telephone or postal authorities to control identities and email, the focus of the computer industry switched to Public Key Infrastructure (PKI), which took the format of the X.509 certificate and built chains of trust back to root certificates which were implicitly trusted. The final blow to PKI occurred during the 1990's as Privacy Enhanced Mail (PEM) was found to have very limited privacy since it required an individual to be fully described in an X.509 certificate which was, by definition, public. The PKI continues to provide good functionality where privacy is not permitted by law or necessity.

Banyan Vines introduced directory services to many enterprise clients starting about 1985. Their early success was later eclipsed by Novel and Microsoft. One interesting aspect of these early ID servers is that they were then and continue now to be focused on users that have legal agreements (like employment contracts) with the owner of the ID server.

In 1999 Microsoft acquired a metadirectory service from ZoomIt along with Kim Cameron who later developed the 7 laws of identity which have been adapted to the list below. Microsoft later released this product as their Microsoft Identity Integration Server.

The Security Assertion Markup Language (SAML) based on XML was introduced in 2003 and was adopted by many identity systems over the years. Federation systems like ADFS were deployed to translate a user's identifier on one system to one that could be used on some other system using SAML among other protocols. This method required a deliberate process of creating legal agreements one by one between the source of the identifiers and attributes and the party that relied on the resultant identity claims. The example of Get Abstract shows how a web page can convert the user's email address into sign-on with a federation server using hints provided during the registration process.

OpenID Connect was created in 2014 on top of the authorization protocol OAUTH 2.0, which gave the user some control over which attributes in the various identity servers should be released to relying parties. Later dynamic registration was introduced which allowed the user to enter an email address to a relying party that was not previously registered to the Identity Provider. The dynamic feature has not been deployed on commercial systems, like Azure, which use identity models based on prior registration of the relying party with the identifier provider. This technique was used on the original IDESG web site to permit use of social site signins for access.

Originally the above deployments were to associate identity with some computer system (or telephone system) that hosted that identifier. The identity models of that time only supported user identifiers in that format. The rest of this paper moves beyond that model to an identity model that better supports the need for privacy in the evolving identity ecosystem.

Identity Laws

With advance apologies to Kim Cameron the 7 laws of Identity have been adapted to the needs of the current identity model. In many case the words are taken from Kim's paper. These are laws in the same sense that the Law of Gravitation is one. These laws are brief and easy to understand which should be an indication that they bear little relationship to any legislation.

Here the model considers the laws as they apply to the users in an IDESG compliant ecosystem. That means that all parties to the interchange are attempting to follow the Baseline Functional Requirements v1.0 (BFR).

  1. User Control and Consent - for this model no data is released to the RP by the IdP without explicit consent of the user. That said there are still many cases where the relying party will ask for attributes that are not necessary to their purpose in communicating with the user. It is easy to understand why an RP would ask for the email address of the user both for verification and for communications if something later requires that, including notification of some attack against the user's identity.
  2. Minimal Disclosure for a Constrained Use - ask for the least amount of identifying information for a stable solution. Compliant web sites must follow this requirement, but other web sites will ask for whatever they can get away with.
  3. Justifiable Parties - The identity system must make its user aware of the party or parties with whom she is interacting while sharing information. The justification requirements apply both to the user who is disclosing information and the relying party who depends on it.
  4. Directed Identity - For the IDESG this means that a relying party should use as many sources of identifiers as is consistent with their objectives. The goal must be user convenience.
  5. Pluralism of Operators and Technologies - Much of the focus of the work of this committee is the OpenID Connect standards bodies because they can be made fully compliant with IDESG principles. It is hoped that other technologies will enable the IDESG principles as well.
  6. Human Integration - These are covered in the usability requirements of the IDESG BFR.
  7. Consistent Experience Across Contexts - The IDESG Best Practices and Example for RP System has been created to help RPs follow the IDESG BFR in a consistent manner.

Solution: A New Identity Model

For this new model, each party to the identity transaction has their own view of the identity. The unique feature that was introduced by OpenID Connect is that the OpenID Provider (OP) provides no information to the relying party other than an identifier of the user that should be unique to that relying party, thus blocking any linkage to use of the identifier with another relying party. This is not meant to imply that the relying party cannot get the user attributes, but only that the user must consent to any attribute release and still be authenticated by the OP.

While the views are related to OpenID connect and other standards in the sections that follow, that is merely as a means to relate the requirements to an existing technology and should not preclude other technologies that provide the functionality. That said, it is the capability to distinguish between authentication and attribute provisioning in OpenID Connect that makes the old models inadequate.

Identity Taxonomy

For the purposes of this document, these terms are used in the following way. This taxonomy is forward looking to the day when reliable identity ecosystems have been deployed. See the section below on word usage.

  1. In a digital world all objects are assigned digital identifiers which are either URIs, DNs or GUIDS.
    1. Digital identifiers are (statistically) unique and should be immutable.
  2. A Digital Entity is a named service operating on the internet. They include providers, relying parties and user agents.
  3. User Objects are collections of digital identifiers, attributes and behaviors in a digital entity and have digital identifiers of their own.
    1. The User Object corresponds to some human user, legal user, or pseudonym that has control of the User Private Information in that object.
    2. The internal format of a User Object within a digital entity is opaque to any other digital entity.
  4. A real-world name is just an attribute in the digital world. It is not an unique digital identifier. It is not immutable.
  5. An assertion is a real-world stipulation by a real-world subject. For example user acceptance of terms-of-use is an assertion.
  6. The subject is a real-world object which might be a natural or legal person.
    1. Subject is an OpenID name for the identifier used between the IdP (which they call an OP) and the RP (which they call a client).
    2. For the purposes of User Experience documents the user is always a natural person.
  7. Linkages bind one digital identifier to another, to a real-world name or to an assertion.
  8. Authentication provides a real-time, short-term linkage between a user and a User Object.
    1. No User Private Information is provided to the RP until after a successful authentication and user consent process has completed.
    2. A suitably authenticated user must have the ability to control the User Private Information in a User Object in any compliant entity.
    3. The control of user public information, such as behaviors, in a User Object is evolving and not addressed in the model.
  9. User Consent is required for an IdP to create an authorization to release attributes from one User Object to another.
    1. It is expected that user consent messages will be standardized and identifiable.
  10. Users are represented on the internet by a User Agent. In this context a user agent is expected to fairly represent the user's intent.
    1. The user agent string in the HTTP header is an insecure way for the user agent to present an identifier to a digital entity.
  11. Claims are a digital identifier and a collection of attributes and assertions.
    1. Claims and displays of User Private Information must be understandable and have the same meaning to all parties to a transaction.
  12. Claims, authentications, user consent and user agents can be validated and have a digital metric of assurance applied to them.
    1. Digital Certificates are but one example of a validated claim.

User's View

The best practice example web page shows the types of IdP that a user should be able to pick at a relying party. The user is trying to find the least intrusive method to gain access to the relying party. To get access the user will typically need to provide some attributes that are able to assure the RP that the access is authorized. To prevent spoofing, the required attributes are tied to some identifier which is, in some sense, tied to something in the real-world. For the purposes of this user-experience model, only human beings are considered although other named identities, including security connection identifiers may be appropriate in other contexts.

Traditionally users have acquired a username and password from the web sites where they want to establish a continuing identity. The web sites liked to have control of their users and the user could pick a username that was unique for each web site to prevent linking between web sites. In practice the users always tried to pick a name that they could remember without having to look it up in some list. Name collision at sites prevented anyone with a common name from using the same name on every site. For a better user experience and for access to separate communications channel, web sites began to request the user's email address, which is loosely guaranteed to be unique by the rules of the internet. At this point these web sites became relying parties on email providers which became identifier providers somewhat be default.

There is an internet standard that requires the user select, or be assigned, an unique identifier called a Uniform Resource Identifier or URI. The email address is now the most common URI for individuals. It is already true that most people that can operate computers have already acquired several of these identifiers which are used in specific contexts, including internet purchases and work-related web accesses.

Certificate Service Provider (CSP) View

There are two types of CSP with differing views of the role of the user in determining their own list of attributes (aka identity claim) to be contained within the certificate:

  1. The user has little control over User Private Information as described in NIST SP 800 63 as a Credential Service Provider where the governmental or corporate enterprise provides a source of credentials, such as a private key embedded in a smart card, and also determines the certificate profile, that is, determines how much User Private Information is included in the certificate that is bound to the credential. This is also the view that applies to credentials that are bound to non-human users.
  2. The user can, in theory, generate their own private keys to serve as a source of credentials and create their own certificate profile with the use of virtual smart cards where the user generates the private key themselves and requests a credential from a Certificate Service Provider like Verisign or Global Trust. Attempts to deploy individual user certificates outside of an enterprise environment have not been successful.

Originally conceived by the monopoly telephone companies as a extention of the monetization their telephone directories, X.509 certificates were defined to bind User Private Information to a authenticator, like a private key on a smart card. The view of the CSP in this case depends on the type of use to which the certificate is applied.

  • In the case of users that need to prove their authority to act, a formal certificate of the X.509 type can be critical. The most critical functions of society: first responders, physicians, professional engineers, child care providers and others given authority, the certificate can be the critical document that provides the proof needed for that critical individual to act on behalf of the individual, or on behalf of legally constituted organizations in society. Typically the registration authority associated with the use of the credential will determine the certificate profile.
  • In the case of organizations which host web sites, the X.509 certificate associated with Transport Layer Security (TLS aka SSL) provides an excellent means to assure the user that the communications between the user and any provider remains private and corresponds to the identity of the web site. The IDESG best practice recommendation is that the web site acquire an Extended Validation Certificate which provide the equivalent of in-person proofing of the organization identity. When the EV cert is used the best practice of user experience is that a strong indication is provided to the user of the strong level of assurance provided.
  • In the case of devices several certifications are desirable. The end goal is to be sure that the device accurately reflects the intentions of the user or the owner of the device. Since many user devices are already infected there are two type of assurance that are available from devices in constrained environments. The first is a certificate of device health which can be obtained from a variety of services now available on the internet. The second is a distinct piece of hardware on the device that can be proven secure. The device could hold private keys for signing or decryption of protected content. These hardware devices can also bind keys to health so that protected content is only displayed if the device has been analyzed and proven to be secure. In these cases the CSP is typically called a Device Attestation Provider. One downside for users of devices that have strong identities, which includes the IP address of the device, is that unknown actors acquire the ability to associate the device to a user and to track the user's behavior on the internet.
  • In the case of consumers, the privacy and usability downside of certificates typically excludes them from consideration.

Identifier Provider View

The original source of internet identifiers was the email provider. Many of the larger email providers have realized the monetary benefits to be gained by collecting User Private Information and have leveraged that by offering to authenticate users for other web sites. They correctly described the user experience and security benefits for users created by the reduction in the number of usernames and passwords that the user needed to remember. These sites became Identity providers in the same sense that CSPs were Identity Providers. This raised many privacy concerns as the user was not often even asked if they wanted information released by the Identity Providers. This section is called Identifier Provider rather than Identity Provider to emphasize that the IdP provides an authentication service which does not include User Private Information. A later authorization step can provide any User Private Information which is permitted by the user. This is somewhat different from the legacy identity model where the user's email address was the identifier for the user both on the IdP and on the Relying Party.

NIST documentation allocates some of the Identifier Provider functionality to the Registration Authority or RA. Others have called this function a namespace provider.

In the OpenID Connect protocol, the IdP sends a packet to the RP with an identifier of the user that is not their email address. While not specifically required by the specification, a privacy conscious IdP would use a different id for the user at every single RP. In OpenID terms an identifier is acquired by Authentication protocol which is strictly constrained to occur between the OpenID provider (OP or IdP) and the user.

Secure Token Service View

Nearly all identity federation services operate a secure token service (STS) to provide access tokens to authorize the relying party with an access token for the User Private Information after consent has been obtained from the user.

In existing federation servers, like ADFS, the secure token service can register some sites to receive user tokens with a prior legal agreement for a captive audience that has already given their consent, although they may not realize that is the case.

In OpenID terms, the Secure Token Service hosts the Token Endpoint (URL). It is typically hosted by the Identifier Provider which acquires user consent in real time.

Attribute Provider View

An attribute Provider needs to get authorization from the user before it can release any user attribute to the relying party. Typically the IdP has User Private Information that can be provided as attributes once released by the user. But other attribute providers can supply User Private Information such as the certificates provided to first responders that permit them to perform medical procedures which would otherwise require a physician and user informed consent to administer.

In OpenID terms the Attribute Provider contains the User Info Endpoint (URL) which provides claims (attributes) to the Client (Relying Party) as authorized by the token.

Relying Party View

The Relying Party (RP) is any web site that uses identifier and attribute information from another internet site. At the beginning of the web every site wanted to "own" the customer relationship and tried to get users to register a user name and password with them so that they would have a continuing relationship with that user that was not shared with any other site. This eventually created huge user experience and security problems for users who demanded something better. Around 2000 Microsoft Passport tried to market identity as a service. Their costs and conditions were too large for wide adoption. More recently the social networking sites like Google and Facebook realized it was to their advantage to provide identity service to others making them the one site that most users always visited. With the publication of standards from the Open ID foundation many more identifier providers have entered the market and web sites have moved to supporting multiple identifier providers. This identity model only describes web based relying parties, but similar functionality could be provided by relying party apps running on the user device. In OpenID terms the RP is a confidential Web application, where confidential means that the RP is capable of keeping secrets secure.

The page at Best Practices and Example for RP System describes an identity model within a relying party that corresponds to Customer Relationship Management. In this example a User record is created with a GUID generated to create statistical uniqueness for the User Object that it names. Control of the contents of this User Object is given to the user that first created it. The wiki page User Private Information describes a model for information provided by the user and the Best Practices and Example for RP System shows how the various identifiers for the real world user that controls the User Object at the relying party can be classified. After the User Object is created, the original sign-in identifier can be augmented with additional sign-in identifiers as the user might desire. For example the user could create a User Object on the relying party using only a username and password that applied only to that relying party. Then the user could add other User Private Information including email addresses and cell phone number. Alternatively the user could add additional sign-in accounts based on email addresses or phone numbers. Since the user that can modify the User Object record at the relying party must be able to be authenticated to the relying party, at least one sign-in record must be created. However, it is possible for a user to create a record with a wholly fictitious username, and then add other federated sign-in from social sites like Google or Facebook which were never identified to the relying party. In that case there would be no linkage from the User Object in the relying party to the real world user, even though a real world user was in control of the User Object. This happy outcome is a result of the distinction made above between identity and identifier. In this case the relying party has a User Object with a variety of identifiers. The user has complete control over whether the relying party can link back to the real world user. In the case where a real world identity was required for legal reasons, a legal name can be added to the User Object with very stringent access controls on who is able to see that particular piece of User Private Information.

In OpenID terms, the relying party is a "client" of the user that has been registered with (and trusted by) the OpenID Provider (OP aka IdP). That quaint term goes back to the OAUTH days when a service could request access to a resource because they were performing some service for the user. In the usage here the relying party is a client of the user in the same sense that the wolf was a client of the grandmother when Little Red Riding Hood came calling. Although we can say that even though the wolf adopted the grandmother's clothing attributes, it failed to acquire a valid id token.

Architecture to implement this Model

The following diagram shows one way to implement this Identity Model. It is based on one version of the OpenID Connect protocol that is implemented in the Best Practices and Example for RP System . This model shows that the authentication process is completely separated from the provisioning of attributes. The Authorization Provider that is shown in the diagram is required for authorization code flow as it ensures that no access token to the User Private Information is ever accessible on the open internet or on the user device. This token is never transported except in a secure channel between providers. The Authorization Provider is also known as an Authorization server or a Secure Token Server - STS). The complete separation between the user known to the IdP and the user signin at the RP even extends to the user identifier (email address or phone number) used by the identifier provider. The identifier provided to the RP by the IdP is specified here to be an identifier that is unique to that particular pair of parties and is never used elsewhere. This authenticated identifier is used to get user consent for an authorization code from the IdP, which is used in turn to get an ID token to present for access to user information at the attribute provider.


Takeaways and Future Work

This model is a departure from existing models where Identity Providers can send any sort of attributes to federated providers. In this model a relying party that wishes to allow the user to authenticate with identifiers from a federation server will get no User Private Information, not even an email address, from any provider until that provider has received explicit consent from the user for release of that attribute. In existing identity models, the user identity is strongly bound to the user email address as the canonical identifier. In this model the identifier of the User Object is known only to the provider of that identifier and is never shared on the internet, except with the user, who is free to share it at will.

The User as Subject

As a general rule, the user is a species of carbon-based life identified as a human being. It would seem that all human beings should have an identity, but this is not the case today. Some human beings do not have legal status and some organizations do have legal status. Many web sites are very interested in whether the user is a real human being, and others are interested in the legal status of the user. This model is focused on the user as a human being with a set of attributes and behaviors, an identity. The United Nations have agreed that all human beings deserve to have an identity. Most nation's laws assert that only human beings can make legal commitments. That said, no human beings have yet been directly attached to the internet and so we are dependent on user agents to capture the intent of the user and pass that intent onto the relying party. The relying party has two legitimate concerns with the level of assurance for the expressed intent: 1) how strong is the binding between the human being and the message received, and 2) what sort of attestation has been made about the reliability of the device and software user agent in reliably reporting that intent. Both of these issues are addressed in other venues and are not further addressed in this model. Another issue not addressed here is whether the user or user agent is legitimately committing some other legal entity by the transaction. It must be noted that the right to have a digital identifier has been shown empirically to be at odds with relying parties' desire for strong assurance. In every case possible relying parties should not ask for more user attributes or assurance than is required by law.

The User Object

The physical user should never be an object in a digital entity. Only the data about that user, know as the User Object can be treated by a digital entity. Once an transaction takes real-world form, such as a shipment, then the user can be treated as an object in the real-world. One instantiation of the User Object in a relying party is described in the Best Practices and Example for RP System. That example includes the User Object as a collection of tables in a SQL data schema. Note that the User Object consists of User Private Information plus linkages and status information that are internal to the relying party. Other User Objects will contain behaviors, such as search terms, legal actions and commercial transactions that are not shown there. It is important the that user can understand what User Private Information is held by any digital entity and what redress the user has to change that information if user desires. The collection of behaviors should be forgotten at some point which is a current topic of debate in various legal jurisdictions.

While there will also be a User Object in the IdP and other providers, it is likely to be less complex than the one in an RP, which is the reason that the RP User Object is given the bulk of the attention in this document. The user will also have more than one persona (identifier with attributes and behaviors). While several of those personas can be included in User Objects, they are not addressed in the document.

Sources of Identity Information (Identifiers and Attributes)

This is an early attempt at a section that describes sources of claims about users, meaning identifiers with attributes.

By this model all User Private Information is supplied by attribute providers, some of which are attached to authentication providers.

After a user is authenticated, that user can authorize a token for the relying party to acquire attribute information. In the current model only User Private Information, and not user behaviors or public information, is addressed.

  1. Social Media
  2. Related Entities (where one commercial site authenticates for others)
  3. Legal records (birth, marriage, death, incarceration, partnerships, corporate charters)
  4. Commercial Certificate providers (Verisign, etc.)
  5. Government IDs (including agents like AAMVA)
  6. Regulated Entities (Financial, Health, etc.)
  7. Employers (schools, clubs, etc.)
  8. Credentials of Authority - Both permanent (like that issued to Professionals) and situational (like that issued to First Responders)
  9. Credit (financial as well as reputational)
  10. Local sign ins - meaning those that cannot be used for federated signin - should they be deprecated as they cannot lead to acquisition of more attributes?

Open Issues

  1. The basis for the Trust of the Relying Party by the Identifier or Attribute Providers (e.g. Identifier, User Attributes and Behaviours) has not been examined yet and deserves better definition. One possibility is membership in the same trust framework. Another basis for trust might be the use of EV certs by the relying party and by the Token Provider..
  2. There is an assumption in the model that the user has control of their responses to all internet hosted entities and that they are using a protocol at least as robust as the "authorization code" flow in OpenID Connect. Any vulnerabilities introduced by have one of those entities hosting an application on the user device has not been analyzed.
  3. The inclusion of intermediaries, like Privacy Enhancing Technologies, has not been examined yet and also deserves some consideration.
  4. This paper describes the use case where the Identifier Provider acquires user consent to supply some subset of their attributes. There are other use cases where the Attribute Provider needs to be involved in that path for liability reasons. Another use case paper would be needed for that case. At that point the trust between the various components of the OpenID Connect Provider needs to be clarified.
  5. User consent is implied but currently there is no protocol for providing proof that the user did supply consent. The Kantara Initiative is working on a consent receipt that provides to the user the relying party's understanding of consent. That could be captured and reviewed by the user, but it is more likely that the user would just want to make changes to the relying party's set of consent possibilities. InCommon has done some work in the education area and FHIR has a JSON object that they are developing for the health ecosystem. Further work is required there.
  6. The OpenID foundation is working on a protocol for two IdPs to exchanges information. For example one bank may provide another with the "Know Your Customer" information for a new account to be opened, or one phone company may be required to give another the cell phone number of the user when that user is changing phone service providers. A use case for IDESG may be required to ensure that case is well covered.
  7. Should digital entities be required to show proof of the user's intent to hold each and every user attribute contained in their User Object.
  8. In other documents the tendency of Relying Parties to request more information than required is addressed. Perhaps that needs to be included here as well?
  9. Do we need special consideration of users acting as agents for other users or organizations?
  10. There are valid reasons for a relying party to know the user's email and phone addresses (aka identifiers) under some circumstances related to the user's secure ownership of the User Private Information in the relying parties User Object. It is not clear if those circumstances and the user's access to the information held at the relying party need to be addressed here.
  11. Do we need to address the collection of user behaviors and the right to be forgotten in this model?
  12. Should each attribute in a User Object contain a date at which it expires and needs to be refreshed or deleted?
  13. This model does not cover people within an organization, but it could apply to the federation of those identities outward from the organization. No use case describing such a federation has yet been created.
  14. Dual use devices like cell phones can hold user private information as well as enterprise data. Do we want to address this?
  15. How does a user with challenges or reduced capability express their vulnerability to a digital entity?

References and Coordination

Word usage

The page at Digital Identity defines Identity to be a collection of identity attributes. It should be understood that the concept of Knowledge Based Authentication (KBA) can turn any User Private Information into an identity attribute. One favorite of financial institutions is to list a series of cities and ask the user to pick the one where they lived prior to their current address.

The terms used often depend on the subset of the English language that is used. In normal English usage the user would be the thing doing the action, or the subject. The User Object is the thing being acted upon. In the language of the DBA an entity is just a named object, any subjects (actors) are not part of the DB schema. In the database EAR model there are defined to be entities, attributes and relationships. This Model uses the term linkages rather than relationships and adds the concept of behaviors as separate from attributes. In the IDESG a digital entity is a named object that deals with identifiers. Authorization happens at many locations among the digital entities described here. The only place it is used in this model is for the "authorization code" and the processes used to acquire and enforce the user's intent. In most places the usages will not conflict, but caution is urged when these terms are used elsewhere to be precise about meaning.

Other organizations, like NIST, refer to Identifier Providers as Registration Authorities, which term also has other formal properties.

Attribute is a generic term used to refer to any User Private Information. Claims are data structures that include one or more attributes and an identifier.

Authentication is often used to include the acquisition of the validated claims as well as establishing the link between the user and the User Object. In this model attribute acquisition is not included as a part of authentication.

The term Open ID Provider (OP) is used by the Open ID Foundation and others as a label for the various providers shown in the architecture above (Identifier, Authorization and Attribute.)

In part since the IDEF registry is working on aligning the IDEF with Kantara, and in part because the various parties to an identity transaction, the taxonomy here has been compared to the Kantara glossary.

  • Assertion = statement from a verifier to a relying party that contains identity or other information about a subscriber. This is called a claim in IEDF and OpenID documentation. In this document any real-world entity can make an assertion, which can have a digital identifier of its own (which might be nothing more than a cryptographic hash).
  • Attribute = property associated with an individual. In this document it is not always clear that an identifier or the associated attributes are associated to an individual human being.
  • Identity is associated with an single person, but it is not clear if that means a natural person, i.e. a human being. This document uses identity only in an abstract sense. It has no formal meaning.
  • Token = Something that a claimant possesses and controls (typically a key or password) that is used to authenticate the claimant’s identity. This is a common usage of the term so readers should be aware that this model uses token as a signed bucket of bits that authorizes some action.

The current UNCITRAL documents offers some definitions which are basically in alignment with the terms used here. The ones relating to attributes are most informative:

  • 11. “Attribute” means an item of information or data associated with a subject. Examples of attributes include information such as name, address, age, gender, title, salary, net worth, driver’s license number, social security number, e-mail address, mobile number, and data such as the subject’s network presence, the device used by the subject, the subject’s usual home location as known by a network, etc. (for a human being); corporate name, principal office address, registration name, jurisdiction of registration, etc. (for a legal entity); make and model, serial number, location, capacity, device type, etc. (for a device). Synonym: identity attribute.
  • 12. “Attribute provider” means a business or government entity that acts as a source of one or more attributes of a subject’s identity. An attribute provider is often the entity responsible for assigning, collecting, or maintaining such attributes. Examples of attribute providers include a government agency that maintains a birth registry or title registry, a national credit bureau, a business that maintains a commercial marketing database or a corporate registry, and entities such as mobile operators, banks, utilities and healthcare providers that hold verified user data and that either verify or provide these attributes to third parties (possibly, subject to user consent)

Privacy Concerns

Privacy requires that digital entities do not ask for information that is not required, but some information is only required to give the user a fallback method for the user to access the User Object. In the case that no fallback access method were enabled, the user might not be able make changes to the User Object if they lost their account at the Identifier Provider.

OpenID Connect

The OpenID Connect protocol allows for most of requirements of the IDESG, but it allows non-conformant deployments as well. A simplified description of the protocol can be read at this site. Note that in the Open ID document the RP is called a client and the IdP is called an Open ID provider (OP). It is a long-term goal for the IDESG to create an OpenID Connect profile that shows which options to use. In the meantime the Compliant Implementation of RP on ASP.NET provides a working model of a compliant OpenID Connect implementation. The following items might be included in a IDESG profile:

  1. The RP and not the user has the ability to determine how the unprotected flow of access tokens are handled. IDESG could require the use of the "Authorization Code Flow", as shown above, which places all access tokens in protected flows between the RP and the IdP. That presumes that the RP is able to protect secrets like the User Private Information from disclosure.
  2. The RP determines which User Private Information it wishes to receive and leaves it up to the IdP to determine whether the user has any choice as to which information to disclose. Some IdPs are very diligent and allow the user to select based on the claims profiles provided by the RP. It is more common for IdPs to offer the list of requested claims profiles to the user on a "take-it-or-leave-it" basis. the IDESG could require IdPs to give the user more granular control.
  3. The Subject Identifier (sub in OpenID) is provided from the IdP to the RP during authentication. It is not a reuirement that the IdP avoid reuse the same "sub" between two RP's to prevent linking between User Objects in those two RPs unless "pairwise" is specified. Any IDESG profile should require pairwise subject IDs as it is not in the interest of any digital entity to enable it.

Features that are not available in OpenID Connect that would improve the user experience include:

  1. The ability to mark scope value by whether they were mandatory so that the consent form could let the user know which were optional. Please see this paper for more details on this issue. This might be solved by eliminating all values from the scope other than "openid". The use of the claims parameter might solve this problem as it does have an appropriate modifier.
  2. There is not a complete protocol for passing information between the various identity providers in the OpenID Connect specifications. The Token Introspection RFC does provide data exchange, but does not describe the endpoint discovery mechanism. That would be needed at the point where the use case (or implementation guide) for the separation of the Attribute Provider from the other OpenID Providers (OP) was described.

General References

One of the NIST NSTIC Pilots recipient was The American Association of Motor Vehicle Administrators (AAMVA) and included the Commonwealth of Virginia DMV - Department of Motor Vehicles. Like most DMVs the one in Virginia was blocked from offering identity or attribute services except to other governmental organizations. on March 24, 2017 Virginia passed a regulation which expands the areas where the DMV id can be used. Virginia has a My DMV sign in ID that can be used for the widest variety of governmental service of any state in the union. This regulation mandates a $10 per year fee for the electronic credential, which could hinder the continued use of this by many of the citizens of Virginia who could most benefit from the capability, not because of the cost, but because of the inconvenience. A standard form is already available to request access to DMV data. Virginia also offers ID cards that are the equal to the driver's license for access to governmental spaces and a veteran's card, which is not.

  1. The page at Compliant Implementation of RP on ASP.NET gives a more technical description of the process for building a compliant Relying Party web site from scratch.
  2. A good source of material on the EU GDPR is available at this site.
    1. Devon H. O'Dell "The Debugging Mind-Set", CACM, 06 Vol 60 (June 2017)