Resilience

From MgmtWiki
Revision as of 14:24, 30 September 2020 by Tom (talk | contribs) (Design and Test)

Jump to: navigation, search

Full Title or Meme

Resilience of any complex ecosystem is the capacity of an ecosystem to respond to a perturbation or disturbance by resisting damage and recovering quickly.

Goal

  • In Identifier and Access Management systems resilience must be framed as the ability for any user to access their resources safely and with minimal complexity at a reliability level that is higher than some specified minimum.
  • Safely refers to access of records without exposing them to unauthorized users or corruption of data.
  • Minimum complexity is entirely determined by the user, but may be different from one user to another.
    • A healthcare patient's complexity must be handled by a minimum fraction of the population of the population at some level of educational ability, say completion of 8th grade.
    • A healthcare physician's complexity may also considered the cost of false positive in determining the level of complexity.
  • The Reliability will be calculated as the probability of getting access at any time and must be in the range of 99.9% to 99.999% which may be dependent on the criticality of failure to authenticate in time to preserve life and property.

Context

About 7 years ago, the White House introduced The National Strategy for Trusted Identities in Cyberspace (NSTIC), an initiative collaboratively bringing together the private sector, advocacy groups, public sector agencies and other organizations to improve the privacy, security and convenience of online transactions. The Identity Ecosystem envisioned in the NSTIC is an online environment where individuals and organizations are able to trust each other because they follow agreed-upon standards to obtain and authenticate their digital identities – and the digital identities of devices.

To achieve this objective, the NSTIC established guiding principles for the creation of an Identity Ecosystem, developed with identity solutions that are:

  1. Privacy-enhancing and voluntary,
  2. Secure and Resilient,
  3. Interoperable and
  4. Cost-effective and easy to use.

Problems

  • It seems to be a feature of any component of a Living System, (which includes all of societies imposed structures) that the most successful systems migrate towards solutions which make for the most efficient use of the resources at their disposal. For the system as a whole to be Resilient, the inevitable failure of any subsystem that is highly leverage, but not imperil the whole system, or it will not survive change.
  • The size of change most likely follow a power law, or the small changes are more frequent than the larger changes. If a system is resilient only to small changes, the the large changes will imperil the system.[1]
  • An example of a big changes brought about by the COVID-19 virus in 2020 was caused by United States Capitalists move to off-shoring manufacturers that involved significant amounts of manual labor as well as the just-in-time logistics theory which meant that any inventory was just unused capital. One example was the manufacture of the face masks that were critical to the health of the working combating the virus. In the mean-time the Trump White House had eliminated the disease experts in the National Security Office. The result was "A very American story about capitalism consuming our resiliency.[2] Both of these efficiencies made the country susceptible to the shortage of many clinical components, as no planning or control over the recovery of that capability. Note that the was a strategic inventory of medical supplies, link masks, but that it was depleted in the H1N1 virus emergency in 2009 and was never replenished.
  • During the reign of Jack Welch at General Electric the company prospered wildly as a result of applying vulture capitalism principles at ever level of the company. Welch retired a hero. The subsequent near-total collapse of the company seems to not have been his fault, but any student of planning and control knows that optimizing for only the short term effects will eventually lead to a situation that was not planned for and cannot be controlled.
  • In identifier and access management problems can be introduced by attacks which cause loss of access, so both the likelihood of loss of access and the time to recover access must be determined to the extent possible.

Solutions

  • In the end, each system must determine the level of efficiency and resilience that it desires. Too much caution will miss out many small changes that occur every day. Too much recklessness will result the the inevitable failure in the long term.

Gartner IAM Six Principles of Resilience

  1. RISK CULTURE - Stop focusing on checkbox compliance, and shift to risk-based decision making.
  2. OUTCOME FOCUS - Stop solely protecting infrastructure, and begin supporting business outcomes.
  3. BETTER FACILITATE- From defender to facilitator balance protecting with delivering business outcomes.
  4. MAKE WORKFLOW - From trying to control information flow to understanding how it flows and risks.
  5. PEOPLE-CENTRIC - Accept the Limits of Technology and Become People-Centric.
  6. DETECT RESPOND - Stop striving for 100% protection, and invest in detection and response.

Gartner’s researchers predict by 2017 50% of IT spending will occur outside of traditional IT department control. They note we are at an intersection of two extraordinary digital trends. These include the ongoing transformation of digital business and the ever-growing capacity and capability of adversaries..


Design and Test

  • Most design looks only at the common use cases.
  • Design for Resilience requires use cases that are at the edge of performance and attack from malicious or untrained users.
  • Test for Resilience requires overloading the system both from a load and from an attack perspective.
  • Resilience cannot depend on others working as expected. Maersk's network was devastated when all of the back-ups to their DNS was destroyed by a single malignant virus. They were only saved by the change that one of the DNS servers was off-line due to a power failure.

See page on Intelligent Design

References

  1. Nassim Nicholas Taleb, The Black Swan - The Impact of the Highly Improbable (2007) Random House ISBN 978-1-4000-6351-2
  2. Farhad Manjoo, How the World's Richest Country ran out of a 75-Cent Face Mask. (2020-03-26) The New York Times p A22

Other material

  • Wikipedia has a great entry on Ecological resilience which explains many of the interactions to be aware about.