Difference between revisions of "Cookies"

From MgmtWiki
Jump to: navigation, search
(References)
(Other Material)
Line 77: Line 77:
  
  
Netscape were terrible at writing specs, but seems to be consistently supported by browsers:
+
Netscape was terrible at writing specs, but seems to be consistently supported by browsers:
  
 
either the NAME or the VALUE may be empty strings
 
either the NAME or the VALUE may be empty strings

Revision as of 10:56, 30 October 2019

Full Title and Meme

Cookies are chunks of data that are placed in a User Agent (typically a browser) that allow a web site to maintain context of user experience, more commonalty called state data. See the page User Object to see the relationship between the various instantiations of User Information during even a single transaction with a user Agent.

The problem is the capability to track the user that cookies give to the web site, or a widget hosted on the web site. This capability has been targeted by privacy regulations that are not effective.

Context

Cookies have been targeted as evil primarily due to tracking of the user (the second party) by the web site (the first party) and by other widgets hosted by the web page (the third party).

But as we will see below, third party cookies have other security vulnerabilities that are potentially more severe.

Third Party Cookies

Third party cookie is the current term of art for which RFC 2109 used the more descriptive term unverifiable; to quote from the RFC:

  A transaction is verifiable if the user has the option to review the request-URI prior to its use in the transaction.
  A transaction is unverifiable if the user does not have that option.
  Unverifiable transactions typically arise when a user agent automatically requests
  inlined or embedded entities or when it resolves redirection (3xx) responses from an
  origin server.  Typically the origin transaction, the transaction
  that the user initiates, is verifiable, and that transaction may
  directly or indirectly induce the user agent to make unverifiable transactions.

History

Starting from the entry on HTTP Cookie in Wikipedia we find that Lou Montulli of Netscape ported cookies from Unix to the Mosaic browser to enable an e-commerce application that was requested by Vint Cert, inter alia in 1994. The point was to save state on the client computer rather in the browser. While this was not the only solution to create session state between the user (as a client) and the web site (as a server), it proved to be flexible and the most reliable. David Kristal at Bell Labs started the standardization process in April 1995 [1], the same time Netscape applied for a patent. The IETF issued RFC 2109 "HTTP State Management Mechanism" in February 1997. By then advertising companies were already using third-party cookies. The recommendation about third-party cookies of RFC 2109 was not followed by Netscape and Internet Explorer. RFC 2109 was superseded by RFC 2965 in October 2000 and it remains as a listed standard in May 2018.

RFC 2965 added a new definition of cookie that was deprecated in RFC 6265 in April 2011 which was written as a definitive specification for cookies as used in the real world. [2] This RFC is also a standards track document, but it is not listed in IEFT STD 1 (May 2018) which leaves the question as to what a standard cookie is right now. But that is merely a technical issue as the world of cookies seems to have been standardized by fiat.

Problems

Security

The first of the Laws of Security says that if an attacker can run code on your computer, it is not just your computer any longer. Ever since JavaScript became necessary to render most web pages, the page that you see on your browser is almost certainly running code that is not under your control. If it's not your computer, an attacker can do anything that the code allows it to do, including just hijacking the computer power for its own purposes. In particular it can save cookies and communicate with any site on the web that is not blocked by a firewall. that is why the term "Unverified Transactions" is better than "Third Party Cookie". It gives the real threat posed by these chunks of code. So the origin server (as defined in RFC 2109) is the one with the URL in the browser header and the URL (source) for the "unverified transaction" is not easily discovered by even an expert user.

It is not clear that blocking third party cookies will solve the privacy problem if the third party ad is still allowed to run code on your computer. Consider a recent report CSS Is So Overpowered It Can Deanonymize Facebook Users, which makes the point that running untrusted code in the same process as a trusted app is not a good security plan no matter what the threat might be.

A large amount of effort has been expended to block vulnerabilities, like cross-site scripting attacks, that could have been avoided by blocking third party cookies. The monetary benefit of advertising, and user unwillingness to directly pay for the content that they consume has resulted in the current payment structure for the internet. It is not likely that attempts to block third party cookies will make any headway until some other payment mechanism is created. It was true when the browser wars we fought that users paid nothing for the browser while the web servers was expensive and quite profitable. It does not seem that anything has changed with that imbalance in the intervening years. If anything the market value of the user device software has continued to erode while the advertising revenue continues to climb.

Privacy

Recall that the original Warren and Brandeis definition of the right to privacy was the right to be let alone. In the web that would mean the right not to be tracked.

One of the things that web sites are permitted to do by the HTTP protoocol is store cookies within your browser that are normally returned to the origin web site whenever the user sends any request or posts any data to that site. The RFC 2109 recommends that users be aware of the data stored on the user's computer and be given the power to accept or reject that action. Technically this is true of modern browsers, but as a practical matter normal users are not expected to know how to find the controls for the cookie, and certainly not the cookie itself, which is stored in hidden folders and is typically encrypted and at least encoded in non legible format. In the case of first party cookies it can be argued that the user consented to the user experience of the origin site which includes cookies. In the case of advertisement space sold to the highest bidder, neither the user nor the web site has any knowledge of the identity of the code that is running in the iFrame hosting that advertisement. Ad blockers that impact all cookies will likely have adverse impact on the user experience. There is no such justification for code running on the user's computer from some unknown site, especially since the security boundary between an iFrame and the main web page is known to be porous. Blocking cross-site scripting attacks is an art that has been mastered by few web masters. That does not impact the money that web sites can get from hosting advertisements that are sold to the highest bidder, it is just too profitable to host them.

Solutions

Even though this page is written about cookies, it should be clear that cookies function as state variables which is their raison d'etre. Given the RESTful nature of most web sites, maintaining state will continue as their major function. Like any object associated with a user, it will have some sort of identifier that can be used to correlate user behaviors. The solutions range from the original opaque blob whose very existence is unknown to the user, to a highly structured message where the user is expected to understand and act on the information. Before trying to select an appropriate solution, the user needs and expectations should be determined. No evidence has been found that the users' reaction to any of this has been researched. Any information of that sort would be valuable if it can be found or generated.

  • The original RFC 2109 proposed that users be given control of what cookies are stored on their computer. Even though this was never tried by a browser in wide deployment, it is the projected and actual volume of cookies and their opacity to the browser makes that proposal unlikely of success no matter how much work is thrown at it.
  • Vendor Relationship Management (VRM) was proposed in the book The Intention Economy[3] as a way to let the customer take charge. VRM is a worked out solution as proposed in RFC 2109. Neither of these resonant with customers. Customers just want to complete the transactions, not to make multiple trust decisions first. Not that trust is unimportant. As Amazon has shown the customer wants to make trust decisions only very seldom, and then stick with that decision until they are disappointed.
  • Consent Receipts are headed in the same direction of giving the customer more control. Like the solutions before this, it will cause Cognitive Overload on the customer at a point where their attention is focused on completing the transaction. While the Consent Receipt is a transaction record, rather than a state record, they will need to be aggregated to report state if they are to have any value to the user. In fact it has been proposed the the consumer should be able to ask for current state in the form of a Consent Receipt. It is hard to imagine how there will be any different response from the user than has been shown in the past. All three of these solutions assume that the problem is that the customer does not have the data they need when the customers have not asked for more data. They also suffer from the diversity of consumer devices which is probably solved with a trust framework or other third party as discussed below.
  • Block Chain will solve all problems, provided you have the energy of a small sun available to power it. There are some efforts to reduce the power consumption. If those work, it looks like the solution will just be some other trust framework, but with a different name. For example the proposal of Hardjono et al.[4] goes to a resilient system like the internet, which is, of necessity, single rooted even though nearly all components can work independantly for a time, just as the internet itself functions today.
  • Providing the customer with a trust framework has worked for Amazon. It had started to work for Yahoo and EBay before, but they have been slow to adapt to new competition. It has the advantage of being cloud based instead of device based an so the user experience varies little from device to device. This result flies in the face of current received wisdom of the privacy gurus as the consumer is not in control. Time will tell how granularity of control is accepted in the future. It has not been overwhelmingly successful in the past. Perhaps a trust framework that is based in a non-profit would open up competition to more players.
  • Ad Blockers for third party populated iFrames, especially intelligent ones that consider the reliability of the source, do work for those consumers that are willing to set them up, mostly just geeks. One problem is that ad blockers are deployed per device. As consumers acquire more devices the behavior on each device is inevitably different and the results of an action on the part of the consumer very from time to time. Recently Microsoft released Windows code that aims to unify the user experience across devices. It will be interesting to track the success of that effort.
  • Working with the advertisers and their agencies has been a challenge as they only see their lucrative business model under threat from bureaucrats and geeks that do not understand business. It is up to the geeks to create a solution that overcomes their justifiable fears. After all, the advertisers are the only source of funding for a significant fraction of today's web.
  • Consumer commons had started working on a first person cookie which uses first person in a different sense that that above. The goal appears to be a replacement for third person cookies from advertisers. It is hard to see these two organizations coming to any common understanding. The commons are also addressing ad blockers. I have been to a couple of their sessions, but haven't seen any evidence that they appreciate the large user experience challenges or why session state is needed. Perhaps this will evolve into a User Stipulation as to the behavior the user expects. In that case it will no longer be an expression of state and probably should have a different name than "cookie".
  • Mike West on GitHub Explainer: Tightening HTTP State Management is a collection of interesting ideas for discussion, nothing more, nothing less. This site includes telemetry data from Chrome that is very helpful. The proposal seems to be a bit like VRM but based on the origin site, rather than the vendor. The value cannot be manipulated by vendor javascript. (2018-08-27)
 No solution will work if the consumers don't use it.
 Let's take the Steve Jobs approach of building an elegant solution for consumers.
 Then let's test it with our parents and children rather than other geeks.

References

  1. David Kristol; HTTP Cookies: Standards, privacy, and politics, ACM Transactions on Internet Technology, 1(2), 151–198, 2001 arXiv:cs/0105018v1 [cs.SE])
  2. Jeff Hodges and Bill Corry HTTP State Management Mechanism to Proposed Standard [1]
  3. Doc Searls The Intention Economy, Harvard Business Review Books 2012 ISBN 9781422158524
  4. Thos Hardjono, +2 Towards a Design Philosophy for Inter-operable Blockchain Systems MIT May 16, 2018 [2]

Other Material

The following was posted on stack-overflow.


Netscape was terrible at writing specs, but seems to be consistently supported by browsers:

either the NAME or the VALUE may be empty strings

if there is no = symbol in the string at all, browsers treat it as the cookie with the empty-string name, ie Set-Cookie: foo is the same as Set-Cookie: =foo.

when browsers output a cookie with an empty name, they omit the equals sign. So Set-Cookie: =bar begets Cookie: bar.

commas and spaces in names and values do actually seem to work, though spaces around the equals sign are trimmed

control characters (\x00 to \x1F plus \x7F) aren't allowed

What isn't mentioned and browsers are totally inconsistent about, is non-ASCII (Unicode) characters:

  • in Opera and Google Chrome, they are encoded to Cookie headers with UTF-8;
  • in IE, the machine's default code page is used (locale-specific and never UTF-8);
  • Firefox (and other Mozilla-based browsers) use the low byte of each UTF-16 code point on its own (so ISO-8859-1 is OK but anything else is mangled);
  • Safari simply refuses to send any cookie containing non-ASCII characters.

so in practice you cannot use non-ASCII characters in cookies at all. If you want to use Unicode, control codes or other arbitrary byte sequences, the cookie_spec demands you use an ad-hoc encoding scheme of your own choosing and suggest URL-encoding (as produced by JavaScript's encodeURIComponent) as a reasonable choice.

In terms of actual standards, there have been a few attempts to codify cookie behaviour but none thus far actually reflect the real world.

RFC 2109 was an attempt to codify and fix the original Netscape cookie_spec. In this standard many more special characters are disallowed, as it uses RFC 2616 tokens (a - is still allowed there), and only the value may be specified in a quoted-string with other characters. No browser ever implemented the limitations, the special handling of quoted strings and escaping, or the new features in this spec.

RFC 2965 was another go at it, tidying up 2109 and adding more features under a ‘version 2 cookies’ scheme. Nobody ever implemented any of that either. This spec has the same token-and-quoted-string limitations as the earlier version and it's just as much a load of nonsense.

RFC 6265 is an HTML5-era attempt to clear up the historical mess. It still doesn't match reality exactly but it's much better then the earlier attempts—it is at least a proper subset of what browsers support, not introducing any syntax that is supposed to work but doesn't (like the previous quoted-string).

In RFC 6265 the cookie name is still specified as an RFC 2616 token, which means you can pick from the alphanums plus:

!#$%&'*+-.^_`|~

In the cookie value it formally bans the (filtered by browsers) control characters and (inconsistently-implemented) non-ASCII characters. It retains cookie_spec's prohibition on space, comma and semicolon, plus for compatibility with any poor idiots who actually implemented the earlier RFCs it also banned backslash and quotes, other than quotes wrapping the whole value (but in that case the quotes are still considered part of the value, not an encoding scheme). So that leaves you with the alphanums plus:

!#$%&'()*+-./:<=>?@[]^_`{|}~

In the real world we are still using the original-and-worst Netscape cookie_spec, so code that consumes cookies should be prepared to encounter pretty much anything, but for code that produces cookies it is advisable to stick with the subset in RFC 6265.