Difference between revisions of "Anonymous"

From MgmtWiki
Jump to: navigation, search
(Solution)
(De-identification)
 
(36 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
==Full Title or Meme==
 
==Full Title or Meme==
 
Literally [[Anonymous]] means no name.
 
Literally [[Anonymous]] means no name.
 +
 
==Context==
 
==Context==
HTTP ( used on [[Web Site]]s) was designed to operate without any identification and the [[REST]] protocol enforces that paradigm.
+
* HTTP (used on [[Web Site]]s) was designed to operate without any identification and the [[REST]] protocol appears to enforce that paradigm. [[Cookies]] were invented to overcome that feature without the user even noticing.
 +
* Literally the term [[Anonymous]] means no name, but many users now mean that the name cannot be tracked back to the user. One would think that the term [[Anonymous|Pseudonymous]] would be what they mean, but somehow anonymous sounds more attractive to them. The confusion remains irreconcilable.
  
 
==Problem==
 
==Problem==
 +
* [[Artificial Intelligence]] has become powerful enough and broadly available in 2023 and so has been used to acquire user attributes from anonymous postings by any real person.<ref>Mack DeGeurin ''ChatGPT Can 'Infer' Personal Details From Anonymous Text'' Gizmodo (2023-10-27) https://gizmodo.com/chatgpt-llm-infers-identifying-traits-in-anonymous-text-1850934318</ref> This is a reminder that [[Identity]] is not an [[Identifier]], but is a collection of personal [[Attribute]]s and [[Behavior]]s.<blockquote>A study by computer scientists at Switzerland's ETH Zurich found that large language models (LLMs) from OpenAI, Meta, Google, and Anthropic can infer a user's race, occupation, location, other personal information from anonymous text. The findings raise concerns that scammers, hackers, and law enforcement agencies, among others, could use LLMs to identify background information of users from the phrases and types of words they use. The LLM tests involved samples of text from a database of comments from more than 500 Reddit profiles. OpenAI's GPT4 had an accuracy rate of 85% to 95% in identifying private information from the texts.</blockquote>
 
* While some users may think that their attributes on distinct [[Web Site]]s cannot be correlated, research has shown that this goal will not be possible<ref>Gina Kolata, ''Can Data be Fully Anonymous? New Algorithms can still identify you'' New York Times (2019-07-24) p A8.</ref> That reality does not prevent users from trying to use [[Pseudonym]]s to remain [[Anonymous]], but it will never work against a determined adversary.
 
* While some users may think that their attributes on distinct [[Web Site]]s cannot be correlated, research has shown that this goal will not be possible<ref>Gina Kolata, ''Can Data be Fully Anonymous? New Algorithms can still identify you'' New York Times (2019-07-24) p A8.</ref> That reality does not prevent users from trying to use [[Pseudonym]]s to remain [[Anonymous]], but it will never work against a determined adversary.
 
#All HTTP connections come with an [[IP address]] which is often unique to the location of the computer.
 
#All HTTP connections come with an [[IP address]] which is often unique to the location of the computer.
Line 12: Line 15:
 
#If the user expects any continuity from one sign in session to the next, some sort of user [[Identifier]] is required.
 
#If the user expects any continuity from one sign in session to the next, some sort of user [[Identifier]] is required.
 
* Perhaps the most clueless example of the false hope of anonymity is the sequencing company Nebula who offers to perform sequencing though the block chain for complete anonymity.<ref>Megan Moteni,''You can soon get Your DNA sequenced Anonymously'' (2019-09-19) https://www.wired.com/story/you-can-soon-get-your-dna-sequenced-anonymously</ref> The problem, of course, is that there is no more sure indicator of your identity than you genome. In fact any police depart could immediate try to find you in a huge existing data base. But this is indicative of the utter cluelessness of the entire block-chain anonymity claims. In fact, they make searching for personal data easier than it has ever been before.
 
* Perhaps the most clueless example of the false hope of anonymity is the sequencing company Nebula who offers to perform sequencing though the block chain for complete anonymity.<ref>Megan Moteni,''You can soon get Your DNA sequenced Anonymously'' (2019-09-19) https://www.wired.com/story/you-can-soon-get-your-dna-sequenced-anonymously</ref> The problem, of course, is that there is no more sure indicator of your identity than you genome. In fact any police depart could immediate try to find you in a huge existing data base. But this is indicative of the utter cluelessness of the entire block-chain anonymity claims. In fact, they make searching for personal data easier than it has ever been before.
 +
* How AI can identify people even in anonymized datasets<ref>Nikk Ogasa, ''How AI can identify people even in anonymized datasets'' Science News (2022-01-25) https://www.sciencenews.org/article/ai-identify-anonymous-data-phone-neural-network</ref> Weekly social interactions form unique signatures that make people stand out
 +
 +
===De-identification===
 +
Does HIPAA '''really''' allow PHI to be de-identified? (well - yes - sort of)
 +
 +
So long as information exists as [[PHI]], its use and disclosure are both limited by the Privacy Rule. HIPAA safe harbor de-identification is the process of the removal of specified identifiers of the patient, and of the patient’s relatives, household members, and employers.  The requirements of the HIPAA safe harbor de-identification process become fully satisfied if, and only if, after the removal of the specific identifiers, the covered entity has no actual knowledge that the remaining information could be used to identify the patient. Once protected health information has been de-identified, it is no longer considered to be PHI; as such, there are no longer restrictions on its use or disclosure. By definition, de-identified health information neither identifies nor provides a reasonable basis to identify a patient. Specific pieces of data (data elements) can, individually or in combination, be used to uniquely identify an individual. The following data elements can be used to uniquely identify, and, as such, must be de-identified under the safe harbor rule:
 +
 +
* Names
 +
*Geographic locators = In the case of zip codes, covered entities are generally permitted to use the first three digits, provided the geographic unit formed by combining those first three digits contains more than 20,000 individuals.
 +
* All elements of dates (except the year) that are related to an individual, including: admission and discharge dates, birth-date, date of death, all ages over 89 years old, and elements of dates (including year) that are indicative of age.
 +
* Telephone, cellphone, and fax numbers
 +
* Email addresses
 +
* IP addresses (IP addresses can be used to identify physical addresses)
 +
* Social Security Numbers
 +
* Medical record numbers
 +
* Health plan beneficiary numbers (i.e. the member ID on a patient’s health insurance card)
 +
* Device identifiers and serial numbers (medical devices are assigned unique serial numbers)
 +
* Certificate/license numbers (e.g., driver license numbers and birth certificate numbers)
 +
* Account numbers (e.g., bank account numbers)
 +
* Vehicle identifiers and serial numbers, including license plates
 +
* [[Web Site]] URLs If a URL is logged within a specific application, the URL can be used to uniquely identify an individual
 +
* Full face photos and comparable images
 +
* [[Biometric Identifier]]s (including fingerprints, voice prints, and retinal images)
 +
* Any unique identifying numbers, characteristics or codes
 +
 +
Once these specific identifiers have been removed, the covered entity must have no actual knowledge that the remaining information could be used to identify the patient. If this “no actual knowledge” requirement has been satisfied, the PHI has been successfully de-identified under the safe harbor method.  There's the catch. As AI technology advances, this bar continues to move down to the point where the data is soon of very limited value. In other words, this safe harbor can only be used til the tide goes out, and this continues to be the direction the tide is taking.
 +
 +
* Before you accept the idea that it is possible for someone to collect data about you that does not identifier you, consider this: the more attributes that are linked together, the larger the pool of [people that are not you] gets. Eventually it gets to the point where you are the only one left in the pool of people that the attributes describe.
 +
* As [[Artificial Intelligence]] gets better, it can extract more information from data. Consider that in 2022 [https://www.uc3m.es/ss/Satellite/UC3MInstitucional/en/Detalle/Comunicacion_C/1371331085966/1371215537949/An_algorithm_makes_it_possible_to_identify_people_by_their_heartbeat an algorithm made it possible to identify people by their heartbeat] So that value is now just another biometric identifier.
 +
 +
===Anonymous does not mean Private===
 +
Just because there is no subject name associated with a collection of attributes, does not mean that the person to which those attributes apply, cannot be identified.
 +
*[https://www.iit.edu/news/anonymous-data-doesnt-mean-private-researchers-say Anonymous Data Doesn’t Mean Private, Researchers Say at the Illinois Institute of Technology] A research team at Illinois Institute of Technology has extracted personal information, specifically protected characteristics like age and gender, from anonymous cell phone data using machine learning and artificial intelligence algorithms, raising questions about data security.
  
 
==Solution==
 
==Solution==
 
*The most trustworthy [[Web Site]]s will tell you when they identify you, but it has not be historically necessary that they do so.
 
*The most trustworthy [[Web Site]]s will tell you when they identify you, but it has not be historically necessary that they do so.
*Current legislation from the EU and California requires [[Web Site]]s to be more forthcoming about how they collect and use data.
+
* 2022 legislation from the EU and California requires [[Web Site]]s to be more forthcoming about how they collect and use data.
*Fake technology solutions are proclaim every few months, none really work if there is any aggregation of data by individual. For example the State of Rhode Island cooperated with Brown University<ref> Justine S. Hastings +4, ''Unlocking Data to Improve Public Policy.'' '''CACM 62''' (2019-10) p. 48ff</ref> to show how that state could overcome identification. Given the information in the first reference, it is only a matter of time before some other Academics will recover individual identities.
+
*Well-meaning [[Technology Solution]]s are proclaimed every few months, none really work for any extended period of time if there is any [[Credential Aggregation]] needed by an individual. For example the State of Rhode Island cooperated with Brown University<ref> Justine S. Hastings +4, ''Unlocking Data to Improve Public Policy.'' '''CACM 62''' (2019-10) p. 48ff</ref> to show how that state could overcome identification. Given the information in the first reference, it is only a matter of time before some other Academics will recover individual identities.
  
 
==References==
 
==References==
 
[[Category:Glossary]]
 
[[Category:Glossary]]
 +
[[Category:Privacy]]
 +
[[Category:Health]]

Latest revision as of 13:45, 28 October 2023

Full Title or Meme

Literally Anonymous means no name.

Context

  • HTTP (used on Web Sites) was designed to operate without any identification and the REST protocol appears to enforce that paradigm. Cookies were invented to overcome that feature without the user even noticing.
  • Literally the term Anonymous means no name, but many users now mean that the name cannot be tracked back to the user. One would think that the term Pseudonymous would be what they mean, but somehow anonymous sounds more attractive to them. The confusion remains irreconcilable.

Problem

  • Artificial Intelligence has become powerful enough and broadly available in 2023 and so has been used to acquire user attributes from anonymous postings by any real person.[1] This is a reminder that Identity is not an Identifier, but is a collection of personal Attributes and Behaviors.
    A study by computer scientists at Switzerland's ETH Zurich found that large language models (LLMs) from OpenAI, Meta, Google, and Anthropic can infer a user's race, occupation, location, other personal information from anonymous text. The findings raise concerns that scammers, hackers, and law enforcement agencies, among others, could use LLMs to identify background information of users from the phrases and types of words they use. The LLM tests involved samples of text from a database of comments from more than 500 Reddit profiles. OpenAI's GPT4 had an accuracy rate of 85% to 95% in identifying private information from the texts.
  • While some users may think that their attributes on distinct Web Sites cannot be correlated, research has shown that this goal will not be possible[2] That reality does not prevent users from trying to use Pseudonyms to remain Anonymous, but it will never work against a determined adversary.
  1. All HTTP connections come with an IP address which is often unique to the location of the computer.
  2. All HTTPS (secure) connections come with a session Identifier which is needed to maintain the secure connection.
  3. Most Web Sites record all HTTP connections for security purposes.
  4. If the user supply some sort of credential to allow access to site, the REST protocol effectively requires the use of cookies installed on the user machine to carry the sign in data from one HTTP request to the next.
  5. If the user expects any continuity from one sign in session to the next, some sort of user Identifier is required.
  • Perhaps the most clueless example of the false hope of anonymity is the sequencing company Nebula who offers to perform sequencing though the block chain for complete anonymity.[3] The problem, of course, is that there is no more sure indicator of your identity than you genome. In fact any police depart could immediate try to find you in a huge existing data base. But this is indicative of the utter cluelessness of the entire block-chain anonymity claims. In fact, they make searching for personal data easier than it has ever been before.
  • How AI can identify people even in anonymized datasets[4] Weekly social interactions form unique signatures that make people stand out

De-identification

Does HIPAA really allow PHI to be de-identified? (well - yes - sort of)

So long as information exists as PHI, its use and disclosure are both limited by the Privacy Rule. HIPAA safe harbor de-identification is the process of the removal of specified identifiers of the patient, and of the patient’s relatives, household members, and employers. The requirements of the HIPAA safe harbor de-identification process become fully satisfied if, and only if, after the removal of the specific identifiers, the covered entity has no actual knowledge that the remaining information could be used to identify the patient. Once protected health information has been de-identified, it is no longer considered to be PHI; as such, there are no longer restrictions on its use or disclosure. By definition, de-identified health information neither identifies nor provides a reasonable basis to identify a patient. Specific pieces of data (data elements) can, individually or in combination, be used to uniquely identify an individual. The following data elements can be used to uniquely identify, and, as such, must be de-identified under the safe harbor rule:

  • Names
  • Geographic locators = In the case of zip codes, covered entities are generally permitted to use the first three digits, provided the geographic unit formed by combining those first three digits contains more than 20,000 individuals.
  • All elements of dates (except the year) that are related to an individual, including: admission and discharge dates, birth-date, date of death, all ages over 89 years old, and elements of dates (including year) that are indicative of age.
  • Telephone, cellphone, and fax numbers
  • Email addresses
  • IP addresses (IP addresses can be used to identify physical addresses)
  • Social Security Numbers
  • Medical record numbers
  • Health plan beneficiary numbers (i.e. the member ID on a patient’s health insurance card)
  • Device identifiers and serial numbers (medical devices are assigned unique serial numbers)
  • Certificate/license numbers (e.g., driver license numbers and birth certificate numbers)
  • Account numbers (e.g., bank account numbers)
  • Vehicle identifiers and serial numbers, including license plates
  • Web Site URLs If a URL is logged within a specific application, the URL can be used to uniquely identify an individual
  • Full face photos and comparable images
  • Biometric Identifiers (including fingerprints, voice prints, and retinal images)
  • Any unique identifying numbers, characteristics or codes

Once these specific identifiers have been removed, the covered entity must have no actual knowledge that the remaining information could be used to identify the patient. If this “no actual knowledge” requirement has been satisfied, the PHI has been successfully de-identified under the safe harbor method. There's the catch. As AI technology advances, this bar continues to move down to the point where the data is soon of very limited value. In other words, this safe harbor can only be used til the tide goes out, and this continues to be the direction the tide is taking.

  • Before you accept the idea that it is possible for someone to collect data about you that does not identifier you, consider this: the more attributes that are linked together, the larger the pool of [people that are not you] gets. Eventually it gets to the point where you are the only one left in the pool of people that the attributes describe.
  • As Artificial Intelligence gets better, it can extract more information from data. Consider that in 2022 an algorithm made it possible to identify people by their heartbeat So that value is now just another biometric identifier.

Anonymous does not mean Private

Just because there is no subject name associated with a collection of attributes, does not mean that the person to which those attributes apply, cannot be identified.

Solution

  • The most trustworthy Web Sites will tell you when they identify you, but it has not be historically necessary that they do so.
  • 2022 legislation from the EU and California requires Web Sites to be more forthcoming about how they collect and use data.
  • Well-meaning Technology Solutions are proclaimed every few months, none really work for any extended period of time if there is any Credential Aggregation needed by an individual. For example the State of Rhode Island cooperated with Brown University[5] to show how that state could overcome identification. Given the information in the first reference, it is only a matter of time before some other Academics will recover individual identities.

References

  1. Mack DeGeurin ChatGPT Can 'Infer' Personal Details From Anonymous Text Gizmodo (2023-10-27) https://gizmodo.com/chatgpt-llm-infers-identifying-traits-in-anonymous-text-1850934318
  2. Gina Kolata, Can Data be Fully Anonymous? New Algorithms can still identify you New York Times (2019-07-24) p A8.
  3. Megan Moteni,You can soon get Your DNA sequenced Anonymously (2019-09-19) https://www.wired.com/story/you-can-soon-get-your-dna-sequenced-anonymously
  4. Nikk Ogasa, How AI can identify people even in anonymized datasets Science News (2022-01-25) https://www.sciencenews.org/article/ai-identify-anonymous-data-phone-neural-network
  5. Justine S. Hastings +4, Unlocking Data to Improve Public Policy. CACM 62 (2019-10) p. 48ff