World Library  
Flag as Inappropriate
Email this Article


Article Id: WHEBN0034057814
Reproduction Date:

Title: De-identification  
Author: World Heritage Encyclopedia
Language: English
Subject: Personal genomics, Pseudonymization, Protected health information, Honest broker, Right to withdraw
Collection: Data Protection, Electronic Health Records, Information Privacy, Research Ethics
Publisher: World Heritage Encyclopedia


De-identification is the process used to prevent a person’s identity from being connected with information. Common uses of de-identification include human subject research for the sake of privacy for research participants. Common strategies for de-identifying datasets are deleting or masking personal identifiers, such as name and social security number, and suppressing or generalizing quasi-identifiers, such as date of birth and zip code. The reverse process of defeating de-identification to identify individuals is known as re-identification.


  • Example 1
  • Anonymization and de-identification 2
  • Applications 3
  • Limits 4
  • De-identification laws in the United States of America 5
    • Safe harbor 5.1
    • Research on decedents 5.2
  • See also 6
  • References 7
  • External links 8


A survey is conducted, such as a census, to collect information about a group of people. To encourage participation and to protect the privacy of survey respondents, the researchers attempt to design the survey in such a way that people can participate in the survey and when the result is published it will not be possible to match any participant's individual response with any data published in the result.

Anonymization and de-identification

Anonymization refers to irreversibly severing a data set from the identity of the data contributor in a study to prevent any future re-identification, even by the study organizers under any condition.[1][2] De-identification is also a severing of a data set from the identity of the data contributor, but may include preserving identifying information which could only be re-linked by a trusted party in certain situations.[1][2][3] There is a debate in the technology community of whether data that can be re-linked, even by a trusted party, should ever be considered de-identified.


Research into de-identification is driven mostly for protecting health information.[4] Some libraries have adopted methods used in the healthcare industry to preserve their readers' privacy.[4]


Whenever a person participates in genetics research the donation of a biological specimen often results in the creation of a large amount of personalized data. Such data is uniquely difficult to de-identify.[5]

Anonymization of genetic data is particularly difficult because of the huge amount of genotypic information in biospecimens,[5] the ties that specimens often have to medical history,[6] and the advent of modern bioinformatics tools for data mining.[6] There have been demonstrations that data for individuals in aggregate collections of genotypic data sets can be tied to the identities of the specimen donors.[7]

Some researchers have suggested that it is not reasonable to ever promise participants in genetics research that they can retain their anonymity, but instead such participants should be taught the limits of using coded identifiers in a de-identification process.[2]

De-identification laws in the United States of America

Safe harbor

Sometimes a researcher will have data about human subjects of significance to other researchers and want to share that data. A common case is that hospitals collect large amounts of medical statistics on their patients and it would be useful for medical research for other entities to review that data. In this case, it would be unethical to reveal the identities of the people whose data would be shared, because those people have a right to privacy. In order to share the data, it must first be de-identified so that no particular person can be associated with their data set by anyone who sees the data.

The problem is that it is hard to determine what kind of data can identify a person. One model for determining what data cannot be shared is the United States' policy on protected health information, which gives a list of identifying data. If a researcher removes protected health information from a data set, then the term for that researcher's state is that the researcher is in a "safe harbor" for having taken reasonable action to protect the identities of those whose data the researchers collected.[8]

Research on decedents

The key law about research in electronic health record data is HIPAA Privacy Rule. This law allows use of electronic health record of deceased subjects for research (HIPAA Privacy Rule (section 164.512(i)(1)(iii)))[9]

See also

Data anonymization


  1. ^ a b Godard, B. A.; Schmidtke, J. R.; Cassiman, J. J.; Aymé, S. G. N. (2003). "Data storage and DNA banking for biomedical research: Informed consent, confidentiality, quality issues, ownership, return of benefits. A professional perspective". European Journal of Human Genetics 11: S88–122.  
  2. ^ a b c Fullerton, S. M.; Anderson, N. R.; Guzauskas, G.; Freeman, D.; Fryer-Edwards, K. (2010). "Meeting the Governance Challenges of Next-Generation Biorepository Research". Science Translational Medicine 2 (15): 15cm3.  
  3. ^
  4. ^ a b Nicholson, S.; Smith, C. A. (2006). "Using lessons from health care to protect the privacy of library users: Guidelines for the de-identification of library data based on HIPAA". Proceedings of the American Society for Information Science and Technology 42: n/a.  
  5. ^ a b McGuire, A. L.; Gibbs, R. A. (2006). "GENETICS: No Longer De-Identified". Science 312 (5772): 370–371.  
  6. ^ a b Thorisson, G. A.; Muilu, J.; Brookes, A. J. (2009). "Genotype–phenotype databases: Challenges and solutions for the post-genomic era". Nature Reviews Genetics 10 (1): 9–18.  
  7. ^ Homer, N.; Szelinger, S.; Redman, M.; Duggan, D.; Tembe, W.; Muehling, J.; Pearson, J. V.; Stephan, D. A.; Nelson, S. F.; Craig, D. W. (2008). Visscher, Peter M., ed. "Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays". PLoS Genetics 4 (8): e1000167.  
  8. ^ "HIPAA Privacy Rule and Its Impacts on Research". 2011. Retrieved 9 December 2011. 
  9. ^ 45 C.F.R. 164.512)

External links

  • A training series on United States government de-identification standards
  • Guidance Regarding Methods for De-identification of Protected Health Information
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.