The Â鶹´«Ã½ Trade Commission routinely evaluates the privacy representations a company makes against their data handling practices.[1] When discrepancies arise between claim and reality, incorrect assertions about data identification are often to blame. Companies often claim and act as if data that lacks clearly identifying information is anonymous, but data is only anonymous when it can never be associated back to a person. If data can be used to uniquely identify or target a user, it can still cause that person harm.
One way that companies obscure personal data is through “hashing.â€[2] Hashing involves taking a piece of data—like an email address, a phone number, or a user ID—and using math to turn it into a number (called a hash) in a consistent way: the same input data will always create the same hash. For example, hashing the fictional phone number “123-456-7890†transforms it into the hash “2813448ce6316cb70b38fa29c8c64130â€, a hexadecimal number that might appear random, but is always what someone gets when they hash that phone number.
Hashing has a nice potential benefit: a hash by itself cannot easily be used to guess what the original data was. For this reason, companies often use hashing in cases where they are uncomfortable writing down or sharing the directly identifying data, but they still want to be able to store the data for matching against later. Since the hash “2813448ce6316cb70b38fa29c8c64130†appears meaningless and seemingly can’t be used to find the original phone number, companies often claim that hashing allows them to preserve user privacy.
This logic is as old as it is flawed – hashes aren’t “anonymous†and can still be used to identify users, and their misuse can lead to harm. Companies should not act or claim as if hashing personal information renders it anonymized. FTC staff will remain vigilant to ensure companies are following the law and take action when the privacy claims they make are deceptive.
In 2012, former Chief Technologist Ed Felten wrote a tech blog titled “Does Hashing Make Data ‘Anonymous’?"[3] To save a click, the answer is no, it does not.[4] While hashing might obscure how a user identifier appears, it still creates a unique signature that can track a person or device over time. The warning was (and continues to be) clear: do not rely on hashing to reduce data sensitivity.
Unfortunately, some companies failed to heed this warning. In 2015, the FTC brought a case against Nomi,[5] alleging that they had surveilled consumers within stores using their MAC address – a number that identifies a device when connecting to a network. The complaint explained, “Nomi cryptographically hashes the MAC addresses it observes prior to storing them on its servers. Hashing obfuscates the MAC address, but the result is still a persistent unique identifier.â€[6]
Nomi wasn’t the only company the Commission alleged incorrectly relied on hashing to make data less sensitive. In 2022 the FTC brought a case against an online counseling service BetterHelp,[7] alleging they had shared consumers’ sensitive health data—including hashed email addresses—with Facebook. The complaint laid out that BetterHelp knew that Facebook would “undo the hashing and reveal the email addresses of those Visitors and Users.†Though BetterHelp sent hashes to Facebook, rather than email addresses, the outcome was the same: Facebook allegedly learned who was seeking counselling for mental health and used that sensitive information to target ads to them.
The privacy harms in both cases originate from the fact that the companies could identify users, not the way that they did so. Hashing is just one tool used in persistent user identification, and the FTC has recently called out other mechanisms of user tracking that rely on pseudonymous identifiers.
In 2023, the FTC brought a complaint[8] against Premom,[9] alleging the company had collected and shared users’ unique advertising and device identifiers with third parties, contrary to Premom’s “represent[ation] that it would share only ‘non-identifiable data’ with third parties.†In the complaint, the FTC laid out how Premom’s collection and sharing of these identifiers enabled “third parties to circumvent operating system privacy controls, track individuals, infer the identity of an individual user, and ultimately associate the use of a fertility app to that user.â€[10] In this case, persistent user tracking was done using a unique advertising ID, which didn’t provide the user any anonymity.
Similarly, in January of 2024, the FTC announced a complaint[11] against InMarket,[12] alleging that they had unlawfully collected data associated with a unique mobile device identifier. The Commission alleged that this unique identifier was used to track individuals over time and across apps without their informed consent.
The Â鶹´«Ã½ Trade Commission is continually working to safeguard the privacy of Americans – and that often means paying close attention to the identifiers used to recognize users online: email addresses, phone numbers, MAC addresses, hashed email addresses, device identifiers, advertising identifiers, to recap a few. Regardless of what they look like, all user identifiers have the powerful capability to identify and track people over time, therefore the opacity of an identifier cannot be an excuse for improper use or disclosure.
Thank you to the staff across the agency in OT and BCP who contributed to the blog: Grady Ward, Ben Swartz, Aaron Alva, Michael Sherling, Alex Gaynor, Stephanie T. Nguyen, Ben Wiseman
[1] The Commission frequently examines the privacy practices themselves, in addition to the evaluation of claim versus practice.
[2] This blog post focuses on privacy representations and practices related to the purported anonymity of hashed personal data. Secure cryptographic hash functions have valid uses for security purposes.
[4] Felten pointed out that hashing can be reversed when performed over common identifiers (like email addresses, phone numbers, IP Addresses, or Social Security Numbers). Because these sets are small, they are trivially reversible through guess and check – the approach he describes can reverse the hash of a Social Security Number in “less time than it takes you to get a cup of coffeeâ€. Given advances in computer speeds and parallel computing, the problem he describes can now be solved in a matter of seconds, not minutes.
[5] /news-events/news/press-releases/2015/04/retail-tracking-firm-settles-ftc-charges-it-misled-consumers-about-opt-out-choices
[7] /news-events/news/press-releases/2023/07/ftc-gives-final-approval-order-banning-betterhelp-sharing-sensitive-health-data-advertising
[8] /system/files/ftc_gov/pdf/goodrx_complaint_for_permanent_injunction_civil_penalties_and_other_relief.pdf