#[1]vice [2]alternate [3]alternate [4]alternate [5]next [6]prev
IFRAME: [7]
https://www.googletagmanager.com/ns.html?id=GTM-MSM4HQ4
[8]SKIP TO MAIN CONTENT
* [9]VICE
* [10]VICE on TV
* [11]i-D
* [12]IMPACT
* [13]Refinery29
(BUTTON) United Statesen
(BUTTON)
* [14]Video
* [15]Podcasts
* [16]News
* [17]Tech
* [18]Music
* [19]Food
* [20]Health
* [21]Money
* [22]Drugs
* [23]Uncommitted: Iowa 2020
* [24]Election 2020
* [25]Identity
* [26]Games
* [27]Entertainment
* [28]Environment
* [29]Travel
* [30]Horoscopes
* [31]Sex
* [32]VICE Magazine
* (BUTTON) More
(BUTTON)
Advertisement
[33]Tech by VICE
Researchers Find 'Anonymized' Data Is Even Less Anonymous Than We Thought
Corporations love to pretend that 'anonymization' of the data they collect
protects consumers. Studies keep showing that’s not really true.
by [34]Karl Bode
Feb 3 2020, 3:24pm
[35]Share[36]Tweet[37]Snap
Image: Cathryn Virginia
Last fall, AdBlock Plus creator Wladimir Palant revealed that Avast was
using its popular antivirus software to [38]collect and sell user data.
While the effort was eventually [39]shuttered, Avast CEO Ondrej Vlcek
first downplayed the scandal, assuring the public the collected data
had been “anonymized”—or stripped of any obvious identifiers like names
or phone numbers.
“We absolutely do not allow any advertisers or any third party...to get
any access through Avast or any data that would allow the third party
to target that specific individual,” [40]Vlcek said.
But analysis from students at Harvard University shows that
anonymization isn’t the magic bullet companies like to pretend it is.
Dasha Metropolitansky and Kian Attari, two students at the [41]Harvard
John A. Paulson School of Engineering and Applied Sciences, recently
built a tool that combs through vast troves of consumer datasets
exposed from breaches for a class paper they’ve yet to publish.
“The program takes in a list of personally identifiable information,
such as a list of emails or usernames, and searches across the leaks
for all the credential data it can find for each person,” [42]Attari
said in a press release.
They told Motherboard their tool analyzed [43]thousands of datasets
from data scandals ranging from the [44]2015 hack of Experian, to the
hacks and breaches that have plagued services from [45]MyHeritage to
[46]porn websites. Despite many of these datasets containing
“anonymized” data, the students say that identifying actual users
wasn’t all that difficult.
“An individual leak is like a puzzle piece,” Harvard researcher Dasha
Metropolitansky told Motherboard. “On its own, it isn’t particularly
powerful, but when multiple leaks are brought together, they form a
surprisingly clear picture of our identities. People may move on from
these leaks, but hackers have long memories.”
For example, while one company might only store usernames, passwords,
email addresses, and other basic account information, another company
may have stored information on your browsing or location data.
Independently they may not identify you, but collectively they reveal
numerous intimate details even your closest friends and family may not
know.
“We showed that an ‘anonymized’ dataset from one place can easily be
linked to a non-anonymized dataset from somewhere else via a column
that appears in both datasets,” Metropolitansky said. “So we shouldn’t
assume that our personal information is safe just because a company
claims to limit how much they collect and store.”
The students told Motherboard they were “astonished” by the sheer
volume of total data now available online and on the dark web.
Metropolitansky and Attari said that even with privacy scandals now a
weekly occurrence, the public is dramatically underestimating the
impact on privacy and security these leaks, hacks, and breaches have in
total.
Previous studies have shown that even within independent individual
anonymized datasets, identifying users isn’t all that difficult.
In one [47]2019 UK study, researchers were able to develop a machine
learning model capable of correctly identifying 99.98 percent of
Americans in any anonymized dataset using just 15 characteristics. A
different [48]MIT study of anonymized credit card data found that users
could be identified 90 percent of the time using just four relatively
vague points of information.
Another [49]German study looking at anonymized user vehicle data found
that that 15 minutes’ worth of data from brake pedal use could let them
identify the right driver, out of 15 options, roughly 90 percent of the
time. Another [50]2017 Stanford and Princeton study showed that
deanonymizing user social networking data was also relatively simple.
Individually these data breaches are problematic—cumulatively they’re a
bit of a nightmare.
Metropolitansky and Attari also found that despite repeated warnings,
the public still isn’t using unique passwords or password managers. Of
the 96,000 passwords contained in one of the program’s output
datasets—just 26,000 were unique.
The problem is compounded by the fact that the United States still
doesn’t have even a basic privacy law for the internet era, thanks in
part to relentless lobbying from a [51]cross-industry coalition of
corporations eager to keep this profitable status quo intact. As a
result, penalties for data breaches and lax security are often [52]too
pathetic to drive meaningful change.
Harvard’s researchers told Motherboard there’s several restrictions a
meaningful U.S. privacy law could implement to potentially mitigate the
harm, including restricting data access to unauthorized employees,
maininting better records on data collection and retention, and
decentralizing data storage (not keeping corporate and consumer data on
the same server).
Until then, we’re left relying on the promises of corporations who’ve
repeatedly proven their privacy promises aren’t worth all that much.
Tagged:
[53]data
Subscribe to the VICE newsletter.
____________________
(BUTTON) Subscribe
References
Visible links
1.
https://www.vice.com/en_us/rss
2.
https://www.vice.com/en_us/article/dygy8k/researchers-find-anonymized-data-is-even-less-anonymous-than-we-thought
3.
https://www.vice.com/en_ca/article/dygy8k/researchers-find-anonymized-data-is-even-less-anonymous-than-we-thought
4.
https://www.vice.com/en_asia/article/dygy8k/researchers-find-anonymized-data-is-even-less-anonymous-than-we-thought
5.
https://www.vice.com/en_us/article/3a8k79/do-ring-cameras-violate-wiretapping-laws-new-hampshire-is-about-to-find-out
6.
https://www.vice.com/en_us/article/7kzxzy/senator-mark-warner-ftc-not-doing-enough-on-browsing-data-avast-antivirus
7.
https://www.googletagmanager.com/ns.html?id=GTM-MSM4HQ4
8.
https://www.vice.com/en_us/article/dygy8k/researchers-find-anonymized-data-is-even-less-anonymous-than-we-thought#main-content
9.
https://www.vice.com/en_us
10.
https://www.viceland.com/en_us?_ga=2.122564107.1244600859.1568037773-1243485207.1550599999
11.
https://i-d.vice.com/en_us
12.
https://impact.vice.com/en_us
13.
https://www.refinery29.com/
14.
https://video.vice.com/en_us/
15.
https://vice.com/en_us/page/podcasts
16.
https://news.vice.com/en_us
17.
https://www.vice.com/en_us/section/tech
18.
https://www.vice.com/en_us/section/music
19.
https://www.vice.com/en_us/section/food
20.
https://www.vice.com/en_us/section/health
21.
https://www.vice.com/en_us/section/money
22.
https://www.vice.com/en_us/section/drugs
23.
https://www.vice.com/en_us/topic/uncommitted-iowa-2020
24.
https://www.vice.com/en_us/topic/2020
25.
https://www.vice.com/en_us/section/identity
26.
https://www.vice.com/en_us/section/games
27.
https://www.vice.com/en_us/section/entertainment
28.
https://www.vice.com/en_us/section/environment
29.
https://www.vice.com/en_us/section/travel
30.
https://www.vice.com/en_us/astroguide
31.
https://www.vice.com/en_us/section/sex
32.
https://www.vice.com/en_us/topic/vice-magazine
33.
https://www.vice.com/en_us/section/tech
34.
https://www.vice.com/en_us/contributor/karl-bode
35.
https://www.vice.com/en_us/article/dygy8k/researchers-find-anonymized-data-is-even-less-anonymous-than-we-thought#javascript
36.
https://www.vice.com/en_us/article/dygy8k/researchers-find-anonymized-data-is-even-less-anonymous-than-we-thought#javascript
37.
https://www.vice.com/en_us/article/dygy8k/researchers-find-anonymized-data-is-even-less-anonymous-than-we-thought#javascript
38.
https://palant.de/2019/10/28/avast-online-security-and-avast-secure-browser-are-spying-on-you/
39.
https://www.vice.com/en_us/article/wxejbb/avast-antivirus-is-shutting-down-jumpshot-data-collection-arm-effective-immediately
40.
https://www.forbes.com/sites/thomasbrewster/2019/12/09/are-you-one-of-avasts-400-million-users-this-is-why-it-collects-and-sells-your-web-habits/
41.
https://www.seas.harvard.edu/
42.
https://www.seas.harvard.edu/news/2020/01/imperiled-information
43.
https://docs.google.com/spreadsheets/d/1A7y6Y5cgObJvoq3sIK-6K9PJ-XAaZ8QR99cD_Og-0RY/edit#gid=1989660935
44.
https://www.theguardian.com/business/2015/oct/01/experian-hack-t-mobile-credit-checks-personal-information
45.
https://www.vice.com/en_us/article/vbqyvx/myheritage-hacked-data-breach-92-million
46.
https://www.vice.com/en_us/article/78k849/hacker-breaches-porn-network-advertises-user-data-on-dark-web
47.
https://www.nature.com/articles/s41467-019-10933-3
48.
http://news.mit.edu/2018/privacy-risks-mobility-data-1207
49.
http://www.autosec.org/pubs/fingerprint.pdf
50.
https://www.cs.princeton.edu/~arvindn/publications/browsing-history-deanonymization.pdf
51.
https://www.eff.org/deeplinks/2017/10/how-silicon-valleys-dirty-tricks-helped-stall-broadband-privacy-california
52.
https://www.vice.com/en_us/article/d3agv7/the-equifax-settlement-is-a-cruel-joke
53.
https://www.vice.com/en_us/topic/data
Hidden links:
55.
https://www.vice.com/en_us
56.
https://www.facebook.com/vice
57.
https://twitter.com/vice