[Med-privacy] Google Flu Trends
peter marshall
pwm@comcast.net
Mon, 8 Dec 2008 17:03:50 -0800
EPIC Urges Federal Health Officials to Reveal Flu Trends Deal
=======================================================================
On December 3, 2008, EPIC filed a Freedom of Information Act request to
force federal officials to reveal how much user search data Google has
transmitted to the Centers for Disease Control and Prevention. In
November, Google announced Google Flu Trends, a web tool that analyzes
internet users' search queries to predict flu outbreaks. Google has
provided Flu Trends data to the federal government, but has refused to
publish any information about the search queries. "No clear legal or
technological privacy safeguards prevent the disclosure of individual
search histories concerning the flu, or related medical concerns. The
public should be informed of the CDC's ongoing role in Google Flu
Trends," EPIC and Patient Privacy Rights wrote in a letter to Google
CEO Eric Schmidt.
EPIC's request comes on the heels of acknowledgements from Google and
the CDC that the search engine company has provided data to the federal
agency. Google stated that it "shared our preliminary results with the
Epidemiology and Prevention Branch of the Influenza Division at CDC
throughout the 2007-2008 flu season." On November 19, 2008, Google and
the CDC jointly published an academic paper concerning Flu Trends.
Furthermore, Google stated that Flu Trends uses current user search
data, as well as years of historic user data, including data for "all
weeks between September 28, 2003 and March 11, 2007." The search data
is used to generate estimates of flu activity on a state by state
basis. But Google says that Flu Trends could be used to provide data on
smaller groups of users, which could increase the likelihood that
individuals will be identified and linked to medical searches. Flu
Trends "may be capable of providing [flu] estimates for large cities
and metropolitan areas with high internet penetration, providing even
more local influenza surveillance. We hope to explore this topic as
well," Google said.
Google Flu Trends relies on individual search terms, such as "flu
symptoms," provided by Internet users. Google has said that it will
only reveal aggregate data, but there are no clear privacy safeguards
which prevent disclosure of individual search histories concerning the
flu. Privacy and medical groups have urged Google to be more
transparent and publish the algorithm on which Flu Trends data is based
so that the public can determine whether the privacy safeguards are
adequate.
Questions have been raised about the adequacy of Google's
"anonymization" techniques. Google Flu Trends analyzes search queries
submitted by Google users. User search data is stored on Google's
servers, and retained by the search engine company. This information
includes the Internet Protocol (IP) address, the date and time of
the query as well as a unique cookie ID assigned to the browser.
Google has stated that it will anonymize search data after a period
of nine months, but technical experts have questioned the efficacy of
the technique. Google obfuscates the fourth octet but retains the rest
of the IP address. At most, the redacted IP address is one of 254 other
users. Moreover, the unique cookie assigned by Google to the browser
remains unchanged over time and can be easily used by Google (or any
entity with powers to subpoena Google) to trace back the search query
down to a specific user. This linking of a search term to a specific
user can re-identify search terms back to an individual that had been
previously "de-identified" by Google.
EPIC's Freedom of Information Act Request to the Centers for Disease
Control and Prevention:
http://epic.org/privacy/flutrends/foia120308.pdf
EPIC's page on Google Flu Trends and Privacy:
http://epic.org/privacy/flutrends/
EPIC's page on Search Engine Privacy
http://epic.org/privacy/search_engine/
EPIC and Patient Privacy Rights' November 12, 2008 Letter to Google:
http://www.epic.org/privacy/flutrends/EPIC_ltr_FluTrends_11-08.pdf
Google's Response to EPIC and Patient Privacy Rights:
http://epic.org/redirect/120808_GOOGLE_reply_epicppr.html
Server Information Google Retains:
http://www.google.com/intl/en/privacy_faq.html#serverlogs