[nfais-l] NFAIS Enotes, #3

jilloneill at nfais.org jilloneill at nfais.org
Thu Jun 14 11:32:49 EDT 2012



NFAIS Enotes 2012 (#3)
Written and compiled by Jill O'Neill
 
Online Activity Data: Reports, Interpretation and Safeguards. 
 
In late March 2012, Google announced a new service. They were allowing users to view monthly activity reports online that would show a user the breadth and depth of their Google-related tasks (see: [http://googleblog.blogspot.com/2012/03/giving-you-more-insight-into-your.html] http://googleblog.blogspot.com/2012/03/giving-you-more-insight-into-your.html). I was game so immediately requested a report and in less than two hours Google invited me to access the finished report, covering my activities in the time frame beginning February 27 through March 25, 2012. Was it creepy? Was it intrusive? Was it even particularly informative?
 
Well, it wasn't particularly revelatory to me, but over time each of the major entities tracking a similar data activity stream (Google, Amazon, Facebook, Apple) might believe that there is value in the content. 
 
According to the initial report received, I was using three browsers (Chrome, Firefox and Android) and two operating systems (Windows, Android). That was easily explained. I had Windows XP, but generally alternated between Chrome and Firefox browsers. The Android browser and operating systems were tied to my account access via the sole Android device I own, the Kindle Fire tablet with its tweaked-Android-browser, Silk. In an attempt to be helpful, the report showed me that I had received something in excess of 1,500 emails during the month, but sent fewer responses. It noted who my top email contacts were. It revealed my top ten most frequent search queries and the number of web queries run across a 30-day period (409, as a matter of fact.) That last caught me by surprise, so I decided to probe a bit. 
 
There are two ways of examining a user’s web history as far as Google is concerned. If using the Chrome browser, pressing Ctrl+H brings up a full account of Web-based navigation and tasks as Chrome sees them. I spent twenty minutes reviewing thirty days of my activities using this method and then returned to the activity report. The Account Activity report is divided up into sections. If the user holds the mouse over the bottom of a particular section, live links to more in-depth data appear. Clicking on the link to my Web Search Settings brought me to a familiar page that documented specific iterations of search queries as well as the volume of searches performed on specific days of the month. 
 
Looking at the month of March, I could tell that on my least active day, I’d run only two searches, while on the most active I had run more than sixty-five different queries.  I could also see which hits in a result set I had actually clicked through on, as well as occasionally seeing the phrase, “Viewed results for ----- (paused for at least three seconds with no click)” – that interval when I would be scanning a page of the result set to see if it contained any useful hits. For the record, this view of activity data has been around for several years, but it used to be part of the drop down menu accessible upon signing into a personalized Google page. Only with the introduction of the black navigation tool bar has this view been buried. 
 
The introduction of the Google Account Activity Report was ostensibly made to allow users the opportunity to identify security breaches as well as to self-assess actual usage. Given that the actual scope of the data collection is unchanged, the report offering might also have been added to bolster Google’s case in recent policy changes pertaining to privacy – and by extension, consumer perception of what might be termed surveillance. 
 
Google took enormous levels of flack for five weeks during January and February when it announced that it would be aggregating all data collected from a single user’s activity across all Google products and services (see: [http://googleblog.blogspot.com/2012/01/updating-our-privacy-policies-and-terms.html] http://googleblog.blogspot.com/2012/01/updating-our-privacy-policies-and-terms.html). Furthermore, Google stated that it would be unifying the majority of its products and services under a single privacy statement, rather than continue with the assorted seventy policies it previously had in place. The idea, according to Google, was to simplify the situation for users in (1) understanding what kind of personal data was gathered and used and (2) to create a better user experience based on a fuller, more detailed picture of what the user was actively doing on the Google platform. The letter sent to Congress (shared with the world through Google Docs) outlined in even more detail why Google felt there was no need for public alarm (see: [https://docs.google.com/file/d/0BwxyRPFduTN2NTZhNDlkZDgtMmM3MC00Yjc0LTg4YTMtYTM3NDkxZTE2OWRi/edit?hl=en_US] https://docs.google.com/file/d/0BwxyRPFduTN2NTZhNDlkZDgtMmM3MC00Yjc0LTg4YTMtYTM3NDkxZTE2OWRi/edit?hl=en_US).
 
Google does do a lot to indicate how users can control their privacy while using the platform, as one blogger demonstrated (see: [http://www.marketingpilgrim.com/2012/03/googles-very-public-list-of-privacy-management-options-and-tools.html] http://www.marketingpilgrim.com/2012/03/googles-very-public-list-of-privacy-management-options-and-tools.html). Security experts were still quick to supply information as to how to manage privacy settings in the Google environment (see: [http://nakedsecurity.sophos.com/2012/01/31/how-to-navigate-googles-privacy-options/] http://nakedsecurity.sophos.com/2012/01/31/how-to-navigate-googles-privacy-options/). The Atlantic Wire offered The Beginner’s Guide to Quitting Google accessible at: [http://www.theatlanticwire.com/technology/2012/03/beginners-guide-quitting-google/49356/] http://www.theatlanticwire.com/technology/2012/03/beginners-guide-quitting-google/49356/
 
The Electronic Privacy Information Centre (EPIC), however, was having none of it and sued the Federal Trade Commission for failing to properly move against the corporate entity for what EPIC felt was a violation of the FTC’s own order to Google regarding privacy following the Google Buzz debacle (see: [http://www.latimes.com/business/technology/la-fi-tn-google-privacy-20120208,0,1152181.story] http://www.latimes.com/business/technology/la-fi-tn-google-privacy-20120208,0,1152181.story). Ultimately, EPIC lost its case. 
 
According to at least one law firm, there were benefits derived from the thwarted legal action: 
 
The reaction to Google's announcement suggests that the society's level of awareness of privacy issues continues to increase. The result of this awareness is the pressure on businesses to maintain fair and transparent privacy practices. This pressure can take various forms, such as "shaming" by the media and consumer advocates, hearings and negative statements by legislators, new guidance or enforcement by regulators, or, as is the case here, private efforts to compel the FTC to act. 
 
Despite these developments, in-house data protection counsel continue to face challenges convincing their internal clients that privacy matters.  More and more, however, they are able to point to the enforcement actions, negative publicity avalanches, and unwelcome attention from legislators and regulators to bring home the risks associated with mismanaging privacy (see:
[http://www.infolawgroup.com/2012/02/articles/enforcement/epic-alleges-epic-ftc-fail-in-google-saga-we-review-the-complaint/] http://www.infolawgroup.com/2012/02/articles/enforcement/epic-alleges-epic-ftc-fail-in-google-saga-we-review-the-complaint/).
 
That noted, experts such as Bruce Schneier remain alarmed about the potential tracking of users across multiple devices and in pursuit of a variety of information-oriented tasks (see:
[http://www.readwriteweb.com/enterprise/2012/02/rsa-2012-bruce-schneier-on-the.php] http://www.readwriteweb.com/enterprise/2012/02/rsa-2012-bruce-schneier-on-the.php). Schneier points out that there is a desire to discredit general computing devices in favor of more proprietary approaches towards the user’s access and behaviors. 
 
Researchers are looking into this and their concern has given rise to some interesting projects, such as the User-Centric Integration of Activity Data (UCIAD [http://uciad.info/ub/] http://uciad.info/ub/) and DATAMI (user interface for viewing and manipulation of individual UCIAD data - [http://www.datami.co.uk/?p=82] http://www.datami.co.uk/?p=82). Quoting from the UCIAD “About” page, “Specifically, the objective of UCIAD is to provide the conceptual and computational foundations to support user-centric analyses of activity data, with the aim of producing results which can be customized for and deployed in different organizations. Ontologies represent semantic models of a particular domain, and can be used to annotate and integrate data from heterogeneous sources. The project will therefore investigate ontological models for the integration of user activity data, how such models can be used as a basis for a pluggable data framework aggregating user activity data, and how such an infrastructure can be used for the benefit of the users, providing meaningful (and exportable) overviews of their interaction with the organization.”  You can learn more about UCIAD from this Powerpoint presentation (PDF File) regarding the project at: [http://sdow.semanticweb.org/2011/pub/sdow2011_paper_8_slides.pdf] http://sdow.semanticweb.org/2011/pub/sdow2011_paper_8_slides.pdf), and about similar initiatives via the papers in this proceedings volume from the October 2011 workshop, Social Data on the Web at: [http://ceur-ws.org/Vol-830/SDoW2011-proceedings.pdf] http://ceur-ws.org/Vol-830/SDoW2011-proceedings.pdf. 
 
UCIAD makes it clearer why Google’s user activity report appears to have been dumbed-down to such an extent. Google is extracting and repackaging user data from a number of different logs and systems. It is still a cumbersome enough process that the system cannot immediately generate the information “on the fly,” but rather, it must still take in a request for the report to be run across a particular set of dates. It’s antithetical to Google values to approach a problem this way due to the overall inefficiency, but the engineering challenges require further investigation. From a user perspective, Google’s focus on unifying individual user data into discrete dossiers likely has both positives and negatives, so the ability to manipulate and extract user data for purposes of keeping Google’s knowledge fragmented (to at least a degree) makes just as much sense. Both parties to this social contract have legitimate views; UCIAD is simply working towards more of a joint solution than currently exists.
 
As the Federal Trade Commission and as the White House both released materials oriented towards the creation of greater privacy protections in March, the venture capital groups took some level of notice. In particular, Fred Wilson of Union   Square offered his thoughts about online privacy protections at: [http://www.avc.com/a_vc/2012/03/some-thoughts-on-online-privacy.html] http://www.avc.com/a_vc/2012/03/some-thoughts-on-online-privacy.html):
 
Our clickstreams, search history, likes, tweets, photos, and so on and so forth is our data and we should have the ability to control it, delete it, and limit how it is used. That seems like a basic right that should be available to everyone who uses the Internet.
 
By and large, that is a statement on which the major entities can all come to an agreement. But   further on, Wilson also notes the business objective in protecting profiling and tracking activities because those approaches fuel online advertising:
 
We should be careful not to undermine the economic underpinning of the Internet in our attempts to regulate online privacy. 
 
We're in the midst of negotiating the social contract regarding the tracking of users across various platforms and the retention of associated data and records. As much as this is about privacy, it's about the historical written record of human lives, key to understanding ourselves. It's about finding some way to fuel further economic investment 
 
There's an interesting (if not particularly deep) book that appeared in 2009 entitled Your Life Uploaded: The Digital   Way To Better Memory, Health and Productivity. Written by Gordon Bell and Jim Gemmell, the book is not just about the benefits of individual data trails. It also notes some of the stumbling blocks to achieving a balance between conflicting personal and business objectives in handling personal data. In one chapter, the authors note the difficulties of handling materials left on a laptop after the death of a Microsoft executive. Microsoft felt that there was a possibility that sensitive business material was held on the device and didn't want to relinquish the laptop until it had reviewed all of the data housed there. The widow of the executive, noting the blurring of work and personal activities on the device, didn't want Microsoft to view potentially sensitive personal information, conversation, photos, etc. left by her spouse on the device. The authors refer to this difficulty as data entanglement and it is actually quite an apt phrase.
 
The dumbed-down activity report that Google has been mailing me on a monthly basis is indicative of just how entangled my life is across corporate platforms and services. I'm using browsers from Google, Apple, and Amazon on a variety of devices. Each entity believes it knows me based on patterns of tracked activity. And they do know something, however partial a picture is actually captured of who I am and what I do. Therein,lies the danger. My search queries, stored on the Google servers, are a benign mix of work-related queries (patron driven acquisition inter-library loan cost comparison) and personal (laidly worm Child Wynde), but surely could be misinterpreted if held under scrutiny. Even the volume of such queries (408 this month, 340 next month, more than 600 in another) might be employed in gauging productivity. That's the real crux of anxiety in data entanglement - the fear of another party taking advantage of us, based on partially-understood patterns captured in fragmented data. 
 
NFAIS member organizations probably have some level of familiarity with this, situated as they are in the digital information community. Libraries are vulnerable to discussions of return on investment based on data viewed through COUNTER statistics, just as content providers are vulnerable to cancellation viewed through the same lens. Platform providers worry about being vulnerable to those demanding easy data extraction. The good news is that as long as each entity understands the vulnerability of others in capturing and interpreting patterns in user data, the less likely we are as a community to abuse the trust of the scholars, students and researchers we serve. 
 
*****************************
 
2012 NFAIS Supporters
 
Access Innovations, Inc.
 
Accessible Archives, Inc.
 
American Psychological Association/PsycINFO
 
CAS
 
CrossRef
 
Data Conversion Laboratory, Inc.
 
Defense Technical Information Center
 
EBSCO Publishing
 
Getty Research Institute
 
The H. W. Wilson Foundation
 
Information Today, Inc.
 
IFIS
 
OCLC
 
Philosopher’s Information Center
 
ProQuest
 
RSI Content Solutions
 
Silverchair Information Systems
 
TEMIS, Inc.
 
Thomson Reuters IP & Science
 
Thomson Reuters IP Solutions
 
Unlimited Priorities LLC
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/nfais-l/attachments/20120614/a3099941/attachment.html>


More information about the nfais-l mailing list