[nfais-l] NFAIS Enotes, GIGO

Fri Apr 6 08:40:36 EDT 2012

NFAIS Enotes, January 2012
Written and Compiled by Jill O’Neill

GIGO (Garbage In, Garbage Out)

Back in August of 2009, I wrote an Enotes that discussed a laborious search experience I had in trying to track down a somewhat obscure artist whose work I’d seen in a museum that summer. The piece focused on my perceptions of Google’s failures in support of my information seeking. Some of the sample search queries were these: 

● Rhead illustration “Arthurian myth”
● Rhead illustration “King Arthur”
● Golden Age “American Illustration”

One pull quote sums up my disappointment --: “...Google (even in its “personalized” version) did not (and perhaps truly cannot) piece together two hours of consecutive search queries and understand the nature of my information need. On August 28, 2009, Jill O’Neill is searching for 19th century American book illustrator and graphic artist, Louis John Rhead, and related concepts. Based on queries and click-through behavior during the current 30 minute search session, she appears to have an interest in viewing examples of his artistic output. Based on her previous Web history with us (Google), result sets should be predominantly from high-quality, high-ranking informational sites (.edu or .org) with preferred reading level at or above 8th grade...”

Recalling this, I was interested in (if somewhat skeptical of) Google’s announcement on January 10, 2012 of their release of “social search.” Would the enhancements announced improve my overall search experience?  (see: [http://googleblog.blogspot.com/2012/01/search-plus-your-world.html] http://googleblog.blogspot.com/2012/01/search-plus-your-world.html). 

Google made it clear that they were going to offer me results influenced by the sharing habits of my personal networks -- Google+ and Twitter -- and thereby direct me to the content that friends and colleagues had found to be of interest based on what they had shared on the Web. Looking for a broad search query that would transcend pre-conceived expectations, I decided to see what might pop up if I searched Google for the wildly popular BBC period television series, Downton Abbey. 

Those two words were the only elements to the query. Immediately Google offered me the option of viewing 120 items gathered via my social network on Google+ or the millions of results from the broader Web. In an effort to attract my attention to the personal results, Google included one or two photo avatars of people Google recognized as being known to me -- Peter Scott, John Blossom, etc. 

Clicking on a button to the right hand side of the screen generated a new page of results. At the top of the personal results were items from PBS, Newsweek and the Daily Beast, items shared specifically from those organizations’ brand pages on Google+.  Just below the top result was another item with the caption, “Tim O’Reilly shared this on oreilly.com,” and an indication of the date and time for that posting. Further down the page were items I’d seen via bloggers whose RSS feeds I captured in my Google Reader.  Not all blogged references to Downton Abbey from my Google Reader subscriptions were displayed -- only those items that  friends had shared in some specifically-social environment. Only if I clicked through on the link did it show me a particular context. Oddly enough, Google did display on that first page of results one of the items that I myself had shared with a limited group on Google+, a gleeful note from last September that the first season of Downton Abbey had won an Emmy (see: ([https://plus.google.com/100676333436662376518/posts/6JDn5AdFrsm] https://plus.google.com/100676333436662376518/posts/6JDn5AdFrsm).  Perhaps a little more baffling were items displayed that came to me out of a somewhat convoluted pattern. Friend X from a particular Circle I had on Google+ had shared a link to a blogspot item via her own blog on Wordpress. The peculiarity was that I couldn’t locate that link when revisiting her Wordpress blog. Somewhere the citation chain had been broken and I could only hope that Google was honestly delivering something to me that my acquaintance had at some point shared online.

Clicking on a globe icon brought back the 108 million Web items that were also available to me, but as might have been anticipated, those were cluttered with retail sites hawking the DVDs as well as with duplicative hits from newspapers running syndicated content. But what if I had run the same queries within the social environments Google was aggregating for me? What did the search query retrieve on Google+ if that was all I was searching? 

Just as with the unfiltered results, the issue here was the sheer number of duplicative hits. I was directed to the same video of Dame Maggie Smith over and over and over because so many had shared an identical link to YouTube. There was no restriction to items shared only by those within my circles.  Running the query in Google Reader surfaced some items I hadn’t paid attention to upon their initial publication, such as one from the Oxford University Press (OUP) blog at [http://blog.oup.com/2011/11/remembrance/] http://blog.oup.com/2011/11/remembrance/.  This is one of the cracks in the Google platform. Even in my personal results, while I subscribe to the OUP RSS feed, the social search in place did not include it because no one in my social network had actually shared that individual item. Serendipity still is part of discovery it would seem, but you have to rummage about a bit.

To be fair, I spent time using other search tools to see what I would get, running the same query. Phil Bradley, librarian, maintains an excellent resource list at [http://www.philb.com/webse.htm%20] http://www.philb.com/webse.htm.  Wolfram Alpha - certainly a dark horse for this kind of query -- surprised me with data from the Internet Movie Database (IMDb).  Using the relatively new search engine, Duckduckgo.com, I saw results primarily from big media and well known Internet entities -- MSNBC, Salon, and the like, but nothing on the first page from individual content creators. More amusing was the visit to Exalead where the system asked anxiously if I was sure I didn’t mean Downtown Abbey. But I did notice something about Microsoft’s Bing result set. The screen was heavily weighted towards advertising. 

This was an interesting wrinkle. Google has recently been criticized for abandoning search. Journalist Peter Yared on CNET indicated that this implementation of social search by Google was a “tacit acknowledgement that its stalwart search links are largely irrelevant and might as well be replaced with social results. Google search results are essentially gamed results produced by search optimizers.”  (see: [http://news.cnet.com/8301-1023_3-57358850-93/why-google-is-ditching-search/] http://news.cnet.com/8301-1023_3-57358850-93/why-google-is-ditching-search/). He continued to note that these algorithmically-generated ads and “answers” which appear on the user’s initial screen of results are pushing actual search results lower and lower on Google’s page. The screenshot Yared used to illustrate this point was picked up by Kent Anderson of the Journal of Bone and Joint Surgery on the Scholarly Kitchen. (see: [http://scholarlykitchen.sspnet.org/2012/01/25/the-end-of-the-salad-days-where-is-google-headed-next/] http://scholarlykitchen.sspnet.o/2012/01/25/the-end-of-the-salad-days-where-is-google-headed-next/). 

I was actually somewhat puzzled by both of those pieces linked in the previous paragraph, because in my experience, the ads on Google rarely overwhelm the results page. Even when I ran the Downton Abbey Google search, I was not inundated with advertisements in the same way that I was on Bing. I scrutinized the search query proposed in the original CNET article. The query was in fact a lazy, but not implausible one, “flights from ny to sf.” There was no use of the airport codes LGA or SFO to specify the actual points of origin and destination and two of the query terms -- from, to -- might easily be dropped out by the system. In such an instance of garbage in, garbage out, the user is naturally inundated with inane results. In my own experience, the levels of advertising that were being touted as indicators of Google’s abandonment of search actually appear on results pages in instances where (a) the user types in an ill-considered, natural language search query or (b) the user inputs a brand name. Put in different queries of a more specific nature and the noise on the page is reduced. With a little specificity (such as airport codes), the top search result is a list of flights between LaGuardia (LGA) and San Francisco (SFO) for 48 hours out from the searcher’s current point in time from name brand airlines such as Delta, American, Air Tran and US Air. Also included is time-in-flight and proposed fare. There are still ads, but not to the same extent that Yared documented. 

So what happens today if I input that initial query used in August of 2009, noted at the beginning of this piece? 

● Rhead illustration “Arthurian myth”

Perhaps unsurprisingly after running the same query, there are still neither ads on my page nor any social results from my network. In fact I had to tweak the query (Rhead illustration idylls) to generate any social results and even then Google offered fewer than six. Yes, my social network may not have the same level of interest in a 19th century illustrator as they have in a currently popular television program, but my point is that the Google search experience hasn’t been dramatically altered over the interval of three years. Google still has not delivered personalized results that meet the user where s/he is, if that user is searching out niche content.

As an article on ReadWriteWeb suggested, the real enhancement to Google’s offering of personal results lies in the fact that users can turn it on and off, depending upon the particular need (see: [http://www.readwriteweb.com/archives/they_did_it_google_personalizes_search_it_is_not_e.php] http://www.readwriteweb.com/archives/they_did_it_google_personalizes_search_it_is_not_e.php).  That is the real value, if only because some tasks of information seeking are more casual than others.

Using a more industry-specific query such as “patron driven acquisition ILL” gave me a set of 50 very specific personal results via Google, but nothing of which I’d not been aware. Clicking over to the more general results revealed far more useful (and unfamiliar) sources outside of my networks. 

The enhancements introduced by Google are primarily useful for rapid scanning purposes. If anything, the value of those personal results arises from revelations about one’s social network and where it may fail, rather than from any “Eureka” moment of new content discovery. At the risk of sounding complacent, within this context, the traditional content provider wins.  But before you become complacent, take a look at Mendeley or some of the other emerging forms of social networks and discovery and take heed.  We still have work to do!

2012 NFAIS Supporters

Access Innovations, Inc.
Accessible Archives, Inc.
American Psychological Association/PsycINFO
CAS
CrossRef
Data Conversion Laboratory, Inc.
Defense Technical Information Center
EBSCO Publishing
Getty Research Institute
The H. W. Wilson Foundation
Information Today, Inc.
IFIS
OCLC
Philosopher’s Information Center
ProQuest
RSI Content Solutions, Inc.
Silverchair Information Systems
TEMIS, Inc.
Thomson Reuters IP & Science
Thomson Reuters IP Solutions
Unlimited Priorities LLC

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/nfais-l/attachments/20120406/26de6ba4/attachment.html>