[nfais-l] NFAIS Enotes, October 2010

Mon Jan 31 13:36:15 EST 2011

NFAIS Enotes, October 2010

Written and compiled by Jill O'Neill

Smart Content

In late 2010, there was a small gathering in New York of professionals
interested in the topic of analytics. Seth Grimes of AltaPlana had organized
this Smart Content event. Laying the groundwork early on, he had solicited
input from a variety of experts on the definition of "smart content," how an
enterprise might benefit from access to such content and the technologies
that content providers might choose to integrate in existing product in
order to make it more attractive to those enterprises (see
http://www.informationweek.com/news/software/bi/showArticle.jhtml?articleID=
228901459
<http://www.informationweek.com/news/software/bi/showArticle.jhtml?articleID
=228901459&queryText=smart%20content> &queryText=smart%20content ). Among
those whose expertise were sought were an Elsevier technologist, a
researcher from Xerox PARC, and an analytics analyst. The summation of their
input (from my perspective) was that smart content had characteristics of
mark-up and structure that allow it to be flexibly manipulated through
automated means. In conjunction with a variety of technologies, patterns and
relationships associated with that content could be exposed and enhanced to
improve discovery of relevant material in a broader range of contexts and
work flows. There is a growing belief that systems of this sort will reduce
cognitive strain on users in uncovering the right piece of material they
need without them having to necessarily know in advance the exact set of
query terms or the most appropriate search approach.  Instead the onus is on
the system to "recognize" user behavior in a particular context and match
exactly the right chunk of content to that user's current task or concern. 

Mark Stefik of Xerox PARC commented, "This shift potentially employs more
resources, more knowledge and more points of view in matching people to
content. It opens the door to a much richer process of intermediation
between people and content." And a Gilbane Group study referenced by Grimes
noted that "smart content is a natural evolution of XML structured content,
delivering richer, value-added functionality." 

It is in that context that I quote futurist Arthur C. Clarke, "Any
sufficiently advanced technology is indistinguishable from magic." Workers
want to have their needs anticipated and there can be real benefits to
detection of and better understanding of hidden usage patterns and behavior.
Are the technologies that support smart content sufficiently advanced to
pass as magic? 

Jeff Fried, Chief Technology Officer at BA-Insights, offered a little
reality to the Smart Content attendees when he noted that, at present, there
is no one-size-fits-all solution for the creation of smart content. Instead,
providers might need to assemble different components from the current
grab-bag of technologies in order to resolve specific user difficulties.
That single point might have been the most significant take-away that any
attendee might have gathered from this event - the recognition that no
single offering would necessarily suffice in satisfying the needs of a
particular user population. In order to achieve that moment of seeming
"magic" for users, savvy publishers would have to examine a variety of
options. "Pearl-growing" was his descriptive phrase for the kinds of
combinations that content providers would need to develop context-specific
solutions. And Fried cautioned his audience that it was going to be
necessary to manage expectations during this phase of development. (The
video of Fried's presentation may be found at http://vimeo.com/16349851  and
his slides may be viewed on Slideshare at
http://www.slideshare.net/SmartContent/what-business-innovators-need-to-know
-about-content-analytics ).

An overly rapid series of Lightning Talks from vendors throughout the day
served to illuminate discrete possibilities that might go into diverse
solutions. There was, as just one example, the Rosette Linguistics platform
offered by Basis Technology, aimed at "extracting meaningful intelligence
from unstructured text in Asian, European and Middle Eastern languages,"
quoted from their home page at http://www.basistech.com
<http://www.basistech.com/> . Basis Technology currently partners with NFAIS
member organization TEMIS, another presenter at the Smart Conference event.
Temis was presenting on their Luxid suite of content enrichment product,
successfully deployed by such organizations as Elsevier, ThomsonReuters and
AAAS. Other modules of Luxid include semantic technologies in support of
scientific discovery, sentiment analysis, and competitive intelligence. 

In the case of search and analytics technology, such as that provided by
FirstRain (http://www.firstrain.com <http://www.firstrain.com/> ), the
system crawls the Web for specifically factual well-structured documents
(organizational charts, product lines, etc.) which it then analyzes to
derive, distill and organize models that could then be dynamically adjusted
based on the rate of change within a specific market or industry.
FirstRain's technology has been leveraged primarily in the realm of the
investment and banking industries, fueling such companies as Fidelity and
information services such as those offered by Standard & Poor's.

Another company present at Smart Content, but more focused in the field of
sentiment analysis, was Linguamatics (http://www.linguamatics.com/ ). This
UK firm was referenced by The New York Times because their tool's analysis
of Twitter postings for the UK election accurately predicted the outcome of
that election (see "Nation's Political Pulse Taken Using Net Chatter," The
New York Times, Oct 31, 2010,
http://www.nytimes.com/2010/11/01/technology/01sentiment.html)
<http://www.nytimes.com/2010/11/01/technology/01sentiment.html>  .
Linguamatics' product is also favored by some pharmaceutical firms (Pfizer,
Merck, and AmGen among others) on the basis of its usefulness in agile
text-mining. 

During the presentation by Darrell Gunter (Elsevier veteran, recently moved
to AIP), reference was made in passing to the Semantic Wave report by Mills
Davis, founder and director of the 10X project
(http://www.project10x.com/about.php ).  The executive summary of that
report is useful for its positioning of various technologies intended to
leverage "the...Web of connected intelligences." It notes several technology
trends as driving this next phase of Web development:

*        intelligent user interfaces enhancing user productivity and
satisfaction

*        collective knowledge systems as "killer" apps

*        semantic applications including, but not limited to,
ontology-driven discovery in a range of professional fields (law, medicine,
defense, etc.)

*        semantic infrastructures in support of integration and
interoperability

*        semantic modeling and solution development

 In other words, these technologies are in support of increasingly complex
information systems - what the Semantic Wave report characterizes as those
representing "meanings and knowledge...separately from content or behavior
artifacts," rendering both understandable by people and machines. Such
technologies are still in a relatively nascent stage of development in the
sense that even those technologies referenced above that have been
introduced into the market have yet to reach a point of adoption where they
are considered entirely mainstream. They are certainly being implemented in
a variety of contexts (pharmaceutical, legal, financial, business, etc.),
but the average user sees only a new tweak to an interface, a dashboard or a
result set without understanding what's going on in the hidden black box
behind. That's their claim to creating "magic" for the user. 

The leveraging of these technologies will be the next step in
highly-specialized information environments. Most of us have been in an
environment where a research professional has stated in a matter-of-fact
manner that he/she knew everyone who was working in a particular space
surrounding a scientific question or challenge. A success story frequently
put forward is that of Collexis, acquired by Elsevier in mid 2010, a
technology that leveraged the relationships between researchers in ways that
enabled institutions to better capture and recognize researcher productivity
while enhancing the ability of the individual researcher to identify new
entrants into a given field in the interests of building new collaborative
efforts. Speaking very generally, Collexis is a sophisticated mixture of
entity extraction, pattern detection and data-mining.  

This was the point made by Richard Stanton at the Smart Content conference.
As many within this community are aware, a huge challenge is disambiguation
of a specific vocabulary term or phrase when extracted from its placement in
context (Madonna the singer vs. Madonna, the religious figure). It's the use
of the language surrounding a term or phrase that a system must be capable
of analyzing in order to be statistically confident that something is
related or relevant to a particular query. Taxonomies and ontologies
continue to play a role, but not perhaps a stand-alone role; they offer the
greatest value-add in conjunction with semantic technologies. 

All of this said, it was an impressively attractive presentation by two
sharp young women from IQ Content, an Irish user experience design
consultancy, that crystallized for me the problem in the room on that
October day. Randall Snare and Katie McGuane were present to discuss
interface design and the creation of a seamless flow in smart content
environments. Their approach was that design had to achieve a balance of
data (between analytics and content) and that the design team is made up of
individuals in three roles:

*        a user experience designer,

*        a content strategist, and

*        an analytics expert

Their case study involved a problem from an insurance provider who wanted to
make it simpler for customers to select a policy, but their discussion of
why the final solution contained a three column design rather than a five
column design was (for me) the issue that few in this discussion of smart
content had addressed. Three options is less confusing than five and in most
instances, the user will choose the policy that appeared in the middle. The
young women somewhat uncomfortably acknowledged during the course of their
program segment that design solutions could manipulate the ultimate choice
of the user on that site. It's just too easy to drive the user's choice.
Even worse, an unscrupulous provider could easily ensure that the system
consistently displayed a middle option most profitable to the organization
rather than the "right" choice for the buyer. Of course, buying an insurance
policy is not the same as identifying an answer in a legal or investment
information product, but the dashboards and interfaces found in these smart
information environments can all too readily slant a user's perception of
relevance or importance. 

Seth Grimes' Smart Content one-day event wasn't the correct venue in which
to raise the issues of information bias and objectivity or the privacy
pitfalls associated with tracking users' information-seeking behaviors. It
was a day intended to offer publishers a glimpse of available options in
constructing the best information environment for their users and it was an
interesting array. The semantic technologies that exist now can assist NFAIS
members in resolving linguistic and translation issues thereby making
content more discoverable and, yes, in the right combination, expose
patterns and relationships that help researchers approach problems with
agility and with a better grasp of those aspects that may previously have
hidden the solutions. 

There are undoubtedly benefits that will accrue from content housed in these
smart environments and, as always, the needs of the professional (legal,
financial, scientific, medical, etc.) will drive the immediate
implementations. For the enterprise, development of smart content platforms
may well have to be a priority in remaining competitive (see for example
this piece by Gilbane about the smart content landscape as it applies to the
enterprise:
http://gilbane.com/xml/2010/11/understanding-the-smart-content-technology-la
ndscape.html). In that context, smart content is characterized by enriched
content and metadata, component discovery and assembly, collaboration, and
federated content management (useful in minimizing duplication of material
within the networked environment). 

The gap between what is offered to the professional market versus that
offered to the library market is dramatic. In the Gilbane piece referenced
earlier, it closes with a plug for their willingness to consult with
businesses on identification of the following:

*         The business drivers where smart content will ensure competitive
advantage when distributing business information to customers and
stakeholders

*         The technologies, tools, and skills required to component-ize
content, and target distribution to various audiences using multiple devices

*         The operational roles and governance needed to support smart
content development and deployment across an organization

*         The implementation planning strategies and challenges to upgrade
content and creation and delivery environments

Any buzz about integrated library systems and where those fail is fairly
remote from those types of buzzword bullet points, and yet the concerns for
libraries in the delivery of smart content isn't very far removed. Remove
the word "business" from the initial statement and substitute the word
"institution" for organization in the third point and it's essentially what
must be done to sell "smart content" to any Carnegie I research facility.
But it does require thinking about the role of a content provider in new
ways. 

Are you building that kind of an advanced information service? How long
before one of your competitors is ready to offer it? Three to five years is
not an unrealistic time frame. 

NOT REGISTERED YET FOR THE ANNUAL CONFERENCE? GO TO:
http://nfais.brightegg.com/page/295-register-for-2011-annual-conference).
The cut-off for discounted hotel rooms is February 7, 2011.

2011 SPONSORS

Access Innovations, Inc.

Accessible Archives, Inc.

American Psychological Association/PsycINFO

American Theological Library Association

CAS

CrossRef

Data Conversion Laboratory

Defense Technical Information Center (DTIC)

Elsevier

Getty Conservation Institute

H. W. Wilson

Information Today, Inc.

International Food Information Service

Philosopher's Information Center

ProQuest

Really Strategies, Inc.

Temis, Inc.

Thomson Reuters Healthcare & Science

Thomson Reuters IP Solutions

Unlimited Priorities Corporation

Jill O'Neill

Director, Planning & Communication

NFAIS

(v) 215-893-1561

(email) jilloneill at nfais.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/nfais-l/attachments/20110131/d3579e2e/attachment.html>