Information Management in 2008

As 2008 begins, TFPL consultants reflect on the projects we have worked on over the past 12 months and outline the key information challenges that our clients have asked us to address as a pointer to the trends for the coming year.

Content management
Our consultants were involved in several intranet redesign projects as well as taxonomy strategy and development projects. The common theme was intelligent information architecture,  making content better organised and described to bring it and users together in a more natural and efficient way – to the benefit of the business AND the user.

Intranets are key resources for organisations, and TFPL helped a number of national organisations review and redesign theirs to work both as a communications channel for the business and as an efficient business application, serving up everyday information.  This process needs skills to look at both the user-focused (i.e. the layout of pages) and the content-focused,( i.e. the metadata profiles and attendant controlled vocabularies) elements.

Taxonomies continue to be important:  there effective design allowing users to manage and navigate content systems and aids retrieval using search engines.  Extending these classification aspects with resource discovery and dynamic publishing of content we are beginning to allow Information Discovery with content related to other similar content through well created and managed metadata.

Aligned to information architecture work has been a growing need from organisations to manage the complex task of migrating unstructured web content from disparate sites into centrally managed content management systems.   We have managed content migrations for a large government deparment and a global law firm.  TFPL have developed a methodology to assist our clients through a content migration, covering:

  1. content audit
  2. migration planning
  3. user review
  4. automatic, rule-based migration
  5. quality assurance

Knowledge management
There has been much interest this year in the development of information and knowledge strategies to support the business objectives resulting from changing trends in external economic, social and technical advances. These drivers have led to:

  • Downsizing the workforce and workspace
  • Flexible work patterns, with more and more staff working from home
  • New technology (web 2.0) and more robust communication networks to improve knowledge sharing and learning across organisations

We have conducted information and knowledge audits and strategy development projects for government agencies, local councils, and the not-for-profit sector.  Organisations are reviewing how they manage and deliver information to ensure that K&IM strategies and services are aligned with business objectives.  They are looking to rationalise the procurement of published material and working to deliver internal information effectively as well as seeking to avoid silo working

Our consultants spend time talking to staff across the organisation, using a variety of methods to better understand:

  • What information they require to carry out their jobs
  • How they want to work
  • How best they would prefer to access and use information 
  • What where the key issues and barriers preventing them from doing so.

In nearly all of our client organisations we found that people were spending too much time trying to find the information ‘they knew was there somewhere’. . Clients were also interested in how other organisations have addressed these issues so that good practice methodology can be adopted straight away.

The range of our IM/KM work across all the sectors has enabled TFPL to share experience and know-how with our clients and to work with them to build a vision for information and knowledge management.. In many cases TFPL has gone on to support clients through the implementation and evaluation of the projects.

Information service reviews
We have worked with in-house information services to ensure that the services and products offered are fresh and relevant.  Challenges facing information services include:

  • Detachment from target audiences
  • Remoteness from senior management
  • Hesitation over service development
  • Subjective spending decisions

Records management

The demand for records management consultancy during 2007 remained very strong across the government sector and clearly re-emerged in the private sector as organisations realign their information and records management programmes to meet changing external demands and set out to realise the benefits of technologies. Building good RM practices into the electronic records management arena still poses a challenge for many and TFPL is supporting a number of EDRM designs and implementations.

Small to medium sized organisations are attracted by the Microsoft Sharepoint offering which is considered an attractive alternative to traditional RM applications and are showing increased interest in using collaborative and social media tools.

Bringing sense to the e-records environment still requires the understanding of the connectivity between the governance frameworks, information architecture, user friendly corporate fileplans with appropriate metadata frameworks and controlled vocabularies. TFPL is meeting the growing demand for making sense of and integrating legacy records into the new e-environment through migration and rationalisation of applications that hold records.  Developing and applying retention schedules for legal and regulatory compliance across all organisations has also featured in this year's consultancy projects.

2008 is widely expected to see a tightening of belts across all sectors.  In this climate, efficiency in business is essential and good decisions can only be taken with the right information at hand.  TFPL consultants can help your business put it's Information Management strategies and practices in order.

Psycho information architecture!

TFPL regularly runs training events for its temporary recruitment candidates. 

Last night, over 30 of then heard Alan Flett describe how psychological theories and methodologies, coupled with computer-based analysis, can help us understand users' mental models and build better information architectures.  This is done by using psychological interview approaches with users and analysing the outputs via computer aided clustering techniques.

The outputs can be used to inform information architecture design.  The results are evidence based, user focused taxonomies and wireframes etc that are capable of evolution and refinement.

For more information on 'psycho' information architecture contact alan.flett@tfpl.com.   For more information about temps training events, contact katy.crosse@tfpl.com.

Thinking about a spring clean?

A client, embarking on the roll out of a new Content Management System in 2007, asked me for an opinion on the potentially thorny problem of classifying a large set of existing content.

The answer is, “well, there are a many ways to attack it.”  So I thought I’d share my points of view on it:

Manual classification

  • requesting the author classifies their content gives you a high degree of accuracy, but often it is a subjective set of tags (the author knows what they were thinking when they wrote the document, but might not consider wider tags which are equally applicable to the content).
  • employing Information Scientists (directly or outsourcing to a group like TFPL’s Information Service) to read appraise and tag the document – This could be a useful approach if you want other metadata to be created, for example a summary, abstract or headline where there is some skill in creating those new meta items.
  • employing a team of classifiers to train on a specific taxonomy and to apply this to content. If the volumes are not huge and this is a one off task, providing some temporary contactors to plough through a document might be low tech, but could be the best option.

For all of the above elements the size of the taxonomy can be a limiting factor.  Any structure over 200 nodes and the lower level nodes are unlikely to be used.

Automatic classification

For large content sets or for multiple and or large taxonomy structures, an automatic classification system might be the best approach. However, the trade off is typically the effort required “up front” to develop the tagging rules.

There are a number of software applications which in one form or another build up rule sets to apply to the language of a document and return one or more tags (dependant on the setting of a threshold value).  This process can include:

  • Training sets: compare a positive set (documents you know are about the subject), to negative set (a random control set) and generate the linguistic rules.  The effort involved in creating the initial documents sets is non-trivial. 
  • Manual rule’s bases – requires expertise in the rule language and application
  • Machine learning systems – needs monitoring and tailoring over time to improve accuracy.
  • Thesaurus driven systems – using keyword relationships to preferred terms with an algorithm to create a rule base. Can be set up reasonably easy, but will need tailoring for complex and ambiguous language  (which English  seems to be littered with wen you get into this topic!)

However, for some of the known taxonomies, like the UK government sponsored Integrated Public Sector Vocabulary (IPSV)  there are off-the-shelf classification systems that can be employed directly.

The type and format of the content is also a major factor:

  • If your reports follow a standard template then you have a much improved chance getting a successful outcome with rule based classifications (knowing the first section is always an Executive Summary means that more weighting can be applied to the content there than the content in the Appendices, for example).
  • Large complex documents can have many subjects and while a classifier could apply many tags it might be better to create distinct sections that have a more focussed set of tags, depending on the desired application of the content.

It's certainly a challenge, but there are options out there and the improvement in search and retrieval with well classified content set is a worthwhile benefit.

Where do you want to go?

Following the focus group we ran last week we have been using the results to prepare a preliminary draft information architecture for the site. Our client was disappointed with the turnout at the focus group. However, from our perspective it was a success; we were able to gain many insights into how the audience uses the site and what they really want when they visit the site. These nuggets are invaluable in adding meaning to the survey and interview research we have already completed. The interaction between participants gives a perspective that doesn’t come from speaking with individuals.

It takes the emphasis away from the “clever” information architecture and taxonomy ideas and puts it firmly on the content itself.  There are some recent posts exploring this theme.

In his post on goal based retrieval, Joe Lamantia discusses common goals of information retrieval (e.g. reviewing summaries of items, understanding contexts and situations) and the different modes of retrieval or interaction that would meet these goals.

Maish Nichani has written about the idea of focussing on the users’ target content in his excellent post “Taming your target content”.

This approach is helping to define the vague content types mentioned by the audience e.g. “news”, “updates” to something more specific, with the surrounding context to add deeper meaning to a single piece of information.

Tell us how you really feel

Working as a consultant, one of the my favourite activities is running workshops and focus groups. Listening to people discussing thier work, their likes and dislikes and ideas is always fascinating.

Yesterday we ran a focus group for a client which involved a card sorting exercise. The objective was to obtain feedback from the organisation's members about their information expectations and use of the website. Before the exercise we had an in depth discussion about the current home page, talked about the navigation and content. When we came to the card sorting the participants were completely floored by four of the terms.  "environment".... what does that mean? "property" ...huh??!!

Interestingly, these four words were smack bang in the middle of the main navigation that we had just looked at in detail. They had said the navigation made sense to them and they used it regularly, yet the words out of context meant absolutely nothing to them.

User-centered design - 1: Guesstimating - 0

Best of both worlds

A folksonomy is an information retreival tool created by allowing users to 'tag' information resources with whatever words they want to remember them by.

The folksonomy approach is very different from the controlled vocabulary approach (taxonomies, classifications, thesauri, ontologies).  The debate between advocates of the two approaches has, unsuprisingly, been polarised.

For the views of people on both sides of the debate see Clay Shirky's Ontology is overrated  and the robust response of Peter Merholz  Clay Shirky's viewpoints are overrated

The Librarians Guide to Etiquette has put a humorous spin on this polarisation.  It told librarians that they could induce panic into their library committees by making  the following suggestion:

''What if we eliminated the use of costly Library of Congress Subject headings in favour of patron-initiated tagging and social bookmarking in our catalogue?''

The Penn Tags project at the Library of the University of Pennsylvania (Penn) gives an interesting way out of this divide.   The library is allowing their users to add their own tags to catalogue entries, and thus generate their own reading lists.

The result is that searchers have two new ways into the catalogue:  as well as the classification and search routes offered by the catalogue itself you now can search on:

  • the words that fellow readers have tagged catalogue entries with
  • the reading lists of fellow readers, students or lecturers.

This combination of classification and folksonomy could have powerful applications within organisations.   We do need classification structures in our intranets and record systems.  But folksonomies are good at two things that classifications are not good at, namely:

  • coping with the diversity of language used by different people in the organisation
  • incorporating emerging subjects and new areas of interest

At Penn library the tags that users apply to catalogue entries will have a similar relationship to classification terms in the catalogue as do non-preffered terms to preffered terms in a thesaurus.  They will link the language(s) that people actually use in the University to the necessarily artificial language of the classification.

And when new topics emerge that are of interest to people in the University, these topics will be reflected in the words that students and lecturers use when they tag those resources in the library that are relevant to that topic.

Usability reading

Three books arrived on my desk this morning.  All on the area of web usability and information architecture.


After flicking through them all, I dived straight into Steve Krug’s new second edition of “Don’t Make Me Think”. I found myself nodding and smiling at the common sense advice.  This book is concisely written with often humerous illustration of the key points. Chapters such as “How we really use the web (Scanning, satisficing, and muddling through)” and “The first step in recovery is admitting that the Home page is beyond your control” get straight to the heart of usability and navigation.  Make it easy and make it clear. There is a very useful section of web navigation conventions including use of taglines, tabs, breadcrumbs and the placement on the page of these elements. The illustrations of users “thinking” and “not thinking” really give food for thought as to the “Reservoir of Goodwill”. This shows users attitudes as they travel through the site and what will increase and decrease their good will.


I scanned a chapter of the second book, “Prioritizing Web Usability” by Jakob Nielsen and Hora Loranger. Initial impressions are that there are a LOT of statistics and it seems a little repetitive.


According to the data gathered by the authors, users spend an average of 1 minute and 49 seconds visiting a site with the visit to the final site used to complete a task lasted an average of 3 minutes and 49 seconds. And a site only has a 12 percent chance of being revisited, once lost those visitors are gone for good.


Both authors clearly agree that the key to a successful website is to provide users with the information they need, and FAST.


I’m also looking forward to reading Peter Morville’s book “Ambient Findability” examining how people find their way through an age of information overload, filtering and making sense of information as they go. This one looks a little more theoretical than the other two and I’ll need to find a quiet corner to do it justice.

What can our organisations learn from the web?

The two questions that most interest me in information management are:

  • To what extent can the information management models that we have used succesfully within organisations be transferred across to the vastly larger scales of the world wide web?
  • To what extent can the methods that have been succesfully developed to link people to information on the world wide web be transferred back to the smaller and more intimate scales of a single organisation?

The standard information management model within an organisation has been for information professionals to persuade/cajole/coerce their colleagues to describe and classify the information that they create. 

Information professionals spend a lot of time developing metadata schema,  controlled vocabularies and classification structures of one sort or another, and even more time trying to get people to use them.

Most information professionals would argue that the effort is worth it - it is vital to capture the knowledge that document creators have about the context and content of that information.

It is interesting to note that it has not proved possible to extend this model to the world wide web.

It is true that the world has agreed on a metadata schema: Dublin Core fixes 15 fields to describe the resource you are contributing to the web.   And a further standard, the Resource description framework, builds on Dublin Core by allowing you to identify which controlled vocabulary you are using to provide the keywords for any particular Dublin Core field. 

Many webpage owners provide Dublin Core metadata in the 'header' of their webpage.  But this data is largely ignored by the big search engines, and by people looking for information.

Google makes negligible use of Dublin Core metadata. It does not offer you an interface to search Dublin Core metadata on websites. It makes little or no use of the metadata in its PageRank algorithm. Its not interested in what creators of information say about their information, and with so many unscrupulous website promoters (think of gaming, porn, Viagra sites) who can blame it?

So if the web won't take the models that have (largely) worked in organisations, what can organisations learn from the web?

The two trends most apparent on the web that aren't yet replicated in organisations are:

  • the attention that the web pays to the knowledge that users have of the information that they have found useful  (everything from Google's PageRank to all the social bookmarking tools on the web)
  • the extent to which the web allows people to simply tag resources with words that are meaningful to the tagger, without constraining people with metadata schema and controlled vocabularies (see Technorati tags which lets bloggers tag their blogposts, Flickr which lets photographers tag their photos)

Here is a link to a case study detailing how IBM implemented a social bookmarking tool in their organisation, and the benefits it has brought in terms of:

  • alerting people to the existence of colleagues with similar interests to themselves
  • alerting people to the existence of information resources that those colleagues with similar interests to themselves have found useful.

Social bookmarking is by no means an alternative to classifications and controlled vocabularies.  When you need to get your hands on all the records arising from a particular project the social bookmarking tool won't help you much.   But it provides something that a classification could not offer: personal recommendations to particular documents or document collections.

Google?

When discussing information architecture, taxonomies and finding things on websites with clients, inevitably the question comes up: “Why can’t we just have Google?”.  It seems like a fair question. After all, I use Google several times or more a day to find stuff.

So Gerry McGovern’s article,“Do you really need search on your website?” seems a little radical but this comment rang true:

"Having a search engine can also be used as an excuse not to organize content professionally. It can be seen as a magic wand that gets people to where they want to go no matter how badly organized the content is. Of course, that's not the case.”

A project to create a taxonomy or navigation structure for a website can be fraught with misunderstandings, politics or plain lack of interest. But Google isn’t going to do it for us.

Using a folksonomy to find out about folksonomy

I am researching a training course on folksonomy and its application inside organisations. 

The best source of material has been Del.icio.us -  which is itself the best example of a folksonomy on the web.

The Del.icio.us tag/folksonomy page shows every website/article/blogpost that anyone has seen, liked, and tagged with the word folksonomy so that they can go back to it at a later date.

By changing the word after tag I can use Del.icio.us to find material on taxonomy, cricket, records management, the future, or anything else that takes my interest.

Del.cio.us lacks Google's universal coverage (it only knows about the things its users have alerted it to)  but it beats Google for variety, novelty, currency and serendipity.

The Del.icio.us tag/folksonomy page brings up new references hourly.  Other pages will bring up new references by the minute or by the day depending only on how many other users are interested in that topic and use that word to tag. 

In contrast a Google search on folksonomy would bring back results that change little from day to day:  because the rankings (based on the extent to which a site is linked to) take a long time to change.

Del.icio.us is quicker to reflect new material than Google.  Only one person has to tag a post a site  with the word 'folksonomy' and I see it on the tag/folksonomy page.  It appears at the top of the page which, like a blog, is in reverse chronological order by time of posting. 

The serendipity effect comes in when I find a web resource that is useful.  I can then find out:

  • what else has the person who tagged the resource bookmarked?
  • who else has bookmarked the resource that I found useful (and what other resources have they bookmarked)?

For the record the best overview of of the topic that I found was Philosophies of folksonomies  saved by Juerg Hagmann.  This has serendipitously led me to Juerg's blog about records management.