Graeme 的个人资料Books and Libraries照片日志列表 工具 帮助

日志


5月24日

Gee whiz! Library visits are up

The American Library Association has started a new web site for the "public" called I Love Libraries. I believe the site is still under development, so for the moment it would be unfair to criticize it.  For example, the site doesn't appear to have an RSS feed.  But something else caught my attention.

A long, long time ago, when I was quite young, I read a great book called How to Lie With Statistics, by Darrell Huff.   One of the enduring things I've retained from the book is the idea of a "gee whiz graph".  I don't have to bother generating an example, since the ALA has provided a couple of them on this page (in the "Conclusions" section at the bottom).

The basic idea is that if you have a series of numbers that go from, say, 152 to 165, you can show them on a graph which has a scale from 0 to 200, and you'll see that the values change a bit, but not dramatically.  On the other hand, if you scale the graph from 150 to 170, the values will visually appear to leap from 2 to 15, an apparent dramatic increase of 650%.  And that's just what the ALA has done with the number of annual visits to libraries.  Library visits increased from 1.24 billion in 2002 to 1.38 billion in 2006.  This turns out to be a rate of increase of a smidgen more than 2.7% per year.  In order to make this increase look more dramatic, the ALA has scaled the graph from 1.15 billion, giving an apparent increase from 0.09 (1.24 -1.15) in 2002 to 0.023 (1.38 -1.15) in 2006, a visual increase of more than 26% a year, apparently more than doubling in four years.  What makes this completely inexcusable is that the graph is small and blurry, so it's quite hard to see what's really going on.

The other graph, showing the percent of adults with library cards, is even worse.  The scale is 60% - 63% - 65% - 68% - 70%, evenly-spaced!

The ALA has clearly demonstrated two ideas that are important to its mission:  you can learn a lot from a book, even one that's 55 years old, and you shouldn't take everything on the Internet at face value.

4月29日

DRM-ridden proprietary databases

Like many other libraries, the local library subscribes to proprietary information services, which are confusingly referred to as "databases". These services are protected by DRM — digital rights management.  DRM hurts normal users without affecting pirates.  It's just a bad idea.  But the companies that provide DRM-protected information services to libraries have brought bad to a whole new level.

I wanted to get an online copy of Martha Yee's article in the April issue of Library Resources & Technical Services.  It's useless to check the library catalog because the catalog doesn't cover "databases", which is where the online journals are hidden.  It's also useless to look up the journal in WorldCat because WorldCat incorrectly implies that the local library doesn't have an online copy.  This is just a consequence of the fact that "databases" live in an alternate world to you and me.

If you were really familiar with our library's "databases", you'd know that the library has something called an "Electronic Journal Finder".  It finds Library Resources & Technical Services journal in Gale's Academic One File all right, but what it gives you is this URL:

http://qy6hy9uz5b.search.serialssolutions.com.wlmproxy.minlib.net/log?L=QY6HY9UZ5B&D=IAO&J=LIBRRESTE&U=http%3A%2F%2Fwlmproxy.minlib.net%2Flogin%3Furl%3Dhttp%3A%2F%2Finfotrac.galegroup.com%2Fitw%2Finfomark%2F1%2F1%2F1%2Fpurl%3Drc18%255fAONE%255F0%255F%255Fjn%2B%2522Library%2BResources%2B%2526%2BTechnical%2BServices%2522%3Fsw_aep%3Dwal

Sometimes this horrible thing works, and sometimes it doesn't.  The reason I'm getting this nonsense is that everything about Academic One File is wrapped in DRM.  As I said, this hurts users more than it slows down pirates.

You might have guessed that Academic One File has the journal anyway, except that the description of the "database" says:

Academic OneFile provides access to full-text articles from peer-reviewed English language journals from around the globe. Academic OneFile covers the physical sciences, technology, medicine, social sciences, the arts, theology, literature, and countless other subjects. Users can search both full text articles and abstracts. Click here for a list of available journal title.

This doesn't give you much guidance about whether a library sciences journal would be covered, but look, you can click on the link to get a list of the journals that Academic One File covers.  Well, if you click on the link, it will take almost a minute to display the list of journals because it's all on one page.  Not searchable, not indexed, on a single page.

But I'm not that organized.  I just checked Academic One File first because it's the first "database" in the list.  If you're already logged in to the catalog with your user-unfriendly bar code and PIN, you still have to log in again to Academic One File because of that alternate reality thing.  Once you're logged in, the search interface is sort of clunky, because the default search doesn't really handle names of journals.  In our reality, you could do a "Title Search", but here you switch to "Publication Search", and then you'll find Library Resources & Technical Services, where you can click through to the April issue and the article I was looking for.

I wanted to download the article, and indeed the article has a download link.  I don't know whether you ever have this feeling interacting with online services, but when I clicked the download link I knew it wasn't going to work.  I just didn't know how.

A short digression:  when you look at a single web page, that single page is normally made up of several different files, each of which has to be downloaded into your browser.  Some of the files are obvious, like the images on the page, and some are programming that affect the look of the page.  In order to keep all these files organized, it's common for a web page to have a BASE tag, which indicates where to start looking for the files.  One alternative to using a BASE tag is to give the files on the page a complete web address (i.e., URL).

The article I wanted was downloaded as a web page.  When I opened it in my browser the text was the whole width of the page, which makes it very hard to read.  I took a look at the page's HTML source using TextPad and the problem was obvious.  Some of the files the page needed weren't referenced using a complete URL, and the page as a whole had no BASE tag.  This is a bug, but I figured perhaps I could hack around it by looking at the HTML of the online copy.

Well, the HTML source of the online version of the article was enlightening, but not exactly in the way I expected:  the files that define the style of the page (specifically the CSS files) are protected by the same DRM that is used to protect the content.  Let me say that again:  they've protected the file which specifies the width of the page using the same unhappy mechanism they use to protect their intellectual content.  Yuck!


4月25日

Awesome UI prototype from U Michigan LIS students

This year, the School of Information at the University of Michigan held a Library 2.0 student design competition.  The winner was Team Awesome, and their entry really is awesome.

Of course the animation is neat, with blocks opening and closing or appearing and disappearing, but I particularly liked the way tag selection was handled.  Clicking on a tag selects and highlights it; clicking on it again deselects it and removes the highlighting, but doesn't delete the tag from the page.  That makes it easy to quickly try different combinations of tags.  The same idea could be applied to search terms.

When I saw the way the student prototype handled tag selection, a light bulb went off in my head and I said to myself, "Yep, that's the way it should work".

4月23日

Data quality in catalogs

OCLC just issued a report on "Online Catalogs:  What Users and Librarians Want".  One of the things it discusses is the idea of data quality.  The report as a whole, and data quality in particular, triggered a very interesting discussion on the Next Generation Catalogs for Libraries mailing list, which you can probably find here under the title of the report, "Online Catalogs:  What Users and Librarians Want".

I happen to have a particular view of quality from having hung out with ISO 9000 and TS 16949 people for a few years, but that's not the topic of this post.  Interestingly, OCLC is ISO 9001 registered (i.e., compliant).

In my previous post, referring to Kalpa imperial : the greatest empire that never was, I said:

It's a shame that only eight libraries in Massachusetts own the book, and not any of the forty or so libraries in the library network I use.

That statement was based on the listing in WorldCat, but it turns out not to be true – the network has four copies – which raises the question of why the network's holdings didn't show up in WorldCat.

WorldCat's control number, 52743026, is shown in the URL for its entry.  The library network will display its MARC record for the book, and you can see in the 001 field the same control number as WorldCat.  With a little bit of digging, I verified that the 003 field identifies the control number as belonging to OCLC.  So the records in the two databases have the same control number, or to put it in database terms, the same primary key.

If you wanted to improve data quality, getting these two databases in synch would be a good place to start.

4月20日

Privacy Policy

How come my local library doesn't have a privacy policy?

4月19日

Experiments in Library Funding

While I vacillate between reading books on paper and on a screen, I've definitely passed through the phase of owning and accumulating books in favor of using the public library.  It doesn't help that we recently moved to a smaller place with less space for books, but within walking distance of our town's fine library.

So I find myself reading more and paying less, which somehow doesn't seem right.  What I'd is a way of getting money to authors, and books into libraries.  Without actually making a list, I'm sure there are at least four or five authors to which I'd happily give $50 a year to ensure that their books made it into our local library.  It would certainly be worth that in the case of short stories, which seem a lot harder to find and keep track of.  What's lacking is a mechanism to make that happen.

Many authors have web sites, and some of them have PayPal donate buttons, which is certainly a way of getting money to authors, but unfortunately it cuts out the middleman.  Publishers need to get paid, and even if you dream of a different business model, editors need to get paid, too.

I suppose the obvious way of doing this is to buy the books and give them to the library, but I'm not sure the library is really set up to do this.

I'd like to know something about library funding, both in order to support the local library personally, and in order to understand how the community can best support it.  But the library is a black box.  (Although it's a black box containing a lot of books :-).  The standard technique for analyzing a black box is to perturb the inputs and watch what happens at the outputs, so that's more or less what I've done.

I've tried several things, including giving money, giving a book, offering to pay for a book, and suggesting a book without offering money.  The two things that appear to work are giving money and suggesting a book without offering money.  I deduce that the money pathway is completely separate from the book pathway.  That's not necessarily unexpected.  For example, you can't go into a soup kitchen and pay for the meal.

Some of these thoughts were triggered by Nancy Dowd's comments on a program at the Dallas Public Library where patrons could check out a selection of popular titles for $5 each.  Nancy Dowd thinks a premium service would help to fund the 'standard' service, but I'm not so sure.  In the comments, Emily Lloyd says:

I think I'd rather ring a bell in front of the library for donations than offer two levels of service, one paid and one unpaid.

And I think I agree with her.  Emily mentions that Hennepin County has a best-seller program which charges $3 for ten days.  Our local library has a small scale program which charges $1 per week, which is a level I'm pretty comfortable with.

Alison Circle gives the argument against premium services in a column at LibraryJournal.com, where she suggests that Dallas Public Library has jumped the shark.  I'm not sure that term is entirely fair, but who am I to complain about hyperbole?  Her point is that being free is an essential part of being a free library, and if some materials aren't free, you don't have a free library anymore.  She's uncomfortable with the possibility that people with more money will get better service than people with less money.  That's something that makes me intensely uncomfortable.

She who must be obeyed belongs to a book club, and occasionally wants to borrow a popular title with a backlog of hold requests.  What she wants to do in that case is to buy the book, read it and give it to the library.  This makes a bit more sense than the sort of book I'd like to push on the library, like books on queer/SF theory.

Suppose the library had a policy that it would accept as a donation any book published within (say) the last six months that has (say) at least ten holds.  In principle, this is pretty easy.  You can check the publication date and number of holds on a book in the online catalog, buy it from your local independent bookseller, and drop it off at the library, using whatever tagline the library has given the program ("buy a best-seller"?).

What I don't know is how much it costs the library to shelve a donated book.  A book has to get a plastic cover, and the sticker with the call number on it, and it has to be cataloged.  If you're talking about the twenty-first copy of a best-seller, it'll be weeded in the first year, and that takes staff time, too.  If someone has made a decision that the library should own exactly twenty copies of the latest Dan Brown, it's not clear to me that getting the twenty-first copy for free is worth what it costs.

So I'm back to my where I think I started, separately giving money and suggesting books.

4月12日

Queueing of requests

A couple of days ago, I said:

But that's not the lesson I draw from Netflix. What I see is that an important part of the service is that Netflix manages a user's queue, and you only get a new video when you return one. ... People don't want their entire queue delivered as soon as possible.

but there's no way (that I know of, at least) to do the same thing with my request list at the library. Yesterday I showed up at the library to find that three of my requests had arrived at once, in addition to another the day before.  In the twenty-first century, this doesn't make any sense!

On the other hand, I'm delighted to have all these interesting books to work through, so it's not all bad.  The first book I read was Walter Jon Williams's This is not a Game, which was fabulous and I finished it in about a day and a half.  I'll see if I can get a short review written.

The next book I read was Seth Godin's Tribes:  We need you to lead us, which I read overnight (it's only about 150 pages).  I recommend it as something to think about, although it has some flaws.  I'll also try to get a review of this written.

And now I'm working my way through David Weber's latest Honor Harrington book, Storm from the Shadows. all 728 pages of it. I'm leaving Elizabeth Bear's Seven for a Secret, sequel to the wonderful New Amsterdam, until I'm done with my taxes and can give it the attention it deserves.

But I still want a better way to manage my library requests.

4月6日

The new availability

Over at the PLA Blog, Andrew Mangels made some perfectly reasonable comments about just-in-time versus just-in-case development policies. It triggered a bit of a reaction in me, since framing the choice that way seems to be based on the idea that people walk into the local library and look for a book. That's certainly not what I do — I almost invariably check the OPAC from home before I walk to the library — so I posted a comment on the item, which I'll reproduce in its entirety here, since I want to expand on what I was willing to say to the PLA.

I'd like to see individual public libraries take more advantage of the networks they belong to in deciding their acquisition strategy. I probably borrow a couple of science fiction books a month, but I'd have no particular problem if my local public library decided that it was going to focus on romances and murder mysteries, and I had to use ILL for science fiction. The same goes for the non-fiction I read.

What makes this feasible is the fact that the OPAC covers the whole network, so ILL is a click away. Availability used to mean what was sitting on a shelf in the library, but the OPAC+ILL has changed the equation. I can reserve a book in a second or two, and I get an email when it arrives. I can pick it up from the checkout desk in a minute or two. It would actually take more of my time to borrow something from the shelves.

In fact the OPAC has reduced the (relative) availability of books on shelves. If I do a search on, say, hydroponics, I'll get a list of books in the entire network, with no indication of what's available in the local library. If I'm IN the local library, it makes sense to go look, but if I'm at home it's easier to ignore shelf availability, click on the first entry I like, and request it.

The weakness of ILL is that I don't know when the book will arrive. If I'm going on vacation in a week, will it arive by then? You could improve service by letting me know, when I request an item, when I should expect it, even if that wasn't a guarantee.

In Massachusetts, there's an ILL consortium with a web interface called the Virtual Catalog, but the user interface is so bad that I use it rarely and reluctantly. The weakness of the interface has the effect of making all the material less available.

When you're deciding what should be available, you need to decide what you mean by available. You can improve availability by improving the web interface, as well as by putting more books on shelves.

What I didn't say in my comment on the PLA blog is that once you decide you're providing a web service and not just a book service, there's a whole lot more you can do. Occasionally I see librarians wondering online whether they should provide a service like Netflix and deliver books directly to patrons' homes. But that's not the lesson I draw from Netflix. What I see is that an important part of the service is that Netflix manages a user's queue, and you only get a new video when you return one. If you have a queue of thirty-seven videos, the fact that you only get three videos at once and not all thirty seven is a feature. not a bug. People don't want their entire queue delivered as soon as possible.

There are many books that I'd like to read but I'm willing to wait for. Anything by Elmore Leonard, for example. That's information the library could use but has no good way of collecting. As soon as one library in the network orders a book, it's entered into the catalog, and every library can see how many requests the book gets, but there's an obvious "first mover disadvantage" for the first library. Also, patrons can only indicate their interest in a book by requesting it, which more or less means they want to read it immediately. There's no way to put yourself at the end of the queue.

Constrast this with Amazon's problem: Amazon would like to satify its customers as quickly as possible with as little extra stock as possible. Pre-orders provide a way for people to indicate their interest in a way which is exactly aligned with Amazon's problem. If Amazon receives a hundred pre-orders, it needs a hundred copies to satisfy the pre-orders.

A library doesn't need a hundred copies to satisfy a hundred requests. But the number of copies it does need to satisy its patrons depends on how urgently those patrons would like copies, information that the library doesn't have.

I don't want to go into a lot of detail in this already overlong post about what sort of features I think would make sense in a patron request, voting and queue management system. My point here is that more cost-effective collection development won't come from just thinking harder about collection development. It will come from thinking harder about web services, and collecting more information from patrons about what books they want when.

"What we got here is ... failure to communicate."

2月16日

Foxmarks adds suggested tags to Firefox

When you add a bookmark to Firefox, you can add one or more tags to the bookmark. You can find bookmarks by their tag using the bookmark manager, which Firefox 3 has unhelpfully renamed the Library, or you can just type one or more comma-separated tags into the address bar.

Foxmarks is a Firefox add-on which synchronizes your bookmarks, including their tags, between multiple computers.  To do this, there's a Foxmarks server which holds all your bookmarks, but it's normally invisible — Foxmarks isn't intended to be a web application of that sort.  However, the Foxmarks server has access to all of the tags on all of the bookmarks of all Foxmarks' users.

In its latest release, Foxmarks has added the ability to suggest tags when you bookmark a page.  The feature is described here, where they say:

So how does it work? As you may know, Foxmarks manages over half a billion bookmarks every day. We’re now putting this data to work for you by analyzing this giant collection of information to determine the best tags for your bookmark. As always, we are careful to protect your privacy and our algorithms will never expose any personally identifying information.

Which isn't very detailed.  I presume it works by suggesting the most common tags for a given page.  Ensuring privacy is an interesting issue, and not an easy one.  I guess Foxmarks doesn't show tags that don't occur (for the page in question) a certain minimum number of pages.

One of the books I'm reading at the moment is Blown to Bits:  Your Life, Liberty, and Happiness after the Digital Explosion, by Abelson, Ledeen and Lewis.  It's an interesting book, and I'll have more to say about it when I'm done.  On page 34, they talk about how Governor Weld's medical data was extracted from blinded data using a combination of gender, zip code and birth date.  I doubt people tag their bookmarks with their birth date, but zip code is a possibility.  Anyway, it shows how hard it is to really blind data.

I basically don't believe in a bright line dividing controlled vocabularies (as in a library catalog) from uncontrolled vocabularies (such as tags and other folksonomies).  The suggested tags feature in Foxmarks is an interesting experiment in blurring the boundary further, with what amounts to a semi-controlled vocabulary.

2月11日

How much do you give away?

One way public libraries measure themselves is by how much they give away: how many books loaned, how many people helped, how many talks attended.  I think this is such a good idea that it should apply to library web sites as well.

Tim Spalding, Thing One at LibraryThing, recently announced on his blog that their Common Knowledge program had just reached one million data items.  What makes this particularly neat is that all this information is freely available via their API as well as on web pages.

One thing that Tim particularly drew attention to was their data on series.  Elizabeth Bear is clearly courting some sort of bibliographic damnation, since her Promethean Age series switches between "and" and "&":  Blood and Iron; Whiskey & Water; Ink and Steel; Hell and Earth.  This is further complicated by the fact that Ink and Steel and Hell and Earth form a series within the series.  Bear describes them as, "the two halves of a really long novel, which is collectively known as The Stratford Man" (ibid.).  LibraryThing gets this right, since it doesn't limit the amount of series information that users can contribute.  You can get the same information from the API.

Justin Thorp has been commenting on the growth of APIs, what he calls "the decline of the web site", for some time, such as here:

Because of all the great Web platforms and APIs that are being made available, the Web is no longer being constrained by the notion of a Web page. For example, there are many people like Michael Arrington who are using Web applications like Twitter with out ever actually going to the Twitter Web site.

Justin points out that this effectively asks the question, "you mean you want me to encourage people to not use my Web site?", to which his (and my) answer is, "Yep".

And Karen Coombs discusses alternative APIs, concluding with:

Libraries also need to think about building an OpenSearch interface to their collections.

I'd like my local library to discourage use of their web site by giving away information on which books are available (as opposed to out, etc.).  I'd like an API.

2月9日

How good does your web site have to be?

Over at P'unk Avenue Window, Geoff DiMasi is thinking about library web sites.   I believe that people's expectations for a library web site (or any web site) are set by the best web sites they encounter . Here's what Geoff has to say:

I envision a library website that has an Ebay reputation system, a Digg voting component, a room reservation system, a Google Books repository, a WorldCat list and notes feature, Amazon reviews and Facebook profiles.

I believe that whether you're thinking about layout and design, navigation and ease of use, or features and functions, when people come to your web site they'll instinctively compare your site to the best sites they've seen.  Geoff wants the library web site to be as good as Ebay, Digg, Google Books, WorldCat, Amazon and Facebook combined.

The comments on Geoff's post are also very interesting. Laura from the Free Library of Philadelphia has this to say:

Libraries have long developed their own ways of doing things that work well for Librarians but when exposed to the larger culture are quite limited. The Web has only accelerated that process.

One of "their own ways of doing things" on library web sites that drives me nutty is jargon. If I'm looking for something, should I look in a "catalog" or a "database"?  The Free Library of Philadelphia has made a good start, dividing its front page into three area for Find, Explore and Ask.  I noticed this same layout recently at Harvard College Libraries, whose choice of terms is "Research", "Request Forms", "Instruction Resources" and "General Info".  This is pretty good but not as good as FLP's.  "Request Forms" in particular has three different meanings:  "request a form", "forms for requests" and "a request is forming".  But FLP still requires the user to be able to predict in advance whether what they want will be found in the catalog or the databases. At least they make the dilemma obvious.

LibraryThing allows users to organize lists of books. One of the problems they have to solve is to distinguish different authors with the same name.  Over at Libology, Rick Mason says:

I like that LibraryThing has found a simple, elegant solution that matches what people think and say when distinguishing between two authors with the same name.

LibraryThing's proposed solution is to identify both Steve Martins as "Steve Martin", and let users identify the correct author of a book by looking at the other books each Steve has written.  This explanation might sound silly, but it's just what IMDB does with, for example, the nine different Paul Newmans.  And as Rick says, it has the advantage that it matches what people think and say.  That's what makes an effective web site.

2月5日

Is the catalog keeping up?

Things are changing fast!

Here's a post from Tame The Web with a short video that shows just how fast: Right Here, Right Now: Ready for the Unexpected/Future.  You're probably aware of much of this material, but it's very illuminating to see it all in a five-minute video.  The things that are changing fast aren't just gadgets — they're the tools that people use, so people are changing, too.  And kids are changing faster than adults.  Here's a quick check: Does your phone have a keyboard?  All three of my kids have phones with keyboards.

Over at The Unquiet Librarian, Buffy Hamilton posts her presentation from the Georgia Council of Teachers of English 2009 Annual Conference: "YA Lit 2.0: How YA Authors and Publishers Are Using Web 2.0 Tools to Reach Teen Readers".  In the supporting material on her blog, Hamilton gives a dozen or more YA authors who use Twitter or Facebook.  I've been struck by how fast Facebook has been expanding into the general population over the past six months, including my colleagues from a certain religious organization playing Mafia on Facebook.  Here's another quick example of how fast things are changing: Hamilton uses four Web 2.0 tools in a single post: WordPress, SlideShare, WikiSpaces and Kwout.

It makes sense that libraries should stay well behind the crumbling edge of web startups, but if you're not moving ahead at the same speed as the crumbling edge, you're falling behind, and if you're not moving ahead at the same speed as your patrons, you're falling behind.

One of my New Year's resolutions is to be more constructive, specifically about libraries and the web.  But not yet, God.  Recently, I noticed that some of the entries in our local library network catalog included a link to more information about the author.

I'm sorry that I can't let this pass, but the link is labeled "Contributor biographical information".  If you were looking at this page and wanted more information about John Scalzi, would you click on "Contributor biographical information" or "Scalzi, John, 1969-"?  Don't use jargon!  Scalzi, by the way, has been given a bibliographic distinction befitting his stature as a popular author: some of his books are entered under "Scalzi, John" and some under "Scalzi, John, 1969-".

So the user interface could be better, but there are other shortcomings.  The link pulls information from the Library of Congress, which uses information provided by the book's publisher.  By the publisher of that book.  Now Scalzi has a blog which happens to be wildly popular, as author blogs go.  Some of Scalzi's books have no "contributor biographical information" at all, some have one which mentions the blog, and some have one which does not.  I didn't go through every book, but the ones I checked didn't mention Scalzi's Twitter account, or his forum.

I also checked Elizabeth Bear (because I'm a Bear fanboy), and most of the entries don't have links, and the ones that do don't mention her web site, her blog on Live Journal or her Twitter account.

The good news is that the catalog is adding new features and trying new things. This is a good thing, and if I had any pull with the folks who run the catalog, I'd do what I could to encourage it.  It would be great if the catalog had a blog where these things were announced and where people could offer comments (and encouragement!).

I don't want the better to be the enemy of the good, but I also don't want the good to be the enemy of the better, either.  To be really useful to people who want more information about their favorite author, the catalog needs links to Wikipedia, to blogs, RSS feeds,, Twitter, Facebook and on and on.  The best thing to follow a good first step is another step.  And another step.  And another step!  FASTER!

1月28日

"Designing the Digital Experience"

This post is less than a review of Designing the Digital Experience: how to use experience design tools and techniques to build Websites customers love, by David Lee King.  The book is divided into three parts: Structural Focus, Community Focus and Customer Focus.

Early on, in Chapter 3, King focuses on the process of designing and building a web site.  Not what the web site will look like when you get there, but the process of getting there.  He discusses three process models:  Jesse James Garrett's User Experience model, David Armano's Experience Map model, and 37signals' Getting Real model. This is a good place to start.  I guess any moderately organized person would start by thinking of the goals they wanted a web site to achieve, but a process model helps to avoid overrunning the goals and requirements in your haste to get to implementation.

On the subject of usability, King points the reader to Steve Krug's Don't Make Me Think: A Common Sense Approach to Web Usability.  Even without reading the book, I think this is a great principle.  Not thinking has many of the same benefits as laziness, the first of Larry Wall's virtues of a programmer.  Libraries should design their web sites to be used by lazy, unthinking patrons.  No, really.

The weakest section of the book is chapter 6, "Emerging Tools for the Digital Community".  For example, King spends less than a page on wikis, and makes the same mistake that I've noted previously: conflating the authors of a wiki with the readers.  King discusses Flickr, but provides no guidance on whether a library should use Flickr or host its own photo gallery.

For me, the key point about customer interaction was made by Jan Carlzon, whom King quotes on page 133:

Each point or interaction with your organization can be referred to as a "moment of truth", a concept first introduced by Jan Carlzon, the former president of Scandinavian Airlines, in his 1986 book entitled Moments of Truth.  Carlzon defines the moment of truth in business as:  "Anytime a customer comes into contact with any aspect of a business, how ever remote, is an opportunity to form an impression."


Carlzon was making a point that maybe wasn't obvious in 1986: everything matters.  When someone even walks past an airline counter, they'll notice, perhaps only subconsciously, how long the lines are, how cheerful the customers and agents are, and how clean and new the counter looks.  In 2009, Carlzon's challenge also applies to an organization's web site.

King also quotes Tom Kelley of IDEO (page 132):

According to Kelley, your car-buying experience (and also your experience interacting with the car company) begins before you walk onto the lot and continues after you leave with your purchase.


Combining these ideas, it's logical to conclude that a person starts forming an impression of your web site before they interact with it.

This is a short book (182 well-written pages), and it seems thin in some areas, but for me the whole is greater than the sum of the parts.  Perhaps it's an unnecessarily mischievous thought, but occasionally it seems to me as though librarians write books as annotated bibliographies, as directions onward rather than as containers of content.  But building an effective web site touches so many disciplines that this may well be the most sensible approach.

1月26日

Taxonomy versus Folksonomy

Over at the official PLA blog, Lorraine Squires discusses whether tags should be controlled (taxonomy) or uncontrolled (folksonomy).  For some reason, this triggered the following not-completely-serious riff on the bibliographic control of user-contributed content.  I literally wrote this post as I was falling asleep last night.  I guess I shouldn't read anything about cataloging just before bedtime.
 
I've slowly become convinced that user-contributed content (UCC), including tags, reviews and lists, should be subject to bibliographic control, but further study is needed to determine the right form.  For example, should (uncontrolled) tags be classified using (controlled) tags or by using the same subject headings as the underlying catalog.  Lorraine apparently sees controlled tagging and uncontrolled tagging as mutually exclusive, but I don't see why they can't co-exist.  In particular, if they are going to co-exist, controlled tags could be used to classify the uncontrolled tags, so that several or many uncontrolled tags would be associated with a single controlled tag.  But that's just the beginning of the issues to be resolved.

For libraries with foreign language populations, which must be most public libraries in the United States, foreign language tags may be problematic.  One answer is to tag the tag in both its native language and in English, but it may not be so easy to determine the language from the one or two words of a tag.  You would also need to decide whether to use a foreign culture classification, or merely the (English, controlled) tag translated into the foreign language.  Particular care would also need to be taken with disadvantaged language populations, such as the Basques or Kurds.

Since tagging takes some time to reach critical mass within a given library or system, pooling of tags from multiple systems has been suggested.  In this case, the source system would be identified as the publisher in the record for the tag (either controlled or uncontrolled).  Identity control would also be required, both to distinguish different people with the same user id in different systems, and to identify the same person with different ids.

When combining tag sets from different systems, it seems inadvisable to simply conflate tags with the same name.  The right approach might be to use the FRBR model and treat the tag from each system as a separate expression of the same work.  This might be just the push your library needs to move to FRBR.

The OPAC would need to be strengthened to allow the user to proceed from a given (tagged) catalog record to either (i) records with the same uncontrolled tag, or the same controlled tag, (ii) using their own tags, or tags from local users, or tags from any user.  I suspect that the user interface would be simpler than that explanation of the options.

User-contributed reviews seem much easier to handle, since they could be entered into the catalog using whatever ruleset was used for electronic documents, again setting the library or library system itself as the publisher.  Since reviews already move from system to system (e.g., the OPAC for a public library pulls reviews from Amazon), here identity control is urgently required.

User lists are unfortunately more complicated.  Because entries may be copied from list to list, and each entry in a list may have an attached note, which naturally has an author, there seems to be no alternative to cataloging each list entry individually, but then you have to reassemble the list.  One possible direction is to follow FRBR and treat a list as a work consisting of an aggregate of works.  Copying an entry from one list to another would simply be a matter of copying the individual work (the list entry, not the underlying work the entry refers to) to the destination aggregrate and adding an association (i.e., "copied from") between the source and destination works (again, the list entries).  The owner of the destination list could then author a note associated with the new entry.

Naturally, the library could provide its own controlled lists to supplement the uncontrolled lists provided by patrons.

When displaying a list, the OPAC would need to offer the user the option of displaying only their own notes, or the notes belonging to entries the entry was copied from (available by following the "copied from" association backwards), or any notes associated with the underlying work.
 
Happy Australia Day!
1月25日

How do you compete with free?

I've been a happy user of Library Elf for almost two years.  It's a web-based service which monitors your library account and sends you email reminders when a requested book arrives, or a book is coming due, or becomes overdue.  That's something that the local library also does.  So Library Elf is competing with free.

The library will send you a reminder two days before a book is due.  Library Elf will send you a reminder when a book is due, or up to seven days in advance, or on a particular day of the week, or every day.  The library will send you a reminder when a book is fourteen days overdue.  Library Elf will send you an overdue notice immediately, and will optionally keep doing it every day.  The library will send you a reminder when a requested book has arrived.  Library Elf will send you one reminder, or one every day.

If you have two books due, the library will send you two emails.  Library Elf will send you one.  In fact, Library Elf rolls up all your reminders into one email, so that when you're going to the library to pick up a request, you can see whether you have books to return.  The email from Library Elf includes a calendar, so you can see at a glance whether you can wait until the weekend to pick up or return a book.  You can associate more than one library card with a single Library Elf account, and one email will include the activity from all the library cards.

Your account on the Library Elf web site includes a single page which lists all of your loans and requests, and includes a calendar.  The library catalog lists loans and requests on separate pages, without a calendar.

Starting this month, Library Elf began charging for premium services, the services I've described here.  Some services are still available for free, but not support for multiple cards on one account, something I rely on, so I'm paying Library Elf twenty dollars a year for something the library does for free, just not quite as well.

Or the library could get a library subscription from Library Elf, for about a dollar a year for each patron that uses the service.

Or the library could improve its current email service.
1月23日

"Library 2.0 and Beyond"

This post is not exactly a review of Library 2.0 and Beyond, which is a collection of eleven articles edited by Nancy Courtney.
 
Often, discussions of Library 2.0 collapse into a discussion of which Web 2.0 features can be applied to libraries.  Then, a catalog from the 1990's, or even the 1980's, is taken as a fixed point around which Web 2.0 features can revolve.  Michael Casey's article on "Looking Forward to Catalog 2.0"  is an excellent antidote to this reductionism, and sets the bar for next generation catalogs appropriately high.  I just want to nudge it up a little further.
 
Talking about the catalogs of the past (and probably those of the present as well), Casey says (page 17):
Moving betwen the catalog and library events and services was not possible and this lack of ability was painfully obvious to every user.
Which suggests a thought experiment of sorts.  Suppose an author was coming to give a reading at your library in two weeks.  This event would naturally appear in the calendar, and perhaps be announced in a blog, on your Facebook page, or on Twitter.  But what happens if a patron searches for the author in the catalog?  I suggest an X Prize for the first catalog to automatically display relevant information from the library's calendar.
 
It's important to be clear in evaluating what a patron sees in response to a catalog search.  One error to be avoided is to suppose that the data to be displayed is limited to the results of the search.  As you can see from the example of the author event, there's really no limit to what might be helpful to the user.  Another is that the effectiveness of the display is determined by the data presented.  Casey says (page 20):
Design is almost as important as results because if it's not easy to use then no matter how powerful the search it will go unused.
with which I completely agree, except for the "almost".  Casey also has a great description (page 21) of how customizable RSS feeds can be used to deliver catalog updates in particular subject areas to interested patrons.
 
I'm going to look in some detail at Chad Boeninger's article, "The Wonderful World of Wikis:  Applications for Libraries", because wikis are something I've been thinking about, and trying out, recently.  Boeninger makes a very telling observation.  He has entered information about the Regional Encyclopedia of Business & Management into a wiki (page 30):
The catalog record gives the subject headings (business - encyclopedias and management, and management - encyclopedias), but that is a little too broad for this resource.  By adding information from the table of contents, as well as thoughts about how the item can be used, the article in the Biz Wiki can perhaps be a little more useful to the business researcher.
Here's my biased gloss on these two sentences:  catalog - less useful; thoughts - more useful.  Now the thoughts are those of a professional librarian and not patrons, but doesn't this feel like the camel's nose, the slippery slope, etc?
 
Well, yes and no.  Here's Boeninger's description of a wiki (page 25):
In its simplest terms, a wiki is basically a website in which the content can be created and edited by a community of users.
It's true that the pages of a wiki can be created and edited by a group of people, but that's not necessarily the same group as the readers of the wiki.  So we're left with the somewhat less impressive fact that a wiki is a web-based tool for one set of people to generate content for another (or possibly the same) set of people.  And it makes perfect sense that professional librarians should be generating content for patrons.
 
Boeninger gives three very strong examples from his own experience (although for some reason not in chronological order) but he doesn't go far enough in presenting other ways in which wikis can be used.  To give an example I've been thinking a little about, a wiki could be used so that several libraries could jointly develop and share homework resources for teens.  Homework resources are the sort of loosely structured content that fits the wiki model pretty well, and libraries in adjacent towns are likely to share much of the same material in common subjects such as American History.  The key point is that the "community of users" who create and edit the wiki doesn't have to be an existing community.  It can be a group created de novo simply by the existence of the wiki.  It's also an example of the truism that the Internet can erase distance.
 
Boeninger says (page 31), "Self-hosted wikis require knowledge of MySQL and PHP, as well as some experience with web server administration."  It happens I've just installed PmWiki on one of my Windows XP systems, and I can tell you exactly how it's done:
 
  • If you don't already have Apache and PHP, download XAMPP and run its standard Windows installer
  • Download PmWiki, unpack and copy it to ..\xampp\htdocs and rename the folder to pmwiki
  • Create ..\xampp\htdocs\pmwiki\index.php (instructions here)
  • In your browser, go to http://localhost/pmwiki

Congratulations, you're running a wiki!  Like several other simple but popular wikis, PmWiki doesn't even use MySQL, so that's not at all relevant.  Now it's true that some knowledge is required — like setting the administrator password (!) in various places — but Boeninger's characterization is a stretch.

Ellysa Kroski has an article on "Folksonomies and User-based Tagging".  I'm a tagging zealot, so perhaps I'm not the fairest reviewer of this material.  While I found the article a useful introduction, it misses one of the advantages of tagging blog posts, and is weak in its defense of tagging.  The omission is simply that if a blogger tags their posts, you can normally subscribe (in RSS) to posts with a particular tag.  This means that a blogger can post on completely unrelated topics (say, cataloging, world peace, and pictures of his cat) and people can choose to read a single topic or everything.  Or a library could post all its news to a single blog, with tags that would allow patrons to filter it if they chose.  Vice versa, tags don't seem to be particularly useful for searching blogs, probably because the tags used for blog categories (like "food" or "writing") are too general for that purpose.
 
(Before I post to this blog, I naturally submit each post to the same rigorous fact-checking as The New Republic.  I discovered that subscribing to a single category of blog posts is not something that's easy unless it's a feature provided by the blog owner.  However, it is possible.)
 
Kroski gives several examples of systems that use tags, such as (pages 92-93) Flickr, Delicious and Technorati, but doesn't have anything to say about the differences between the systems.  Try searching for "cocker spaniel" (no quotes) on each system, and on Google.  Not only are the photos of cocker spaniels cute, but so are the differences between the systems.  There's clearly something more to be said.
 
Here's Kroski on the advantage of controlled vocabularies over tagging (page 98):
A traditional taxonomy, such as the Library of Congress classification system[,] will allow users to locate relevant resources precisely because of the strength of its controlled vocabulary.  However, the user must know that the subject heading is "World War 1939-1945" in order to reap the rewards of this system.
Unfortunately for Kroski's argument, the Internet makes this pretty easy to check.  Let's say we're looking for books on Drupal.  We don't have to commit the LoC subject headings to memory because they're online here, where we find that the right heading is "Drupal (Computer file)".  We can use that term to search the library network catalog, where we find that this catalog uses the heading "Drupal Computer Program Language".  No matter; we've found three books.  Which is great, except a keyword search on "drupal" finds a fourth, which has been classified under other subject headings entirely.  Three out of four is not good.
 
I really don't understand how librarians can continue to talk about "the strength of its controlled vocabulary" with a straight face.
 
The Library of Congress Flickr Pilot Project put several thousand images from the LoC on Flickr where any registered user could add tags or comments.  The result was that "the project has been successful in achieving the objectives and desired outcomes of the Library’s strategic goals" (full report (pdf link), page iv).  And "less than 25 instances of user-generated content were removed as inappropriate" (ibid.), which seems insignificant, except that since all of the tags and comments had to be checked in order to find those twenty five, it seems to have taken one full-time person (summary report (pdf link), page 4).
 
That seems to me to be the most important objection to tagging, and it's something that Kroski doesn't mention.  Unless you devote significant resources to checking tags, someone's going to add dirty words and someone else is going to be upset.
 
Eric Schnell provides a useful introduction to different web services in "Mashups and Web Services".  But not all his examples (e.g., page 70) are mashups, which strictly speaking are applications which combine data from different web sites.  He's also not particularly clear about the range of tools which can be used to create mashups, focusing on developer concepts like REST and SOAP (page 68) to the exclusion of user tools such as Yahoo! Pipes.
 
There's a lot more helpful content in the book than I've discussed here, including introductions to podcasting, online social networks, gaming and digital storytelling.  I can't recommend the book without reservation, but as long as it's not the only thing you read about Web 2.0, Library 2.0, or about the individual topics presented in the book, I enourage you to read it.
11月17日

Patron Cooties

I guess I knew this would come up sooner or later.  Over at The D-Light of Digital Collections, Lyle says:

Strictly speaking, including ratings and tags on bibliographic records violates the Library Bill of Rights interpretation on Labels and Rating Systems.

Now there's definitely a problem here, even if the LBoRioLaRS is left breathless by the attempt to describe it. 
LBoRioLaRS attempts to draw a bright line where there isn't one.  For example:

Is it appropriate to add movie, game or music ratings to the bibliographic record?  No. These rating systems are devised by private groups using subjective and changing criteria to advise people of suitability or content of materials. It is the library's responsibility to prevent the imposition or endorsement of private rating systems.

Movie ratings are useful even if they're not perfect.  And I'm highly amused by the phrase "private groups", as though a government agency could do a better job.  Or librarians!  I don't really think we need a film ratings board staffed by librarians.  But apparently when publishers inscribe a book's subject headings on stone tablets and ship them off for permanent storage in the basement of the Library of Congress, the taint of private enterprise is burnt off by the intensity of their moral purity.

LBoRioLaRS also objects to labeling books as Christian even though "Christianity" is an Authorized Library of Congress Subject Heading.  I don't get it.

On the other hand, I can see that kids are a real problem.  Adding age classifications to kids' books is more than just information, and I can see the possibility that some kids would be bothered or discouraged by them. 
LBoRioLaRS is on much more solid ground when worrying about kids.  My default approach to this and similar problems, like dirty words, is just this:  opt-out for adults and opt-in for children.  That is, if adults don't want to see tags, they should be able to turn them off.  And if a child is mature enough to handle tags, one of their parents should be able to turn them on.

This is just the tip of the iceberg.  What if a parent wants their child to be able to see reading age tags but not every single tag in the catalog?  Do librarians need to classify or otherwise limit tags?  Would it even be legal for a public library to do so?  A sensible librarian might conclude that any solution is going to burn so much staff time that it's not worth it.

So ... perhaps no tags, for good reasons.  But there's one very bad reason for not implementing tags:  patron cooties.  This is the idea that if you gave patrons access to "the bibliographic record" they'd just mess it all up.  They'd make mistakes; they'd make up new subject headings (like, "Nineteenth Century -- Stuck in the"); they'd add notes to their girlfriend or pictures of their cat.

Even supposing for the moment that such a thing as "the bibliographic record" exists, there's no reason to suppose that patron-contributed information would be stored anywhere near it.  Go look at any book on Amazon!  You can see that any reviews or tags attached to the book are quite separate from the bibliographic information supplied by the publisher.  Who's going to be confused?

And over at hangingtogether.org, John MacColl has this to say about an image on Flickr:

The contributor’s metadata is full enough for an image like this one, and many of the others here, to be appraised by curators and archivists for addition to professionally assembled collections and exhibitions such as those discussed above.

MacColl has nothing but praise for the images on Flickr and their descriptions, apparently with a small exception:  as good as they are, however good they are, they need only one thing to be perfect:  to be appraised by a professional.  But Flickr already is a collection and exhibition.

How did we slide from the idea that catalogs (and librarians, for that matter) have unique value, to thinking that folksonomies are just wannabe catalogs?  Surely both have unique value.

In the current issue of Against The Grain (v.20 #5, November 2008, not online as of this posting), Rick Anderson talks about how few information professionals there are compared with the amount of information that's generated every year.  He states (p 52) that five billion gigabytes of information were created in 2002, and that on a liberal estimate of the number of number of information professionals in the world, each of them would need to categorize a Library of Congress every year in order to keep up.  Anderson uses the figures to make a slightly different point, but my conclusion is that attempting to catalog all the tags in the world is impossible as well as pointless.

11月12日

Public Relations 2.0

Oh dear!  TLC (The Library Company) has just announced a new ILS, LS2, and a new OPAC, LS2 PAC.  But here's how they describe it:

    LS2 is a social-centric search experience.
    It exploits the best of the Web to enable
    points of interaction between users, enriching
    their experience of your collection.

While I might be willing to have my experience of your collection enriched, I'm pretty sure my wife would object if anyone enabled points of interaction with me.

    Engage your users. Promote your library.
    Make your users do a double-take and
    then invite them to settle in and hang out.

I hope you'll forgive me for saying so, but when I use your OPAC, I want to be done and out of there as soon as possible.

    Create a place to go and be (and search, find, and see).

No comment!  OK, one comment:  it doesn't even scan.

The silly thing about this is that the feature list is very impressive:
  • Tagging
  • List Sharing
  • User Reviews
  • RSS result feeds
  • RSS indexing from popular or local Web sites
  • Predictive Results, which completes search terms as you type
  • Faceted Results
  • Integrated searching of subscription databases
  • Item Mapping, which shows the item location on a map of your library.
  • Genre Browsing
  • and Book River, an eye-catching representation of book jackets in the library's collection
Wow!  That's quite a list.  I want one!
11月10日

Metadata Quality

A little while ago, I received a Google Alert email, which "alerted" me to an interesting post at A Librarian's Life, and thence to K-State Libraries conference reports blog, where I found a report on Karen Calhoun's recent presentation to KU.  I did a Google search on the title, "Our space: the new world of metadata", and found the presentation slides on SlideShare.  This presentation actually comes from a different time and place, but it has the same title, and not being a cataloger, that's good enough for me.
 
There's a lot of good stuff in the presentation, but there was one sentence that caught me eye (slide 59):  "With respect to metadata quality, it is likely that librarians' and end users' definitions differ."  For my sins, I spent a few years selling manufacturing quality management software, and I've been thinking about how the notion of quality developed for that problem domain might be applied to libraries in general, and metadata in particular.  And I tend to over-react whenever I see the word 'quality'.
 
My first reaction was that the definition of quality BELONGS to users and librarians don't get a vote, thank you very much.  But that's not right.
 
Manufacturers have been thnking about quality management for more than fifty years, first in the auto industry but now throughout manufacturing.  Whatever your opinion of the final result, quality management is a well-understood discipline.  When widget manufacturers talk about quality, they mean that the process and product meet a succession of requirements from the design of the widget to its eventual use.  The production process should be safe and not generate unnecessary scrap.  The dimensions of the widget should fall within specified tolerances, and the widget should work in the hands of the eventual user.  There's a product designer that specifies the widget itself, and often a separate process engineer who specifies the process to be used in manufacturing, packaging and delivery.  Given those specifications, each person that touches the widget is indirectly responsible for the quality requirements of everyone after them in the production chain, through to the eventual user.
 
Now the metadata in a library catalog doesn't just sit there either.  For example, in a discovery application a search query uses the metadata to select the results of the query.  The metadata of the result set is passed to the application, formatted by the user interface and delivered to some user device.  Each step in this process requires a focus on quality.  In fact, quality is a property of the process, not (just) the end product delivered to the end-user.
 
While software is not the same as manufacturing, there are a couple of ideas from manufacturing quality that can be applied to metadata.  Widget quality is almost invariably checked at the end of the manufacturing process.  Inspired by Brian Herzog's Work Like a Patron Day, my suggestion for catalogers is that once they've added an entry to the catalog, they try to find it using the patron interface to the catalog.

For widgets, use by the end user is the last point at which quality can be observed and reported.  For metadata, that point is the patron's screen.  But this screen is more complicated than a single response to a single query.  For example, it may have links to, or lists of, other books by the same author, or other books on the same subject or with the same tags.  Or reviews or ratings, and on and on.  It's true but not relevant that the cataloger has no particular control over how the the application extracts data from the catalog, interprets it and formats it for display.  The cataloger and the metadata s/he enters is logically the first step in the flow from catalog to application to user, but the application is fixed by the time metadata is entered, and the cataloger is chronologically the last step before the patron.
 
Manufacturing quality also includes a robust process for handling returns -- errors that are discovered after the product has left the factory.  What happens if there's an error in the catalog?  Does it makes sense for users to report errors?  Coincidentally, Google Books includes just such a mechanism, with a commitment to continuous improvement of their metadata.
 
Today I came across a book in the catalog with no ISBN, something I would have bet wasn't possible.  What's to be done?