Database cleaning

The slightly pedantic side of my nature quite enjoys the idea of cleaning up the data in the catalogue at MPOW.  I didn’t even really mind the process of doing it – after all, there are only about 1200 records and I listened to an audio book for most of it.  It took up the better part of 2 days and I think it’s pretty much done.

The data was pretty messy and I found myself getting really frustrated during the process.  I know people who laugh at my insistence that things be spelled correctly, appear in the right order, or in the right ‘box’.  While I don’t always practice what I preach, I’m usually pretty good – the reference lists in my assignments were always perfect Harvard format for example.  Having spent lots of time this year reading blogs and tweets from fellow inhabitants of libraryland I know this is not unique to me.  As someone said at the ALIA conference dinner this year – surely only librarians can have a conversation about cataloguing and find it fascinating!

Actually, cataloguing drives me crazy.  I get it, I like the ferreting around to find the right spot to put stuff but all those commas and colons and other grammatical bits and pieces (particularly in the item description) give me a cold sweat.  Here at MPOW I usually ‘catalogue’ by copying and pasting the record from Trove, but every now and then I get a book in that isn’t in Trove and I have to do it from scratch.  Takes me forever.

I digress. Back to the messy data.  The DBText database was built from zero very quickly by a part time employee (my predecessor).  The criteria was obviously to get the data onto the system as quickly as possible, but as often happens with this approach it’s actually taken more time to subsequently fix it up than it would have to get it right in the first place.  There are many things in our collection that are duplicated so the quickest way to get stuff on the system is to enter it once and then duplicate the record until you have enough.  Great. As long as the information in the FIRST record is correct. And doesn’t need fixing. Because now all the duplicates are also wrong and they have to be fixed. Individually. Record by record (batch fixing didn’t work with the method I was using for working through the database, I already thought of that).

Just saying.

image: Database by Nanaki via flickr

One response

  1. […] This post was mentioned on Twitter by Clare McKenzie, Clare McKenzie. Clare McKenzie said: Database cleaning: […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: