docs/admin_initial_setup/migrating_your_data.adoc

   1 Migrating from a legacy system
   2 ==============================
   3
   4 Introduction
   5 ------------
   6
   7 When you migrate to Evergreen, you generally want to migrate the bibliographic
   8 records and copy information that existed in your previous library system. For
   9 anything more than a few thousand records, you should import the data directly
  10 into the database rather than use the tools in the staff client. While the data
  11 that you can extract from your legacy system varies widely, this section
  12 assumes that you or members of your team have the ability to write scripts and
  13 are comfortable working with SQL to manipulate data within PostgreSQL. If so,
  14 then the following section will guide you towards a method of generating common
  15 data formats so that you can then load the data into the database in bulk.
  16
  17 Making electronic resources visible in the catalog
  18 --------------------------------------------------
  19 Electronic resources generally do not have any call number or copy information
  20 associated with them, and Evergreen enables you to easily make bibliographic
  21 records visible in the public catalog within sections of the organizational
  22 unit hierarchy. For example, you can make a set of bibliographic records
  23 visible only to specific branches that have purchased licenses for the
  24 corresponding resources, or you can make records representing publicly
  25 available electronic resources visible to the entire consortium.
  26
  27 Therefore, to make a record visible in the public catalog, modify the records
  28 using your preferred MARC editing approach to ensure the 856 field contains the
  29 following information before loading records for electronic resources into
  30 Evergreen:
  31
  32 .856 field for electronic resources: indicators and subfields
  33 [width="100%",options="header"]
  34 |=============================================================================
  35 |Attribute   | Value | Note
  36 |Indicator 1 |4      |
  37 |Indicator 2 |0 or 1 |
  38 |Subfield u  |URL for the electronic resource |
  39 |Subfield y  |Text content of the link |
  40 |Subfield z  |Public note | Normally displayed after the link
  41 |Subfield 9  |Organizational unit short name | The record will be visible when
  42   a search is performed specifying this organizational unit or one of its
  43   children. You can repeat this subfield as many times as you need.
  44 |=============================================================================
  45
  46 Once your electronic resource bibliographic records have the required
  47 indicators and subfields for each 856 field in the record, you can proceed to
  48 load the records using either the command-line bulk import method or the MARC
  49 Batch Importer in the staff client.
  50
  51 Migrating your bibliographic records
  52 ------------------------------------
  53 Convert your MARC21 binary records into the MARCXML format, with one record per
  54 line. You can use the following Python script to achieve this goal; just
  55 install the _pymarc_ library first, and adjust the values of the _input_ and
  56 _output_ variables as needed.
  57
  58 [source,python]
  59 ------------------------------------------------------------------------------
  60 #!/usr/bin/env python
  61 # -*- coding: utf-8 -*-
  62 import codecs
  63 import pymarc
  64
  65 input = 'records_in.mrc'
  66 output = 'records_out.xml'
  67
  68 reader = pymarc.MARCReader(open(input, 'rb'), to_unicode=True)
  69 writer = codecs.open(output, 'w', 'utf-8')
  70 for record in reader:
  71     record.leader = record.leader[:9] + 'a' + record.leader[10:]
  72     writer.write(pymarc.record_to_xml(record) + "\n")
  73 ------------------------------------------------------------------------------
  74
  75 Once you have a MARCXML file with one record per line, you can load the records
  76 into your Evergreen system via a staging table in your database.
  77
  78 . Connect to the PostgreSQL database using the _psql_ command. For example:
  79 +
  80 ------------------------------------------------------------------------------
  81 psql -U <user-name> -h <hostname> -d <database>
  82 ------------------------------------------------------------------------------
  83 +
  84 . Create a staging table in the database. The staging table is a temporary
  85   location for the raw data that you will load into the production table or
  86   tables. Issue the following SQL statement from the _psql_ command line,
  87   adjusting the name of the table from _staging_records_import_, if desired:
  88 +
  89 [source,sql]
  90 ------------------------------------------------------------------------------
  91 CREATE TABLE staging_records_import (id BIGSERIAL, dest BIGINT, marc TEXT);
  92 ------------------------------------------------------------------------------
  93 +
  94 . Create a function that will insert the new records into the production table
  95   and update the _dest_ column of the staging table. Adjust
  96   "staging_records_import" to match the name of the staging table that you plan
  97   to create when you issue the following SQL statement:
  98 +
  99 [source,sql]
 100 ------------------------------------------------------------------------------
 101 CREATE OR REPLACE FUNCTION staging_importer() RETURNS VOID AS $$
 102 DECLARE stage RECORD;
 103 BEGIN
 104 FOR stage IN SELECT * FROM staging_records_import ORDER BY id LOOP
 105       INSERT INTO biblio.record_entry (marc, last_xact_id) VALUES (stage.marc, 'IMPORT');
 106       UPDATE staging_records_import SET dest = currval('biblio.record_entry_id_seq')
 107        WHERE id = stage.id;
 108    END LOOP;
 109   END;
 110   $$ LANGUAGE plpgsql;
 111 ------------------------------------------------------------------------------
 112 +
 113 . Load the data from your MARCXML file into the staging table using the COPY
 114   statement, adjusting for the name of the staging table and the location of
 115   your MARCXML file:
 116 +
 117 [source,sql]
 118 ------------------------------------------------------------------------------
 119 COPY staging_records_import (marc) FROM '/tmp/records_out.xml';
 120 ------------------------------------------------------------------------------
 121 +
 122 . Load the data from your staging table into the production table by invoking
 123   your staging function:
 124 +
 125 [source,sql]
 126 ------------------------------------------------------------------------------
 127 SELECT staging_importer();
 128 ------------------------------------------------------------------------------
 129
 130 When you leave out the _id_ value for a _BIGSERIAL_ column, the value in the
 131 column automatically increments for each new record that you add to the table.
 132
 133 Once you have loaded the records into your Evergreen system, you can search for
 134 some known records using the staff client to confirm that the import was
 135 successful.
 136
 137 Migrating your call numbers, copies, and parts
 138 ----------------------------------------------
 139 'Holdings', comprised of call numbers, copies, and parts, are the set of
 140 objects that enable users to locate and potentially acquire materials from your
 141 library system.
 142
 143 'Call numbers' connect libraries to bibliographic records. Each call number has a
 144 'label' associated with a classification scheme such as a the Library of Congress
 145 or Dewey Decimal systems, and can optionally have either or both a label prefix
 146 and a label suffix. Label prefixes and suffixes do not affect the sort order of
 147 the label.
 148
 149 'Copies' connect call numbers to particular instances of that resource at a
 150 particular library. Each copy has a barcode and must exist in a particular copy
 151 location. Other optional attributes of copies include circulation modifier,
 152 which may affect whether that copy can circulate or for how long it can
 153 circulate, and OPAC visibility, which controls whether that particular copy
 154 should be visible in the public catalog.
 155
 156 'Parts' provide more granularity for copies, primarily to enable patrons to
 157 place holds on individual parts of a set of items. For example, an encyclopedia
 158 might be represented by a single bibliographic record, with a single call
 159 number representing the label for that encyclopedia at a given library, with 26
 160 copies representing each letter of the alphabet, with each copy mapped to a
 161 different part such as _A, B, C, ... Z_.
 162
 163 To migrate this data into your Evergreen system, you will create another
 164 staging table in the database to hold the raw data for your materials from
 165 which the actual call numbers, copies, and parts will be generated.
 166
 167 Begin by connecting to the PostgreSQL database using the _psql_ command. For
 168 example:
 169
 170 ------------------------------------------------------------------------------
 171 psql -U <user-name> -h <hostname> -d <database>
 172 ------------------------------------------------------------------------------
 173
 174 Create the staging materials table by issuing the following SQL statement:
 175
 176 [source,sql]
 177 ------------------------------------------------------------------------------
 178 CREATE TABLE staging_materials (
 179   bibkey BIGINT,  -- biblio.record_entry_id
 180   callnum TEXT, -- call number label
 181   callnum_prefix TEXT, -- call number prefix
 182   callnum_suffix TEXT, -- call number suffix
 183   callnum_class TEXT, -- classification scheme
 184   create_date DATE,
 185   location TEXT, -- shelving location code
 186   item_type TEXT, -- circulation modifier code
 187   owning_lib TEXT, -- org unit code
 188   barcode TEXT, -- copy barcode
 189   part TEXT
 190 );
 191 ------------------------------------------------------------------------------
 192
 193 For the purposes of this example migration of call numbers, copies, and parts,
 194 we assume that you are able to create a tab-delimited file containing values
 195 that map to the staging table properties, with one copy per line. For example,
 196 the following 5 lines demonstrate how the file could look for 5 different
 197 copies, with non-applicable attribute values represented by _\N_, and 3 of the
 198 copies connected to a single call number and bibliographic record via parts:
 199
 200 ------------------------------------------------------------------------------
 201 1   QA 76.76 A3 \N  \N  LC  2012-12-05  STACKS  BOOK    BR1 30007001122620  \N
 202 2   GV 161 V8   Ref.    Juv.    LC  2010-11-11  KIDS    DVD BR2 30007005197073  \N
 203 3   AE 5 E363 1984  \N  \N      LC  1984-01-10  REFERENCE   BOOK    BR1 30007006853385  A
 204 3   AE 5 E363 1984  \N  \N      LC  1984-01-10  REFERENCE   BOOK    BR1 30007006853393  B
 205 3   AE 5 E363 1984  \N  \N      LC  1984-01-10  REFERENCE   BOOK    BR1 30007006853344  C
 206 ------------------------------------------------------------------------------
 207
 208 Once your holdings are in a tab-delimited format--which, for the purposes of
 209 this example, we will name _holdings.tsv_--you can import the holdings file
 210 into your staging table. Copy the contents of the holdings file into the
 211 staging table using the _COPY_ SQL statement:
 212
 213 [source,sql]
 214 ------------------------------------------------------------------------------
 215 COPY staging_items (bibkey, callnum, callnum_prefix,
 216   callnum_suffix, callnum_class, create_date, location,
 217   item_type, owning_lib, barcode, part) FROM 'holdings.tsv';
 218 ------------------------------------------------------------------------------
 219
 220 Generate the copy locations you need to represent your holdings:
 221
 222 [source,sql]
 223 ------------------------------------------------------------------------------
 224 INSERT INTO asset.copy_location (name, owning_lib)
 225   SELECT DISTINCT location, 1 FROM staging_materials
 226   WHERE NOT EXISTS (
 227     SELECT 1 FROM asset.copy_location
 228     WHERE name = location
 229   );
 230 ------------------------------------------------------------------------------
 231
 232 Generate the circulation modifiers you need to represent your holdings:
 233
 234 [source,sql]
 235 ------------------------------------------------------------------------------
 236 INSERT INTO config.circ_modifier (code, name, description, sip2_media_type)
 237   SELECT DISTINCT circmod, circmod, circmod, '001'
 238   FROM staging_materials
 239   WHERE NOT EXISTS (
 240     SELECT 1 FROM config.circ_modifier
 241     WHERE circmod = code
 242   );
 243 ------------------------------------------------------------------------------
 244
 245 Generate the call number prefixes and suffixes you need to represent your
 246 holdings:
 247
 248 [source,sql]
 249 ------------------------------------------------------------------------------
 250 INSERT INTO asset.call_number_prefix (owning_lib, label)
 251   SELECT DISTINCT aou.id, callnum_prefix
 252   FROM staging_materials sm
 253     INNER JOIN actor.org_unit aou
 254       ON aou.shortname = sm.owning_lib
 255   WHERE NOT EXISTS (
 256     SELECT 1 FROM asset.call_number_prefix acnp
 257     WHERE callnum_prefix = acnp.label
 258       AND aou.id = acnp.owning_lib
 259   ) AND callnum_prefix IS NOT NULL;
 260
 261 INSERT INTO asset.call_number_suffix (owning_lib, label)
 262   SELECT DISTINCT aou.id, callnum_suffix
 263   FROM staging_materials sm
 264     INNER JOIN actor.org_unit aou
 265       ON aou.shortname = sm.owning_lib
 266   WHERE NOT EXISTS (
 267     SELECT 1 FROM asset.call_number_suffix acns
 268     WHERE callnum_suffix = acns.label
 269       AND aou.id = acns.owning_lib
 270   ) AND callnum_suffix IS NOT NULL;
 271 ------------------------------------------------------------------------------
 272
 273 Generate the call numbers for your holdings:
 274
 275 [source,sql]
 276 ------------------------------------------------------------------------------
 277 INSERT INTO asset.call_number (
 278   creator, editor, record, owning_lib, label, prefix, suffix, label_class
 279 )
 280   SELECT DISTINCT 1, 1, bibkey, aou.id, callnum, acnp.id, acns.id,
 281   CASE WHEN callnum_class = 'LC' THEN 1
 282              WHEN callnum_class = 'DEWEY' THEN 2
 283   END
 284   FROM staging_materials sm
 285     INNER JOIN actor.org_unit aou
 286       ON aou.shortname = owning_lib
 287     INNER JOIN asset.call_number_prefix acnp
 288       ON COALESCE(acnp.label, '') = COALESCE(callnum_prefix, '')
 289     INNER JOIN asset.call_number_suffix acns
 290       ON COALESCE(acns.label, '') = COALESCE(callnum_suffix, '')
 291 ;
 292 ------------------------------------------------------------------------------
 293
 294 Generate the copies for your holdings:
 295
 296 [source,sql]
 297 ------------------------------------------------------------------------------
 298 INSERT INTO asset.copy (
 299   circ_lib, creator, editor, call_number, location,
 300  loan_duration, fine_level, barcode
 301 )
 302   SELECT DISTINCT aou.id, 1, 1, acn.id, acl.id, 2, 2, barcode
 303   FROM staging_materials sm
 304     INNER JOIN actor.org_unit aou
 305       ON aou.shortname = sm.owning_lib
 306     INNER JOIN asset.copy_location acl
 307       ON acl.name = sm.location
 308     INNER JOIN asset.call_number acn
 309       ON acn.label = sm.callnum
 310   WHERE acn.deleted IS FALSE
 311 ;
 312 ------------------------------------------------------------------------------
 313
 314 Generate the parts for your holdings. First, create the set of parts that are
 315 required for each record based on your staging materials table:
 316
 317 [source,sql]
 318 ------------------------------------------------------------------------------
 319 INSERT INTO biblio.monograph_part (record, label)
 320   SELECT DISTINCT bibkey, part
 321   FROM staging_materials sm
 322   WHERE part IS NOT NULL AND NOT EXISTS (
 323     SELECT 1 FROM biblio.monograph_part bmp
 324     WHERE sm.part = bmp.label
 325       AND sm.bibkey = bmp.record
 326   );
 327 ------------------------------------------------------------------------------
 328
 329 Now map the parts for each record to the specific copies that you added:
 330
 331 [source,sql]
 332 ------------------------------------------------------------------------------
 333 INSERT INTO asset.copy_part_map (target_copy, part)
 334   SELECT DISTINCT acp.id, bmp.id
 335   FROM staging_materials sm
 336     INNER JOIN asset.copy acp
 337       ON acp.barcode = sm.barcode
 338     INNER JOIN biblio.monograph_part bmp
 339       ON bmp.record = sm.bibkey
 340   WHERE part IS NOT NULL
 341     AND part = bmp.label
 342     AND acp.deleted IS FALSE
 343     AND NOT EXISTS (
 344     SELECT 1 FROM asset.copy_part_map
 345     WHERE target_copy = acp.id
 346       AND part = bmp.id
 347   );
 348 ------------------------------------------------------------------------------
 349
 350 At this point, you have loaded your bibliographic records, call numbers, call
 351 number prefixes and suffixes, copies, and parts, and your records should be
 352 visible to searches in the public catalog within the appropriate organization
 353 unit scope.