docs/admin_initial_setup/migrating_your_data.txt

   1 Migrating from a legacy system
   2 ==============================
   3
   4 When you migrate to Evergreen, you generally want to migrate the bibliographic
   5 records and copy information that existed in your previous library system. For
   6 anything more than a few thousand records, you should import the data directly
   7 into the database rather than use the tools in the staff client. While the data
   8 that you can extract from your legacy system varies widely, this section
   9 assumes that you or members of your team have the ability to write scripts and
  10 are comfortable working with SQL to manipulate data within PostgreSQL. If so,
  11 then the following section will guide you towards a method of generating common
  12 data formats so that you can then load the data into the database in bulk.
  13
  14 Making electronic resources visible in the catalog
  15 --------------------------------------------------
  16 Electronic resources generally do not have any call number or copy information
  17 associated with them, and Evergreen enables you to easily make bibliographic
  18 records visible in the public catalog within sections of the organizational
  19 unit hierarchy. For example, you can make a set of bibliographic records
  20 visible only to specific branches that have purchased licenses for the
  21 corresponding resources, or you can make records representing publicly
  22 available electronic resources visible to the entire consortium.
  23
  24 Therefore, to make a record visible in the public catalog, modify the records
  25 using your preferred MARC editing approach to ensure the 856 field contains the
  26 following information before loading records for electronic resources into
  27 Evergreen:
  28
  29 .856 field for electronic resources: indicators and subfields
  30 [width="100%",options="header"]
  31 |=============================================================================
  32 |Attribute   | Value | Note
  33 |Indicator 1 |4      |
  34 |Indicator 2 |0 or 1 |
  35 |Subfield u  |URL for the electronic resource |
  36 |Subfield y  |Text content of the link |
  37 |Subfield z  |Public note | Normally displayed after the link
  38 |Subfield 9  |Organizational unit short name | The record will be visible when
  39   a search is performed specifying this organizational unit or one of its
  40   children. You can repeat this subfield as many times as you need.
  41 |=============================================================================
  42
  43 Once your electronic resource bibliographic records have the required
  44 indicators and subfields for each 856 field in the record, you can proceed to
  45 load the records using either the command-line bulk import method or the MARC
  46 Batch Importer in the staff client.
  47
  48 Migrating your bibliographic records
  49 ------------------------------------
  50 Convert your MARC21 binary records into the MARCXML format, with one record per
  51 line. You can use the following Python script to achieve this goal; just
  52 install the _pymarc_ library first, and adjust the values of the _input_ and
  53 _output_ variables as needed.
  54
  55 [source,python]
  56 ------------------------------------------------------------------------------
  57 #!/usr/bin/env python
  58 # -*- coding: utf-8 -*-
  59 import codecs
  60 import pymarc
  61
  62 input = 'records_in.mrc'
  63 output = 'records_out.xml'
  64
  65 reader = pymarc.MARCReader(open(input, 'rb'), to_unicode=True)
  66 writer = codecs.open(output, 'w', 'utf-8')
  67 for record in reader:
  68     record.leader = record.leader[:9] + 'a' + record.leader[10:]
  69     writer.write(pymarc.record_to_xml(record) + "\n")
  70 ------------------------------------------------------------------------------
  71
  72 Once you have a MARCXML file with one record per line, you can load the records
  73 into your Evergreen system via a staging table in your database.
  74
  75 . Connect to the PostgreSQL database using the _psql_ command. For example:
  76 +
  77 ------------------------------------------------------------------------------
  78 psql -U <user-name> -h <hostname> -d <database>
  79 ------------------------------------------------------------------------------
  80 +
  81 . Create a staging table in the database. The staging table is a temporary
  82   location for the raw data that you will load into the production table or
  83   tables. Issue the following SQL statement from the _psql_ command line,
  84   adjusting the name of the table from _staging_records_import_, if desired:
  85 +
  86 [source,sql]
  87 ------------------------------------------------------------------------------
  88 CREATE TABLE staging_records_import (id BIGSERIAL, dest BIGINT, marc TEXT);
  89 ------------------------------------------------------------------------------
  90 +
  91 . Create a function that will insert the new records into the production table
  92   and update the _dest_ column of the staging table. Adjust
  93   "staging_records_import" to match the name of the staging table that you plan
  94   to create when you issue the following SQL statement:
  95 +
  96 [source,sql]
  97 ------------------------------------------------------------------------------
  98 CREATE OR REPLACE FUNCTION staging_importer() RETURNS NULL AS $$
  99 DECLARE stage RECORD;
 100 BEGIN
 101 FOR stage IN SELECT * FROM staging_records_import ORDER BY id LOOP
 102       INSERT INTO biblio.record_entry (marc, last_xact_id) VALUES (stage.marc, 'IMPORT');
 103       UPDATE staging_records_import SET dest = currval('biblio.record_entry_id_seq');
 104    END LOOP;
 105   END;
 106   $$ LANGUAGE plpgsql;
 107 ------------------------------------------------------------------------------
 108 +
 109 . Load the data from your MARCXML file into the staging table using the COPY
 110   statement, adjusting for the name of the staging table and the location of
 111   your MARCXML file:
 112 +
 113 [source,sql]
 114 ------------------------------------------------------------------------------
 115 COPY staging_records_import (marc) FROM '/tmp/records_out.xml';
 116 ------------------------------------------------------------------------------
 117 +
 118 . Load the data from your staging table into the production table by invoking
 119   your staging function:
 120 +
 121 [source,sql]
 122 ------------------------------------------------------------------------------
 123 SELECT staging_importer();
 124 ------------------------------------------------------------------------------
 125
 126 When you leave out the _id_ value for a _BIGSERIAL_ column, the value in the
 127 column automatically increments for each new record that you add to the table.
 128
 129 Once you have loaded the records into your Evergreen system, you can search for
 130 some known records using the staff client to confirm that the import was
 131 successful.
 132
 133 Migrating your call numbers, copies, and parts
 134 ----------------------------------------------
 135 'Holdings', comprised of call numbers, copies, and parts, are the set of
 136 objects that enable users to locate and potentially acquire materials from your
 137 library system.
 138
 139 'Call numbers' connect libraries to bibliographic records. Each call number has a
 140 'label' associated with a classification scheme such as a the Library of Congress
 141 or Dewey Decimal systems, and can optionally have either or both a label prefix
 142 and a label suffix. Label prefixes and suffixes do not affect the sort order of
 143 the label.
 144
 145 'Copies' connect call numbers to particular instances of that resource at a
 146 particular library. Each copy has a barcode and must exist in a particular copy
 147 location. Other optional attributes of copies include circulation modifier,
 148 which may affect whether that copy can circulate or for how long it can
 149 circulate, and OPAC visibility, which controls whether that particular copy
 150 should be visible in the public catalog.
 151
 152 'Parts' provide more granularity for copies, primarily to enable patrons to
 153 place holds on individual parts of a set of items. For example, an encyclopedia
 154 might be represented by a single bibliographic record, with a single call
 155 number representing the label for that encyclopedia at a given library, with 26
 156 copies representing each letter of the alphabet, with each copy mapped to a
 157 different part such as _A, B, C, ... Z_.
 158
 159 To migrate this data into your Evergreen system, you will create another
 160 staging table in the database to hold the raw data for your materials from
 161 which the actual call numbers, copies, and parts will be generated.
 162
 163 Begin by connecting to the PostgreSQL database using the _psql_ command. For
 164 example:
 165
 166 ------------------------------------------------------------------------------
 167 psql -U <user-name> -h <hostname> -d <database>
 168 ------------------------------------------------------------------------------
 169
 170 Create the staging materials table by issuing the following SQL statement:
 171
 172 [source,sql]
 173 ------------------------------------------------------------------------------
 174 CREATE TABLE staging_materials (
 175   bibkey BIGINT,  -- biblio.record_entry_id
 176   callnum TEXT, -- call number label
 177   callnum_prefix TEXT, -- call number prefix
 178   callnum_suffix TEXT, -- call number suffix
 179   callnum_class TEXT, -- classification scheme
 180   create_date DATE,
 181   location TEXT, -- shelving location code
 182   item_type TEXT, -- circulation modifier code
 183   owning_lib TEXT, -- org unit code
 184   barcode TEXT, -- copy barcode
 185   part TEXT
 186 );
 187 ------------------------------------------------------------------------------
 188
 189 For the purposes of this example migration of call numbers, copies, and parts,
 190 we assume that you are able to create a tab-delimited file containing values
 191 that map to the staging table properties, with one copy per line. For example,
 192 the following 5 lines demonstrate how the file could look for 5 different
 193 copies, with non-applicable attribute values represented by _\N_, and 3 of the
 194 copies connected to a single call number and bibliographic record via parts:
 195
 196 ------------------------------------------------------------------------------
 197 1   QA 76.76 A3 \N  \N  LC  2012-12-05  STACKS  BOOK    BR1 30007001122620  \N
 198 2   GV 161 V8   Ref.    Juv.    LC  2010-11-11  KIDS    DVD BR2 30007005197073  \N
 199 3   AE 5 E363 1984  \N  \N      LC  1984-01-10  REFERENCE   BOOK    BR1 30007006853385  A
 200 3   AE 5 E363 1984  \N  \N      LC  1984-01-10  REFERENCE   BOOK    BR1 30007006853393  B
 201 3   AE 5 E363 1984  \N  \N      LC  1984-01-10  REFERENCE   BOOK    BR1 30007006853344  C
 202 ------------------------------------------------------------------------------
 203
 204 Once your holdings are in a tab-delimited format--which, for the purposes of
 205 this example, we will name _holdings.tsv_--you can import the holdings file
 206 into your staging table. Copy the contents of the holdings file into the
 207 staging table using the _COPY_ SQL statement:
 208
 209 [source,sql]
 210 ------------------------------------------------------------------------------
 211 COPY staging_items (bibkey, callnum, callnum_prefix,
 212   callnum_suffix, callnum_class, create_date, location,
 213   item_type, owning_lib, barcode, part) FROM 'holdings.tsv';
 214 ------------------------------------------------------------------------------
 215
 216 Generate the copy locations you need to represent your holdings:
 217
 218 [source,sql]
 219 ------------------------------------------------------------------------------
 220 INSERT INTO asset.copy_location (name, owning_lib)
 221   SELECT DISTINCT location, 1 FROM staging_materials
 222   WHERE NOT EXISTS (
 223     SELECT 1 FROM asset.copy_location
 224     WHERE name = location
 225   );
 226 ------------------------------------------------------------------------------
 227
 228 Generate the circulation modifiers you need to represent your holdings:
 229
 230 [source,sql]
 231 ------------------------------------------------------------------------------
 232 INSERT INTO config.circ_modifier (code, name, description, sip2_media_type)
 233   SELECT DISTINCT circmod, circmod, circmod, '001'
 234   FROM staging_materials
 235   WHERE NOT EXISTS (
 236     SELECT 1 FROM config.circ_modifier
 237     WHERE circmod = code
 238   );
 239 ------------------------------------------------------------------------------
 240
 241 Generate the call number prefixes and suffixes you need to represent your
 242 holdings:
 243
 244 [source,sql]
 245 ------------------------------------------------------------------------------
 246 INSERT INTO asset.call_number_prefix (owning_lib, label)
 247   SELECT DISTINCT aou.id, callnum_prefix
 248   FROM staging_materials sm
 249     INNER JOIN actor.org_unit aou
 250       ON aou.shortname = sm.owning_lib
 251   WHERE NOT EXISTS (
 252     SELECT 1 FROM asset.call_number_prefix acnp
 253     WHERE callnum_prefix = acnp.label
 254       AND aou.id = acnp.owning_lib
 255   ) AND callnum_prefix IS NOT NULL;
 256
 257 INSERT INTO asset.call_number_suffix (owning_lib, label)
 258   SELECT DISTINCT aou.id, callnum_suffix
 259   FROM staging_materials sm
 260     INNER JOIN actor.org_unit aou
 261       ON aou.shortname = sm.owning_lib
 262   WHERE NOT EXISTS (
 263     SELECT 1 FROM asset.call_number_suffix acns
 264     WHERE callnum_suffix = acns.label
 265       AND aou.id = acns.owning_lib
 266   ) AND callnum_suffix IS NOT NULL;
 267 ------------------------------------------------------------------------------
 268
 269 Generate the call numbers for your holdings:
 270
 271 [source,sql]
 272 ------------------------------------------------------------------------------
 273 INSERT INTO asset.call_number (
 274   creator, editor, record, owning_lib, label, prefix, suffix, label_class
 275 )
 276   SELECT DISTINCT 1, 1, bibkey, aou.id, callnum, acnp.id, acns.id,
 277   CASE WHEN callnum_class = 'LC' THEN 1
 278              WHEN callnum_class = 'DEWEY' THEN 2
 279   END
 280   FROM staging_materials sm
 281     INNER JOIN actor.org_unit aou
 282       ON aou.shortname = owning_lib
 283     INNER JOIN asset.call_number_prefix acnp
 284       ON COALESCE(acnp.label, '') = COALESCE(callnum_prefix, '')
 285     INNER JOIN asset.call_number_suffix acns
 286       ON COALESCE(acns.label, '') = COALESCE(callnum_suffix, '')
 287 ;
 288 ------------------------------------------------------------------------------
 289
 290 Generate the copies for your holdings:
 291
 292 [source,sql]
 293 ------------------------------------------------------------------------------
 294 INSERT INTO asset.copy (
 295   circ_lib, creator, editor, call_number, location,
 296  loan_duration, fine_level, barcode
 297 )
 298   SELECT DISTINCT aou.id, 1, 1, acn.id, acl.id, 2, 2, barcode
 299   FROM staging_materials sm
 300     INNER JOIN actor.org_unit aou
 301       ON aou.shortname = sm.owning_lib
 302     INNER JOIN asset.copy_location acl
 303       ON acl.name = sm.location
 304     INNER JOIN asset.call_number acn
 305       ON acn.label = sm.callnum
 306   WHERE acn.deleted IS FALSE
 307 ;
 308 ------------------------------------------------------------------------------
 309
 310 Generate the parts for your holdings. First, create the set of parts that are
 311 required for each record based on your staging materials table:
 312
 313 [source,sql]
 314 ------------------------------------------------------------------------------
 315 INSERT INTO biblio.monograph_part (record, label)
 316   SELECT DISTINCT bibkey, part
 317   FROM staging_materials sm
 318   WHERE part IS NOT NULL AND NOT EXISTS (
 319     SELECT 1 FROM biblio.monograph_part bmp
 320     WHERE sm.part = bmp.label
 321       AND sm.bibkey = bmp.record
 322   );
 323 ------------------------------------------------------------------------------
 324
 325 Now map the parts for each record to the specific copies that you added:
 326
 327 [source,sql]
 328 ------------------------------------------------------------------------------
 329 INSERT INTO asset.copy_part_map (target_copy, part)
 330   SELECT DISTINCT acp.id, bmp.id
 331   FROM staging_materials sm
 332     INNER JOIN asset.copy acp
 333       ON acp.barcode = sm.barcode
 334     INNER JOIN biblio.monograph_part bmp
 335       ON bmp.record = sm.bibkey
 336   WHERE part IS NOT NULL
 337     AND part = bmp.label
 338     AND acp.deleted IS FALSE
 339     AND NOT EXISTS (
 340     SELECT 1 FROM asset.copy_part_map
 341     WHERE target_copy = acp.id
 342       AND part = bmp.id
 343   );
 344 ------------------------------------------------------------------------------
 345
 346 At this point, you have loaded your bibliographic records, call numbers, call
 347 number prefixes and suffixes, copies, and parts, and your records should be
 348 visible to searches in the public catalog within the appropriate organization
 349 unit scope.