docs/development/support_scripts.adoc

   1 Support Scripts
   2 ---------------
   3
   4 Various scripts are included with Evergreen in the `/openils/bin/` directory
   5 (and in the source code in `Open-ILS/src/support-scripts` and
   6 `Open-ILS/src/extras`). Some of them are used during
   7 the installation process, such as `eg_db_config`, while others are usually
   8 run as cron jobs for routine maintenance, such as `fine_generator.pl` and
   9 `hold_targeter.pl`. Others are useful for less frequent needs, such as the
  10 scripts for importing/exporting MARC records. You may explore these scripts
  11 and adapt them for your local needs. You are also welcome to share your
  12 improvements or ask any questions on the
  13 http://evergreen-ils.org/communicate/[Evergreen IRC channel or email lists].
  14
  15 Here is a summary of the most commonly used scripts. The script name links
  16 to more thorough documentation, if available.
  17
  18  * <<_processing_action_triggers,action_trigger_runner.pl>>
  19    -- Useful for creating events for specified hooks and running pending events
  20  * authority_authority_linker.pl
  21    -- Links reference headings in authority records to main entry headings
  22       in other authority records. Should be run at least once a day (only for
  23           changed records).
  24  * <<_authority_control_fields,authority_control_fields.pl>>
  25    -- Links bibliographic records to the best matching authority record.
  26       Should be run at least once a day (only for changed records).
  27       You can accomplish this by running _authority_control_fields.pl --days-back=1_
  28  * autogen.sh
  29    -- Generates web files used by the OPAC, especially files related to
  30       organization unit hierarchy, fieldmapper IDL, locales selection,
  31       facet definitions, compressed JS files and related cache key
  32  * clark-kent.pl
  33    -- Used to start and stop the reporter (which runs scheduled reports)
  34  * <<_creating_the_evergreen_database,eg_db_config>>
  35    -- Creates database and schema, updates config files, sets Evergreen
  36       administrator username and password
  37  * fine_generator.pl
  38  * hold_targeter.pl
  39  * <<_importing_authority_records_from_command_line,marc2are.pl>>
  40    -- Converts authority records from MARC format to Evergreen objects
  41       suitable for importing via pg_loader.pl (or parallel_pg_loader.pl)
  42  * marc2bre.pl
  43    -- Converts bibliographic records from MARC format to Evergreen objects
  44       suitable for importing via pg_loader.pl (or parallel_pg_loader.pl)
  45  * marc2sre.pl
  46    -- Converts serial records from MARC format to Evergreen objects
  47       suitable for importing via pg_loader.pl (or parallel_pg_loader.pl)
  48  * <<_marc_export,marc_export>>
  49    -- Exports authority, bibliographic, and serial holdings records into
  50       any of these formats: USMARC, UNIMARC, XML, BRE, ARE
  51  * osrf_control
  52    -- Used to start, stop and send signals to OpenSRF services
  53  * parallel_pg_loader.pl
  54    -- Uses the output of marc2bre.pl (or similar tools) to generate the SQL
  55       for importing records into Evergreen in a parallel fashion
  56
  57 anchor:_authority_control_fields[]
  58
  59 authority_control_fields: Connecting Bibliographic and Authority records
  60 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  61
  62 indexterm:[authority control]
  63
  64 This script matches headings in bibliographic records to the appropriate
  65 authority records. When it finds a match, it will add a subfield 0 to the
  66 matching bibliographic field.
  67
  68 Here is how the matching works:
  69
  70 [options="header",cols="1,1,3"]
  71 |=========================================================
  72 |Bibliographic field|Authority field it matches|Subfields that it examines
  73
  74 |100|100|a,b,c,d,f,g,j,k,l,n,p,q,t,u
  75 |110|110|a,b,c,d,f,g,k,l,n,p,t,u
  76 |111|111|a,c,d,e,f,g,j,k,l,n,p,q,t,u
  77 |130|130|a,d,f,g,h,k,l,m,n,o,p,r,s,t
  78 |600|100|a,b,c,d,f,g,h,j,k,l,m,n,o,p,q,r,s,t,v,x,y,z
  79 |610|110|a,b,c,d,f,g,h,k,l,m,n,o,p,r,s,t,v,w,x,y,z
  80 |611|111|a,c,d,e,f,g,h,j,k,l,n,p,q,s,t,v,x,y,z
  81 |630|130|a,d,f,g,h,k,l,m,n,o,p,r,s,t,v,x,y,z
  82 |648|148|a,v,x,y,z
  83 |650|150|a,b,v,x,y,z
  84 |651|151|a,v,x,y,z
  85 |655|155|a,v,x,y,z
  86 |700|100|a,b,c,d,f,g,j,k,l,n,p,q,t,u
  87 |710|110|a,b,c,d,f,g,k,l,n,p,t,u
  88 |711|111|a,c,d,e,f,g,j,k,l,n,p,q,t,u
  89 |730|130|a,d,f,g,h,j,k,m,n,o,p,r,s,t
  90 |751|151|a,v,x,y,z
  91 |800|100|a,b,c,d,e,f,g,j,k,l,n,p,q,t,u,4
  92 |830|130|a,d,f,g,h,k,l,m,n,o,p,r,s,t
  93 |=========================================================
  94
  95
  96 anchor:_marc_export[]
  97
  98 marc_export: Exporting Bibliographic Records into MARC files
  99 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 100
 101 indexterm:[marc_export]
 102 indexterm:[MARC records,exporting,using the command line]
 103
 104 The following procedure explains how to export Evergreen bibliographic
 105 records into MARC files using the *marc_export* support script. All steps
 106 should be performed by the `opensrf` user from your Evergreen server.
 107
 108 [NOTE]
 109 Processing time for exporting records depends on several factors such as
 110 the number of records you are exporting. It is recommended that you divide
 111 the export ID files (records.txt) into a manageable number of records if
 112 you are exporting a large number of records.
 113
 114  . Create a text file list of the Bibliographic record IDs you would like
 115 to export from Evergreen. One way to do this is using SQL:
 116 +
 117 [source,sql]
 118 ----
 119 SELECT DISTINCT bre.id FROM biblio.record_entry AS bre
 120     JOIN asset.call_number AS acn ON acn.record = bre.id
 121     WHERE bre.deleted='false' and owning_lib=101 \g /home/opensrf/records.txt;
 122 ----
 123 +
 124 This query creates a file called `records.txt` containing a column of
 125 distinct IDs of items owned by the organizational unit with the id 101.
 126
 127  . Navigate to the support-scripts folder
 128 +
 129 ----
 130 cd /home/opensrf/Evergreen-ILS*/Open-ILS/src/support-scripts/
 131 ----
 132
 133  . Run *marc_export*, using the ID file you created in step 1 to define which
 134    files to export. The following example exports the records into MARCXML format.
 135 +
 136 ----
 137 cat /home/opensrf/records.txt | ./marc_export --store -i -c /openils/conf/opensrf_core.xml \
 138     -x /openils/conf/fm_IDL.xml -f XML --timeout 5 > exported_files.xml
 139 ----
 140
 141 [NOTE]
 142 ====================
 143 `marc_export` does not output progress as it executes.
 144 ====================
 145
 146 Options
 147 ^^^^^^^
 148
 149 The *marc_export* support script includes several options.  You can find a complete list
 150 by running `./marc_export -h`.  A few key options are also listed below:
 151
 152 --descendants and --library
 153 +++++++++++++++++++++++++++
 154
 155 The `marc_export` script has two related options, `--descendants` and
 156 `--library`.  Both options take one argument of an organizational unit
 157
 158 The `--library` option will export records with holdings at the specified
 159 organizational unit only.  By default, this only includes physical holdings,
 160 not electronic ones (also known as located URIs).
 161
 162 The `descendants` option works much like the `--library` option
 163 except that it is aware of the org. tree and will export records with
 164 holdings at the specified organizational unit and all of its descendants.
 165 This is handy if you want to export the records for all of the branches
 166 of a system.  You can do that by specifying this option and the system's
 167 shortname, instead of specifying multiple `--library` options for each branch.
 168
 169 Both the `--library` and `--descendants` options can be repeated.
 170 All of the specified org. units and their descendants will be included
 171 in the output.  You can also combine `--library` and `--descendants`
 172 options when necessary.
 173
 174 --items
 175 +++++++
 176
 177 The `--items` option will add an 852 field for every relevant item to the MARC
 178 record.  This 852 field includes the following information:
 179
 180 [options="header",cols="2,3"]
 181 |===================================
 182 |Subfield          |Contents
 183 |$b (occurrence 1) |Call number owning library shortname
 184 |$b (occurrence 2) |Item circulating library shortname
 185 |$c                |Shelving location
 186 |$g                |Circulation modifier
 187 |$j                |Call number
 188 |$k                |Call number prefix
 189 |$m                |Call number suffix
 190 |$p                |Barcode
 191 |$t                |Copy number
 192 |$x                |Miscellaneous item information
 193 |$y                |Price
 194 |===================================
 195
 196
 197 --since
 198 +++++++
 199
 200 You can use the `--since` option to export records modified after a certain date and time.
 201
 202 --store
 203 +++++++
 204
 205 By default, marc_export will use the reporter storage service, which should
 206 work in most cases. But if you have a separate reporter database and you
 207 know you want to talk directly to your main production database, then you
 208 can set the `--store` option to `cstore` or `storage`.
 209
 210 --uris
 211 ++++++
 212 The `--uris` option (short form: `-u`) allows you to  export records with
 213 located URIs (i.e. electronic resources).  When used by itself, it will export
 214 only records that have located URIs.  When used in conjunction with `--items`,
 215 it will add records with located URIs but no items/copies to the output.
 216 If combined with a `--library` or `--descendants` option, this option will
 217 limit its output to those records with URIs at the designated libraries.  The
 218 best way to use this option is in combination with the `--items` and one of the
 219 `--library` or `--descendants` options to export *all* of a library's
 220 holdings both physical and electronic.
 221
 222 anchor:_pingest_pl[]
 223
 224 Parallel Ingest with pingest.pl
 225 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 226
 227 indexterm:[pgingest.pl]
 228 indexterm:[MARC records,importing,using the command line]
 229
 230 A program named pingest.pl allows fast bibliographic record
 231 ingest.  It performs ingest in parallel so that multiple batches can
 232 be done simultaneously.  It operates by splitting the records to be
 233 ingested up into batches and running all of the ingest methods on each
 234 batch.  You may pass in options to control how many batches are run at
 235 the same time, how many records there are per batch, and which ingest
 236 operations to skip.
 237
 238 NOTE: The browse ingest is presently done in a single process over all
 239 of the input records as it cannot run in parallel with itself.  It
 240 does, however, run in parallel with the other ingests.
 241
 242 Command Line Options
 243 ^^^^^^^^^^^^^^^^^^^^
 244
 245 pingest.pl accepts the following command line options:
 246
 247 --host::
 248     The server where PostgreSQL runs (either host name or IP address).
 249     The default is read from the PGHOST environment variable or
 250     "localhost."
 251
 252 --port::
 253     The port that PostgreSQL listens to on host.  The default is read
 254     from the PGPORT environment variable or 5432.
 255
 256 --db::
 257     The database to connect to on the host.  The default is read from
 258     the PGDATABASE environment variable or "evergreen."
 259
 260 --user::
 261     The username for database connections.  The default is read from
 262     the PGUSER environment variable or "evergreen."
 263
 264 --password::
 265     The password for database connections.  The default is read from
 266     the PGPASSWORD environment variable or "evergreen."
 267
 268 --batch-size::
 269     Number of records to process per batch.  The default is 10,000.
 270
 271 --max-child::
 272     Max number of worker processes (i.e. the number of batches to
 273     process simultaneously).  The default is 8.
 274
 275 --skip-browse::
 276 --skip-attrs::
 277 --skip-search::
 278 --skip-facets::
 279 --skip-display::
 280     Skip the selected reingest component.
 281
 282
 283
 284 Importing Authority Records from Command Line
 285 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 286
 287 indexterm:[marc2are.pl]
 288 indexterm:[pg_loader.pl]
 289 indexterm:[MARC records,importing,using the command line]
 290
 291 The major advantages of the command line approach are its speed and its
 292 convenience for system administrators who can perform bulk loads of
 293 authority records in a controlled environment. For alternate instructions,
 294 see the cataloging manual.
 295
 296  . Run *marc2are.pl* against the authority records, specifying the user
 297 name, password, MARC type (USMARC or XML). Use `STDOUT` redirection to
 298 either pipe the output directly into the next command or into an output
 299 file for inspection. For example, to process a file with authority records
 300 in MARCXML format named `auth_small.xml` using the default user name and
 301 password, and directing the output into a file named `auth.are`:
 302 +
 303 ----
 304 cd Open-ILS/src/extras/import/
 305 perl marc2are.pl --user admin --pass open-ils --marctype XML auth_small.xml > auth.are
 306 ----
 307 +
 308 [NOTE]
 309 The MARC type will default to USMARC if the `--marctype` option is not specified.
 310
 311  . Run *parallel_pg_loader.pl* to generate the SQL necessary for importing the
 312 authority records into your system. This script will create files in your
 313 current directory with filenames like `pg_loader-output.are.sql` and
 314 `pg_loader-output.sql` (which runs the previous SQL file). To continue with the
 315 previous example by processing our new `auth.are` file:
 316 +
 317 ----
 318 cd Open-ILS/src/extras/import/
 319 perl parallel_pg_loader.pl --auto are --order are auth.are
 320 ----
 321 +
 322 [TIP]
 323 To save time for very large batches of records, you could simply pipe the
 324 output of *marc2are.pl* directly into *parallel_pg_loader.pl*.
 325
 326  . Load the authority records from the SQL file that you generated in the
 327 last step into your Evergreen database using the psql tool. Assuming the
 328 default user name, host name, and database name for an Evergreen instance,
 329 that command looks like:
 330 +
 331 ----
 332 psql -U evergreen -h localhost -d evergreen -f pg_loader-output.sql
 333 ----
 334
 335 Juvenile-to-adult batch script
 336 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 337
 338 The batch `juv_to_adult.srfsh` script is responsible for toggling a patron
 339 from juvenile to adult. It should be set up as a cron job.
 340
 341 This script changes patrons to adult when they reach the age value set in the
 342 library setting named "Juvenile Age Threshold" (`global.juvenile_age_threshold`).
 343 When no library setting value is present at a given patron's home library, the
 344 value passed in to the script will be used as a default.
 345
 346 MARC Stream Importer
 347 ~~~~~~~~~~~~~~~~~~~~
 348
 349 indexterm:[MARC records,importing,using the command line]
 350
 351 The MARC Stream Importer can import authority records or bibliographic records.
 352 A single running instance of the script can import either type of record, based
 353 on the record leader.
 354
 355 This support script has its own configuration file, _marc_stream_importer.conf_,
 356 which includes settings related to logs, ports, uses, and access control.
 357
 358 The importer is even more flexible than the staff client import, including the
 359 following options:
 360
 361  * _--bib-auto-overlay-exact_ and _--auth-auto-overlay-exact_: overlay/merge on
 362 exact 901c matches
 363  * _--bib-auto-overlay-1match_ and _--auth-auto-overlay-1match_: overlay/merge
 364 when exactly one match is found
 365  * _--bib-auto-overlay-best-match_ and _--auth-auto-overlay-best-match_:
 366 overlay/merge on best match
 367  * _--bib-import-no-match_ and _--auth-import-no-match_: import when no match
 368 is found
 369
 370 One advantage to using this tool instead of the staff client Import interface
 371 is that the MARC Stream Importer can load a group of files at once.
 372