1 <?xml version="1.0" encoding="UTF-8"?>
\r
2 <chapter xml:id="intro_to_sql" xmlns="http://docbook.org/ns/docbook" version="5.0" xml:lang="EN"
\r
3 xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:xlink="http://www.w3.org/1999/xlink">
\r
5 <title>Introduction to SQL for Evergreen Administrators</title>
\r
7 <section id="intro_to_databases">
\r
8 <title>Introduction to SQL Databases</title>
\r
10 <title>Introduction</title>
\r
11 <simpara>Over time, the SQL database has become the standard method of storing,
\r
12 retrieving, and processing raw data for applications. Ranging from embedded
\r
13 databases such as SQLite and Apache Derby, to enterprise databases such as
\r
14 Oracle and IBM DB2, any SQL database offers basic advantages to application
\r
15 developers such as standard interfaces (Structured Query Language (SQL), Java
\r
16 Database Connectivity (JDBC), Open Database Connectivity (ODBC), Perl Database
\r
17 Independent Interface (DBI)), a standard conceptual model of data (tables,
\r
18 fields, relationships, constraints, etc), performance in storing and retrieving
\r
19 data, concurrent access, etc.</simpara>
\r
20 <simpara>Evergreen is built on PostgreSQL, an open source SQL database that began as
\r
21 <literal>POSTGRES</literal> at the University of California at Berkeley in 1986 as a research
\r
22 project led by Professor Michael Stonebraker. A SQL interface was added to a
\r
23 fork of the original POSTGRES Berkelely code in 1994, and in 1996 the project
\r
24 was renamed PostgreSQL.</simpara>
\r
26 <simplesect id="_tables">
\r
27 <title>Tables</title>
\r
28 <simpara>The table is the cornerstone of a SQL database. Conceptually, a database table
\r
29 is similar to a single sheet in a spreadsheet: every table has one or more
\r
30 columns, with each row in the table containing values for each column. Each
\r
31 column in a table defines an attribute corresponding to a particular data type.</simpara>
\r
32 <simpara>We’ll insert a row into a table, then display the resulting contents. Don’t
\r
33 worry if the INSERT statement is completely unfamiliar, we’ll talk more about
\r
34 the syntax of the insert statement later.</simpara>
\r
35 <formalpara><title><literal>actor.usr_note</literal> database table</title><para>
\r
36 <programlisting language="sql" linenumbering="unnumbered">evergreen=# INSERT INTO actor.usr_note (usr, creator, pub, title, value)
\r
37 VALUES (1, 1, TRUE, 'Who is this guy?', 'He''s the administrator!');
\r
39 evergreen=# select id, usr, creator, pub, title, value from actor.usr_note;
\r
40 id | usr | creator | pub | title | value
\r
41 ----+-----+---------+-----+------------------+-------------------------
\r
42 1 | 1 | 1 | t | Who is this guy? | He's the administrator!
\r
43 (1 rows)</programlisting>
\r
44 </para></formalpara>
\r
45 <simpara>PostgreSQL supports table inheritance, which lets you define tables that
\r
46 inherit the column definitions of a given parent table. A search of the data in
\r
47 the parent table includes the data in the child tables. Evergreen uses table
\r
48 inheritance: for example, the <literal>action.circulation</literal> table is a child of the
\r
49 <literal>money.billable_xact</literal> table, and the <literal>money.*_payment</literal> tables all inherit from
\r
50 the <literal>money.payment</literal> parent table.</simpara>
\r
52 <simplesect id="_schemas">
\r
53 <title>Schemas</title>
\r
54 <simpara>PostgreSQL, like most SQL databases, supports the use of schema names to group
\r
55 collections of tables and other database objects together. You might think of
\r
56 schemas as namespaces if you’re a programmer; or you might think of the schema
\r
57 / table / column relationship like the area code / exchange / local number
\r
58 structure of a telephone number.</simpara>
\r
61 rowsep="1" colsep="1"
\r
63 <title>Examples: database object names</title>
\r
64 <?dbhtml table-width="80%"?>
\r
65 <?dbfo table-width="80%"?>
\r
67 <colspec colname="col_1" colwidth="85*"/>
\r
68 <colspec colname="col_2" colwidth="85*"/>
\r
69 <colspec colname="col_3" colwidth="85*"/>
\r
70 <colspec colname="col_4" colwidth="85*"/>
\r
73 <entry align="left" valign="top">Full name </entry>
\r
74 <entry align="left" valign="top">Schema name </entry>
\r
75 <entry align="left" valign="top">Table name </entry>
\r
76 <entry align="left" valign="top">Field name</entry>
\r
81 <entry align="left" valign="top"><simpara>actor.usr_note.title</simpara></entry>
\r
82 <entry align="left" valign="top"><simpara>actor</simpara></entry>
\r
83 <entry align="left" valign="top"><simpara>usr_note</simpara></entry>
\r
84 <entry align="left" valign="top"><simpara>title</simpara></entry>
\r
87 <entry align="left" valign="top"><simpara>biblio.record_entry.marc</simpara></entry>
\r
88 <entry align="left" valign="top"><simpara>biblio</simpara></entry>
\r
89 <entry align="left" valign="top"><simpara>record_entry</simpara></entry>
\r
90 <entry align="left" valign="top"><simpara>marc</simpara></entry>
\r
95 <simpara>The default schema name in PostgreSQL is <literal>public</literal>, so if you do not specify a
\r
96 schema name when creating or accessing a database object, PostgreSQL will use
\r
97 the <literal>public</literal> schema. As a result, you might not find the object that you’re
\r
98 looking for if you don’t use the appropriate schema.</simpara>
\r
99 <formalpara><title>Example: Creating a table without a specific schema</title><para>
\r
100 <programlisting language="sql" linenumbering="unnumbered">evergreen=# CREATE TABLE foobar (foo TEXT, bar TEXT);
\r
102 evergreen=# \d foobar
\r
103 Table "public.foobar"
\r
104 Column | Type | Modifiers
\r
105 --------+------+-----------
\r
107 bar | text |</programlisting>
\r
108 </para></formalpara>
\r
109 <formalpara><title>Example: Trying to access a unqualified table outside of the public schema</title><para>
\r
110 <programlisting language="sql" linenumbering="unnumbered">evergreen=# SELECT * FROM usr_note;
\r
111 ERROR: relation "usr_note" does not exist
\r
112 LINE 1: SELECT * FROM usr_note;
\r
114 </para></formalpara>
\r
115 <simpara>Evergreen uses schemas to organize all of its tables with mostly intuitive,
\r
116 if short, schema names. Here’s the current (as of 2010-01-03) list of schemas
\r
117 used by Evergreen:</simpara>
\r
120 rowsep="1" colsep="1"
\r
122 <title>Evergreen schema names</title>
\r
123 <?dbhtml table-width="80%"?>
\r
124 <?dbfo table-width="80%"?>
\r
126 <colspec colname="col_1" colwidth="170*"/>
\r
127 <colspec colname="col_2" colwidth="170*"/>
\r
130 <entry align="left" valign="top">Schema name </entry>
\r
131 <entry align="left" valign="top">Description</entry>
\r
136 <entry align="left" valign="top"><simpara><literal>acq</literal></simpara></entry>
\r
137 <entry align="left" valign="top"><simpara>Acquisitions</simpara></entry>
\r
140 <entry align="left" valign="top"><simpara><literal>action</literal></simpara></entry>
\r
141 <entry align="left" valign="top"><simpara>Circulation actions</simpara></entry>
\r
144 <entry align="left" valign="top"><simpara><literal>action_trigger</literal></simpara></entry>
\r
145 <entry align="left" valign="top"><simpara>Event mechanisms</simpara></entry>
\r
148 <entry align="left" valign="top"><simpara><literal>actor</literal></simpara></entry>
\r
149 <entry align="left" valign="top"><simpara>Evergreen users and organization units</simpara></entry>
\r
152 <entry align="left" valign="top"><simpara><literal>asset</literal></simpara></entry>
\r
153 <entry align="left" valign="top"><simpara>Call numbers and copies</simpara></entry>
\r
156 <entry align="left" valign="top"><simpara><literal>auditor</literal></simpara></entry>
\r
157 <entry align="left" valign="top"><simpara>Track history of changes to selected tables</simpara></entry>
\r
160 <entry align="left" valign="top"><simpara><literal>authority</literal></simpara></entry>
\r
161 <entry align="left" valign="top"><simpara>Authority records</simpara></entry>
\r
164 <entry align="left" valign="top"><simpara><literal>biblio</literal></simpara></entry>
\r
165 <entry align="left" valign="top"><simpara>Bibliographic records</simpara></entry>
\r
168 <entry align="left" valign="top"><simpara><literal>booking</literal></simpara></entry>
\r
169 <entry align="left" valign="top"><simpara>Resource bookings</simpara></entry>
\r
172 <entry align="left" valign="top"><simpara><literal>config</literal></simpara></entry>
\r
173 <entry align="left" valign="top"><simpara>Evergreen configurable options</simpara></entry>
\r
176 <entry align="left" valign="top"><simpara><literal>container</literal></simpara></entry>
\r
177 <entry align="left" valign="top"><simpara>Buckets for records, call numbers, copies, and users</simpara></entry>
\r
180 <entry align="left" valign="top"><simpara><literal>extend_reporter</literal></simpara></entry>
\r
181 <entry align="left" valign="top"><simpara>Extra views for report definitions</simpara></entry>
\r
184 <entry align="left" valign="top"><simpara><literal>metabib</literal></simpara></entry>
\r
185 <entry align="left" valign="top"><simpara>Metadata about bibliographic records</simpara></entry>
\r
188 <entry align="left" valign="top"><simpara><literal>money</literal></simpara></entry>
\r
189 <entry align="left" valign="top"><simpara>Fines and bills</simpara></entry>
\r
192 <entry align="left" valign="top"><simpara><literal>offline</literal></simpara></entry>
\r
193 <entry align="left" valign="top"><simpara>Offline transactions</simpara></entry>
\r
196 <entry align="left" valign="top"><simpara><literal>permission</literal></simpara></entry>
\r
197 <entry align="left" valign="top"><simpara>User permissions</simpara></entry>
\r
200 <entry align="left" valign="top"><simpara><literal>query</literal></simpara></entry>
\r
201 <entry align="left" valign="top"><simpara>Stored SQL statements</simpara></entry>
\r
204 <entry align="left" valign="top"><simpara><literal>reporter</literal></simpara></entry>
\r
205 <entry align="left" valign="top"><simpara>Report definitions</simpara></entry>
\r
208 <entry align="left" valign="top"><simpara><literal>search</literal></simpara></entry>
\r
209 <entry align="left" valign="top"><simpara>Search functions</simpara></entry>
\r
212 <entry align="left" valign="top"><simpara><literal>serial</literal></simpara></entry>
\r
213 <entry align="left" valign="top"><simpara>Serial MFHD records</simpara></entry>
\r
216 <entry align="left" valign="top"><simpara><literal>stats</literal></simpara></entry>
\r
217 <entry align="left" valign="top"><simpara>Convenient views of circulation and asset statistics</simpara></entry>
\r
220 <entry align="left" valign="top"><simpara><literal>vandelay</literal></simpara></entry>
\r
221 <entry align="left" valign="top"><simpara>MARC batch importer and exporter</simpara></entry>
\r
226 <note><simpara>The term <emphasis>schema</emphasis> has two meanings in the world of SQL databases. We have
\r
227 discussed the schema as a conceptual grouping of tables and other database
\r
228 objects within a given namespace; for example, "the <emphasis role="strong">actor</emphasis> schema contains the
\r
229 tables and functions related to users and organizational units". Another common
\r
230 usage of <emphasis>schema</emphasis> is to refer to the entire data model for a given database;
\r
231 for example, "the Evergreen database schema".</simpara></note>
\r
233 <simplesect id="_columns">
\r
234 <title>Columns</title>
\r
235 <simpara>Each column definition consists of:</simpara>
\r
244 (optionally) a default value to be used whenever a row is inserted that
\r
245 does not contain a specific value
\r
250 (optionally) one or more constraints on the values beyond data type
\r
254 <simpara>Although PostgreSQL supports dozens of data types, Evergreen makes our life
\r
255 easier by only using a handful.</simpara>
\r
258 rowsep="1" colsep="1"
\r
260 <title>PostgreSQL data types used by Evergreen</title>
\r
261 <?dbhtml table-width="90%"?>
\r
262 <?dbfo table-width="90%"?>
\r
264 <colspec colname="col_1" colwidth="77*"/>
\r
265 <colspec colname="col_2" colwidth="77*"/>
\r
266 <colspec colname="col_3" colwidth="230*"/>
\r
269 <entry align="left" valign="top">Type name </entry>
\r
270 <entry align="left" valign="top">Description </entry>
\r
271 <entry align="left" valign="top">Limits</entry>
\r
276 <entry align="left" valign="top"><simpara><literal>INTEGER</literal></simpara></entry>
\r
277 <entry align="left" valign="top"><simpara>Medium integer</simpara></entry>
\r
278 <entry align="left" valign="top"><simpara>-2147483648 to +2147483647</simpara></entry>
\r
281 <entry align="left" valign="top"><simpara><literal>BIGINT</literal></simpara></entry>
\r
282 <entry align="left" valign="top"><simpara>Large integer</simpara></entry>
\r
283 <entry align="left" valign="top"><simpara>-9223372036854775808 to 9223372036854775807</simpara></entry>
\r
286 <entry align="left" valign="top"><simpara><literal>SERIAL</literal></simpara></entry>
\r
287 <entry align="left" valign="top"><simpara>Sequential integer</simpara></entry>
\r
288 <entry align="left" valign="top"><simpara>1 to 2147483647</simpara></entry>
\r
291 <entry align="left" valign="top"><simpara><literal>BIGSERIAL</literal></simpara></entry>
\r
292 <entry align="left" valign="top"><simpara>Large sequential integer</simpara></entry>
\r
293 <entry align="left" valign="top"><simpara>1 to 9223372036854775807</simpara></entry>
\r
296 <entry align="left" valign="top"><simpara><literal>TEXT</literal></simpara></entry>
\r
297 <entry align="left" valign="top"><simpara>Variable length character data</simpara></entry>
\r
298 <entry align="left" valign="top"><simpara>Unlimited length</simpara></entry>
\r
301 <entry align="left" valign="top"><simpara><literal>BOOL</literal></simpara></entry>
\r
302 <entry align="left" valign="top"><simpara>Boolean</simpara></entry>
\r
303 <entry align="left" valign="top"><simpara>TRUE or FALSE</simpara></entry>
\r
306 <entry align="left" valign="top"><simpara><literal>TIMESTAMP WITH TIME ZONE</literal></simpara></entry>
\r
307 <entry align="left" valign="top"><simpara>Timestamp</simpara></entry>
\r
308 <entry align="left" valign="top"><simpara>4713 BC to 294276 AD</simpara></entry>
\r
311 <entry align="left" valign="top"><simpara><literal>TIME</literal></simpara></entry>
\r
312 <entry align="left" valign="top"><simpara>Time</simpara></entry>
\r
313 <entry align="left" valign="top"><simpara>Expressed in HH:MM:SS</simpara></entry>
\r
316 <entry align="left" valign="top"><simpara><literal>NUMERIC</literal>(precision, scale)</simpara></entry>
\r
317 <entry align="left" valign="top"><simpara>Decimal</simpara></entry>
\r
318 <entry align="left" valign="top"><simpara>Up to 1000 digits of precision. In Evergreen mostly used for money
\r
319 values, with a precision of 6 and a scale of 2 (<literal>####.##</literal>).</simpara></entry>
\r
324 <simpara>Full details about these data types are available from the
\r
325 <ulink url="http://www.postgresql.org/docs/8.4/static/datatype.html">data types section of
\r
326 the PostgreSQL manual</ulink>.</simpara>
\r
328 <simplesect id="_constraints">
\r
329 <title>Constraints</title>
\r
330 <simplesect id="_prevent_null_values">
\r
331 <title>Prevent NULL values</title>
\r
332 <simpara>A column definition may include the constraint <literal>NOT NULL</literal> to prevent NULL
\r
333 values. In PostgreSQL, a NULL value is not the equivalent of zero or false or
\r
334 an empty string; it is an explicit non-value with special properties. We’ll
\r
335 talk more about how to work with NULL values when we get to queries.</simpara>
\r
337 <simplesect id="_primary_key">
\r
338 <title>Primary key</title>
\r
339 <simpara>Every table can have at most one primary key. A primary key consists of one or
\r
340 more columns which together uniquely identify each row in a table. If you
\r
341 attempt to insert a row into a table that would create a duplicate or NULL
\r
342 primary key entry, the database rejects the row and returns an error.</simpara>
\r
343 <simpara>Natural primary keys are drawn from the intrinsic properties of the data being
\r
344 modelled. For example, some potential natural primary keys for a table that
\r
345 contains people would be:</simpara>
\r
348 rowsep="1" colsep="1"
\r
350 <title>Example: Some potential natural primary keys for a table of people</title>
\r
351 <?dbhtml table-width="90%"?>
\r
352 <?dbfo table-width="90%"?>
\r
354 <colspec colname="col_1" colwidth="77*"/>
\r
355 <colspec colname="col_2" colwidth="153*"/>
\r
356 <colspec colname="col_3" colwidth="153*"/>
\r
359 <entry align="left" valign="top">Natural key </entry>
\r
360 <entry align="left" valign="top">Pros </entry>
\r
361 <entry align="left" valign="top">Cons</entry>
\r
366 <entry align="left" valign="top"><simpara>First name, last name, address</simpara></entry>
\r
367 <entry align="left" valign="top"><simpara>No two people with the same name would ever live at the same address, right?</simpara></entry>
\r
368 <entry align="left" valign="top"><simpara>Lots of columns force data duplication in referencing tables</simpara></entry>
\r
371 <entry align="left" valign="top"><simpara>SSN or driver’s license</simpara></entry>
\r
372 <entry align="left" valign="top"><simpara>These are guaranteed to be unique</simpara></entry>
\r
373 <entry align="left" valign="top"><simpara>Lots of people don’t have an SSN or a driver’s license</simpara></entry>
\r
378 <simpara>To avoid problems with natural keys, many applications instead define surrogate
\r
379 primary keys. A surrogate primary keys is a column with an autoincrementing
\r
380 integer value added to a table definition that ensures uniqueness.</simpara>
\r
381 <simpara>Evergreen uses surrogate keys (a column named <literal>id</literal> with a <literal>SERIAL</literal> data type)
\r
382 for most of its tables.</simpara>
\r
384 <simplesect id="_foreign_keys">
\r
385 <title>Foreign keys</title>
\r
386 <simpara>Every table can contain zero or more foreign keys: one or more columns that
\r
387 refer to the primary key of another table.</simpara>
\r
388 <simpara>For example, let’s consider Evergreen’s modelling of the basic relationship
\r
389 between copies, call numbers, and bibliographic records. Bibliographic records
\r
390 contained in the <literal>biblio.record_entry</literal> table can have call numbers attached to
\r
391 them. Call numbers are contained in the <literal>asset.call_number</literal> table, and they can
\r
392 have copies attached to them. Copies are contained in the <literal>asset.copy</literal> table.</simpara>
\r
395 rowsep="1" colsep="1"
\r
397 <title>Example: Evergreen’s copy / call number / bibliographic record relationships</title>
\r
398 <?dbhtml table-width="100%"?>
\r
399 <?dbfo table-width="100%"?>
\r
401 <colspec colname="col_1" colwidth="106*"/>
\r
402 <colspec colname="col_2" colwidth="106*"/>
\r
403 <colspec colname="col_3" colwidth="106*"/>
\r
404 <colspec colname="col_4" colwidth="106*"/>
\r
407 <entry align="left" valign="top">Table </entry>
\r
408 <entry align="left" valign="top">Primary key </entry>
\r
409 <entry align="left" valign="top">Column with a foreign key </entry>
\r
410 <entry align="left" valign="top">Points to</entry>
\r
415 <entry align="left" valign="top"><simpara>asset.copy</simpara></entry>
\r
416 <entry align="left" valign="top"><simpara>asset.copy.id</simpara></entry>
\r
417 <entry align="left" valign="top"><simpara>asset.copy.call_number</simpara></entry>
\r
418 <entry align="left" valign="top"><simpara>asset.call_number.id</simpara></entry>
\r
421 <entry align="left" valign="top"><simpara>asset.call_number</simpara></entry>
\r
422 <entry align="left" valign="top"><simpara>asset.call_number.id</simpara></entry>
\r
423 <entry align="left" valign="top"><simpara>asset.call_number.record</simpara></entry>
\r
424 <entry align="left" valign="top"><simpara>biblio.record_entry.id</simpara></entry>
\r
427 <entry align="left" valign="top"><simpara>biblio.record_entry</simpara></entry>
\r
428 <entry align="left" valign="top"><simpara>biblio.record_entry.id</simpara></entry>
\r
429 <entry align="left" valign="top"><simpara></simpara></entry>
\r
430 <entry align="left" valign="top"><simpara></simpara></entry>
\r
436 <simplesect id="_check_constraints">
\r
437 <title>Check constraints</title>
\r
438 <simpara>PostgreSQL enables you to define rules to ensure that the value to be inserted
\r
439 or updated meets certain conditions. For example, you can ensure that an
\r
440 incoming integer value is within a specific range, or that a ZIP code matches a
\r
441 particular pattern.</simpara>
\r
444 <simplesect id="_deconstructing_a_table_definition_statement">
\r
445 <title>Deconstructing a table definition statement</title>
\r
446 <simpara>The <literal>actor.org_address</literal> table is a simple table in the Evergreen schema that
\r
447 we can use as a concrete example of many of the properties of databases that
\r
448 we have discussed so far.</simpara>
\r
449 <programlisting language="sql" linenumbering="unnumbered">CREATE TABLE actor.org_address (
\r
450 id SERIAL PRIMARY KEY, <co id="sqlCO1-1"/>
\r
451 valid BOOL NOT NULL DEFAULT TRUE, <co id="sqlCO1-2"/>
\r
452 address_type TEXT NOT NULL DEFAULT 'MAILING', <co id="sqlCO1-3"/>
\r
453 org_unit INT NOT NULL REFERENCES actor.org_unit (id) <co id="sqlCO1-4"/>
\r
454 DEFERRABLE INITIALLY DEFERRED,
\r
455 street1 TEXT NOT NULL,
\r
456 street2 TEXT, <co id="sqlCO1-5"/>
\r
457 city TEXT NOT NULL,
\r
459 state TEXT NOT NULL,
\r
460 country TEXT NOT NULL,
\r
461 post_code TEXT NOT NULL
\r
462 );</programlisting>
\r
464 <callout arearefs="sqlCO1-1">
\r
466 The column named <literal>id</literal> is defined with a special data type of <literal>SERIAL</literal>; if
\r
467 given no value when a row is inserted into a table, the database automatically
\r
468 generates the next sequential integer value for the column. <literal>SERIAL</literal> is a
\r
469 popular data type for a primary key because it is guaranteed to be unique - and
\r
470 indeed, the constraint for this column identifies it as the <literal>PRIMARY KEY</literal>.
\r
473 <callout arearefs="sqlCO1-2">
\r
475 The data type <literal>BOOL</literal> defines a boolean value: <literal>TRUE</literal> or <literal>FALSE</literal> are the only
\r
476 acceptable values for the column. The constraint <literal>NOT NULL</literal> instructs the
\r
477 database to prevent the column from ever containing a NULL value. The column
\r
478 property <literal>DEFAULT TRUE</literal> instructs the database to automatically set the value
\r
479 of the column to <literal>TRUE</literal> if no value is provided.
\r
482 <callout arearefs="sqlCO1-3">
\r
484 The data type <literal>TEXT</literal> defines a text column of practically unlimited length.
\r
485 As with the previous column, there is a <literal>NOT NULL</literal> constraint, and a default
\r
486 value of <literal>'MAILING'</literal> will result if no other value is supplied.
\r
489 <callout arearefs="sqlCO1-4">
\r
491 The <literal>REFERENCES actor.org_unit (id)</literal> clause indicates that this column has a
\r
492 foreign key relationship to the <literal>actor.org_unit</literal> table, and that the value of
\r
493 this column in every row in this table must have a corresponding value in the
\r
494 <literal>id</literal> column in the referenced table (<literal>actor.org_unit</literal>).
\r
497 <callout arearefs="sqlCO1-5">
\r
499 The column named <literal>street2</literal> demonstrates that not all columns have constraints
\r
500 beyond data type. In this case, the column is allowed to be NULL or to contain a
\r
501 <literal>TEXT</literal> value.
\r
506 <simplesect id="_displaying_a_table_definition_using_literal_psql_literal">
\r
507 <title>Displaying a table definition using <literal>psql</literal></title>
\r
508 <simpara>The <literal>psql</literal> command-line interface is the preferred method for accessing
\r
509 PostgreSQL databases. It offers features like tab-completion, readline support
\r
510 for recalling previous commands, flexible input and output formats, and
\r
511 is accessible via a standard SSH session.</simpara>
\r
512 <simpara>If you press the <literal>Tab</literal> key once after typing one or more characters of the
\r
513 database object name, <literal>psql</literal> automatically completes the name if there are no
\r
514 other matches. If there are other matches for your current input, nothing
\r
515 happens until you press the <literal>Tab</literal> key a second time, at which point <literal>psql</literal>
\r
516 displays all of the matches for your current input.</simpara>
\r
517 <simpara>To display the definition of a database object such as a table, issue the
\r
518 command <literal>\d _object-name_</literal>. For example, to display the definition of the
\r
519 actor.usr_note table:</simpara>
\r
520 <programlisting language="sh" linenumbering="unnumbered">$ psql evergreen <co id="sqlCO2-1"/>
\r
522 Type "help" for help.
\r
524 evergreen=# \d actor.usr_note <co id="sqlCO2-2"/>
\r
525 Table "actor.usr_note"
\r
526 Column | Type | Modifiers
\r
527 -------------+--------------------------+-------------------------------------------------------------
\r
528 id | bigint | not null default nextval('actor.usr_note_id_seq'::regclass)
\r
529 usr | bigint | not null
\r
530 creator | bigint | not null
\r
531 create_date | timestamp with time zone | default now()
\r
532 pub | boolean | not null default false
\r
533 title | text | not null
\r
534 value | text | not null
\r
536 "usr_note_pkey" PRIMARY KEY, btree (id)
\r
537 "actor_usr_note_creator_idx" btree (creator)
\r
538 "actor_usr_note_usr_idx" btree (usr)
\r
539 Foreign-key constraints:
\r
540 "usr_note_creator_fkey" FOREIGN KEY (creator) REFERENCES actor.usr(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
\r
541 "usr_note_usr_fkey" FOREIGN KEY (usr) REFERENCES actor.usr(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
\r
543 evergreen=# \q <co id="sqlCO2-3"/>
\r
546 <callout arearefs="sqlCO2-1">
\r
548 This is the most basic connection to a PostgreSQL database. You can use a
\r
549 number of other flags to specify user name, hostname, port, and other options.
\r
552 <callout arearefs="sqlCO2-2">
\r
554 The <literal>\d</literal> command displays the definition of a database object.
\r
557 <callout arearefs="sqlCO2-3">
\r
559 The <literal>\q</literal> command quits the <literal>psql</literal> session and returns you to the shell prompt.
\r
565 <section id="basic_sql_queries">
\r
566 <title>Basic SQL queries</title>
\r
567 <simplesect id="_the_select_statement">
\r
568 <title>The SELECT statement</title>
\r
569 <simpara>The SELECT statement is the basic tool for retrieving information from a
\r
570 database. The syntax for most SELECT statements is:</simpara>
\r
572 <literallayout><literal>SELECT</literal> [<emphasis>columns(s)</emphasis>]
\r
573 <literal>FROM</literal> [<emphasis>table(s)</emphasis>]
\r
574 [<literal>WHERE</literal> <emphasis>condition(s)</emphasis>]
\r
575 [<literal>GROUP BY</literal> <emphasis>columns(s)</emphasis>]
\r
576 [<literal>HAVING</literal> <emphasis>grouping-condition(s)</emphasis>]
\r
577 [<literal>ORDER BY</literal> <emphasis>column(s)</emphasis>]
\r
578 [<literal>LIMIT</literal> <emphasis>maximum-results</emphasis>]
\r
579 [<literal>OFFSET</literal> <emphasis>start-at-result-#</emphasis>]
\r
582 <simpara>For example, to select all of the columns for each row in the
\r
583 <literal>actor.usr_address</literal> table, issue the following query:</simpara>
\r
584 <programlisting language="sql" linenumbering="unnumbered">SELECT *
\r
585 FROM actor.usr_address
\r
588 <simplesect id="_selecting_particular_columns_from_a_table">
\r
589 <title>Selecting particular columns from a table</title>
\r
590 <simpara><literal>SELECT *</literal> returns all columns from all of the tables included in your query.
\r
591 However, quite often you will want to return only a subset of the possible
\r
592 columns. You can retrieve specific columns by listing the names of the columns
\r
593 you want after the <literal>SELECT</literal> keyword. Separate each column name with a comma.</simpara>
\r
594 <simpara>For example, to select just the city, county, and state from the
\r
595 actor.usr_address table, issue the following query:</simpara>
\r
596 <programlisting language="sql" linenumbering="unnumbered">SELECT city, county, state
\r
597 FROM actor.usr_address
\r
600 <simplesect id="_sorting_results_with_the_order_by_clause">
\r
601 <title>Sorting results with the ORDER BY clause</title>
\r
602 <simpara>By default, a SELECT statement returns rows matching your query with no
\r
603 guarantee of any particular order in which they are returned. To force
\r
604 the rows to be returned in a particular order, use the ORDER BY clause
\r
605 to specify one or more columns to determine the sorting priority of the
\r
607 <simpara>For example, to sort the rows returned from your <literal>actor.usr_address</literal> query by
\r
608 city, with county and then zip code as the tie breakers, issue the
\r
609 following query:</simpara>
\r
610 <programlisting language="sql" linenumbering="unnumbered">SELECT city, county, state
\r
611 FROM actor.usr_address
\r
612 ORDER BY city, county, post_code
\r
615 <simplesect id="_filtering_results_with_the_where_clause">
\r
616 <title>Filtering results with the WHERE clause</title>
\r
617 <simpara>Thus far, your results have been returning all of the rows in the table.
\r
618 Normally, however, you would want to restrict the rows that are returned to the
\r
619 subset of rows that match one or more conditions of your search. The <literal>WHERE</literal>
\r
620 clause enables you to specify a set of conditions that filter your query
\r
621 results. Each condition in the <literal>WHERE</literal> clause is an SQL expression that returns
\r
622 a boolean (true or false) value.</simpara>
\r
623 <simpara>For example, to restrict the results returned from your <literal>actor.usr_address</literal>
\r
624 query to only those rows containing a state value of <emphasis>Connecticut</emphasis>, issue the
\r
625 following query:</simpara>
\r
626 <programlisting language="sql" linenumbering="unnumbered">SELECT city, county, state
\r
627 FROM actor.usr_address
\r
628 WHERE state = 'Connecticut'
\r
629 ORDER BY city, county, post_code
\r
631 <simpara>You can include more conditions in the <literal>WHERE</literal> clause with the <literal>OR</literal> and <literal>AND</literal>
\r
632 operators. For example, to further restrict the results returned from your
\r
633 <literal>actor.usr_address</literal> query to only those rows where the state column contains a
\r
634 value of <emphasis>Connecticut</emphasis> and the city column contains a value of <emphasis>Hartford</emphasis>,
\r
635 issue the following query:</simpara>
\r
636 <programlisting language="sql" linenumbering="unnumbered">SELECT city, county, state
\r
637 FROM actor.usr_address
\r
638 WHERE state = 'Connecticut'
\r
639 AND city = 'Hartford'
\r
640 ORDER BY city, county, post_code
\r
642 <note><simpara>To return rows where the state is <emphasis>Connecticut</emphasis> and the city is <emphasis>Hartford</emphasis> or
\r
643 <emphasis>New Haven</emphasis>, you must use parentheses to explicitly group the city value
\r
644 conditions together, or else the database will evaluate the <literal>OR city = 'New
\r
645 Haven'</literal> clause entirely on its own and match all rows where the city column is
\r
646 <emphasis>New Haven</emphasis>, even though the state might not be <emphasis>Connecticut</emphasis>.</simpara></note>
\r
647 <formalpara><title>Trouble with OR</title><para>
\r
648 <programlisting language="sql" linenumbering="unnumbered">SELECT city, county, state
\r
649 FROM actor.usr_address
\r
650 WHERE state = 'Connecticut'
\r
651 AND city = 'Hartford' OR city = 'New Haven'
\r
652 ORDER BY city, county, post_code
\r
655 -- Can return unwanted rows because the OR is not grouped!</programlisting>
\r
656 </para></formalpara>
\r
657 <formalpara><title>Grouped OR’ed conditions</title><para>
\r
658 <programlisting language="sql" linenumbering="unnumbered">SELECT city, county, state
\r
659 FROM actor.usr_address
\r
660 WHERE state = 'Connecticut'
\r
661 AND (city = 'Hartford' OR city = 'New Haven')
\r
662 ORDER BY city, county, post_code
\r
665 -- The parentheses ensure that the OR is applied to the cities, and the
\r
666 -- state in either case must be 'Connecticut'</programlisting>
\r
667 </para></formalpara>
\r
668 <simplesect id="_comparison_operators">
\r
669 <title>Comparison operators</title>
\r
670 <simpara>Here is a partial list of comparison operators that are commonly used in
\r
671 <literal>WHERE</literal> clauses:</simpara>
\r
672 <simplesect id="_comparing_two_scalar_values">
\r
673 <title>Comparing two scalar values</title>
\r
677 <literal>x = y</literal> (equal to)
\r
682 <literal>x != y</literal> (not equal to)
\r
687 <literal>x < y</literal> (less than)
\r
692 <literal>x > y</literal> (greater than)
\r
697 <literal>x LIKE y</literal> (TEXT value x matches a subset of TEXT y, where y is a string that
\r
698 can contain <emphasis>%</emphasis> as a wildcard for 0 or more characters, and <emphasis>_</emphasis> as a wildcard
\r
699 for a single character. For example, <literal>WHERE 'all you can eat fish and chips
\r
700 and a big stick' LIKE '%fish%stick'</literal> would return TRUE)
\r
705 <literal>x ILIKE y</literal> (like LIKE, but the comparison ignores upper-case / lower-case)
\r
710 <literal>x IN y</literal> (x is in the list of values y, where y can be a list or a SELECT
\r
711 statement that returns a list)
\r
718 <simplesect id="_null_values">
\r
719 <title>NULL values</title>
\r
720 <simpara>SQL databases have a special way of representing the value of a column that has
\r
721 no value: <literal>NULL</literal>. A <literal>NULL</literal> value is not equal to zero, and is not an empty
\r
722 string; it is equal to nothing, not even another <literal>NULL</literal>, because it has no value
\r
723 that can be compared.</simpara>
\r
724 <simpara>To return rows from a table where a given column is not <literal>NULL</literal>, use the
\r
725 <literal>IS NOT NULL</literal> comparison operator.</simpara>
\r
726 <formalpara><title>Retrieving rows where a column is not <literal>NULL</literal></title><para>
\r
727 <programlisting language="sql" linenumbering="unnumbered">SELECT id, first_given_name, family_name
\r
729 WHERE second_given_name IS NOT NULL
\r
731 </para></formalpara>
\r
732 <simpara>Similarly, to return rows from a table where a given column is <literal>NULL</literal>, use
\r
733 the <literal>IS NULL</literal> comparison operator.</simpara>
\r
734 <formalpara><title>Retrieving rows where a column is <literal>NULL</literal></title><para>
\r
735 <programlisting language="sql" linenumbering="unnumbered">SELECT id, first_given_name, second_given_name, family_name
\r
737 WHERE second_given_name IS NULL
\r
740 id | first_given_name | second_given_name | family_name
\r
741 ----+------------------+-------------------+----------------
\r
742 1 | Administrator | | System Account
\r
743 (1 row)</programlisting>
\r
744 </para></formalpara>
\r
745 <simpara>Notice that the <literal>NULL</literal> value in the output is displayed as empty space,
\r
746 indistinguishable from an empty string; this is the default display method in
\r
747 <literal>psql</literal>. You can change the behaviour of <literal>psql</literal> using the <literal>pset</literal> command:</simpara>
\r
748 <formalpara><title>Changing the way <literal>NULL</literal> values are displayed in <literal>psql</literal></title><para>
\r
749 <programlisting language="sql" linenumbering="unnumbered">evergreen=# \pset null '(null)'
\r
750 Null display is '(null)'.
\r
752 SELECT id, first_given_name, second_given_name, family_name
\r
754 WHERE second_given_name IS NULL
\r
757 id | first_given_name | second_given_name | family_name
\r
758 ----+------------------+-------------------+----------------
\r
759 1 | Administrator | (null) | System Account
\r
760 (1 row)</programlisting>
\r
761 </para></formalpara>
\r
762 <simpara>Database queries within programming languages such as Perl and C have
\r
763 special methods of checking for <literal>NULL</literal> values in returned results.</simpara>
\r
765 <simplesect id="_text_delimiter">
\r
766 <title>Text delimiter: '</title>
\r
767 <simpara>You might have noticed that we have been using the <literal>'</literal> character to delimit
\r
768 TEXT values and values such as dates and times that are TEXT values. Sometimes,
\r
769 however, your TEXT value itself contains a <literal>'</literal> character, such as the word
\r
770 <literal>you’re</literal>. To prevent the database from prematurely ending the TEXT value at the
\r
771 first <literal>'</literal> character and returning a syntax error, use another <literal>'</literal> character to
\r
772 escape the following <literal>'</literal> character.</simpara>
\r
773 <simpara>For example, to change the last name of a user in the <literal>actor.usr</literal> table to
\r
774 <literal>L’estat</literal>, issue the following SQL:</simpara>
\r
775 <formalpara><title>Escaping <literal>'</literal> in TEXT values</title><para>
\r
776 <programlisting language="sql" linenumbering="unnumbered">UPDATE actor.usr
\r
777 SET family_name = 'L''estat'
\r
780 FROM permission.grp_tree
\r
781 WHERE name = 'Vampire'
\r
784 </para></formalpara>
\r
785 <simpara>When you retrieve the row from the database, the value is displayed with just
\r
786 a single <literal>'</literal> character:</simpara>
\r
787 <programlisting language="sql" linenumbering="unnumbered">SELECT id, family_name
\r
789 WHERE family_name = 'L''estat'
\r
795 (1 row)</programlisting>
\r
797 <simplesect id="_grouping_and_eliminating_results_with_the_group_by_and_having_clauses">
\r
798 <title>Grouping and eliminating results with the GROUP BY and HAVING clauses</title>
\r
799 <simpara>The GROUP BY clause returns a unique set of results for the desired columns.
\r
800 This is most often used in conjunction with an aggregate function to present
\r
801 results for a range of values in a single query, rather than requiring you to
\r
802 issue one query per target value.</simpara>
\r
803 <formalpara><title>Returning unique results of a single column with <literal>GROUP BY</literal></title><para>
\r
804 <programlisting language="sql" linenumbering="unnumbered">SELECT grp
\r
805 FROM permission.grp_perm_map
\r
819 (8 rows)</programlisting>
\r
820 </para></formalpara>
\r
821 <simpara>While <literal>GROUP BY</literal> can be useful for a single column, it is more often used
\r
822 to return the distinct results across multiple columns. For example, the
\r
823 following query shows us which groups have permissions at each depth in
\r
824 the library hierarchy:</simpara>
\r
825 <formalpara><title>Returning unique results of multiple columns with <literal>GROUP BY</literal></title><para>
\r
826 <programlisting language="sql" linenumbering="unnumbered">SELECT grp, depth
\r
827 FROM permission.grp_perm_map
\r
828 GROUP BY grp, depth
\r
829 ORDER BY depth, grp;
\r
848 (15 rows)</programlisting>
\r
849 </para></formalpara>
\r
850 <simpara>Extending this further, you can use the <literal>COUNT()</literal> aggregate function to
\r
851 also return the number of times each unique combination of <literal>grp</literal> and <literal>depth</literal>
\r
852 appears in the table. <emphasis>Yes, this is a sneak peek at the use of aggregate
\r
853 functions! Keeners.</emphasis></simpara>
\r
854 <formalpara><title>Counting unique column combinations with <literal>GROUP BY</literal></title><para>
\r
855 <programlisting language="sql" linenumbering="unnumbered">SELECT grp, depth, COUNT(grp)
\r
856 FROM permission.grp_perm_map
\r
857 GROUP BY grp, depth
\r
858 ORDER BY depth, grp;
\r
860 grp | depth | count
\r
861 -----+-------+-------
\r
877 (15 rows)</programlisting>
\r
878 </para></formalpara>
\r
879 <simpara>You can use the <literal>WHERE</literal> clause to restrict the returned results before grouping
\r
880 is applied to the results. The following query restricts the results to those
\r
881 rows that have a depth of 0.</simpara>
\r
882 <formalpara><title>Using the <literal>WHERE</literal> clause with <literal>GROUP BY</literal></title><para>
\r
883 <programlisting language="sql" linenumbering="unnumbered">SELECT grp, COUNT(grp)
\r
884 FROM permission.grp_perm_map
\r
898 (6 rows)</programlisting>
\r
899 </para></formalpara>
\r
900 <simpara>To restrict results after grouping has been applied to the rows, use the
\r
901 <literal>HAVING</literal> clause; this is typically used to restrict results based on
\r
902 a comparison to the value returned by an aggregate function. For example,
\r
903 the following query restricts the returned rows to those that have more than
\r
904 5 occurrences of the same value for <literal>grp</literal> in the table.</simpara>
\r
905 <formalpara><title><literal>GROUP BY</literal> restricted by a <literal>HAVING</literal> clause</title><para>
\r
906 <programlisting language="sql" linenumbering="unnumbered">SELECT grp, COUNT(grp)
\r
907 FROM permission.grp_perm_map
\r
909 HAVING COUNT(grp) > 5
\r
920 (6 rows)</programlisting>
\r
921 </para></formalpara>
\r
923 <simplesect id="_eliminating_duplicate_results_with_the_distinct_keyword">
\r
924 <title>Eliminating duplicate results with the DISTINCT keyword</title>
\r
925 <simpara><literal>GROUP BY</literal> is one way of eliminating duplicate results from the rows returned
\r
926 by your query. The purpose of the <literal>DISTINCT</literal> keyword is to remove duplicate
\r
927 rows from the results of your query. However, it works, and it is easy - so if
\r
928 you just want a quick list of the unique set of values for a column or set of
\r
929 columns, the <literal>DISTINCT</literal> keyword might be appropriate.</simpara>
\r
930 <simpara>On the other hand, if you are getting duplicate rows back when you don’t expect
\r
931 them, then applying the <literal>DISTINCT</literal> keyword might be a sign that you are
\r
932 papering over a real problem.</simpara>
\r
933 <formalpara><title>Returning unique results of multiple columns with <literal>DISTINCT</literal></title><para>
\r
934 <programlisting language="sql" linenumbering="unnumbered">SELECT DISTINCT grp, depth
\r
935 FROM permission.grp_perm_map
\r
936 ORDER BY depth, grp
\r
956 (15 rows)</programlisting>
\r
957 </para></formalpara>
\r
959 <simplesect id="_paging_through_results_with_the_limit_and_offset_clauses">
\r
960 <title>Paging through results with the LIMIT and OFFSET clauses</title>
\r
961 <simpara>The <literal>LIMIT</literal> clause restricts the total number of rows returned from your query
\r
962 and is useful if you just want to list a subset of a large number of rows. For
\r
963 example, in the following query we list the five most frequently used
\r
964 circulation modifiers:</simpara>
\r
965 <formalpara><title>Using the <literal>LIMIT</literal> clause to restrict results</title><para>
\r
966 <programlisting language="sql" linenumbering="unnumbered">SELECT circ_modifier, COUNT(circ_modifier)
\r
968 GROUP BY circ_modifier
\r
973 circ_modifier | count
\r
974 ---------------+--------
\r
980 (5 rows)</programlisting>
\r
981 </para></formalpara>
\r
982 <simpara>When you use the <literal>LIMIT</literal> clause to restrict the total number of rows returned
\r
983 by your query, you can also use the <literal>OFFSET</literal> clause to determine which subset
\r
984 of the rows will be returned. The use of the <literal>OFFSET</literal> clause assumes that
\r
985 you’ve used the <literal>ORDER BY</literal> clause to impose order on the results.</simpara>
\r
986 <simpara>In the following example, we use the <literal>OFFSET</literal> clause to get results 6 through
\r
987 10 from the same query that we prevously executed.</simpara>
\r
988 <formalpara><title>Using the <literal>OFFSET</literal> clause to return a specific subset of rows</title><para>
\r
989 <programlisting language="sql" linenumbering="unnumbered">SELECT circ_modifier, COUNT(circ_modifier)
\r
991 GROUP BY circ_modifier
\r
997 circ_modifier | count
\r
998 ---------------+--------
\r
999 LAW SERIAL | 102758
\r
1004 (5 rows)</programlisting>
\r
1005 </para></formalpara>
\r
1008 <section id="advanced_sql_queries">
\r
1009 <title>Advanced SQL queries</title>
\r
1010 <simplesect id="_transforming_column_values_with_functions">
\r
1011 <title>Transforming column values with functions</title>
\r
1012 <simpara>PostgreSQL includes many built-in functions for manipulating column data.
\r
1013 You can also create your own functions (and Evergreen does make use of
\r
1014 many custom functions). There are two types of functions used in
\r
1015 databases: scalar functions and aggregate functions.</simpara>
\r
1016 <simplesect id="_scalar_functions">
\r
1017 <title>Scalar functions</title>
\r
1018 <simpara>Scalar functions transform each value of the target column. If your query
\r
1019 would return 50 values for a column in a given query, and you modify your
\r
1020 query to apply a scalar function to the values returned for that column,
\r
1021 it will still return 50 values. For example, the UPPER() function,
\r
1022 used to convert text values to upper-case, modifies the results in the
\r
1023 following set of queries:</simpara>
\r
1024 <formalpara><title>Using the UPPER() scalar function to convert text values to upper-case</title><para>
\r
1025 <programlisting language="sql" linenumbering="unnumbered">-- First, without the UPPER() function for comparison
\r
1026 SELECT shortname, name
\r
1027 FROM actor.org_unit
\r
1032 -----------+-----------------------
\r
1033 CONS | Example Consortium
\r
1034 SYS1 | Example System 1
\r
1035 SYS2 | Example System 2
\r
1038 -- Now apply the UPPER() function to the name column
\r
1039 SELECT shortname, UPPER(name)
\r
1040 FROM actor.org_unit
\r
1045 -----------+--------------------
\r
1046 CONS | EXAMPLE CONSORTIUM
\r
1047 SYS1 | EXAMPLE SYSTEM 1
\r
1048 SYS2 | EXAMPLE SYSTEM 2
\r
1049 (3 rows)</programlisting>
\r
1050 </para></formalpara>
\r
1051 <simpara>There are so many scalar functions in PostgreSQL that we cannot cover them
\r
1052 all here, but we can list some of the most commonly used functions:</simpara>
\r
1056 || - concatenates two text values together
\r
1061 COALESCE() - returns the first non-NULL value from the list of arguments
\r
1066 LOWER() - returns a text value converted to lower-case
\r
1071 REPLACE() - returns a text value after replacing all occurrences of a given text value with a different text value
\r
1076 REGEXP_REPLACE() - returns a text value after being transformed by a regular expression
\r
1081 UPPER() - returns a text value converted to upper-case
\r
1085 <simpara>For a complete list of scalar functions, see
\r
1086 <ulink url="http://www.postgresql.org/docs/8.3/interactive/functions.html">the PostgreSQL function documentation</ulink>.</simpara>
\r
1088 <simplesect id="_aggregate_functions">
\r
1089 <title>Aggregate functions</title>
\r
1090 <simpara>Aggregate functions return a single value computed from the the complete set of
\r
1091 values returned for the specified column.</simpara>
\r
1121 <simplesect id="_sub_selects">
\r
1122 <title>Sub-selects</title>
\r
1123 <simpara>A sub-select is the technique of using the results of one query to feed
\r
1124 into another query. You can, for example, return a set of values from
\r
1125 one column in a SELECT statement to be used to satisfy the IN() condition
\r
1126 of another SELECT statement; or you could return the MAX() value of a
\r
1127 column in a SELECT statement to match the = condition of another SELECT
\r
1128 statement.</simpara>
\r
1129 <simpara>For example, in the following query we use a sub-select to restrict the copies
\r
1130 returned by the main SELECT statement to only those locations that have an
\r
1131 <literal>opac_visible</literal> value of <literal>TRUE</literal>:</simpara>
\r
1132 <formalpara><title>Sub-select example</title><para>
\r
1133 <programlisting language="sql" linenumbering="unnumbered">SELECT call_number
\r
1135 WHERE deleted IS FALSE
\r
1138 FROM asset.copy_location
\r
1139 WHERE opac_visible IS TRUE
\r
1141 ;</programlisting>
\r
1142 </para></formalpara>
\r
1143 <simpara>Sub-selects can be an approachable way to breaking down a problem that
\r
1144 requires matching values between different tables, and often result in
\r
1145 a clearly expressed solution to a problem. However, if you start writing
\r
1146 sub-selects within sub-selects, you should consider tackling the problem
\r
1147 with joins instead.</simpara>
\r
1149 <simplesect id="_joins">
\r
1150 <title>Joins</title>
\r
1151 <simpara>Joins enable you to access the values from multiple tables in your query
\r
1152 results and comparison operators. For example, joins are what enable you to
\r
1153 relate a bibliographic record to a barcoded copy via the <literal>biblio.record_entry</literal>,
\r
1154 <literal>asset.call_number</literal>, and <literal>asset.copy</literal> tables. In this section, we discuss the
\r
1155 most common kind of join—the inner join—as well as the less common outer join
\r
1156 and some set operations which can compare and contrast the values returned by
\r
1157 separate queries.</simpara>
\r
1158 <simpara>When we talk about joins, we are going to talk about the left-hand table and
\r
1159 the right-hand table that participate in the join. Every join brings together
\r
1160 just two tables - but you can use an unlimited (for our purposes) number
\r
1161 of joins in a single SQL statement. Each time you use a join, you effectively
\r
1162 create a new table, so when you add a second join clause to a statement,
\r
1163 table 1 and table 2 (which were the left-hand table and the right-hand table
\r
1164 for the first join) now act as a merged left-hand table and the new table
\r
1165 in the second join clause is the right-hand table.</simpara>
\r
1166 <simpara>Clear as mud? Okay, let’s look at some examples.</simpara>
\r
1167 <simplesect id="_inner_joins">
\r
1168 <title>Inner joins</title>
\r
1169 <simpara>An inner join returns all of the columns from the left-hand table in the join
\r
1170 with all of the columns from the right-hand table in the joins that match a
\r
1171 condition in the ON clause. Typically, you use the <literal>=</literal> operator to match the
\r
1172 foreign key of the left-hand table with the primary key of the right-hand
\r
1173 table to follow the natural relationship between the tables.</simpara>
\r
1174 <simpara>In the following example, we return all of columns from the <literal>actor.usr</literal> and
\r
1175 <literal>actor.org_unit</literal> tables, joined on the relationship between the user’s home
\r
1176 library and the library’s ID. Notice in the results that some columns, like
\r
1177 <literal>id</literal> and <literal>mailing_address</literal>, appear twice; this is because both the <literal>actor.usr</literal>
\r
1178 and <literal>actor.org_unit</literal> tables include columns with these names. This is also why
\r
1179 we have to fully qualify the column names in our queries with the schema and
\r
1180 table names.</simpara>
\r
1181 <formalpara><title>A simple inner join</title><para>
\r
1182 <programlisting language="sql" linenumbering="unnumbered">SELECT *
\r
1184 INNER JOIN actor.org_unit ON actor.usr.home_ou = actor.org_unit.id
\r
1185 WHERE actor.org_unit.shortname = 'CONS'
\r
1188 -[ RECORD 1 ]------------------+---------------------------------
\r
1199 claims_never_checked_out_count | 0
\r
1205 mailing_address | 1
\r
1206 billing_address | 1
\r
1208 name | Example Consortium
\r
1212 fiscal_calendar | 1</programlisting>
\r
1213 </para></formalpara>
\r
1214 <simpara>Of course, you do not have to return every column from the joined tables;
\r
1215 you can (and should) continue to specify only the columns that you want to
\r
1216 return. In the following example, we count the number of borrowers for
\r
1217 every user profile in a given library by joining the <literal>permission.grp_tree</literal>
\r
1218 table where profiles are defined against the <literal>actor.usr</literal> table, and then
\r
1219 joining the <literal>actor.org_unit</literal> table to give us access to the user’s home
\r
1220 library:</simpara>
\r
1221 <formalpara><title>Borrower Count by Profile (Adult, Child, etc)/Library</title><para>
\r
1222 <programlisting language="sql" linenumbering="unnumbered">SELECT permission.grp_tree.name, actor.org_unit.name, COUNT(permission.grp_tree.name)
\r
1224 INNER JOIN permission.grp_tree
\r
1225 ON actor.usr.profile = permission.grp_tree.id
\r
1226 INNER JOIN actor.org_unit
\r
1227 ON actor.org_unit.id = actor.usr.home_ou
\r
1228 WHERE actor.usr.deleted IS FALSE
\r
1229 GROUP BY permission.grp_tree.name, actor.org_unit.name
\r
1230 ORDER BY actor.org_unit.name, permission.grp_tree.name
\r
1233 name | name | count
\r
1234 -------+--------------------+-------
\r
1235 Users | Example Consortium | 1
\r
1236 (1 row)</programlisting>
\r
1237 </para></formalpara>
\r
1239 <simplesect id="_aliases">
\r
1240 <title>Aliases</title>
\r
1241 <simpara>So far we have been fully-qualifying all of our table names and column names to
\r
1242 prevent any confusion. This quickly gets tiring with lengthy qualified
\r
1243 table names like <literal>permission.grp_tree</literal>, so the SQL syntax enables us to assign
\r
1244 aliases to table names and column names. When you define an alias for a table
\r
1245 name, you can access its column throughout the rest of the statement by simply
\r
1246 appending the column name to the alias with a period; for example, if you assign
\r
1247 the alias <literal>au</literal> to the <literal>actor.usr</literal> table, you can access the <literal>actor.usr.id</literal>
\r
1248 column through the alias as <literal>au.id</literal>.</simpara>
\r
1249 <simpara>The formal syntax for declaring an alias for a column is to follow the column
\r
1250 name in the result columns clause with <literal>AS</literal> <emphasis>alias</emphasis>. To declare an alias for a table name,
\r
1251 follow the table name in the FROM clause (including any JOIN statements) with
\r
1252 <literal>AS</literal> <emphasis>alias</emphasis>. However, the <literal>AS</literal> keyword is optional for tables (and columns as
\r
1253 of PostgreSQL 8.4), and in practice most SQL statements leave it out. For
\r
1254 example, we can write the previous INNER JOIN statement example using aliases
\r
1255 instead of fully-qualified identifiers:</simpara>
\r
1256 <formalpara><title>Borrower Count by Profile (using aliases)</title><para>
\r
1257 <programlisting language="sql" linenumbering="unnumbered">SELECT pgt.name AS "Profile", aou.name AS "Library", COUNT(pgt.name) AS "Count"
\r
1259 INNER JOIN permission.grp_tree pgt
\r
1260 ON au.profile = pgt.id
\r
1261 INNER JOIN actor.org_unit aou
\r
1262 ON aou.id = au.home_ou
\r
1263 WHERE au.deleted IS FALSE
\r
1264 GROUP BY pgt.name, aou.name
\r
1265 ORDER BY aou.name, pgt.name
\r
1268 Profile | Library | Count
\r
1269 ---------+--------------------+-------
\r
1270 Users | Example Consortium | 1
\r
1271 (1 row)</programlisting>
\r
1272 </para></formalpara>
\r
1273 <simpara>A nice side effect of declaring an alias for your columns is that the alias
\r
1274 is used as the column header in the results table. The previous version of
\r
1275 the query, which didn’t use aliased column names, had two columns named
\r
1276 <literal>name</literal>; this version of the query with aliases results in a clearer
\r
1277 categorization.</simpara>
\r
1279 <simplesect id="_outer_joins">
\r
1280 <title>Outer joins</title>
\r
1281 <simpara>An outer join returns all of the rows from one or both of the tables
\r
1282 participating in the join.</simpara>
\r
1286 For a LEFT OUTER JOIN, the join returns all of the rows from the left-hand
\r
1287 table and the rows matching the join condition from the right-hand table, with
\r
1288 NULL values for the rows with no match in the right-hand table.
\r
1293 A RIGHT OUTER JOIN behaves in the same way as a LEFT OUTER JOIN, with the
\r
1294 exception that all rows are returned from the right-hand table participating in
\r
1300 For a FULL OUTER JOIN, the join returns all the rows from both the left-hand
\r
1301 and right-hand tables, with NULL values for the rows with no match in either
\r
1302 the left-hand or right-hand table.
\r
1306 <formalpara><title>Base tables for the OUTER JOIN examples</title><para>
\r
1307 <programlisting language="sql" linenumbering="unnumbered">SELECT * FROM aaa;
\r
1318 SELECT * FROM bbb;
\r
1321 ----+-------+----------
\r
1324 5 | five | fivefive
\r
1326 (4 rows)</programlisting>
\r
1327 </para></formalpara>
\r
1328 <formalpara><title>Example of a LEFT OUTER JOIN</title><para>
\r
1329 <programlisting language="sql" linenumbering="unnumbered">SELECT * FROM aaa
\r
1330 LEFT OUTER JOIN bbb ON aaa.id = bbb.id
\r
1332 id | stuff | id | stuff | foo
\r
1333 ----+-------+----+-------+----------
\r
1334 1 | one | 1 | one | oneone
\r
1335 2 | two | 2 | two | twotwo
\r
1338 5 | five | 5 | five | fivefive
\r
1339 (5 rows)</programlisting>
\r
1340 </para></formalpara>
\r
1341 <formalpara><title>Example of a RIGHT OUTER JOIN</title><para>
\r
1342 <programlisting language="sql" linenumbering="unnumbered">SELECT * FROM aaa
\r
1343 RIGHT OUTER JOIN bbb ON aaa.id = bbb.id
\r
1345 id | stuff | id | stuff | foo
\r
1346 ----+-------+----+-------+----------
\r
1347 1 | one | 1 | one | oneone
\r
1348 2 | two | 2 | two | twotwo
\r
1349 5 | five | 5 | five | fivefive
\r
1350 | | 6 | six | sixsix
\r
1351 (4 rows)</programlisting>
\r
1352 </para></formalpara>
\r
1353 <formalpara><title>Example of a FULL OUTER JOIN</title><para>
\r
1354 <programlisting language="sql" linenumbering="unnumbered">SELECT * FROM aaa
\r
1355 FULL OUTER JOIN bbb ON aaa.id = bbb.id
\r
1357 id | stuff | id | stuff | foo
\r
1358 ----+-------+----+-------+----------
\r
1359 1 | one | 1 | one | oneone
\r
1360 2 | two | 2 | two | twotwo
\r
1363 5 | five | 5 | five | fivefive
\r
1364 | | 6 | six | sixsix
\r
1365 (6 rows)</programlisting>
\r
1366 </para></formalpara>
\r
1368 <simplesect id="_self_joins">
\r
1369 <title>Self joins</title>
\r
1370 <simpara>It is possible to join a table to itself. You can, in fact you must, use
\r
1371 aliases to disambiguate the references to the table.</simpara>
\r
1374 <simplesect id="_set_operations">
\r
1375 <title>Set operations</title>
\r
1376 <simpara>Relational databases are effectively just an efficient mechanism for
\r
1377 manipulating sets of values; they are implementations of set theory. There are
\r
1378 three operators for sets (tables) in which each set must have the same number
\r
1379 of columns with compatible data types: the union, intersection, and difference
\r
1380 operators.</simpara>
\r
1381 <formalpara><title>Base tables for the set operation examples</title><para>
\r
1382 <programlisting language="sql" linenumbering="unnumbered">SELECT * FROM aaa;
\r
1393 SELECT * FROM bbb;
\r
1396 ----+-------+----------
\r
1399 5 | five | fivefive
\r
1401 (4 rows)</programlisting>
\r
1402 </para></formalpara>
\r
1403 <simplesect id="_union">
\r
1404 <title>Union</title>
\r
1405 <simpara>The <literal>UNION</literal> operator returns the distinct set of rows that are members of
\r
1406 either or both of the left-hand and right-hand tables. The <literal>UNION</literal> operator
\r
1407 does not return any duplicate rows. To return duplicate rows, use the
\r
1408 <literal>UNION ALL</literal> operator.</simpara>
\r
1409 <formalpara><title>Example of a UNION set operation</title><para>
\r
1410 <programlisting language="sql" linenumbering="unnumbered">-- The parentheses are not required, but are intended to help
\r
1411 -- illustrate the sets participating in the set operation
\r
1432 (6 rows)</programlisting>
\r
1433 </para></formalpara>
\r
1435 <simplesect id="_intersection">
\r
1436 <title>Intersection</title>
\r
1437 <simpara>The <literal>INTERSECT</literal> operator returns the distinct set of rows that are common to
\r
1438 both the left-hand and right-hand tables. To return duplicate rows, use the
\r
1439 <literal>INTERSECT ALL</literal> operator.</simpara>
\r
1440 <formalpara><title>Example of an INTERSECT set operation</title><para>
\r
1441 <programlisting language="sql" linenumbering="unnumbered">(
\r
1458 (3 rows)</programlisting>
\r
1459 </para></formalpara>
\r
1461 <simplesect id="_difference">
\r
1462 <title>Difference</title>
\r
1463 <simpara>The <literal>EXCEPT</literal> operator returns the rows in the left-hand table that do not
\r
1464 exist in the right-hand table. You are effectively subtracting the common
\r
1465 rows from the left-hand table.</simpara>
\r
1466 <formalpara><title>Example of an EXCEPT set operation</title><para>
\r
1467 <programlisting language="sql" linenumbering="unnumbered">(
\r
1485 -- Order matters: switch the left-hand and right-hand tables
\r
1486 -- and you get a different result
\r
1502 (1 row)</programlisting>
\r
1503 </para></formalpara>
\r
1506 <simplesect id="_views">
\r
1507 <title>Views</title>
\r
1508 <simpara>A view is a persistent <literal>SELECT</literal> statement that acts like a read-only table.
\r
1509 To create a view, issue the <literal>CREATE VIEW</literal> statement, giving the view a name
\r
1510 and a <literal>SELECT</literal> statement on which the view is built.</simpara>
\r
1511 <simpara>The following example creates a view based on our borrower profile count:</simpara>
\r
1512 <formalpara><title>Creating a view</title><para>
\r
1513 <programlisting language="sql" linenumbering="unnumbered">CREATE VIEW actor.borrower_profile_count AS
\r
1514 SELECT pgt.name AS "Profile", aou.name AS "Library", COUNT(pgt.name) AS "Count"
\r
1516 INNER JOIN permission.grp_tree pgt
\r
1517 ON au.profile = pgt.id
\r
1518 INNER JOIN actor.org_unit aou
\r
1519 ON aou.id = au.home_ou
\r
1520 WHERE au.deleted IS FALSE
\r
1521 GROUP BY pgt.name, aou.name
\r
1522 ORDER BY aou.name, pgt.name
\r
1523 ;</programlisting>
\r
1524 </para></formalpara>
\r
1525 <simpara>When you subsequently select results from the view, you can apply additional
\r
1526 <literal>WHERE</literal> clauses to filter the results, or <literal>ORDER BY</literal> clauses to change the
\r
1527 order of the returned rows. In the following examples, we issue a simple
\r
1528 <literal>SELECT *</literal> statement to show that the default results are returned in the
\r
1529 same order from the view as the equivalent SELECT statement would be returned.
\r
1530 Then we issue a <literal>SELECT</literal> statement with a <literal>WHERE</literal> clause to further filter the
\r
1531 results.</simpara>
\r
1532 <formalpara><title>Selecting results from a view</title><para>
\r
1533 <programlisting language="sql" linenumbering="unnumbered">SELECT * FROM actor.borrower_profile_count;
\r
1535 Profile | Library | Count
\r
1536 ----------------------------+----------------------------+-------
\r
1537 Faculty | University Library | 208
\r
1538 Graduate | University Library | 16
\r
1539 Patrons | University Library | 62
\r
1542 -- You can still filter your results with WHERE clauses
\r
1544 FROM actor.borrower_profile_count
\r
1545 WHERE "Profile" = 'Faculty';
\r
1547 Profile | Library | Count
\r
1548 ---------+----------------------------+-------
\r
1549 Faculty | University Library | 208
\r
1550 Faculty | College Library | 64
\r
1551 Faculty | College Library 2 | 102
\r
1552 Faculty | University Library 2 | 776
\r
1553 (4 rows)</programlisting>
\r
1554 </para></formalpara>
\r
1556 <simplesect id="_inheritance">
\r
1557 <title>Inheritance</title>
\r
1558 <simpara>PostgreSQL supports table inheritance: that is, a child table inherits its
\r
1559 base definition from a parent table, but can add additional columns to its
\r
1560 own definition. The data from any child tables is visible in queries against
\r
1561 the parent table.</simpara>
\r
1562 <simpara>Evergreen uses table inheritance in several areas:
\r
1563 * In the Vandelay MARC batch importer / exporter, Evergreen defines base
\r
1564 tables for generic queues and queued records for which authority record and
\r
1565 bibliographic record child tables
\r
1566 * Billable transactions are based on the <literal>money.billable_xact</literal> table;
\r
1567 child tables include <literal>action.circulation</literal> for circulation transactions
\r
1568 and <literal>money.grocery</literal> for general bills.
\r
1569 * Payments are based on the <literal>money.payment</literal> table; its child table is
\r
1570 <literal>money.bnm_payment</literal> (for brick-and-mortar payments), which in turn has child
\r
1571 tables of <literal>money.forgive_payment</literal>, <literal>money.work_payment</literal>, <literal>money.credit_payment</literal>,
\r
1572 <literal>money.goods_payment</literal>, and <literal>money.bnm_desk_payment</literal>. The
\r
1573 <literal>money.bnm_desk_payment</literal> table in turn has child tables of <literal>money.cash_payment</literal>,
\r
1574 <literal>money.check_payment</literal>, and <literal>money.credit_card_payment</literal>.
\r
1575 * Transits are based on the <literal>action.transit_copy</literal> table, which has a child
\r
1576 table of <literal>action.hold_transit_copy</literal> for transits initiated by holds.
\r
1577 * Generic acquisition line items are defined by the
\r
1578 <literal>acq.lineitem_attr_definition</literal> table, which in turn has a number of child
\r
1579 tables to define MARC attributes, generated attributes, user attributes, and
\r
1580 provider attributes.</simpara>
\r
1583 <section id="understanding_query_performance_with_explain">
\r
1584 <title>Understanding query performance with EXPLAIN</title>
\r
1585 <simpara>Some queries run for a long, long time. This can be the result of a poorly
\r
1586 written query—a query with a join condition that joins every
\r
1587 row in the <literal>biblio.record_entry</literal> table with every row in the <literal>metabib.full_rec</literal>
\r
1588 view would consume a massive amount of memory and disk space and CPU time—or
\r
1589 a symptom of a schema that needs some additional indexes. PostgreSQL provides
\r
1590 the <literal>EXPLAIN</literal> tool to estimate how long it will take to run a given query and
\r
1591 show you the <emphasis>query plan</emphasis> (how it plans to retrieve the results from the
\r
1592 database).</simpara>
\r
1593 <simpara>To generate the query plan without actually running the statement, simply
\r
1594 prepend the <literal>EXPLAIN</literal> keyword to your query. In the following example, we
\r
1595 generate the query plan for the poorly written query that would join every
\r
1596 row in the <literal>biblio.record_entry</literal> table with every row in the <literal>metabib.full_rec</literal>
\r
1598 <formalpara><title>Query plan for a terrible query</title><para>
\r
1599 <programlisting language="sql" linenumbering="unnumbered">EXPLAIN SELECT *
\r
1600 FROM biblio.record_entry
\r
1601 FULL OUTER JOIN metabib.full_rec ON 1=1
\r
1605 -------------------------------------------------------------------------------//
\r
1606 Merge Full Join (cost=0.00..4959156437783.60 rows=132415734100864 width=1379)
\r
1607 -> Seq Scan on record_entry (cost=0.00..400634.16 rows=2013416 width=1292)
\r
1608 -> Seq Scan on real_full_rec (cost=0.00..1640972.04 rows=65766704 width=87)
\r
1609 (3 rows)</programlisting>
\r
1610 </para></formalpara>
\r
1611 <simpara>This query plan shows that the query would return 132415734100864 rows, and it
\r
1612 plans to accomplish what you asked for by sequentially scanning (<emphasis>Seq Scan</emphasis>)
\r
1613 every row in each of the tables participating in the join.</simpara>
\r
1614 <simpara>In the following example, we have realized our mistake in joining every row of
\r
1615 the left-hand table with every row in the right-hand table and take the saner
\r
1616 approach of using an <literal>INNER JOIN</literal> where the join condition is on the record ID.</simpara>
\r
1617 <formalpara><title>Query plan for a less terrible query</title><para>
\r
1618 <programlisting language="sql" linenumbering="unnumbered">EXPLAIN SELECT *
\r
1619 FROM biblio.record_entry bre
\r
1620 INNER JOIN metabib.full_rec mfr ON mfr.record = bre.id;
\r
1622 ----------------------------------------------------------------------------------------//
\r
1623 Hash Join (cost=750229.86..5829273.98 rows=65766704 width=1379)
\r
1624 Hash Cond: (real_full_rec.record = bre.id)
\r
1625 -> Seq Scan on real_full_rec (cost=0.00..1640972.04 rows=65766704 width=87)
\r
1626 -> Hash (cost=400634.16..400634.16 rows=2013416 width=1292)
\r
1627 -> Seq Scan on record_entry bre (cost=0.00..400634.16 rows=2013416 width=1292)
\r
1628 (5 rows)</programlisting>
\r
1629 </para></formalpara>
\r
1630 <simpara>This time, we will return 65766704 rows - still way too many rows. We forgot
\r
1631 to include a <literal>WHERE</literal> clause to limit the results to something meaningful. In
\r
1632 the following example, we will limit the results to deleted records that were
\r
1633 modified in the last month.</simpara>
\r
1634 <formalpara><title>Query plan for a realistic query</title><para>
\r
1635 <programlisting language="sql" linenumbering="unnumbered">EXPLAIN SELECT *
\r
1636 FROM biblio.record_entry bre
\r
1637 INNER JOIN metabib.full_rec mfr ON mfr.record = bre.id
\r
1638 WHERE bre.deleted IS TRUE
\r
1639 AND DATE_TRUNC('MONTH', bre.edit_date) >
\r
1640 DATE_TRUNC ('MONTH', NOW() - '1 MONTH'::INTERVAL)
\r
1644 ----------------------------------------------------------------------------------------//
\r
1645 Hash Join (cost=5058.86..2306218.81 rows=201669 width=1379)
\r
1646 Hash Cond: (real_full_rec.record = bre.id)
\r
1647 -> Seq Scan on real_full_rec (cost=0.00..1640972.04 rows=65766704 width=87)
\r
1648 -> Hash (cost=4981.69..4981.69 rows=6174 width=1292)
\r
1649 -> Index Scan using biblio_record_entry_deleted on record_entry bre
\r
1650 (cost=0.00..4981.69 rows=6174 width=1292)
\r
1651 Index Cond: (deleted = true)
\r
1652 Filter: ((deleted IS TRUE) AND (date_trunc('MONTH'::text, edit_date)
\r
1653 > date_trunc('MONTH'::text, (now() - '1 mon'::interval))))
\r
1654 (7 rows)</programlisting>
\r
1655 </para></formalpara>
\r
1656 <simpara>We can see that the number of rows returned is now only 201669; that’s
\r
1657 something we can work with. Also, the overall cost of the query is 2306218,
\r
1658 compared to 4959156437783 in the original query. The <literal>Index Scan</literal> tells us
\r
1659 that the query planner will use the index that was defined on the <literal>deleted</literal>
\r
1660 column to avoid having to check every row in the <literal>biblio.record_entry</literal> table.</simpara>
\r
1661 <simpara>However, we are still running a sequential scan over the
\r
1662 <literal>metabib.real_full_rec</literal> table (the table on which the <literal>metabib.full_rec</literal>
\r
1663 view is based). Given that linking from the bibliographic records to the
\r
1664 flattened MARC subfields is a fairly common operation, we could create a
\r
1665 new index and see if that speeds up our query plan.</simpara>
\r
1666 <formalpara><title>Query plan with optimized access via a new index</title><para>
\r
1667 <programlisting language="sql" linenumbering="unnumbered">-- This index will take a long time to create on a large database
\r
1668 -- of bibliographic records
\r
1669 CREATE INDEX bib_record_idx ON metabib.real_full_rec (record);
\r
1672 FROM biblio.record_entry bre
\r
1673 INNER JOIN metabib.full_rec mfr ON mfr.record = bre.id
\r
1674 WHERE bre.deleted IS TRUE
\r
1675 AND DATE_TRUNC('MONTH', bre.edit_date) >
\r
1676 DATE_TRUNC ('MONTH', NOW() - '1 MONTH'::INTERVAL)
\r
1680 ----------------------------------------------------------------------------------------//
\r
1681 Nested Loop (cost=0.00..1558330.46 rows=201669 width=1379)
\r
1682 -> Index Scan using biblio_record_entry_deleted on record_entry bre
\r
1683 (cost=0.00..4981.69 rows=6174 width=1292)
\r
1684 Index Cond: (deleted = true)
\r
1685 Filter: ((deleted IS TRUE) AND (date_trunc('MONTH'::text, edit_date) >
\r
1686 date_trunc('MONTH'::text, (now() - '1 mon'::interval))))
\r
1687 -> Index Scan using bib_record_idx on real_full_rec
\r
1688 (cost=0.00..240.89 rows=850 width=87)
\r
1689 Index Cond: (real_full_rec.record = bre.id)
\r
1690 (6 rows)</programlisting>
\r
1691 </para></formalpara>
\r
1692 <simpara>We can see that the resulting number of rows is still the same (201669), but
\r
1693 the execution estimate has dropped to 1558330 because the query planner can
\r
1694 use the new index (<literal>bib_record_idx</literal>) rather than scanning the entire table.
\r
1695 Success!</simpara>
\r
1696 <note><simpara>While indexes can significantly speed up read access to tables for common
\r
1697 filtering conditions, every time a row is created or updated the corresponding
\r
1698 indexes also need to be maintained - which can decrease the performance of
\r
1699 writes to the database. Be careful to keep the balance of read performance
\r
1700 versus write performance in mind if you plan to create custom indexes in your
\r
1701 Evergreen database.</simpara></note>
\r
1703 <section id="inserting_updating_and_deleting_data">
\r
1704 <title>Inserting, updating, and deleting data</title>
\r
1705 <simplesect id="_inserting_data">
\r
1706 <title>Inserting data</title>
\r
1707 <simpara>To insert one or more rows into a table, use the INSERT statement to identify
\r
1708 the target table and list the columns in the table for which you are going to
\r
1709 provide values for each row. If you do not list one or more columns contained
\r
1710 in the table, the database will automatically supply a <literal>NULL</literal> value for those
\r
1711 columns. The values for each row follow the <literal>VALUES</literal> clause and are grouped in
\r
1712 parentheses and delimited by commas. Each row, in turn, is delimited by commas
\r
1713 (<emphasis>this multiple row syntax requires PostgreSQL 8.2 or higher</emphasis>).</simpara>
\r
1714 <simpara>For example, to insert two rows into the <literal>permission.usr_grp_map</literal> table:</simpara>
\r
1715 <formalpara><title>Inserting rows into the <literal>permission.usr_grp_map</literal> table</title><para>
\r
1716 <programlisting language="sql" linenumbering="unnumbered">INSERT INTO permission.usr_grp_map (usr, grp)
\r
1717 VALUES (2, 10), (2, 4)
\r
1718 ;</programlisting>
\r
1719 </para></formalpara>
\r
1720 <simpara>Of course, as with the rest of SQL, you can replace individual column values
\r
1721 with one or more use sub-selects:</simpara>
\r
1722 <formalpara><title>Inserting rows using sub-selects instead of integers</title><para>
\r
1723 <programlisting language="sql" linenumbering="unnumbered">INSERT INTO permission.usr_grp_map (usr, grp)
\r
1725 (SELECT id FROM actor.usr
\r
1726 WHERE family_name = 'Scott' AND first_given_name = 'Daniel'),
\r
1727 (SELECT id FROM permission.grp_tree
\r
1728 WHERE name = 'Local System Administrator')
\r
1730 (SELECT id FROM actor.usr
\r
1731 WHERE family_name = 'Scott' AND first_given_name = 'Daniel'),
\r
1732 (SELECT id FROM permission.grp_tree
\r
1733 WHERE name = 'Circulator')
\r
1735 ;</programlisting>
\r
1736 </para></formalpara>
\r
1738 <simplesect id="_inserting_data_using_a_select_statement">
\r
1739 <title>Inserting data using a SELECT statement</title>
\r
1740 <simpara>Sometimes you want to insert a bulk set of data into a new table based on
\r
1741 a query result. Rather than a <literal>VALUES</literal> clause, you can use a <literal>SELECT</literal>
\r
1742 statement to insert one or more rows matching the column definitions. This
\r
1743 is a good time to point out that you can include explicit values, instead
\r
1744 of just column identifiers, in the return columns of the <literal>SELECT</literal> statement.
\r
1745 The explicit values are returned in every row of the result set.</simpara>
\r
1746 <simpara>In the following example, we insert 6 rows into the <literal>permission.usr_grp_map</literal>
\r
1747 table; each row will have a <literal>usr</literal> column value of 1, with varying values for
\r
1748 the <literal>grp</literal> column value based on the <literal>id</literal> column values returned from
\r
1749 <literal>permission.grp_tree</literal>:</simpara>
\r
1750 <formalpara><title>Inserting rows via a <literal>SELECT</literal> statement</title><para>
\r
1751 <programlisting language="sql" linenumbering="unnumbered">INSERT INTO permission.usr_grp_map (usr, grp)
\r
1753 FROM permission.grp_tree
\r
1757 INSERT 0 6</programlisting>
\r
1758 </para></formalpara>
\r
1760 <simplesect id="_deleting_rows">
\r
1761 <title>Deleting rows</title>
\r
1762 <simpara>Deleting data from a table is normally fairly easy. To delete rows from a table,
\r
1763 issue a <literal>DELETE</literal> statement identifying the table from which you want to delete
\r
1764 rows and a <literal>WHERE</literal> clause identifying the row or rows that should be deleted.</simpara>
\r
1765 <simpara>In the following example, we delete all of the rows from the
\r
1766 <literal>permission.grp_perm_map</literal> table where the permission maps to
\r
1767 <literal>UPDATE_ORG_UNIT_CLOSING</literal> and the group is anything other than administrators:</simpara>
\r
1768 <formalpara><title>Deleting rows from a table</title><para>
\r
1769 <programlisting language="sql" linenumbering="unnumbered">DELETE FROM permission.grp_perm_map
\r
1772 FROM permission.grp_tree
\r
1773 WHERE name != 'Local System Administrator'
\r
1776 FROM permission.perm_list
\r
1777 WHERE code = 'UPDATE_ORG_UNIT_CLOSING'
\r
1779 ;</programlisting>
\r
1780 </para></formalpara>
\r
1781 <note><simpara>There are two main reasons that a <literal>DELETE</literal> statement may not actually
\r
1782 delete rows from a table, even when the rows meet the conditional clause.</simpara></note>
\r
1783 <orderedlist numeration="arabic">
\r
1787 If the row contains a value that is the target of a relational constraint,
\r
1788 for example, if another table has a foreign key pointing at your target
\r
1789 table, you will be prevented from deleting a row with a value corresponding
\r
1790 to a row in the dependent table.
\r
1795 If the table has a rule that substitutes a different action for a <literal>DELETE</literal>
\r
1796 statement, the deletion will not take place. In Evergreen it is common for a
\r
1797 table to have a rule that substitutes the action of setting a <literal>deleted</literal> column
\r
1798 to <literal>TRUE</literal>. For example, if a book is discarded, deleting the row representing
\r
1799 the copy from the <literal>asset.copy</literal> table would severely affect circulation statistics,
\r
1800 bills, borrowing histories, and their corresponding tables in the database that
\r
1801 have foreign keys pointing at the <literal>asset.copy</literal> table (<literal>action.circulation</literal> and
\r
1802 <literal>money.billing</literal> and its children respectively). Instead, the <literal>deleted</literal> column
\r
1803 value is set to <literal>TRUE</literal> and Evergreen’s application logic skips over these rows
\r
1809 <simplesect id="_updating_rows">
\r
1810 <title>Updating rows</title>
\r
1811 <simpara>To update rows in a table, issue an <literal>UPDATE</literal> statement identifying the table
\r
1812 you want to update, the column or columns that you want to set with their
\r
1813 respective new values, and (optionally) a <literal>WHERE</literal> clause identifying the row or
\r
1814 rows that should be updated.</simpara>
\r
1815 <simpara>Following is the syntax for the <literal>UPDATE</literal> statement:</simpara>
\r
1817 <literallayout><literal>UPDATE</literal> [<emphasis>table-name</emphasis>]
\r
1818 <literal>SET</literal> [<emphasis>column</emphasis>] <literal>TO</literal> [<emphasis>new-value</emphasis>]
\r
1819 <literal>WHERE</literal> [<emphasis>condition</emphasis>]
\r
1824 <section id="query_requests">
\r
1825 <title>Query requests</title>
\r
1826 <simpara>The following queries were requested by Bibliomation, but might be reusable
\r
1827 by other libraries.</simpara>
\r
1828 <simplesect id="_monthly_circulation_stats_by_collection_code_library">
\r
1829 <title>Monthly circulation stats by collection code / library</title>
\r
1830 <formalpara><title>Monthly Circulation Stats by Collection Code/Library</title><para>
\r
1831 <programlisting language="sql" linenumbering="unnumbered">SELECT COUNT(acirc.id) AS "COUNT", aou.name AS "Library", acl.name AS "Copy Location"
\r
1832 FROM asset.copy ac
\r
1833 INNER JOIN asset.copy_location acl ON ac.location = acl.id
\r
1834 INNER JOIN action.circulation acirc ON acirc.target_copy = ac.id
\r
1835 INNER JOIN actor.org_unit aou ON acirc.circ_lib = aou.id
\r
1836 WHERE DATE_TRUNC('MONTH', acirc.create_time) = DATE_TRUNC('MONTH', NOW() - INTERVAL '3 month')
\r
1837 AND acirc.desk_renewal IS FALSE
\r
1838 AND acirc.opac_renewal IS FALSE
\r
1839 AND acirc.phone_renewal IS FALSE
\r
1840 GROUP BY aou.name, acl.name
\r
1841 ORDER BY aou.name, acl.name, 1
\r
1842 ;</programlisting>
\r
1843 </para></formalpara>
\r
1845 <simplesect id="_monthly_circulation_stats_by_borrower_stat_library">
\r
1846 <title>Monthly circulation stats by borrower stat / library</title>
\r
1847 <formalpara><title>Monthly Circulation Stats by Borrower Stat/Library</title><para>
\r
1848 <programlisting language="sql" linenumbering="unnumbered">SELECT COUNT(acirc.id) AS "COUNT", aou.name AS "Library", asceum.stat_cat_entry AS "Borrower Stat"
\r
1849 FROM action.circulation acirc
\r
1850 INNER JOIN actor.org_unit aou ON acirc.circ_lib = aou.id
\r
1851 INNER JOIN actor.stat_cat_entry_usr_map asceum ON asceum.target_usr = acirc.usr
\r
1852 INNER JOIN actor.stat_cat astat ON asceum.stat_cat = astat.id
\r
1853 WHERE DATE_TRUNC('MONTH', acirc.create_time) = DATE_TRUNC('MONTH', NOW() - INTERVAL '3 month')
\r
1854 AND astat.name = 'Preferred language'
\r
1855 AND acirc.desk_renewal IS FALSE
\r
1856 AND acirc.opac_renewal IS FALSE
\r
1857 AND acirc.phone_renewal IS FALSE
\r
1858 GROUP BY aou.name, asceum.stat_cat_entry
\r
1859 ORDER BY aou.name, asceum.stat_cat_entry, 1
\r
1860 ;</programlisting>
\r
1861 </para></formalpara>
\r
1863 <simplesect id="_monthly_intralibrary_loan_stats_by_library">
\r
1864 <title>Monthly intralibrary loan stats by library</title>
\r
1865 <formalpara><title>Monthly Intralibrary Loan Stats by Library</title><para>
\r
1866 <programlisting language="sql" linenumbering="unnumbered">SELECT aou.name AS "Library", COUNT(acirc.id)
\r
1867 FROM action.circulation acirc
\r
1868 INNER JOIN actor.org_unit aou ON acirc.circ_lib = aou.id
\r
1869 INNER JOIN asset.copy ac ON acirc.target_copy = ac.id
\r
1870 INNER JOIN asset.call_number acn ON ac.call_number = acn.id
\r
1871 WHERE acirc.circ_lib != acn.owning_lib
\r
1872 AND DATE_TRUNC('MONTH', acirc.create_time) = DATE_TRUNC('MONTH', NOW() - INTERVAL '3 month')
\r
1873 AND acirc.desk_renewal IS FALSE
\r
1874 AND acirc.opac_renewal IS FALSE
\r
1875 AND acirc.phone_renewal IS FALSE
\r
1877 ORDER BY aou.name, 2
\r
1878 ;</programlisting>
\r
1879 </para></formalpara>
\r
1881 <simplesect id="_monthly_borrowers_added_by_profile_adult_child_etc_library">
\r
1882 <title>Monthly borrowers added by profile (adult, child, etc) / library</title>
\r
1883 <formalpara><title>Monthly Borrowers Added by Profile (Adult, Child, etc)/Library</title><para>
\r
1884 <programlisting language="sql" linenumbering="unnumbered">SELECT pgt.name AS "Profile", aou.name AS "Library", COUNT(pgt.name) AS "Count"
\r
1886 INNER JOIN permission.grp_tree pgt
\r
1887 ON au.profile = pgt.id
\r
1888 INNER JOIN actor.org_unit aou
\r
1889 ON aou.id = au.home_ou
\r
1890 WHERE au.deleted IS FALSE
\r
1891 AND DATE_TRUNC('MONTH', au.create_date) = DATE_TRUNC('MONTH', NOW() - '3 months'::interval)
\r
1892 GROUP BY pgt.name, aou.name
\r
1893 ORDER BY aou.name, pgt.name
\r
1894 ;</programlisting>
\r
1895 </para></formalpara>
\r
1897 <simplesect id="_borrower_count_by_profile_adult_child_etc_library">
\r
1898 <title>Borrower count by profile (adult, child, etc) / library</title>
\r
1899 <formalpara><title>Borrower Count by Profile (Adult, Child, etc)/Library</title><para>
\r
1900 <programlisting language="sql" linenumbering="unnumbered">SELECT pgt.name AS "Profile", aou.name AS "Library", COUNT(pgt.name) AS "Count"
\r
1902 INNER JOIN permission.grp_tree pgt
\r
1903 ON au.profile = pgt.id
\r
1904 INNER JOIN actor.org_unit aou
\r
1905 ON aou.id = au.home_ou
\r
1906 WHERE au.deleted IS FALSE
\r
1907 GROUP BY pgt.name, aou.name
\r
1908 ORDER BY aou.name, pgt.name
\r
1909 ;</programlisting>
\r
1910 </para></formalpara>
\r
1912 <simplesect id="_monthly_items_added_by_collection_library">
\r
1913 <title>Monthly items added by collection / library</title>
\r
1914 <simpara>We define a "collection" as a shelving location in Evergreen.</simpara>
\r
1915 <formalpara><title>Monthly Items Added by Collection/Library</title><para>
\r
1916 <programlisting language="sql" linenumbering="unnumbered">SELECT aou.name AS "Library", acl.name, COUNT(ac.barcode)
\r
1917 FROM actor.org_unit aou
\r
1918 INNER JOIN asset.call_number acn ON acn.owning_lib = aou.id
\r
1919 INNER JOIN asset.copy ac ON ac.call_number = acn.id
\r
1920 INNER JOIN asset.copy_location acl ON ac.location = acl.id
\r
1921 WHERE ac.deleted IS FALSE
\r
1922 AND acn.deleted IS FALSE
\r
1923 AND DATE_TRUNC('MONTH', ac.create_date) = DATE_TRUNC('MONTH', NOW() - '1 month'::interval)
\r
1924 GROUP BY aou.name, acl.name
\r
1925 ORDER BY aou.name, acl.name
\r
1926 ;</programlisting>
\r
1927 </para></formalpara>
\r
1929 <simplesect id="_hold_purchase_alert_by_library">
\r
1930 <title>Hold purchase alert by library</title>
\r
1931 <simpara>in the following set of queries, we bring together the active title, volume,
\r
1932 and copy holds and display those that have more than a certain number of holds
\r
1933 per title. The goal is to UNION ALL the three queries, then group by the
\r
1934 bibliographic record ID and display the title / author information for those
\r
1935 records that have more than a given threshold of holds.</simpara>
\r
1936 <formalpara><title>Hold Purchase Alert by Library</title><para>
\r
1937 <programlisting language="sql" linenumbering="unnumbered">-- Title holds
\r
1938 SELECT all_holds.bib_id, aou.name, rmsr.title, rmsr.author, COUNT(all_holds.bib_id)
\r
1942 SELECT target, request_lib
\r
1943 FROM action.hold_request
\r
1944 WHERE hold_type = 'T'
\r
1945 AND fulfillment_time IS NULL
\r
1946 AND cancel_time IS NULL
\r
1951 SELECT bre.id, request_lib
\r
1952 FROM action.hold_request ahr
\r
1953 INNER JOIN asset.call_number acn ON ahr.target = acn.id
\r
1954 INNER JOIN biblio.record_entry bre ON acn.record = bre.id
\r
1955 WHERE ahr.hold_type = 'V'
\r
1956 AND ahr.fulfillment_time IS NULL
\r
1957 AND ahr.cancel_time IS NULL
\r
1962 SELECT bre.id, request_lib
\r
1963 FROM action.hold_request ahr
\r
1964 INNER JOIN asset.copy ac ON ahr.target = ac.id
\r
1965 INNER JOIN asset.call_number acn ON ac.call_number = acn.id
\r
1966 INNER JOIN biblio.record_entry bre ON acn.record = bre.id
\r
1967 WHERE ahr.hold_type = 'C'
\r
1968 AND ahr.fulfillment_time IS NULL
\r
1969 AND ahr.cancel_time IS NULL
\r
1971 ) AS all_holds(bib_id, request_lib)
\r
1972 INNER JOIN reporter.materialized_simple_record rmsr
\r
1973 INNER JOIN actor.org_unit aou ON aou.id = all_holds.request_lib
\r
1974 ON rmsr.id = all_holds.bib_id
\r
1975 GROUP BY all_holds.bib_id, aou.name, rmsr.id, rmsr.title, rmsr.author
\r
1976 HAVING COUNT(all_holds.bib_id) > 2
\r
1978 ;</programlisting>
\r
1979 </para></formalpara>
\r
1981 <simplesect id="_update_borrower_records_with_a_different_home_library">
\r
1982 <title>Update borrower records with a different home library</title>
\r
1983 <simpara>In this example, the library has opened a new branch in a growing area,
\r
1984 and wants to reassign the home library for the patrons in the vicinity of
\r
1985 the new branch to the new branch. To accomplish this, we create a staging table
\r
1986 that holds a set of city names and the corresponding branch shortname for the home
\r
1987 library for each city.</simpara>
\r
1988 <simpara>Then we issue an <literal>UPDATE</literal> statement to set the home library for patrons with a
\r
1989 physical address with a city that matches the city names in our staging table.</simpara>
\r
1990 <formalpara><title>Update borrower records with a different home library</title><para>
\r
1991 <programlisting language="sql" linenumbering="unnumbered">CREATE SCHEMA staging;
\r
1992 CREATE TABLE staging.city_home_ou_map (city TEXT, ou_shortname TEXT,
\r
1993 FOREIGN KEY (ou_shortname) REFERENCES actor.org_unit (shortname));
\r
1994 INSERT INTO staging.city_home_ou_map (city, ou_shortname)
\r
1995 VALUES ('Southbury', 'BR1'), ('Middlebury', 'BR2'), ('Hartford', 'BR3');
\r
1998 UPDATE actor.usr au SET home_ou = COALESCE(
\r
2001 FROM actor.org_unit aou
\r
2002 INNER JOIN staging.city_home_ou_map schom ON schom.ou_shortname = aou.shortname
\r
2003 INNER JOIN actor.usr_address aua ON aua.city = schom.city
\r
2004 WHERE au.id = aua.usr
\r
2009 FROM actor.org_unit aou
\r
2010 INNER JOIN staging.city_home_ou_map schom ON schom.ou_shortname = aou.shortname
\r
2011 INNER JOIN actor.usr_address aua ON aua.city = schom.city
\r
2012 WHERE au.id = aua.usr
\r
2014 ) IS NOT NULL;</programlisting>
\r
2015 </para></formalpara>
\r
2018 <section id="intor_to_sql_attribution">
\r
2019 <simpara>This chapter was taken from Dan Scott's <emphasis>Introduction to SQL for Evergreen Administrators</emphasis>, February 2010.</simpara>
\r