From e794b1179d9357623c74081d956ae462321b28ad Mon Sep 17 00:00:00 2001 From: Robert Soulliere Date: Thu, 12 Aug 2010 14:27:28 -0400 Subject: [PATCH] Add development chapters contributed by Dan Scott. --- 1.6/development/OpenSRF_intro.xml | 1703 +++++++++++++++++++ 1.6/development/datamodelsandaccess.xml | 760 +++++++++ 1.6/development/introduction_to_sql.xml | 2021 +++++++++++++++++++++++ 1.6/development/workshop.xml | 1532 +++++++++++++++++ 1.6/media/10.png | Bin 0 -> 361 bytes 1.6/media/11.png | Bin 0 -> 565 bytes 1.6/media/12.png | Bin 0 -> 617 bytes 1.6/media/13.png | Bin 0 -> 623 bytes 1.6/media/14.png | Bin 0 -> 411 bytes 1.6/media/15.png | Bin 0 -> 640 bytes 1.6/root.xml | 7 +- 11 files changed, 6021 insertions(+), 2 deletions(-) create mode 100644 1.6/development/OpenSRF_intro.xml create mode 100644 1.6/development/datamodelsandaccess.xml create mode 100644 1.6/development/introduction_to_sql.xml create mode 100644 1.6/development/workshop.xml create mode 100644 1.6/media/10.png create mode 100644 1.6/media/11.png create mode 100644 1.6/media/12.png create mode 100644 1.6/media/13.png create mode 100644 1.6/media/14.png create mode 100644 1.6/media/15.png diff --git a/1.6/development/OpenSRF_intro.xml b/1.6/development/OpenSRF_intro.xml new file mode 100644 index 0000000000..7567d07bad --- /dev/null +++ b/1.6/development/OpenSRF_intro.xml @@ -0,0 +1,1703 @@ + + + + OpenSRF + + + One of the claimed advantages of + Evergreen over alternative integrated library systems is the underlying Open + Service Request Framework (OpenSRF, pronounced "open surf") architecture. This + article introduces OpenSRF, demonstrates how to build OpenSRF services through + simple code examples, and explains the technical foundations on which OpenSRF + is built. + +
+ Introducing OpenSRF + OpenSRF is a message routing network that offers scalability and failover + support for individual services and entire servers with minimal development and + deployment overhead. You can use OpenSRF to build loosely-coupled applications + that can be deployed on a single server or on clusters of geographically + distributed servers using the same code and minimal configuration changes. + Although copyright statements on some of the OpenSRF code date back to Mike + Rylander’s original explorations in 2000, Evergreen was the first major + application to be developed with, and to take full advantage of, the OpenSRF + architecture starting in 2004. The first official release of OpenSRF was 0.1 in + February 2005 (http://evergreen-ils.org/blog/?p=21), but OpenSRF’s development + continues a steady pace of enhancement and refinement, with the release of + 1.0.0 in October 2008 and the most recent release of 1.2.2 in February 2010. + OpenSRF is a distinct break from the architectural approach used by previous + library systems and has more in common with modern Web applications. The + traditional "scale-up" approach to serve more transactions is to purchase a + server with more CPUs and more RAM, possibly splitting the load between a Web + server, a database server, and a business logic server. Evergreen, however, is + built on the Open Service Request Framework (OpenSRF) architecture, which + firmly embraces the "scale-out" approach of spreading transaction load over + cheap commodity servers. The initial GPLS + PINES hardware cluster, while certainly impressive, may have offered the + misleading impression that Evergreen requires a lot of hardware to run. + However, Evergreen and OpenSRF easily scale down to a single server; many + Evergreen libraries run their entire library system on a single server, and + most OpenSRF and Evergreen development occurs on a virtual machine running on a + single laptop or desktop image. + Another common concern is that the flexibility of OpenSRF’s distributed + architecture makes it complex to configure and to write new applications. This + article demonstrates that OpenSRF itself is an extremely simple architecture on + which one can easily build applications of many kinds – not just library + applications – and that you can use a number of different languages to call and + implement OpenSRF methods with a minimal learning curve. With an application + built on OpenSRF, when you identify a bottleneck in your application’s business + logic layer, you can adjust the number of the processes serving that particular + bottleneck on each of your servers; or if the problem is that your service is + resource-hungry, you could add an inexpensive server to your cluster and + dedicate it to running that resource-hungry service. + + Programming language support + If you need to develop an entirely new OpenSRF service, you can choose from a + number of different languages in which to implement that service. OpenSRF + client language bindings have been written for C, Java, JavaScript, Perl, and + Python, and service language bindings have been written for C, Perl, and Python. + This article uses Perl examples as a lowest common denominator programming + language. Writing an OpenSRF binding for another language is a relatively small + task if that language offers libraries that support the core technologies on + which OpenSRF depends: + + + + Extensible Messaging and Presence + Protocol (XMPP, sometimes referred to as Jabber) - provides the base messaging + infrastructure between OpenSRF clients and services + + + + + JavaScript Object Notation (JSON) - serializes the content + of each XMPP message in a standardized and concise format + + + + + memcached - provides the caching service + + + + + syslog - the standard UNIX logging + service + + + + Unfortunately, the + OpenSRF + reference documentation, although augmented by the + OpenSRF + glossary, blog posts like the description + of OpenSRF and Jabber, and even this article, is not a sufficient substitute + for a complete specification on which one could implement a language binding. + The recommended option for would-be developers of another language binding is + to use the Python implementation as the cleanest basis for a port to another + language. + +
+
+ Writing an OpenSRF Service + Imagine an application architecture in which 10 lines of Perl or Python, using + the data types native to each language, are enough to implement a method that + can then be deployed and invoked seamlessly across hundreds of servers. You + have just imagined developing with OpenSRF – it is truly that simple. Under the + covers, of course, the OpenSRF language bindings do an incredible amount of + work on behalf of the developer. An OpenSRF application consists of one or more + OpenSRF services that expose methods: for example, the opensrf.simple-text + demonstration + service exposes the opensrf.simple-text.split() and + opensrf.simple-text.reverse() methods. Each method accepts zero or more + arguments and returns zero or one results. The data types supported by OpenSRF + arguments and results are typical core language data types: strings, numbers, + booleans, arrays, and hashes. + To implement a new OpenSRF service, perform the following steps: + + + + Include the base OpenSRF support libraries + + + + + Write the code for each of your OpenSRF methods as separate procedures + + + + + Register each method + + + + + Add the service definition to the OpenSRF configuration files + + + + For example, the following code implements an OpenSRF service. The service + includes one method named opensrf.simple-text.reverse() that accepts one + string as input and returns the reversed version of that string: + #!/usr/bin/perl + + package OpenSRF::Application::Demo::SimpleText; + + use strict; + + use OpenSRF::Application; + use parent qw/OpenSRF::Application/; + + sub text_reverse { + my ($self , $conn, $text) = @_; + my $reversed_text = scalar reverse($text); + return $reversed_text; + } + + __PACKAGE__->register_method( + method => 'text_reverse', + api_name => 'opensrf.simple-text.reverse' + ); + Ten lines of code, and we have a complete OpenSRF service that exposes a single + method and could be deployed quickly on a cluster of servers to meet your + application’s ravenous demand for reversed strings! If you’re unfamiliar with + Perl, the use OpenSRF::Application; use parent qw/OpenSRF::Application/; + lines tell this package to inherit methods and properties from the + OpenSRF::Application module. For example, the call to + __PACKAGE__->register_method() is defined in OpenSRF::Application but due to + inheritance is available in this package (named by the special Perl symbol + __PACKAGE__ that contains the current package name). The register_method() + procedure is how we introduce a method to the rest of the OpenSRF world. + + Registering a service with the OpenSRF configuration files + Two files control most of the configuration for OpenSRF: + + + + opensrf.xml contains the configuration for the service itself, as well as + a list of which application servers in your OpenSRF cluster should start + the service. + + + + + opensrf_core.xml (often referred to as the "bootstrap configuration" + file) contains the OpenSRF networking information, including the XMPP server + connection credentials for the public and private routers. You only need to touch + this for a new service if the new service needs to be accessible via the + public router. + + + + Begin by defining the service itself in opensrf.xml. To register the + opensrf.simple-text service, add the following section to the <apps> + element (corresponding to the XPath /opensrf/default/apps/): + <apps> + <opensrf.simple-text> + <keepalive>3</keepalive> + <stateless>1</stateless> + <language>perl</language> + <implementation>OpenSRF::Application::Demo::SimpleText</implementation> + <max_requests>100</max_requests> + <unix_config> + <max_requests>1000</max_requests> + <unix_log>opensrf.simple-text_unix.log</unix_log> + <unix_sock>opensrf.simple-text_unix.sock</unix_sock> + <unix_pid>opensrf.simple-text_unix.pid</unix_pid> + <min_children>5</min_children> + <max_children>15</max_children> + <min_spare_children>2</min_spare_children> + <max_spare_children>5</max_spare_children> + </unix_config> + </opensrf.simple-text> + + <!-- other OpenSRF services registered here... --> + </apps> + + + + The element name is the name that the OpenSRF control scripts use to refer + to the service. + + + + + The <keepalive> element specifies the interval (in seconds) between + checks to determine if the service is still running. + + + + + The <stateless> element specifies whether OpenSRF clients can call + methods from this service without first having to create a connection to a + specific service backend process for that service. If the value is 1, then + the client can simply issue a request and the router will forward the request + to an available service and the result will be returned directly to the client. + + + + + The <language> element specifies the programming language in which the + service is implemented. + + + + + The <implementation> element pecifies the name of the library or module + in which the service is implemented. + + + + + (C implementations only): The <max_requests> element, as a direct child + of the service element name, specifies the maximum number of requests a process + serves before it is killed and replaced by a new process. + + + + + (Perl implementations only): The <max_requests> element, as a direct + child of the <unix_config> element, specifies the maximum number of requests + a process serves before it is killed and replaced by a new process. + + + + + The <unix_log> element specifies the name of the log file for + language-specific log messages such as syntax warnings. + + + + + The <unix_sock> element specifies the name of the UNIX socket used for + inter-process communications. + + + + + The <unix_pid> element specifies the name of the PID file for the + master process for the service. + + + + + The <min_children> element specifies the minimum number of child + processes that should be running at any given time. + + + + + The <max_children> element specifies the maximum number of child + processes that should be running at any given time. + + + + + The <min_spare_children> element specifies the minimum number of idle + child processes that should be available to handle incoming requests. If there + are fewer than this number of spare child processes, new processes will be + spawned. + + + + + The`<max_spare_children>` element specifies the maximum number of idle + child processes that should be available to handle incoming requests. If there + are more than this number of spare child processes, the extra processes will be + killed. + + + + To make the service accessible via the public router, you must also + edit the opensrf_core.xml configuration file to add the service to the list + of publicly accessible services: + Making a service publicly accessible in <literal>opensrf_core.xml</literal> + <router> + <!-- This is the public router. On this router, we only register applications + which should be accessible to everyone on the opensrf network --> + <name>router</name> + <domain>public.localhost</domain> + <services> + <service>opensrf.math</service> + <service>opensrf.simple-text</service> + </services> + </router> + + + + + This section of the opensrf_core.xml file is located at XPath + /config/opensrf/routers/. + + + + + public.localhost is the canonical public router domain in the OpenSRF + installation instructions. + + + + + Each <service> element contained in the <services> element + offers their services via the public router as well as the private router. + + + + Once you have defined the new service, you must restart the OpenSRF Router + to retrieve the new configuration and start or restart the service itself. + Complete working examples of the opensrf_core.xml and + opensrf.xml configuration files are included with this article + for your reference. + + + Calling an OpenSRF method + OpenSRF clients in any supported language can invoke OpenSRF services in any + supported language. So let’s see a few examples of how we can call our fancy + new opensrf.simple-text.reverse() method: + + Calling OpenSRF methods from the srfsh client + srfsh is a command-line tool installed with OpenSRF that you can use to call + OpenSRF methods. To call an OpenSRF method, issue the request command and + pass the OpenSRF service and method name as the first two arguments; then pass + one or more JSON objects delimited by commas as the arguments to the method + being invoked. + The following example calls the opensrf.simple-text.reverse method of the + opensrf.simple-text OpenSRF service, passing the string "foobar" as the + only method argument: + $ srfsh + srfsh # request opensrf.simple-text opensrf.simple-text.reverse "foobar" + + Received Data: "raboof" + + =------------------------------------ + Request Completed Successfully + Request Time in seconds: 0.016718 + =------------------------------------ + + + Getting documentation for OpenSRF methods from the srfsh client + The srfsh client also gives you command-line access to retrieving metadata + about OpenSRF services and methods. For a given OpenSRF method, for example, + you can retrieve information such as the minimum number of required arguments, + the data type and a description of each argument, the package or library in + which the method is implemented, and a description of the method. To retrieve + the documentation for an opensrf method from srfsh, issue the introspect + command, followed by the name of the OpenSRF service and (optionally) the + name of the OpenSRF method. If you do not pass a method name to the introspect + command, srfsh lists all of the methods offered by the service. If you pass + a partial method name, srfsh lists all of the methods that match that portion + of the method name. + The quality and availability of the descriptive information for each + method depends on the developer to register the method with complete and + accurate information. The quality varies across the set of OpenSRF and + Evergreen APIs, although some effort is being put towards improving the + state of the internal documentation. + srfsh# introspect opensrf.simple-text "opensrf.simple-text.reverse" + --> opensrf.simple-text + + Received Data: { + "__c":"opensrf.simple-text", + "__p":{ + "api_level":1, + "stream":0, + "object_hint":"OpenSRF_Application_Demo_SimpleText", + "remote":0, + "package":"OpenSRF::Application::Demo::SimpleText", + "api_name":"opensrf.simple-text.reverse", + "server_class":"opensrf.simple-text", + "signature":{ + "params":[ + { + "desc":"The string to reverse", + "name":"text", + "type":"string" + } + ], + "desc":"Returns the input string in reverse order\n", + "return":{ + "desc":"Returns the input string in reverse order", + "type":"string" + } + }, + "method":"text_reverse", + "argc":1 + } + } + + + + stream denotes whether the method supports streaming responses or not. + + + + + package identifies which package or library implements the method. + + + + + api_name identifies the name of the OpenSRF method. + + + + + signature is a hash that describes the parameters for the method. + + + + + params is an array of hashes describing each parameter in the method; + each parameter has a description (desc), name (name), and type (type). + + + + + desc is a string that describes the method itself. + + + + + return is a hash that describes the return value for the method; it + contains a description of the return value (desc) and the type of the + returned value (type). + + + + + method identifies the name of the function or method in the source + implementation. + + + + + argc is an integer describing the minimum number of arguments that + must be passed to this method. + + + + + + Calling OpenSRF methods from Perl applications + To call an OpenSRF method from Perl, you must connect to the OpenSRF service, + issue the request to the method, and then retrieve the results. + #/usr/bin/perl + use strict; + use OpenSRF::AppSession; + use OpenSRF::System; + + OpenSRF::System->bootstrap_client(config_file => '/openils/conf/opensrf_core.xml'); + + my $session = OpenSRF::AppSession->create("opensrf.simple-text"); + + print "substring: Accepts a string and a number as input, returns a string\n"; + my $result = $session->request("opensrf.simple-text.substring", "foobar", 3); + my $request = $result->gather(); + print "Substring: $request\n\n"; + + print "split: Accepts two strings as input, returns an array of strings\n"; + $request = $session->request("opensrf.simple-text.split", "This is a test", " "); + my $output = "Split: ["; + my $element; + while ($element = $request->recv()) { + $output .= $element->content . ", "; + } + $output =~ s/, $/]/; + print $output . "\n\n"; + + print "statistics: Accepts an array of strings as input, returns a hash\n"; + my @many_strings = [ + "First I think I'll have breakfast", + "Then I think that lunch would be nice", + "And then seventy desserts to finish off the day" + ]; + + $result = $session->request("opensrf.simple-text.statistics", @many_strings); + $request = $result->gather(); + print "Length: " . $result->{'length'} . "\n"; + print "Word count: " . $result->{'word_count'} . "\n"; + + $session->disconnect(); + + + + The OpenSRF::System->bootstrap_client() method reads the OpenSRF + configuration information from the indicated file and creates an XMPP client + connection based on that information. + + + + + The OpenSRF::AppSession->create() method accepts one argument - the name + of the OpenSRF service to which you want to want to make one or more requests - + and returns an object prepared to use the client connection to make those + requests. + + + + + The OpenSRF::AppSession->request() method accepts a minimum of one + argument - the name of the OpenSRF method to which you want to make a request - + followed by zero or more arguments to pass to the OpenSRF method as input + values. This example passes a string and an integer to the + opensrf.simple-text.substring method defined by the opensrf.simple-text + OpenSRF service. + + + + + The gather() method, called on the result object returned by the + request() method, iterates over all of the possible results from the result + object and returns a single variable. + + + + + This request() call passes two strings to the opensrf.simple-text.split + method defined by the opensrf.simple-text OpenSRF service and returns (via + gather()) a reference to an array of results. + + + + + The opensrf.simple-text.split() method is a streaming method that + returns an array of results with one element per recv() call on the + result object. We could use the gather() method to retrieve all of the + results in a single array reference, but instead we simply iterate over + the result variable until there are no more results to retrieve. + + + + + While the gather() convenience method returns only the content of the + complete set of results for a given request, the recv() method returns an + OpenSRF result object with status, statusCode, and content fields as + we saw in the HTTP results example. + + + + + This request() call passes an array to the + opensrf.simple-text.statistics method defined by the opensrf.simple-text + OpenSRF service. + + + + + The result object returns a hash reference via gather(). The hash + contains the length and word_count keys we defined in the method. + + + + + The OpenSRF::AppSession->disconnect() method closes the XMPP client + connection and cleans up resources associated with the session. + + + + + + + Accepting and returning more interesting data types + Of course, the example of accepting a single string and returning a single + string is not very interesting. In real life, our applications tend to pass + around multiple arguments, including arrays and hashes. Fortunately, OpenSRF + makes that easy to deal with; in Perl, for example, returning a reference to + the data type does the right thing. In the following example of a method that + returns a list, we accept two arguments of type string: the string to be split, + and the delimiter that should be used to split the string. + Basic text splitting method + sub text_split { + my $self = shift; + my $conn = shift; + my $text = shift; + my $delimiter = shift || ' '; + + my @split_text = split $delimiter, $text; + return \@split_text; + } + + __PACKAGE__->register_method( + method => 'text_split', + api_name => 'opensrf.simple-text.split' + ); + + We simply return a reference to the list, and OpenSRF does the rest of the work + for us to convert the data into the language-independent format that is then + returned to the caller. As a caller of a given method, you must rely on the + documentation used to register to determine the data structures - if the developer has + added the appropriate documentation. + + + Accepting and returning Evergreen objects + OpenSRF is agnostic about objects; its role is to pass JSON back and forth + between OpenSRF clients and services, and it allows the specific clients and + services to define their own semantics for the JSON structures. On top of that + infrastructure, Evergreen offers the fieldmapper: an object-relational mapper + that provides a complete definition of all objects, their properties, their + relationships to other objects, the permissions required to create, read, + update, or delete objects of that type, and the database table or view on which + they are based. + The Evergreen fieldmapper offers a great deal of convenience for working with + complex system objects beyond the basic mapping of classes to database + schemas. Although the result is passed over the wire as a JSON object + containing the indicated fields, fieldmapper-aware clients then turn those + JSON objects into native objects with setter / getter methods for each field. + All of this metadata about Evergreen objects is defined in the + fieldmapper configuration file (/openils/conf/fm_IDL.xml), and access to + these classes is provided by the open-ils.cstore, open-ils.pcrud, and + open-ils.reporter-store OpenSRF services which parse the fieldmapper + configuration file and dynamically register OpenSRF methods for creating, + reading, updating, and deleting all of the defined classes. + Example fieldmapper class definition for "Open User Summary" + <class id="mous" controller="open-ils.cstore open-ils.pcrud" + oils_obj:fieldmapper="money::open_user_summary" + oils_persist:tablename="money.open_usr_summary" + reporter:label="Open User Summary"> + <fields oils_persist:primary="usr" oils_persist:sequence=""> + <field name="balance_owed" reporter:datatype="money" /> + <field name="total_owed" reporter:datatype="money" /> + <field name="total_paid" reporter:datatype="money" /> + <field name="usr" reporter:datatype="link"/> + </fields> + <links> + <link field="usr" reltype="has_a" key="id" map="" class="au"/> + </links> + <permacrud xmlns="http://open-ils.org/spec/opensrf/IDL/permacrud/v1"> + <actions> + <retrieve permission="VIEW_USER"> + <context link="usr" field="home_ou"/> + </retrieve> + </actions> + </permacrud> + </class> + + + + + The <class> element defines the class: + + + + + The id attribute defines the class hint that identifies the class both + elsewhere in the fieldmapper configuration file, such as in the value of the + field attribute of the <link> element, and in the JSON object itself when + it is instantiated. For example, an "Open User Summary" JSON object would have + the top level property of "__c":"mous". + + + + + The controller attribute identifies the services that have direct access + to this class. If open-ils.pcrud is not listed, for example, then there is + no means to directly access members of this class through a public service. + + + + + The oils_obj:fieldmapper attribute defines the name of the Perl + fieldmapper class that will be dynamically generated to provide setter and + getter methods for instances of the class. + + + + + The oils_persist:tablename attribute identifies the schema name and table + name of the database table that stores the data that represents the instances + of this class. In this case, the schema is money and the table is + open_usr_summary. + + + + + The reporter:label attribute defines a human-readable name for the class + used in the reporting interface to identify the class. These names are defined + in English in the fieldmapper configuration file; however, they are extracted + so that they can be translated and served in the user’s language of choice. + + + + + + + The <fields> element lists all of the fields that belong to the object. + + + + + The oils_persist:primary attribute identifies the field that acts as the + primary key for the object; in this case, the field with the name usr. + + + + + The oils_persist:sequence attribute identifies the sequence object + (if any) in this database provides values for new instances of this class. In + this case, the primary key is defined by a field that is linked to a different + table, so no sequence is used to populate these instances. + + + + + + + Each <field> element defines a single field with the following attributes: + + + + + The name attribute identifies the column name of the field in the + underlying database table as well as providing a name for the setter / getter + method that can be invoked in the JSON or native version of the object. + + + + + The reporter:datatype attribute defines how the reporter should treat + the contents of the field for the purposes of querying and display. + + + + + The reporter:label attribute can be used to provide a human-readable name + for each field; without it, the reporter falls back to the value of the name + attribute. + + + + + + + The <links> element contains a set of zero or more <link> elements, + each of which defines a relationship between the class being described and + another class. + + + + + The field attribute identifies the field named in this class that links + to the external class. + + + + + The reltype attribute identifies the kind of relationship between the + classes; in the case of has_a, each value in the usr field is guaranteed + to have a corresponding value in the external class. + + + + + The key attribute identifies the name of the field in the external + class to which this field links. + + + + + The rarely-used map attribute identifies a second class to which + the external class links; it enables this field to define a direct + relationship to an external class with one degree of separation, to + avoid having to retrieve all of the linked members of an intermediate + class just to retrieve the instances from the actual desired target class. + + + + + The class attribute identifies the external class to which this field + links. + + + + + + + The <permacrud> element defines the permissions that must have been + granted to a user to operate on instances of this class. + + + + + The <retrieve> element is one of four possible children of the + <actions> element that define the permissions required for each action: + create, retrieve, update, and delete. + + + + + The permission attribute identifies the name of the permission that must + have been granted to the user to perform the action. + + + + + The contextfield attribute, if it exists, defines the field in this class + that identifies the library within the system for which the user must have + prvileges to work. If a user has been granted a given permission, but has not been + granted privileges to work at a given library, they can not perform the action + at that library. + + + + + + + The rarely-used <context> element identifies a linked field (link + attribute) in this class which links to an external class that holds the field + (field attribute) that identifies the library within the system for which the + user must have privileges to work. + + + + When you retrieve an instance of a class, you can ask for the result to + flesh some or all of the linked fields of that class, so that the linked + instances are returned embedded directly in your requested instance. In that + same request you can ask for the fleshed instances to in turn have their linked + fields fleshed. By bundling all of this into a single request and result + sequence, you can avoid the network overhead of requiring the client to request + the base object, then request each linked object in turn. + You can also iterate over a collection of instances and set the automatically + generated isdeleted, isupdated, or isnew properties to indicate that + the given instance has been deleted, updated, or created respectively. + Evergreen can then act in batch mode over the collection to perform the + requested actions on any of the instances that have been flagged for action. + + + Returning streaming results + In the previous implementation of the opensrf.simple-text.split method, we + returned a reference to the complete array of results. For small values being + delivered over the network, this is perfectly acceptable, but for large sets of + values this can pose a number of problems for the requesting client. Consider a + service that returns a set of bibliographic records in response to a query like + "all records edited in the past month"; if the underlying database is + relatively active, that could result in thousands of records being returned as + a single network request. The client would be forced to block until all of the + results are returned, likely resulting in a significant delay, and depending on + the implementation, correspondingly large amounts of memory might be consumed + as all of the results are read from the network in a single block. + OpenSRF offers a solution to this problem. If the method returns results that + can be divided into separate meaningful units, you can register the OpenSRF + method as a streaming method and enable the client to loop over the results one + unit at a time until the method returns no further results. In addition to + registering the method with the provided name, OpenSRF also registers an additional + method with .atomic appended to the method name. The .atomic variant gathers + all of the results into a single block to return to the client, giving the caller + the ability to choose either streaming or atomic results from a single method + definition. + In the following example, the text splitting method has been reimplemented to + support streaming; very few changes are required: + Text splitting method - streaming mode + sub text_split { + my $self = shift; + my $conn = shift; + my $text = shift; + my $delimiter = shift || ' '; + + my @split_text = split $delimiter, $text; + foreach my $string (@split_text) { + $conn->respond($string); + } + return undef; + } + + __PACKAGE__->register_method( + method => 'text_split', + api_name => 'opensrf.simple-text.split', + stream => 1 + ); + + + + + Rather than returning a reference to the array, a streaming method loops + over the contents of the array and invokes the respond() method of the + connection object on each element of the array. + + + + + Registering the method as a streaming method instructs OpenSRF to also + register an atomic variant (opensrf.simple-text.split.atomic). + + + + + + Error! Warning! Info! Debug! + As hard as it may be to believe, it is true: applications sometimes do not + behave in the expected manner, particularly when they are still under + development. The service language bindings for OpenSRF include integrated + support for logging messages at the levels of ERROR, WARNING, INFO, DEBUG, and + the extremely verbose INTERNAL to either a local file or to a syslogger + service. The destination of the log files, and the level of verbosity to be + logged, is set in the opensrf_core.xml configuration file. To add logging to + our Perl example, we just have to add the OpenSRF::Utils::Logger package to our + list of used Perl modules, then invoke the logger at the desired logging level. + You can include many calls to the OpenSRF logger; only those that are higher + than your configured logging level will actually hit the log. The following + example exercises all of the available logging levels in OpenSRF: + use OpenSRF::Utils::Logger; + my $logger = OpenSRF::Utils::Logger; + # some code in some function + { + $logger->error("Hmm, something bad DEFINITELY happened!"); + $logger->warn("Hmm, something bad might have happened."); + $logger->info("Something happened."); + $logger->debug("Something happened; here are some more details."); + $logger->internal("Something happened; here are all the gory details.") + } + If you call the mythical OpenSRF method containing the preceding OpenSRF logger + statements on a system running at the default logging level of INFO, you will + only see the INFO, WARN, and ERR messages, as follows: + Results of logging calls at the default level of INFO + [2010-03-17 22:27:30] opensrf.simple-text [ERR :5681:SimpleText.pm:277:] Hmm, something bad DEFINITELY happened! + [2010-03-17 22:27:30] opensrf.simple-text [WARN:5681:SimpleText.pm:278:] Hmm, something bad might have happened. + [2010-03-17 22:27:30] opensrf.simple-text [INFO:5681:SimpleText.pm:279:] Something happened. + + If you then increase the the logging level to INTERNAL (5), the logs will + contain much more information, as follows: + Results of logging calls at the default level of INTERNAL + [2010-03-17 22:48:11] opensrf.simple-text [ERR :5934:SimpleText.pm:277:] Hmm, something bad DEFINITELY happened! + [2010-03-17 22:48:11] opensrf.simple-text [WARN:5934:SimpleText.pm:278:] Hmm, something bad might have happened. + [2010-03-17 22:48:11] opensrf.simple-text [INFO:5934:SimpleText.pm:279:] Something happened. + [2010-03-17 22:48:11] opensrf.simple-text [DEBG:5934:SimpleText.pm:280:] Something happened; here are some more details. + [2010-03-17 22:48:11] opensrf.simple-text [INTL:5934:SimpleText.pm:281:] Something happened; here are all the gory details. + [2010-03-17 22:48:11] opensrf.simple-text [ERR :5934:SimpleText.pm:283:] Resolver did not find a cache hit + [2010-03-17 22:48:21] opensrf.simple-text [INTL:5934:Cache.pm:125:] Stored opensrf.simple-text.test_cache.masaa => "here" in memcached server + [2010-03-17 22:48:21] opensrf.simple-text [DEBG:5934:Application.pm:579:] Coderef for [OpenSRF::Application::Demo::SimpleText::test_cache]... + [2010-03-17 22:48:21] opensrf.simple-text [DEBG:5934:Application.pm:586:] A top level Request object is responding de nada + [2010-03-17 22:48:21] opensrf.simple-text [DEBG:5934:Application.pm:190:] Method duration for [opensrf.simple-text.test_cache]: 10.005 + [2010-03-17 22:48:21] opensrf.simple-text [INTL:5934:AppSession.pm:780:] Calling queue_wait(0) + [2010-03-17 22:48:21] opensrf.simple-text [INTL:5934:AppSession.pm:769:] Resending...0 + [2010-03-17 22:48:21] opensrf.simple-text [INTL:5934:AppSession.pm:450:] In send + [2010-03-17 22:48:21] opensrf.simple-text [DEBG:5934:AppSession.pm:506:] AppSession sending RESULT to opensrf@private.localhost/... + [2010-03-17 22:48:21] opensrf.simple-text [DEBG:5934:AppSession.pm:506:] AppSession sending STATUS to opensrf@private.localhost/... + ... + + To see everything that is happening in OpenSRF, try leaving your logging level + set to INTERNAL for a few minutes - just ensure that you have a lot of free disk + space available if you have a moderately busy system! + + + Caching results: one secret of scalability + If you have ever used an application that depends on a remote Web service + outside of your control — say, if you need to retrieve results from a + microblogging service — you know the pain of latency and dependability (or the + lack thereof). To improve the response time for OpenSRF services, you can take + advantage of the support offered by the OpenSRF::Utils::Cache module for + communicating with a local instance or cluster of memcache daemons to store + and retrieve persistent values. The following example demonstrates caching + by sleeping for 10 seconds the first time it receives a given cache key and + cannot retrieve a corresponding value from the cache: + Simple caching OpenSRF service + use OpenSRF::Utils::Cache; + sub test_cache { + my $self = shift; + my $conn = shift; + my $test_key = shift; + my $cache = OpenSRF::Utils::Cache->new('global'); + my $cache_key = "opensrf.simple-text.test_cache.$test_key"; + my $result = $cache->get_cache($cache_key) || undef; + if ($result) { + $logger->info("Resolver found a cache hit"); + return $result; + } + sleep 10; + my $cache_timeout = 300; + $cache->put_cache($cache_key, "here", $cache_timeout); + return "There was no cache hit."; + } + + + + + The OpenSRF::Utils::Cache module provides access to the built-in caching + support in OpenSRF. + + + + + The constructor for the cache object accepts a single argument to define + the cache type for the object. Each cache type can use a separate memcache + server to keep the caches separated. Most Evergreen services use the global + cache, while the anon cache is used for Web sessions. + + + + + The cache key is simply a string that uniquely identifies the value you + want to store or retrieve. This line creates a cache key based on the OpenSRF + method name and request input value. + + + + + The get_cache() method checks to see if the cache key already exists. If + a matching key is found, the service immediately returns the stored value. + + + + + If the cache key does not exist, the code sleeps for 10 seconds to + simulate a call to a slow remote Web service or an intensive process. + + + + + The $cache_timeout variable represents a value for the lifetime of the + cache key in seconds. + + + + + After the code retrieves its value (or, in the case of this example, + finishes sleeping), it creates the cache entry by calling the put_cache() + method. The method accepts three arguments: the cache key, the value to be + stored ("here"), and the timeout value in seconds to ensure that we do not + return stale data on subsequent calls. + + + + + + Initializing the service and its children: child labour + When an OpenSRF service is started, it looks for a procedure called + initialize() to set up any global variables shared by all of the children of + the service. The initialize() procedure is typically used to retrieve + configuration settings from the opensrf.xml file. + An OpenSRF service spawns one or more children to actually do the work + requested by callers of the service. For every child process an OpenSRF service + spawns, the child process clones the parent environment and then each child + process runs the child_init() process (if any) defined in the OpenSRF service + to initialize any child-specific settings. + When the OpenSRF service kills a child process, it invokes the child_exit() + procedure (if any) to clean up any resources associated with the child process. + Similarly, when the OpenSRF service is stopped, it calls the DESTROY() + procedure to clean up any remaining resources. + + + Retrieving configuration settings + The settings for OpenSRF services are maintained in the opensrf.xml XML + configuration file. The structure of the XML document consists of a root + element <opensrf> containing two child elements: + + + + The <default> element contains an <apps> element describing all + OpenSRF services running on this system — see --, as + well as any other arbitrary XML descriptions required for global configuration + purposes. For example, Evergreen uses this section for email notification and + inter-library patron privacy settings. + + + + + The <hosts> element contains one element per host that participates in + this OpenSRF system. Each host element must include an <activeapps> element + that lists all of the services to start on this host when the system starts + up. Each host element can optionally override any of the default settings. + + + + OpenSRF includes a service named opensrf.settings to provide distributed + cached access to the configuration settings with a simple API: + + + + opensrf.settings.default_config.get accepts zero arguments and returns + the complete set of default settings as a JSON document. + + + + + opensrf.settings.host_config.get accepts one argument (hostname) and + returns the complete set of settings, as customized for that hostname, as a + JSON document. + + + + + opensrf.settings.xpath.get accepts one argument (an + XPath expression) and returns the portion of + the configuration file that matches the expression as a JSON document. + + + + For example, to determine whether an Evergreen system uses the opt-in + support for sharing patron information between libraries, you could either + invoke the opensrf.settings.default_config.get method and parse the + JSON document to determine the value, or invoke the opensrf.settings.xpath.get + method with the XPath /opensrf/default/share/user/opt_in argument to + retrieve the value directly. + In practice, OpenSRF includes convenience libraries in all of its client + language bindings to simplify access to configuration values. C offers + osrfConfig.c, Perl offers OpenSRF::Utils::SettingsClient, Java offers + org.opensrf.util.SettingsClient, and Python offers osrf.set. These + libraries locally cache the configuration file to avoid network roundtrips for + every request and enable the developer to request specific values without + having to manually construct XPath expressions. + +
+
+ OpenSRF Communication Flows + Now that you have seen that it truly is easy to create an OpenSRF service, we + can take a look at what is going on under the covers to make all of this work + for you. + + Get on the messaging bus - safely + One of the core innovations of OpenSRF was to use the Extensible Messaging and + Presence Protocol (XMPP, more colloquially known as Jabber) as the messaging + bus that ties OpenSRF services together across servers. XMPP is an "XML + protocol for near-real-time messaging, presence, and request-response services" + (http://www.ietf.org/rfc/rfc3920.txt) that OpenSRF relies on to handle most of + the complexity of networked communications. OpenSRF requres an XMPP server + that supports multiple domains such as ejabberd. + Multiple domain support means that a single server can support XMPP virtual + hosts with separate sets of users and access privileges per domain. By + routing communications through separate public and private XMPP domains, + OpenSRF services gain an additional layer of security. + The OpenSRF + installation documentation instructs you to create two separate hostnames + (private.localhost and public.localhost) to use as XMPP domains. OpenSRF + can control access to its services based on the domain of the client and + whether a given service allows access from clients on the public domain. When + you start OpenSRF, the first XMPP clients that connect to the XMPP server are + the OpenSRF public and private routers. OpenSRF routers maintain a list of + available services and connect clients to available services. When an OpenSRF + service starts, it establishes a connection to the XMPP server and registers + itself with the private router. The OpenSRF configuration contains a list of + public OpenSRF services, each of which must also register with the public + router. + + + OpenSRF communication flows over XMPP + In a minimal OpenSRF deployment, two XMPP users named "router" connect to the + XMPP server, with one connected to the private XMPP domain and one connected to + the public XMPP domain. Similarly, two XMPP users named "opensrf" connect to + the XMPP server via the private and public XMPP domains. When an OpenSRF + service is started, it uses the "opensrf" XMPP user to advertise its + availability with the corresponding router on that XMPP domain; the XMPP server + automatically assigns a Jabber ID (JID) based on the client hostname to each + service’s listener process and each connected drone process waiting to carry + out requests. When an OpenSRF router receives a request to invoke a method on a + given service, it connects the requester to the next available listener in the + list of registered listeners for that service. + Services and clients connect to the XMPP server using a single set of XMPP + client credentials (for example, opensrf@private.localhost), but use XMPP + resource identifiers to differentiate themselves in the JID for each + connection. For example, the JID for a copy of the opensrf.simple-text + service with process ID 6285 that has connected to the private.localhost + domain using the opensrf XMPP client credentials could be + opensrf@private.localhost/opensrf.simple-text_drone_at_localhost_6285. By + convention, the user name for OpenSRF clients is opensrf, and the user name + for OpenSRF routers is router, so the XMPP server for OpenSRF will have four + separate users registered: + * opensrf@private.localhost is an OpenSRF client that connects with these + credentials and which can access any OpenSRF service. + * opensrf@public.localhost is an OpenSRF client that connects with these + credentials and which can only access OpenSRF services that have registered + with the public router. + * router@private.localhost is the private OpenSRF router with which all + services register. + * router@public.localhost is the public OpenSRF router with which only + services that must be publicly accessible register. + All OpenSRF services automatically register themselves with the private XMPP + domain, but only those services that register themselves with the public XMPP + domain can be invoked from public OpenSRF clients. The OpenSRF client and + router user names, passwords, and domain names, along with the list of services + that should be public, are contained in the opensrf_core.xml configuration + file. + + + OpenSRF communication flows over HTTP + In some contexts, access to a full XMPP client is not a practical option. For + example, while XMPP clients have been implemented in JavaScript, you might + be concerned about browser compatibility and processing overhead - or you might + want to issue OpenSRF requests from the command line with curl. Fortunately, + any OpenSRF service registered with the public router is accessible via the + OpenSRF HTTP Translator. The OpenSRF HTTP Translator implements the + OpenSRF-over-HTTP + proposed specification as an Apache module that translates HTTP requests into + OpenSRF requests and returns OpenSRF results as HTTP results to the initiating + HTTP client. + Issuing an HTTP POST request to an OpenSRF method via the OpenSRF HTTP Translator + # curl request broken up over multiple lines for legibility + curl -H "X-OpenSRF-service: opensrf.simple-text" + --data 'osrf-msg=[ \ + {"__c":"osrfMessage","__p":{"threadTrace":0,"locale":"en-CA", + "type":"REQUEST","payload": {"__c":"osrfMethod","__p": + {"method":"opensrf.simple-text.reverse","params":["foobar"]} + }} + }]' + http://localhost/osrf-http-translator + + + + + The X-OpenSRF-service header identifies the OpenSRF service of interest. + + + + + The POST request consists of a single parameter, the osrf-msg value, + which contains a JSON array. + + + + + The first object is an OpenSRF message ("__c":"osrfMessage") with a set of + parameters ("__p":{}). + + + + + The identifier for the request ("threadTrace":0); this value is echoed + back in the result. + + + + + The message type ("type":"REQUEST"). + + + + + The locale for the message; if the OpenSRF method is locale-sensitive, it + can check the locale for each OpenSRF request and return different information + depending on the locale. + + + + + The payload of the message ("payload":{}) containing the OpenSRF method + request ("__c":"osrfMethod") and its parameters ("__p:"{}). + + + + + The method name for the request ("method":"opensrf.simple-text.reverse"). + + + + + A set of JSON parameters to pass to the method ("params":["foobar"]); in + this case, a single string "foobar". + + + + + + + + + The URL on which the OpenSRF HTTP translator is listening, + /osrf-http-translator is the default location in the Apache example + configuration files shipped with the OpenSRF source, but this is configurable. + + + + Results from an HTTP POST request to an OpenSRF method via the OpenSRF HTTP Translator + # HTTP response broken up over multiple lines for legibility + [{"__c":"osrfMessage","__p": + {"threadTrace":0, "payload": + {"__c":"osrfResult","__p": + {"status":"OK","content":"raboof","statusCode":200} + },"type":"RESULT","locale":"en-CA" + } + }, + {"__c":"osrfMessage","__p": + {"threadTrace":0,"payload": + {"__c":"osrfConnectStatus","__p": + {"status":"Request Complete","statusCode":205} + },"type":"STATUS","locale":"en-CA" + } + }] + + + + + The OpenSRF HTTP Translator returns an array of JSON objects in its + response. Each object in the response is an OpenSRF message + ("__c":"osrfMessage") with a collection of response parameters ("__p":). + + + + + The OpenSRF message identifier ("threadTrace":0) confirms that this + message is in response to the request matching the same identifier. + + + + + The message includes a payload JSON object ("payload":) with an OpenSRF + result for the request ("__c":"osrfResult"). + + + + + The result includes a status indicator string ("status":"OK"), the content + of the result response - in this case, a single string "raboof" + ("content":"raboof") - and an integer status code for the request + ("statusCode":200). + + + + + The message also includes the message type ("type":"RESULT") and the + message locale ("locale":"en-CA"). + + + + + The second message in the set of results from the response. + + + + + Again, the message identifier confirms that this message is in response to + a particular request. + + + + + The payload of the message denotes that this message is an + OpenSRF connection status message ("__c":"osrfConnectStatus"), with some + information about the particular OpenSRF connection that was used for this + request. + + + + + The response parameters for an OpenSRF connection status message include a + verbose status ("status":"Request Complete") and an integer status code for + the connection status (`"statusCode":205). + + + + + The message also includes the message type ("type":"RESULT") and the + message locale ("locale":"en-CA"). + + + + Before adding a new public OpenSRF service, ensure that it does + not introduce privilege escalation or unchecked access to data. For example, + the Evergreen open-ils.cstore private service is an object-relational mapper + that provides read and write access to the entire Evergreen database, so it + would be catastrophic to expose that service publicly. In comparison, the + Evergreen open-ils.pcrud public service offers the same functionality as + open-ils.cstore to any connected HTTP client or OpenSRF client, but the + additional authentication and authorization layer in open-ils.pcrud prevents + unchecked access to Evergreen’s data. + + + Stateless and stateful connections + OpenSRF supports both stateless and stateful connections. When an OpenSRF + client issues a REQUEST message in a stateless connection, the router + forwards the request to the next available service and the service returns the + result directly to the client. + REQUEST flow in a stateless connection + + + + REQUEST flow in a stateless connection + + When an OpenSRF client issues a CONNECT message to create a stateful conection, the + router returns the Jabber ID of the next available service to the client so + that the client can issue one or more REQUEST message directly to that + particular service and the service will return corresponding RESULT messages + directly to the client. Until the client issues a DISCONNECT message, that + particular service is only available to the requesting client. Stateful connections + are useful for clients that need to make many requests from a particular service, + as it avoids the intermediary step of contacting the router for each request, as + well as for operations that require a controlled sequence of commands, such as a + set of database INSERT, UPDATE, and DELETE statements within a transaction. + CONNECT, REQUEST, and DISCONNECT flow in a stateful connection + + + + CONNECT + + + + Message body format + OpenSRF was an early adopter of JavaScript Object Notation (JSON). While XMPP + is an XML protocol, the Evergreen developers recognized that the compactness of + the JSON format offered a significant reduction in bandwidth for the volume of + messages that would be generated in an application of that size. In addition, + the ability of languages such as JavaScript, Perl, and Python to generate + native objects with minimal parsing offered an attractive advantage over + invoking an XML parser for every message. Instead, the body of the XMPP message + is a simple JSON structure. For a simple request, like the following example + that simply reverses a string, it looks like a significant overhead: but we get + the advantages of locale support and tracing the request from the requester + through the listener and responder (drone). + A request for opensrf.simple-text.reverse("foobar"): + <message from='router@private.localhost/opensrf.simple-text' + to='opensrf@private.localhost/opensrf.simple-text_listener_at_localhost_6275' + router_from='opensrf@private.localhost/_karmic_126678.3719_6288' + router_to='' router_class='' router_command='' osrf_xid='' + > + <thread>1266781414.366573.12667814146288</thread> + <body> + [ + {"__c":"osrfMessage","__p": + {"threadTrace":"1","locale":"en-US","type":"REQUEST","payload": + {"__c":"osrfMethod","__p": + {"method":"opensrf.simple-text.reverse","params":["foobar"]} + } + } + } + ] + </body> + </message> + + A response from opensrf.simple-text.reverse("foobar") + <message from='opensrf@private.localhost/opensrf.simple-text_drone_at_localhost_6285' + to='opensrf@private.localhost/_karmic_126678.3719_6288' + router_command='' router_class='' osrf_xid='' + > + <thread>1266781414.366573.12667814146288</thread> + <body> + [ + {"__c":"osrfMessage","__p": + {"threadTrace":"1","payload": + {"__c":"osrfResult","__p": + {"status":"OK","content":"raboof","statusCode":200} + } ,"type":"RESULT","locale":"en-US"} + }, + {"__c":"osrfMessage","__p": + {"threadTrace":"1","payload": + {"__c":"osrfConnectStatus","__p": + {"status":"Request Complete","statusCode":205} + },"type":"STATUS","locale":"en-US"} + } + ] + </body> + </message> + + The content of the <body> element of the OpenSRF request and result should + look familiar; they match the structure of the OpenSRF over HTTP examples that we previously dissected. + + + Registering OpenSRF methods in depth + Let’s explore the call to __PACKAGE__->register_method(); most of the members + of the hash are optional, and for the sake of brevity we omitted them in the + previous example. As we have seen in the results of the introspection call, a + verbose registration method call is recommended to better enable the internal + documentation. Here is the complete set of members that you should pass to + __PACKAGE__->register_method(): + + + + The method member specifies the name of the procedure in this module that is being registered as an OpenSRF method. + + + + + The api_name member specifies the invocable name of the OpenSRF method; by convention, the OpenSRF service name is used as the prefix. + + + + + The optional api_level member can be used for versioning the methods to allow the use of a deprecated API, but in practical use is always 1. + + + + + The optional argc member specifies the minimal number of arguments that the method expects. + + + + + The optional stream member, if set to any value, specifies that the method supports returning multiple values from a single call to subsequent requests. OpenSRF automatically creates a corresponding method with ".atomic" appended to its name that returns the complete set of results in a single request. Streaming methods are useful if you are returning hundreds of records and want to act on the results as they return. + + + + + The optional signature member is a hash that describes the method’s purpose, arguments, and return value. + + + + + The desc member of the signature hash describes the method’s purpose. + + + + + The params member of the signature hash is an array of hashes in which each array element describes the corresponding method argument in order. + + + + + The name member of the argument hash specifies the name of the argument. + + + + + The desc member of the argument hash describes the argument’s purpose. + + + + + The type member of the argument hash specifies the data type of the argument: for example, string, integer, boolean, number, array, or hash. + + + + + + + The return member of the signature hash is a hash that describes the return value of the method. + + + + + The desc member of the return hash describes the return value. + + + + + The type member of the return hash specifies the data type of the return value: for example, string, integer, boolean, number, array, or hash. + + + + + + + + +
+
+ Evergreen-specific OpenSRF services + Evergreen is currently the primary showcase for the use of OpenSRF as an + application architecture. Evergreen 1.6.1 includes the following + set of OpenSRF services: + + + + The open-ils.actor service supports common tasks for working with user + accounts and libraries. + + + + + The open-ils.auth service supports authentication of Evergreen users. + + + + + The open-ils.booking service supports the management of reservations + for bookable items. + + + + + The open-ils.cat service supports common cataloging tasks, such as + creating, modifying, and merging bibliographic and authority records. + + + + + The open-ils.circ service supports circulation tasks such as checking + out items and calculating due dates. + + + + + The open-ils.collections service supports tasks that assist collections + agencies in contacting users with outstanding fines above a certain + threshold. + + + + + The open-ils.cstore private service supports unrestricted access to + Evergreen fieldmapper objects. + + + + + The open-ils.ingest private service supports tasks for importing + data such as bibliographic and authority records. + + + + + The open-ils.pcrud service supports permission-based access to Evergreen + fieldmapper objects. + + + + + The open-ils.penalty penalty service supports the calculation of + penalties for users, such as being blocked from further borrowing, for + conditions such as having too many items checked out or too many unpaid + fines. + + + + + The open-ils.reporter service supports the creation and scheduling of + reports. + + + + + The open-ils.reporter-store private service supports access to Evergreen + fieldmapper objects for the reporting service. + + + + + The open-ils.search service supports searching across bibliographic + records, authority records, serial records, Z39.50 sources, and ZIP codes. + + + + + The open-ils.storage private service supports a deprecated method of + providing access to Evergreen fieldmapper objects. Implemented in Perl, + this service has largely been replaced by the much faster C-based + open-ils.cstore service. + + + + + The open-ils.supercat service supports transforms of MARC records into + other formats, such as MODS, as well as providing Atom and RSS feeds and + SRU access. + + + + + The open-ils.trigger private service supports event-based triggers for + actions such as overdue and holds available notification emails. + + + + + The open-ils.vandelay service supports the import and export of batches of + bibliographic and authority records. + + + + Of some interest is that the open-ils.reporter-store and open-ils.cstore + services have identical implementations. Surfacing them as separate services + enables a deployer of Evergreen to ensure that the reporting service does not + interfere with the performance-critical open-ils.cstore service. One can also + direct the reporting service to a read-only database replica to, again, avoid + interference with open-ils.cstore which must write to the master database. + There are only a few significant services that are not built on OpenSRF in + Evergreen 1.6.0, such as the SIP and Z39.50 servers. These services implement + different protocols and build on existing daemon architectures (Simple2ZOOM + for Z39.50), but still rely on the other OpenSRF services to provide access + to the Evergreen data. The non-OpenSRF services are reasonably self-contained + and can be deployed on different servers to deliver the same sort of deployment + flexibility as OpenSRF services, but have the disadvantage of not being + integrated into the same configuration and control infrastructure as the + OpenSRF services. +
+
+ This chapter was taken from Dan Scott's Easing gently into OpenSRF article, June, 2010. +
+
diff --git a/1.6/development/datamodelsandaccess.xml b/1.6/development/datamodelsandaccess.xml new file mode 100644 index 0000000000..caadd3e9d1 --- /dev/null +++ b/1.6/development/datamodelsandaccess.xml @@ -0,0 +1,760 @@ + + + + Evergreen Data Models and Access + +
+ Database schema + The database schema is tied pretty tightly to PostgreSQL. Although PostgreSQL + adheres closely to ANSI SQL standards, the use of schemas, SQL functions + implemented in both plpgsql and plperl, and PostgreSQL’s native full-text + search would make it… challenging… to port to other database platforms. + A few common PostgreSQL interfaces for poking around the schema and + manipulating data are: + + + + psql (the command line client) + + + + + pgadminIII (a GUI client). + + + + Or you can read through the source files in Open-ILS/src/sql/Pg. + Let’s take a quick tour through the schemas, pointing out some highlights + and some key interdependencies: + + + + actor.org_unit → asset.copy_location + + + + + actor.usr → actor.card + + + + + biblio.record_entry → asset.call_number → asset.copy + + + + + config.metabib_field → metabib.*_field_entry + + + +
+
+ Database access methods + You could use direct access to the database via Perl DBI, JDBC, etc, + but Evergreen offers several database CRUD services for + creating / retrieving / updating / deleting data. These avoid tying + you too tightly to the current database schema and they funnel database + access through the same mechanism, rather than tying up connections + with other interfaces. +
+
+ Evergreen Interface Definition Language (IDL) + Defines properties and required permissions for Evergreen classes. + To reduce network overhead, a given object is identified via a + class-hint and serialized as a JSON array of properties (no named properties). + As of 1.6, fields will be serialized in the order in which they appear + in the IDL definition file, and the is_new / is_changed / is_deleted + properties are automatically added. This has greatly reduced the size of + the fm_IDL.xml file and makes DRY people happier :) + + + + … oils_persist:readonly tells us, if true, that the data lives in the database, but is pulled from the SELECT statement defined in the <oils_persist:source_definition> child element + + + + + IDL basic example (config.language_map) + <class id="clm" controller="open-ils.cstore open-ils.pcrud" + oils_obj:fieldmapper="config::language_map" + oils_persist:tablename="config.language_map" + reporter:label="Language Map" oils_persist:field_safe="true"> + <fields oils_persist:primary="code" oils_persist:sequence=""> + <field reporter:label="Language Code" name="code" + reporter:selector="value" reporter:datatype="text"/> + <field reporter:label="Language" name="value" + reporter:datatype="text" oils_persist:i18n="true"/> + </fields> + <links/> + <permacrud xmlns="http://open-ils.org/spec/opensrf/IDL/permacrud/v1"> + <actions> + <create global_required="true" permission="CREATE_MARC_CODE"> + <retrieve global_required="true" + permission="CREATE_MARC_CODE UPDATE_MARC_CODE DELETE_MARC_CODE"> + <update global_required="true" permission="UPDATE_MARC_CODE"> + <delete global_required="true" permission="DELETE_MARC_CODE"> + </actions> + </permacrud> + </class> + + + + The class element defines the attributes and permissions for classes, + and relationships between classes. + + + + + The id attribute on the class element defines the class hint that is + used everywhere in Evergreen. + + + + + The controller attribute defines the OpenSRF + services that provide access to the data for the class objects. + + + + + + + The oils_obj::fieldmapper attribute defines the name of the class that + is generated by OpenILS::Utils::Fieldmapper. + + + + + The oils_persist:tablename attribute defines the name of the table + that contains the data for the class objects. + + + + + The reporter interface uses reporter:label attribute values in + the source list to provide meaningful class and attribute names. The + open-ils.fielder service generates a set of methods that provide direct + access to the classes for which oils_persist:field_safe is true. For + example, + + srfsh# request open-ils.fielder open-ils.fielder.clm.atomic \ + {"query":{"code":{"=":"eng"}}} + + Received Data: [ + { + "value":"English", + "code":"eng" + } + ] + + + + The fields element defines the list of fields for the class. + + + + + The oils_persist:primary attribute defines the column that acts as + the primary key for the table. + + + + + The oils_persist:sequence attribute holds the name of the database + sequence. + + + + + + + Each field element defines one property of the class. + + + + + The name attribute defines the getter/setter method name for the field. + + + + + The reporter:label attribute defines the attribute name as used in + the reporter interface. + + + + + The reporter:selector attribute defines the field used in the reporter + filter interface to provide a selectable list. This gives the user a more + meaningful access point than the raw numeric ID or abstract code. + + + + + The reporter:datatype attribute defines the type of data held by + this property for the purposes of the reporter. + + + + + + + The oils_persist:i18n attribute, when true, means that + translated values for the field’s contents may be accessible in + different locales. + + + + + The permacrud element defines the permissions (if any) required + to create, retrieve, update, and delete data for this + class. open-ils.permacrud must be defined as a controller for the class + for the permissions to be applied. + + + + + Each action requires one or more permission values that the + user must possess to perform the action. + + + + + If the global_required attribute is true, then the user must + have been granted that permission globally (depth = 0) to perform + the action. + + + + + The context_field attribute denotes the <field> that identifies + the org_unit at which the user must have the pertinent permission. + + + + + An action element may contain a <context_field> element that + defines the linked class (identified by the link attribute) and + the field in the linked class that identifies the org_unit where + the permission must be held. + + + + + If the <context_field> element contains a jump attribute, + then it defines a link to a link to a class with a field identifying + the org_unit where the permission must be held. + + + + + + + + + + Reporter data types and their possible values + + + + bool: Boolean true or false + + + + + id: ID of the row in the database + + + + + int: integer value + + + + + interval: PostgreSQL time interval + + + + + link: link to another class, as defined in the <links> + element of the class definition + + + + + money: currency amount + + + + + org_unit: list of org_units + + + + + text: text value + + + + + timestamp: PostgreSQL timestamp + + + + + + IDL example with linked fields (actor.workstation) + Just as tables often include columns with foreign keys that point + to values stored in the column of a different table, IDL classes + can contain fields that link to fields in other classes. The <links> + element defines which fields link to fields in other classes, and + the nature of the relationship: + <class id="aws" controller="open-ils.cstore" + oils_obj:fieldmapper="actor::workstation" + oils_persist:tablename="actor.workstation" + reporter:label="Workstation"> + <fields oils_persist:primary="id" + oils_persist:sequence="actor.workstation_id_seq"> + <field reporter:label="Workstation ID" name="id" + reporter:datatype="id"/> + <field reporter:label="Workstation Name" name="name" + reporter:datatype="text"/> + <field reporter:label="Owning Library" name="owning_lib" + reporter:datatype="org_unit"/> + <field reporter:label="Circulations" name="circulations" + oils_persist:virtual="true" reporter:datatype="link"/> + </fields> + <links> + <link field="owning_lib" reltype="has_a" key="id" + map="" class="aou"/> + <link field="circulations" reltype="has_many" key="workstation" + map="" class="circ"/> + <link field="circulation_checkins" reltype="has_many" + key="checkin_workstation" map="" class="circ"/> + </links> + </class> + + + + This field includes an oils_persist:virtual attribute with the value of + true, meaning that the linked class circ is a virtual class. + + + + + The <links> element contains 0 or more <link> elements. + + + + + Each <link> element defines the field (field) that links to a different + class (class), the relationship (rel_type) between this field and the target + field (key). If the field in this class links to a virtual class, the (map) + attribute defines the field in the target class that returns a list of matching + objects for each object in this class. + + + + +
+
+ <literal>open-ils.cstore</literal> data access interfaces + For each class documented in the IDL, the open-ils.cstore service + automatically generates a set of data access methods, based on the + oils_persist:tablename class attribute. + For example, for the class hint clm, cstore generates the following + methods with the config.language_map qualifer: + + + + open-ils.cstore.direct.config.language_map.id_list {"code" { "like": "e%" } } + + Retrieves a list composed only of the IDs that match the query. + + + + open-ils.cstore.direct.config.language_map.retrieve "eng" + + Retrieves the object that matches a specific ID. + + + + open-ils.cstore.direct.config.language_map.search {"code" : "eng"} + + Retrieves a list of objects that match the query. + + + + open-ils.cstore.direct.config.language_map.create <_object_> + + Creates a new object from the passed in object. + + + + open-ils.cstore.direct.config.language_map.update <_object_> + + Updates the object that has been passed in. + + + + open-ils.cstore.direct.config.language_map.delete "eng" + + Deletes the object that matches the query. + + +
+
+ open-ils.pcrud data access interfaces + For each class documented in the IDL, the open-ils.pcrud service + automatically generates a set of data access methods, based on the + oils_persist:tablename class attribute. + For example, for the class hint clm, open-ils.pcrud generates the following + methods that parallel the open-ils.cstore interface: + + + + open-ils.pcrud.id_list.clm <_authtoken_>, { "code": { "like": "e%" } } + + + + + open-ils.pcrud.retrieve.clm <_authtoken_>, "eng" + + + + + open-ils.pcrud.search.clm <_authtoken_>, { "code": "eng" } + + + + + open-ils.pcrud.create.clm <_authtoken_>, <_object_> + + + + + open-ils.pcrud.update.clm <_authtoken_>, <_object_> + + + + + open-ils.pcrud.delete.clm <_authtoken_>, "eng" + + + +
+
+ Transaction and savepoint control + Both open-ils.cstore and open-ils.pcrud enable you to control database transactions + to ensure that a set of operations either all succeed, or all fail, + atomically: + + + + open-ils.cstore.transaction.begin + + + + + open-ils.cstore.transaction.commit + + + + + open-ils.cstore.transaction.rollback + + + + + open-ils.pcrud.transaction.begin + + + + + open-ils.pcrud.transaction.commit + + + + + open-ils.pcrud.transaction.rollback + + + + At a more granular level, open-ils.cstore and open-ils.pcrud enable you to set database + savepoints to ensure that a set of operations either all succeed, or all + fail, atomically, within a given transaction: + + + + open-ils.cstore.savepoint.begin + + + + + open-ils.cstore.savepoint.commit + + + + + open-ils.cstore.savepoint.rollback + + + + + open-ils.pcrud.savepoint.begin + + + + + open-ils.pcrud.savepoint.commit + + + + + open-ils.pcrud.savepoint.rollback + + + + Transactions and savepoints must be performed within a stateful + connection to the open-ils.cstore and open-ils.pcrud services. + In srfsh, you can open a stateful connection using the open + command, and then close the stateful connection using the close + command - for example: + srfsh# open open-ils.cstore + ... perform various transaction-related work + srfsh# close open-ils.cstore + + JSON Queries + Beyond simply retrieving objects by their ID using the \*.retrieve + methods, you can issue queries against the \*.delete and \*.search + methods using JSON to filter results with simple or complex search + conditions. + For example, to generate a list of barcodes that are held in a + copy location that allows holds and is visible in the OPAC: + srfsh# request open-ils.cstore open-ils.cstore.json_query + {"select": {"acp":["barcode"], "acpl":["name"]}, + "from": {"acp":"acpl"}, + "where": [ + {"+acpl": "holdable"}, + {"+acpl": "opac_visible"} + ]} + + Received Data: { + "barcode":"BARCODE1", + "name":"Stacks" + } + + Received Data: { + "barcode":"BARCODE2", + "name":"Stacks" + } + + + + Invoke the json_query service. + + + + + Select the barcode field from the acp class and the name + field from the acpl class. + + + + + Join the acp class to the acpl class based on the linked field + defined in the IDL. + + + + + Add a where clause to filter the results. We have more than one + condition beginning with the same key, so we wrap the conditions inside + an array. + + + + + The first condition tests whether the boolean value of the holdable + field on the acpl class is true. + + + + + The second condition tests whether the boolean value of the + opac_visible field on the acpl class is true. + + + + For thorough coverage of the breadth of support offered by JSON + query syntax, see JSON Queries: A Tutorial. + + + Fleshing linked objects + A simplistic approach to retrieving a set of objects that are linked to + an object that you are retrieving - for example, a set of call numbers + linked to the barcodes that a given user has borrowed - would be to: + 1. Retrieve the list of circulation objects (circ class) + for a given user (usr class). + 2. For each circulation object, look up the target copy (target_copy + field, linked to the acp class). + 3. For each copy, look up the call number for that copy (call_number + field, linked to the acn class). + However, this would result in potentially hundreds of round-trip + queries from the client to the server. Even with low-latency connections, + the network overhead would be considerable. So, built into the open-ils.cstore and + open-ils.pcrud access methods is the ability to flesh linked fields - + that is, rather than return an identifier to a given linked field, + the method can return the entire object as part of the initial response. + Most of the interfaces that return class instances from the IDL offer the + ability to flesh returned fields. For example, the + open-ils.cstore.direct.\*.retrieve methods allow you to specify a + JSON structure defining the fields you wish to flesh in the returned object. + Fleshing fields in objects returned by <literal>open-ils.cstore</literal> + srfsh# request open-ils.cstore open-ils.cstore.direct.asset.copy.retrieve 1, \ + { + "flesh": 1, + "flesh_fields": { + "acp": ["location"] + } + } + + + + + The flesh argument is the depth at which objects should be fleshed. + For example, to flesh out a field that links to another object that includes + a field that links to another object, you would specify a depth of 2. + + + + + The flesh_fields argument contains a list of objects with the fields + to flesh for each object. + + + + Let’s flesh things a little deeper. In addition to the copy location, + let’s also flesh the call number attached to the copy, and then flesh + the bibliographic record attached to the call number. + Fleshing fields in fields of objects returned by <literal>open-ils.cstore</literal> + request open-ils.cstore open-ils.cstore.direct.asset.copy.retrieve 1, \ + { + "flesh": 2, + "flesh_fields": { + "acp": ["location", "call_number"], + "acn": ["record"] + } + } + + +
+
+ Adding an IDL entry for ResolverResolver + Most OpenSRF methods in Evergreen define their object interface in the + IDL. Without an entry in the IDL, the prospective caller of a given + method is forced to either call the method and inspect the returned + contents, or read the source to work out the structure of the JSON + payload. At this stage of the tutorial, we have not defined an entry + in the IDL to represent the object returned by the + open-ils.resolver.resolve_holdings method. It is time to complete + that task. + The open-ils.resolver service is unlike many of the other classes + defined in the IDL because its data is not stored in the Evergreen + database. Instead, the data is requested from an external Web service + and only temporarily cached in memcached. Fortunately, the IDL + enables us to represent this kind of class by setting the + oils_persist:virtual class attribute to true. + So, let’s add an entry to the IDL for the open-ils.resolver.resolve_holdings + service: + + And let’s make ResolverResolver.pm return an array composed of our new + rhr classes rather than raw JSON objects: + + Once we add the new entry to the IDL and copy the revised ResolverResolver.pm + Perl module to /openils/lib/perl5/OpenILS/Application/, we need to: + + + + Copy the updated IDL to both the /openils/conf/ and + /openils/var/web/reports/ directories. The Dojo approach to + parsing the IDL uses the IDL stored in the reports directory. + + + + + Restart the Perl services to make the new IDL visible to the services + and refresh the open-ils.resolver implementation + + + + + Rerun /openils/bin/autogen.sh to regenerate the JavaScript versions + of the IDL required by the HTTP translator and gateway. + + + + We also need to adjust our JavaScript client to use the nifty new + objects that open-ils.resolver.resolve_holdings now returns. + The best approach is to use the support in Evergreen’s Dojo extensions + to generate the JavaScript classes directly from the IDL XML file. + Accessing classes defined in the IDL via Fieldmapper + + + + + + Load the Dojo core. + + + + + fieldmapper.AutoIDL reads /openils/var/reports/fm_IDL.xml to + generate a list of class properties. + + + + + fieldmapper.dojoData seems to provide a store for Evergreen data + accessed via Dojo. + + + + + fieldmapper.Fieldmapper converts the list of class properties into + actual classes. + + + + + fieldmapper.standardRequest invokes an OpenSRF method and returns + an array of objects. + + + + + The first argument to fieldmapper.standardRequest is an array + containing the OpenSRF service name and method name. + + + + + The second argument to fieldmapper.standardRequest is an array + containing the arguments to pass to the OpenSRF method. + + + + + As Fieldmapper has instantiated the returned objects based on their + class hints, we can invoke getter/setter methods on the objects. + + + +
+
+ This chapter was taken from Dan Scott's Developer Workshop, February 2010. +
+
diff --git a/1.6/development/introduction_to_sql.xml b/1.6/development/introduction_to_sql.xml new file mode 100644 index 0000000000..e6eceb3f49 --- /dev/null +++ b/1.6/development/introduction_to_sql.xml @@ -0,0 +1,2021 @@ + + + + Introduction to SQL for Evergreen Administrators + +
+ Introduction to SQL Databases + + Introduction + Over time, the SQL database has become the standard method of storing, + retrieving, and processing raw data for applications. Ranging from embedded + databases such as SQLite and Apache Derby, to enterprise databases such as + Oracle and IBM DB2, any SQL database offers basic advantages to application + developers such as standard interfaces (Structured Query Language (SQL), Java + Database Connectivity (JDBC), Open Database Connectivity (ODBC), Perl Database + Independent Interface (DBI)), a standard conceptual model of data (tables, + fields, relationships, constraints, etc), performance in storing and retrieving + data, concurrent access, etc. + Evergreen is built on PostgreSQL, an open source SQL database that began as + POSTGRES at the University of California at Berkeley in 1986 as a research + project led by Professor Michael Stonebraker. A SQL interface was added to a + fork of the original POSTGRES Berkelely code in 1994, and in 1996 the project + was renamed PostgreSQL. + + + Tables + The table is the cornerstone of a SQL database. Conceptually, a database table + is similar to a single sheet in a spreadsheet: every table has one or more + columns, with each row in the table containing values for each column. Each + column in a table defines an attribute corresponding to a particular data type. + We’ll insert a row into a table, then display the resulting contents. Don’t + worry if the INSERT statement is completely unfamiliar, we’ll talk more about + the syntax of the insert statement later. + <literal>actor.usr_note</literal> database table + evergreen=# INSERT INTO actor.usr_note (usr, creator, pub, title, value) + VALUES (1, 1, TRUE, 'Who is this guy?', 'He''s the administrator!'); + + evergreen=# select id, usr, creator, pub, title, value from actor.usr_note; + id | usr | creator | pub | title | value + ----+-----+---------+-----+------------------+------------------------- + 1 | 1 | 1 | t | Who is this guy? | He's the administrator! + (1 rows) + + PostgreSQL supports table inheritance, which lets you define tables that + inherit the column definitions of a given parent table. A search of the data in + the parent table includes the data in the child tables. Evergreen uses table + inheritance: for example, the action.circulation table is a child of the + money.billable_xact table, and the money.*_payment tables all inherit from + the money.payment parent table. + + + Schemas + PostgreSQL, like most SQL databases, supports the use of schema names to group + collections of tables and other database objects together. You might think of + schemas as namespaces if you’re a programmer; or you might think of the schema + / table / column relationship like the area code / exchange / local number + structure of a telephone number. + + Examples: database object names + + + + + + + + + + Full name + Schema name + Table name + Field name + + + + + actor.usr_note.title + actor + usr_note + title + + + biblio.record_entry.marc + biblio + record_entry + marc + + + +
+ The default schema name in PostgreSQL is public, so if you do not specify a + schema name when creating or accessing a database object, PostgreSQL will use + the public schema. As a result, you might not find the object that you’re + looking for if you don’t use the appropriate schema. + Example: Creating a table without a specific schema + evergreen=# CREATE TABLE foobar (foo TEXT, bar TEXT); + CREATE TABLE + evergreen=# \d foobar + Table "public.foobar" + Column | Type | Modifiers + --------+------+----------- + foo | text | + bar | text | + + Example: Trying to access a unqualified table outside of the public schema + evergreen=# SELECT * FROM usr_note; + ERROR: relation "usr_note" does not exist + LINE 1: SELECT * FROM usr_note; + ^ + + Evergreen uses schemas to organize all of its tables with mostly intuitive, + if short, schema names. Here’s the current (as of 2010-01-03) list of schemas + used by Evergreen: + + Evergreen schema names + + + + + + + + Schema name + Description + + + + + acq + Acquisitions + + + action + Circulation actions + + + action_trigger + Event mechanisms + + + actor + Evergreen users and organization units + + + asset + Call numbers and copies + + + auditor + Track history of changes to selected tables + + + authority + Authority records + + + biblio + Bibliographic records + + + booking + Resource bookings + + + config + Evergreen configurable options + + + container + Buckets for records, call numbers, copies, and users + + + extend_reporter + Extra views for report definitions + + + metabib + Metadata about bibliographic records + + + money + Fines and bills + + + offline + Offline transactions + + + permission + User permissions + + + query + Stored SQL statements + + + reporter + Report definitions + + + search + Search functions + + + serial + Serial MFHD records + + + stats + Convenient views of circulation and asset statistics + + + vandelay + MARC batch importer and exporter + + + +
+ The term schema has two meanings in the world of SQL databases. We have + discussed the schema as a conceptual grouping of tables and other database + objects within a given namespace; for example, "the actor schema contains the + tables and functions related to users and organizational units". Another common + usage of schema is to refer to the entire data model for a given database; + for example, "the Evergreen database schema". +
+ + Columns + Each column definition consists of: + + + + a data type + + + + + (optionally) a default value to be used whenever a row is inserted that + does not contain a specific value + + + + + (optionally) one or more constraints on the values beyond data type + + + + Although PostgreSQL supports dozens of data types, Evergreen makes our life + easier by only using a handful. + + PostgreSQL data types used by Evergreen + + + + + + + + + Type name + Description + Limits + + + + + INTEGER + Medium integer + -2147483648 to +2147483647 + + + BIGINT + Large integer + -9223372036854775808 to 9223372036854775807 + + + SERIAL + Sequential integer + 1 to 2147483647 + + + BIGSERIAL + Large sequential integer + 1 to 9223372036854775807 + + + TEXT + Variable length character data + Unlimited length + + + BOOL + Boolean + TRUE or FALSE + + + TIMESTAMP WITH TIME ZONE + Timestamp + 4713 BC to 294276 AD + + + TIME + Time + Expressed in HH:MM:SS + + + NUMERIC(precision, scale) + Decimal + Up to 1000 digits of precision. In Evergreen mostly used for money + values, with a precision of 6 and a scale of 2 (####.##). + + + +
+ Full details about these data types are available from the + data types section of + the PostgreSQL manual. +
+ + Constraints + + Prevent NULL values + A column definition may include the constraint NOT NULL to prevent NULL + values. In PostgreSQL, a NULL value is not the equivalent of zero or false or + an empty string; it is an explicit non-value with special properties. We’ll + talk more about how to work with NULL values when we get to queries. + + + Primary key + Every table can have at most one primary key. A primary key consists of one or + more columns which together uniquely identify each row in a table. If you + attempt to insert a row into a table that would create a duplicate or NULL + primary key entry, the database rejects the row and returns an error. + Natural primary keys are drawn from the intrinsic properties of the data being + modelled. For example, some potential natural primary keys for a table that + contains people would be: + + Example: Some potential natural primary keys for a table of people + + + + + + + + + Natural key + Pros + Cons + + + + + First name, last name, address + No two people with the same name would ever live at the same address, right? + Lots of columns force data duplication in referencing tables + + + SSN or driver’s license + These are guaranteed to be unique + Lots of people don’t have an SSN or a driver’s license + + + +
+ To avoid problems with natural keys, many applications instead define surrogate + primary keys. A surrogate primary keys is a column with an autoincrementing + integer value added to a table definition that ensures uniqueness. + Evergreen uses surrogate keys (a column named id with a SERIAL data type) + for most of its tables. +
+ + Foreign keys + Every table can contain zero or more foreign keys: one or more columns that + refer to the primary key of another table. + For example, let’s consider Evergreen’s modelling of the basic relationship + between copies, call numbers, and bibliographic records. Bibliographic records + contained in the biblio.record_entry table can have call numbers attached to + them. Call numbers are contained in the asset.call_number table, and they can + have copies attached to them. Copies are contained in the asset.copy table. + + Example: Evergreen’s copy / call number / bibliographic record relationships + + + + + + + + + + Table + Primary key + Column with a foreign key + Points to + + + + + asset.copy + asset.copy.id + asset.copy.call_number + asset.call_number.id + + + asset.call_number + asset.call_number.id + asset.call_number.record + biblio.record_entry.id + + + biblio.record_entry + biblio.record_entry.id + + + + + +
+
+ + Check constraints + PostgreSQL enables you to define rules to ensure that the value to be inserted + or updated meets certain conditions. For example, you can ensure that an + incoming integer value is within a specific range, or that a ZIP code matches a + particular pattern. + +
+ + Deconstructing a table definition statement + The actor.org_address table is a simple table in the Evergreen schema that + we can use as a concrete example of many of the properties of databases that + we have discussed so far. + CREATE TABLE actor.org_address ( + id SERIAL PRIMARY KEY, + valid BOOL NOT NULL DEFAULT TRUE, + address_type TEXT NOT NULL DEFAULT 'MAILING', + org_unit INT NOT NULL REFERENCES actor.org_unit (id) + DEFERRABLE INITIALLY DEFERRED, + street1 TEXT NOT NULL, + street2 TEXT, + city TEXT NOT NULL, + county TEXT, + state TEXT NOT NULL, + country TEXT NOT NULL, + post_code TEXT NOT NULL + ); + + + + The column named id is defined with a special data type of SERIAL; if + given no value when a row is inserted into a table, the database automatically + generates the next sequential integer value for the column. SERIAL is a + popular data type for a primary key because it is guaranteed to be unique - and + indeed, the constraint for this column identifies it as the PRIMARY KEY. + + + + + The data type BOOL defines a boolean value: TRUE or FALSE are the only + acceptable values for the column. The constraint NOT NULL instructs the + database to prevent the column from ever containing a NULL value. The column + property DEFAULT TRUE instructs the database to automatically set the value + of the column to TRUE if no value is provided. + + + + + The data type TEXT defines a text column of practically unlimited length. + As with the previous column, there is a NOT NULL constraint, and a default + value of 'MAILING' will result if no other value is supplied. + + + + + The REFERENCES actor.org_unit (id) clause indicates that this column has a + foreign key relationship to the actor.org_unit table, and that the value of + this column in every row in this table must have a corresponding value in the + id column in the referenced table (actor.org_unit). + + + + + The column named street2 demonstrates that not all columns have constraints + beyond data type. In this case, the column is allowed to be NULL or to contain a + TEXT value. + + + + + + Displaying a table definition using <literal>psql</literal> + The psql command-line interface is the preferred method for accessing + PostgreSQL databases. It offers features like tab-completion, readline support + for recalling previous commands, flexible input and output formats, and + is accessible via a standard SSH session. + If you press the Tab key once after typing one or more characters of the + database object name, psql automatically completes the name if there are no + other matches. If there are other matches for your current input, nothing + happens until you press the Tab key a second time, at which point psql + displays all of the matches for your current input. + To display the definition of a database object such as a table, issue the + command \d _object-name_. For example, to display the definition of the + actor.usr_note table: + $ psql evergreen + psql (8.4.1) + Type "help" for help. + + evergreen=# \d actor.usr_note + Table "actor.usr_note" + Column | Type | Modifiers + -------------+--------------------------+------------------------------------------------------------- + id | bigint | not null default nextval('actor.usr_note_id_seq'::regclass) + usr | bigint | not null + creator | bigint | not null + create_date | timestamp with time zone | default now() + pub | boolean | not null default false + title | text | not null + value | text | not null + Indexes: + "usr_note_pkey" PRIMARY KEY, btree (id) + "actor_usr_note_creator_idx" btree (creator) + "actor_usr_note_usr_idx" btree (usr) + Foreign-key constraints: + "usr_note_creator_fkey" FOREIGN KEY (creator) REFERENCES actor.usr(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED + "usr_note_usr_fkey" FOREIGN KEY (usr) REFERENCES actor.usr(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED + + evergreen=# \q + $ + + + + This is the most basic connection to a PostgreSQL database. You can use a + number of other flags to specify user name, hostname, port, and other options. + + + + + The \d command displays the definition of a database object. + + + + + The \q command quits the psql session and returns you to the shell prompt. + + + + +
+
+ Basic SQL queries + + The SELECT statement + The SELECT statement is the basic tool for retrieving information from a + database. The syntax for most SELECT statements is: +
+ SELECT [columns(s)] + FROM [table(s)] + [WHERE condition(s)] + [GROUP BY columns(s)] + [HAVING grouping-condition(s)] + [ORDER BY column(s)] + [LIMIT maximum-results] + [OFFSET start-at-result-#] + ; +
+ For example, to select all of the columns for each row in the + actor.usr_address table, issue the following query: + SELECT * + FROM actor.usr_address + ; +
+ + Selecting particular columns from a table + SELECT * returns all columns from all of the tables included in your query. + However, quite often you will want to return only a subset of the possible + columns. You can retrieve specific columns by listing the names of the columns + you want after the SELECT keyword. Separate each column name with a comma. + For example, to select just the city, county, and state from the + actor.usr_address table, issue the following query: + SELECT city, county, state + FROM actor.usr_address + ; + + + Sorting results with the ORDER BY clause + By default, a SELECT statement returns rows matching your query with no + guarantee of any particular order in which they are returned. To force + the rows to be returned in a particular order, use the ORDER BY clause + to specify one or more columns to determine the sorting priority of the + rows. + For example, to sort the rows returned from your actor.usr_address query by + city, with county and then zip code as the tie breakers, issue the + following query: + SELECT city, county, state + FROM actor.usr_address + ORDER BY city, county, post_code + ; + + + Filtering results with the WHERE clause + Thus far, your results have been returning all of the rows in the table. + Normally, however, you would want to restrict the rows that are returned to the + subset of rows that match one or more conditions of your search. The WHERE + clause enables you to specify a set of conditions that filter your query + results. Each condition in the WHERE clause is an SQL expression that returns + a boolean (true or false) value. + For example, to restrict the results returned from your actor.usr_address + query to only those rows containing a state value of Connecticut, issue the + following query: + SELECT city, county, state + FROM actor.usr_address + WHERE state = 'Connecticut' + ORDER BY city, county, post_code + ; + You can include more conditions in the WHERE clause with the OR and AND + operators. For example, to further restrict the results returned from your + actor.usr_address query to only those rows where the state column contains a + value of Connecticut and the city column contains a value of Hartford, + issue the following query: + SELECT city, county, state + FROM actor.usr_address + WHERE state = 'Connecticut' + AND city = 'Hartford' + ORDER BY city, county, post_code + ; + To return rows where the state is Connecticut and the city is Hartford or + New Haven, you must use parentheses to explicitly group the city value + conditions together, or else the database will evaluate the OR city = 'New + Haven' clause entirely on its own and match all rows where the city column is + New Haven, even though the state might not be Connecticut. + Trouble with OR + SELECT city, county, state + FROM actor.usr_address + WHERE state = 'Connecticut' + AND city = 'Hartford' OR city = 'New Haven' + ORDER BY city, county, post_code + ; + + -- Can return unwanted rows because the OR is not grouped! + + Grouped OR’ed conditions + SELECT city, county, state + FROM actor.usr_address + WHERE state = 'Connecticut' + AND (city = 'Hartford' OR city = 'New Haven') + ORDER BY city, county, post_code + ; + + -- The parentheses ensure that the OR is applied to the cities, and the + -- state in either case must be 'Connecticut' + + + Comparison operators + Here is a partial list of comparison operators that are commonly used in + WHERE clauses: + + Comparing two scalar values + + + + x = y (equal to) + + + + + x != y (not equal to) + + + + + x < y (less than) + + + + + x > y (greater than) + + + + + x LIKE y (TEXT value x matches a subset of TEXT y, where y is a string that + can contain % as a wildcard for 0 or more characters, and _ as a wildcard + for a single character. For example, WHERE 'all you can eat fish and chips + and a big stick' LIKE '%fish%stick' would return TRUE) + + + + + x ILIKE y (like LIKE, but the comparison ignores upper-case / lower-case) + + + + + x IN y (x is in the list of values y, where y can be a list or a SELECT + statement that returns a list) + + + + + + + + NULL values + SQL databases have a special way of representing the value of a column that has + no value: NULL. A NULL value is not equal to zero, and is not an empty + string; it is equal to nothing, not even another NULL, because it has no value + that can be compared. + To return rows from a table where a given column is not NULL, use the + IS NOT NULL comparison operator. + Retrieving rows where a column is not <literal>NULL</literal> + SELECT id, first_given_name, family_name + FROM actor.usr + WHERE second_given_name IS NOT NULL + ; + + Similarly, to return rows from a table where a given column is NULL, use + the IS NULL comparison operator. + Retrieving rows where a column is <literal>NULL</literal> + SELECT id, first_given_name, second_given_name, family_name + FROM actor.usr + WHERE second_given_name IS NULL + ; + + id | first_given_name | second_given_name | family_name + ----+------------------+-------------------+---------------- + 1 | Administrator | | System Account + (1 row) + + Notice that the NULL value in the output is displayed as empty space, + indistinguishable from an empty string; this is the default display method in + psql. You can change the behaviour of psql using the pset command: + Changing the way <literal>NULL</literal> values are displayed in <literal>psql</literal> + evergreen=# \pset null '(null)' + Null display is '(null)'. + + SELECT id, first_given_name, second_given_name, family_name + FROM actor.usr + WHERE second_given_name IS NULL + ; + + id | first_given_name | second_given_name | family_name + ----+------------------+-------------------+---------------- + 1 | Administrator | (null) | System Account + (1 row) + + Database queries within programming languages such as Perl and C have + special methods of checking for NULL values in returned results. + + + Text delimiter: ' + You might have noticed that we have been using the ' character to delimit + TEXT values and values such as dates and times that are TEXT values. Sometimes, + however, your TEXT value itself contains a ' character, such as the word + you’re. To prevent the database from prematurely ending the TEXT value at the + first ' character and returning a syntax error, use another ' character to + escape the following ' character. + For example, to change the last name of a user in the actor.usr table to + L’estat, issue the following SQL: + Escaping <literal>'</literal> in TEXT values + UPDATE actor.usr + SET family_name = 'L''estat' + WHERE profile IN ( + SELECT id + FROM permission.grp_tree + WHERE name = 'Vampire' + ) + ; + + When you retrieve the row from the database, the value is displayed with just + a single ' character: + SELECT id, family_name + FROM actor.usr + WHERE family_name = 'L''estat' + ; + + id | family_name + ----+------------- + 1 | L'estat + (1 row) + + + Grouping and eliminating results with the GROUP BY and HAVING clauses + The GROUP BY clause returns a unique set of results for the desired columns. + This is most often used in conjunction with an aggregate function to present + results for a range of values in a single query, rather than requiring you to + issue one query per target value. + Returning unique results of a single column with <literal>GROUP BY</literal> + SELECT grp + FROM permission.grp_perm_map + GROUP BY grp + ORDER BY grp; + + grp + -----+ + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 10 + (8 rows) + + While GROUP BY can be useful for a single column, it is more often used + to return the distinct results across multiple columns. For example, the + following query shows us which groups have permissions at each depth in + the library hierarchy: + Returning unique results of multiple columns with <literal>GROUP BY</literal> + SELECT grp, depth + FROM permission.grp_perm_map + GROUP BY grp, depth + ORDER BY depth, grp; + + grp | depth + -----+------- + 1 | 0 + 2 | 0 + 3 | 0 + 4 | 0 + 5 | 0 + 10 | 0 + 3 | 1 + 4 | 1 + 5 | 1 + 6 | 1 + 7 | 1 + 10 | 1 + 3 | 2 + 4 | 2 + 10 | 2 + (15 rows) + + Extending this further, you can use the COUNT() aggregate function to + also return the number of times each unique combination of grp and depth + appears in the table. Yes, this is a sneak peek at the use of aggregate + functions! Keeners. + Counting unique column combinations with <literal>GROUP BY</literal> + SELECT grp, depth, COUNT(grp) + FROM permission.grp_perm_map + GROUP BY grp, depth + ORDER BY depth, grp; + + grp | depth | count + -----+-------+------- + 1 | 0 | 6 + 2 | 0 | 2 + 3 | 0 | 45 + 4 | 0 | 3 + 5 | 0 | 5 + 10 | 0 | 1 + 3 | 1 | 3 + 4 | 1 | 4 + 5 | 1 | 1 + 6 | 1 | 9 + 7 | 1 | 5 + 10 | 1 | 10 + 3 | 2 | 24 + 4 | 2 | 8 + 10 | 2 | 7 + (15 rows) + + You can use the WHERE clause to restrict the returned results before grouping + is applied to the results. The following query restricts the results to those + rows that have a depth of 0. + Using the <literal>WHERE</literal> clause with <literal>GROUP BY</literal> + SELECT grp, COUNT(grp) + FROM permission.grp_perm_map + WHERE depth = 0 + GROUP BY grp + ORDER BY 2 DESC + ; + + grp | count + -----+------- + 3 | 45 + 1 | 6 + 5 | 5 + 4 | 3 + 2 | 2 + 10 | 1 + (6 rows) + + To restrict results after grouping has been applied to the rows, use the + HAVING clause; this is typically used to restrict results based on + a comparison to the value returned by an aggregate function. For example, + the following query restricts the returned rows to those that have more than + 5 occurrences of the same value for grp in the table. + <literal>GROUP BY</literal> restricted by a <literal>HAVING</literal> clause + SELECT grp, COUNT(grp) + FROM permission.grp_perm_map + GROUP BY grp + HAVING COUNT(grp) > 5 + ; + + grp | count + -----+------- + 6 | 9 + 4 | 15 + 5 | 6 + 1 | 6 + 3 | 72 + 10 | 18 + (6 rows) + + + + Eliminating duplicate results with the DISTINCT keyword + GROUP BY is one way of eliminating duplicate results from the rows returned + by your query. The purpose of the DISTINCT keyword is to remove duplicate + rows from the results of your query. However, it works, and it is easy - so if + you just want a quick list of the unique set of values for a column or set of + columns, the DISTINCT keyword might be appropriate. + On the other hand, if you are getting duplicate rows back when you don’t expect + them, then applying the DISTINCT keyword might be a sign that you are + papering over a real problem. + Returning unique results of multiple columns with <literal>DISTINCT</literal> + SELECT DISTINCT grp, depth + FROM permission.grp_perm_map + ORDER BY depth, grp + ; + + grp | depth + -----+------- + 1 | 0 + 2 | 0 + 3 | 0 + 4 | 0 + 5 | 0 + 10 | 0 + 3 | 1 + 4 | 1 + 5 | 1 + 6 | 1 + 7 | 1 + 10 | 1 + 3 | 2 + 4 | 2 + 10 | 2 + (15 rows) + + + + Paging through results with the LIMIT and OFFSET clauses + The LIMIT clause restricts the total number of rows returned from your query + and is useful if you just want to list a subset of a large number of rows. For + example, in the following query we list the five most frequently used + circulation modifiers: + Using the <literal>LIMIT</literal> clause to restrict results + SELECT circ_modifier, COUNT(circ_modifier) + FROM asset.copy + GROUP BY circ_modifier + ORDER BY 2 DESC + LIMIT 5 + ; + + circ_modifier | count + ---------------+-------- + CIRC | 741995 + BOOK | 636199 + SER | 265906 + DOC | 191598 + LAW MONO | 126627 + (5 rows) + + When you use the LIMIT clause to restrict the total number of rows returned + by your query, you can also use the OFFSET clause to determine which subset + of the rows will be returned. The use of the OFFSET clause assumes that + you’ve used the ORDER BY clause to impose order on the results. + In the following example, we use the OFFSET clause to get results 6 through + 10 from the same query that we prevously executed. + Using the <literal>OFFSET</literal> clause to return a specific subset of rows + SELECT circ_modifier, COUNT(circ_modifier) + FROM asset.copy + GROUP BY circ_modifier + ORDER BY 2 DESC + LIMIT 5 + OFFSET 5 + ; + + circ_modifier | count + ---------------+-------- + LAW SERIAL | 102758 + DOCUMENTS | 86215 + BOOK_WEB | 63786 + MFORM SER | 39917 + REF | 34380 + (5 rows) + + +
+
+ Advanced SQL queries + + Transforming column values with functions + PostgreSQL includes many built-in functions for manipulating column data. + You can also create your own functions (and Evergreen does make use of + many custom functions). There are two types of functions used in + databases: scalar functions and aggregate functions. + + Scalar functions + Scalar functions transform each value of the target column. If your query + would return 50 values for a column in a given query, and you modify your + query to apply a scalar function to the values returned for that column, + it will still return 50 values. For example, the UPPER() function, + used to convert text values to upper-case, modifies the results in the + following set of queries: + Using the UPPER() scalar function to convert text values to upper-case + -- First, without the UPPER() function for comparison + SELECT shortname, name + FROM actor.org_unit + WHERE id < 4 + ; + + shortname | name + -----------+----------------------- + CONS | Example Consortium + SYS1 | Example System 1 + SYS2 | Example System 2 + (3 rows) + + -- Now apply the UPPER() function to the name column + SELECT shortname, UPPER(name) + FROM actor.org_unit + WHERE id < 4 + ; + + shortname | upper + -----------+-------------------- + CONS | EXAMPLE CONSORTIUM + SYS1 | EXAMPLE SYSTEM 1 + SYS2 | EXAMPLE SYSTEM 2 + (3 rows) + + There are so many scalar functions in PostgreSQL that we cannot cover them + all here, but we can list some of the most commonly used functions: + + + + || - concatenates two text values together + + + + + COALESCE() - returns the first non-NULL value from the list of arguments + + + + + LOWER() - returns a text value converted to lower-case + + + + + REPLACE() - returns a text value after replacing all occurrences of a given text value with a different text value + + + + + REGEXP_REPLACE() - returns a text value after being transformed by a regular expression + + + + + UPPER() - returns a text value converted to upper-case + + + + For a complete list of scalar functions, see + the PostgreSQL function documentation. + + + Aggregate functions + Aggregate functions return a single value computed from the the complete set of + values returned for the specified column. + + + + AVG() + + + + + COUNT() + + + + + MAX() + + + + + MIN() + + + + + SUM() + + + + + + + Sub-selects + A sub-select is the technique of using the results of one query to feed + into another query. You can, for example, return a set of values from + one column in a SELECT statement to be used to satisfy the IN() condition + of another SELECT statement; or you could return the MAX() value of a + column in a SELECT statement to match the = condition of another SELECT + statement. + For example, in the following query we use a sub-select to restrict the copies + returned by the main SELECT statement to only those locations that have an + opac_visible value of TRUE: + Sub-select example + SELECT call_number + FROM asset.copy + WHERE deleted IS FALSE + AND location IN ( + SELECT id + FROM asset.copy_location + WHERE opac_visible IS TRUE + ) + ; + + Sub-selects can be an approachable way to breaking down a problem that + requires matching values between different tables, and often result in + a clearly expressed solution to a problem. However, if you start writing + sub-selects within sub-selects, you should consider tackling the problem + with joins instead. + + + Joins + Joins enable you to access the values from multiple tables in your query + results and comparison operators. For example, joins are what enable you to + relate a bibliographic record to a barcoded copy via the biblio.record_entry, + asset.call_number, and asset.copy tables. In this section, we discuss the + most common kind of join—the inner join—as well as the less common outer join + and some set operations which can compare and contrast the values returned by + separate queries. + When we talk about joins, we are going to talk about the left-hand table and + the right-hand table that participate in the join. Every join brings together + just two tables - but you can use an unlimited (for our purposes) number + of joins in a single SQL statement. Each time you use a join, you effectively + create a new table, so when you add a second join clause to a statement, + table 1 and table 2 (which were the left-hand table and the right-hand table + for the first join) now act as a merged left-hand table and the new table + in the second join clause is the right-hand table. + Clear as mud? Okay, let’s look at some examples. + + Inner joins + An inner join returns all of the columns from the left-hand table in the join + with all of the columns from the right-hand table in the joins that match a + condition in the ON clause. Typically, you use the = operator to match the + foreign key of the left-hand table with the primary key of the right-hand + table to follow the natural relationship between the tables. + In the following example, we return all of columns from the actor.usr and + actor.org_unit tables, joined on the relationship between the user’s home + library and the library’s ID. Notice in the results that some columns, like + id and mailing_address, appear twice; this is because both the actor.usr + and actor.org_unit tables include columns with these names. This is also why + we have to fully qualify the column names in our queries with the schema and + table names. + A simple inner join + SELECT * + FROM actor.usr + INNER JOIN actor.org_unit ON actor.usr.home_ou = actor.org_unit.id + WHERE actor.org_unit.shortname = 'CONS' + ; + + -[ RECORD 1 ]------------------+--------------------------------- + id | 1 + card | 1 + profile | 1 + usrname | admin + email | + ... + mailing_address | + billing_address | + home_ou | 1 + ... + claims_never_checked_out_count | 0 + id | 1 + parent_ou | + ou_type | 1 + ill_address | 1 + holds_address | 1 + mailing_address | 1 + billing_address | 1 + shortname | CONS + name | Example Consortium + email | + phone | + opac_visible | t + fiscal_calendar | 1 + + Of course, you do not have to return every column from the joined tables; + you can (and should) continue to specify only the columns that you want to + return. In the following example, we count the number of borrowers for + every user profile in a given library by joining the permission.grp_tree + table where profiles are defined against the actor.usr table, and then + joining the actor.org_unit table to give us access to the user’s home + library: + Borrower Count by Profile (Adult, Child, etc)/Library + SELECT permission.grp_tree.name, actor.org_unit.name, COUNT(permission.grp_tree.name) + FROM actor.usr + INNER JOIN permission.grp_tree + ON actor.usr.profile = permission.grp_tree.id + INNER JOIN actor.org_unit + ON actor.org_unit.id = actor.usr.home_ou + WHERE actor.usr.deleted IS FALSE + GROUP BY permission.grp_tree.name, actor.org_unit.name + ORDER BY actor.org_unit.name, permission.grp_tree.name + ; + + name | name | count + -------+--------------------+------- + Users | Example Consortium | 1 + (1 row) + + + + Aliases + So far we have been fully-qualifying all of our table names and column names to + prevent any confusion. This quickly gets tiring with lengthy qualified + table names like permission.grp_tree, so the SQL syntax enables us to assign + aliases to table names and column names. When you define an alias for a table + name, you can access its column throughout the rest of the statement by simply + appending the column name to the alias with a period; for example, if you assign + the alias au to the actor.usr table, you can access the actor.usr.id + column through the alias as au.id. + The formal syntax for declaring an alias for a column is to follow the column + name in the result columns clause with AS alias. To declare an alias for a table name, + follow the table name in the FROM clause (including any JOIN statements) with + AS alias. However, the AS keyword is optional for tables (and columns as + of PostgreSQL 8.4), and in practice most SQL statements leave it out. For + example, we can write the previous INNER JOIN statement example using aliases + instead of fully-qualified identifiers: + Borrower Count by Profile (using aliases) + SELECT pgt.name AS "Profile", aou.name AS "Library", COUNT(pgt.name) AS "Count" + FROM actor.usr au + INNER JOIN permission.grp_tree pgt + ON au.profile = pgt.id + INNER JOIN actor.org_unit aou + ON aou.id = au.home_ou + WHERE au.deleted IS FALSE + GROUP BY pgt.name, aou.name + ORDER BY aou.name, pgt.name + ; + + Profile | Library | Count + ---------+--------------------+------- + Users | Example Consortium | 1 + (1 row) + + A nice side effect of declaring an alias for your columns is that the alias + is used as the column header in the results table. The previous version of + the query, which didn’t use aliased column names, had two columns named + name; this version of the query with aliases results in a clearer + categorization. + + + Outer joins + An outer join returns all of the rows from one or both of the tables + participating in the join. + + + + For a LEFT OUTER JOIN, the join returns all of the rows from the left-hand + table and the rows matching the join condition from the right-hand table, with + NULL values for the rows with no match in the right-hand table. + + + + + A RIGHT OUTER JOIN behaves in the same way as a LEFT OUTER JOIN, with the + exception that all rows are returned from the right-hand table participating in + the join. + + + + + For a FULL OUTER JOIN, the join returns all the rows from both the left-hand + and right-hand tables, with NULL values for the rows with no match in either + the left-hand or right-hand table. + + + + Base tables for the OUTER JOIN examples + SELECT * FROM aaa; + + id | stuff + ----+------- + 1 | one + 2 | two + 3 | three + 4 | four + 5 | five + (5 rows) + + SELECT * FROM bbb; + + id | stuff | foo + ----+-------+---------- + 1 | one | oneone + 2 | two | twotwo + 5 | five | fivefive + 6 | six | sixsix + (4 rows) + + Example of a LEFT OUTER JOIN + SELECT * FROM aaa + LEFT OUTER JOIN bbb ON aaa.id = bbb.id + ; + id | stuff | id | stuff | foo + ----+-------+----+-------+---------- + 1 | one | 1 | one | oneone + 2 | two | 2 | two | twotwo + 3 | three | | | + 4 | four | | | + 5 | five | 5 | five | fivefive + (5 rows) + + Example of a RIGHT OUTER JOIN + SELECT * FROM aaa + RIGHT OUTER JOIN bbb ON aaa.id = bbb.id + ; + id | stuff | id | stuff | foo + ----+-------+----+-------+---------- + 1 | one | 1 | one | oneone + 2 | two | 2 | two | twotwo + 5 | five | 5 | five | fivefive + | | 6 | six | sixsix + (4 rows) + + Example of a FULL OUTER JOIN + SELECT * FROM aaa + FULL OUTER JOIN bbb ON aaa.id = bbb.id + ; + id | stuff | id | stuff | foo + ----+-------+----+-------+---------- + 1 | one | 1 | one | oneone + 2 | two | 2 | two | twotwo + 3 | three | | | + 4 | four | | | + 5 | five | 5 | five | fivefive + | | 6 | six | sixsix + (6 rows) + + + + Self joins + It is possible to join a table to itself. You can, in fact you must, use + aliases to disambiguate the references to the table. + + + + Set operations + Relational databases are effectively just an efficient mechanism for + manipulating sets of values; they are implementations of set theory. There are + three operators for sets (tables) in which each set must have the same number + of columns with compatible data types: the union, intersection, and difference + operators. + Base tables for the set operation examples + SELECT * FROM aaa; + + id | stuff + ----+------- + 1 | one + 2 | two + 3 | three + 4 | four + 5 | five + (5 rows) + + SELECT * FROM bbb; + + id | stuff | foo + ----+-------+---------- + 1 | one | oneone + 2 | two | twotwo + 5 | five | fivefive + 6 | six | sixsix + (4 rows) + + + Union + The UNION operator returns the distinct set of rows that are members of + either or both of the left-hand and right-hand tables. The UNION operator + does not return any duplicate rows. To return duplicate rows, use the + UNION ALL operator. + Example of a UNION set operation + -- The parentheses are not required, but are intended to help + -- illustrate the sets participating in the set operation + ( + SELECT id, stuff + FROM aaa + ) + UNION + ( + SELECT id, stuff + FROM bbb + ) + ORDER BY 1 + ; + + id | stuff + ----+------- + 1 | one + 2 | two + 3 | three + 4 | four + 5 | five + 6 | six + (6 rows) + + + + Intersection + The INTERSECT operator returns the distinct set of rows that are common to + both the left-hand and right-hand tables. To return duplicate rows, use the + INTERSECT ALL operator. + Example of an INTERSECT set operation + ( + SELECT id, stuff + FROM aaa + ) + INTERSECT + ( + SELECT id, stuff + FROM bbb + ) + ORDER BY 1 + ; + + id | stuff + ----+------- + 1 | one + 2 | two + 5 | five + (3 rows) + + + + Difference + The EXCEPT operator returns the rows in the left-hand table that do not + exist in the right-hand table. You are effectively subtracting the common + rows from the left-hand table. + Example of an EXCEPT set operation + ( + SELECT id, stuff + FROM aaa + ) + EXCEPT + ( + SELECT id, stuff + FROM bbb + ) + ORDER BY 1 + ; + + id | stuff + ----+------- + 3 | three + 4 | four + (2 rows) + + -- Order matters: switch the left-hand and right-hand tables + -- and you get a different result + ( + SELECT id, stuff + FROM bbb + ) + EXCEPT + ( + SELECT id, stuff + FROM aaa + ) + ORDER BY 1 + ; + + id | stuff + ----+------- + 6 | six + (1 row) + + + + + Views + A view is a persistent SELECT statement that acts like a read-only table. + To create a view, issue the CREATE VIEW statement, giving the view a name + and a SELECT statement on which the view is built. + The following example creates a view based on our borrower profile count: + Creating a view + CREATE VIEW actor.borrower_profile_count AS + SELECT pgt.name AS "Profile", aou.name AS "Library", COUNT(pgt.name) AS "Count" + FROM actor.usr au + INNER JOIN permission.grp_tree pgt + ON au.profile = pgt.id + INNER JOIN actor.org_unit aou + ON aou.id = au.home_ou + WHERE au.deleted IS FALSE + GROUP BY pgt.name, aou.name + ORDER BY aou.name, pgt.name + ; + + When you subsequently select results from the view, you can apply additional + WHERE clauses to filter the results, or ORDER BY clauses to change the + order of the returned rows. In the following examples, we issue a simple + SELECT * statement to show that the default results are returned in the + same order from the view as the equivalent SELECT statement would be returned. + Then we issue a SELECT statement with a WHERE clause to further filter the + results. + Selecting results from a view + SELECT * FROM actor.borrower_profile_count; + + Profile | Library | Count + ----------------------------+----------------------------+------- + Faculty | University Library | 208 + Graduate | University Library | 16 + Patrons | University Library | 62 + ... + + -- You can still filter your results with WHERE clauses + SELECT * + FROM actor.borrower_profile_count + WHERE "Profile" = 'Faculty'; + + Profile | Library | Count + ---------+----------------------------+------- + Faculty | University Library | 208 + Faculty | College Library | 64 + Faculty | College Library 2 | 102 + Faculty | University Library 2 | 776 + (4 rows) + + + + Inheritance + PostgreSQL supports table inheritance: that is, a child table inherits its + base definition from a parent table, but can add additional columns to its + own definition. The data from any child tables is visible in queries against + the parent table. + Evergreen uses table inheritance in several areas: + * In the Vandelay MARC batch importer / exporter, Evergreen defines base + tables for generic queues and queued records for which authority record and + bibliographic record child tables + * Billable transactions are based on the money.billable_xact table; + child tables include action.circulation for circulation transactions + and money.grocery for general bills. + * Payments are based on the money.payment table; its child table is + money.bnm_payment (for brick-and-mortar payments), which in turn has child + tables of money.forgive_payment, money.work_payment, money.credit_payment, + money.goods_payment, and money.bnm_desk_payment. The + money.bnm_desk_payment table in turn has child tables of money.cash_payment, + money.check_payment, and money.credit_card_payment. + * Transits are based on the action.transit_copy table, which has a child + table of action.hold_transit_copy for transits initiated by holds. + * Generic acquisition line items are defined by the + acq.lineitem_attr_definition table, which in turn has a number of child + tables to define MARC attributes, generated attributes, user attributes, and + provider attributes. + +
+
+ Understanding query performance with EXPLAIN + Some queries run for a long, long time. This can be the result of a poorly + written query—a query with a join condition that joins every + row in the biblio.record_entry table with every row in the metabib.full_rec + view would consume a massive amount of memory and disk space and CPU time—or + a symptom of a schema that needs some additional indexes. PostgreSQL provides + the EXPLAIN tool to estimate how long it will take to run a given query and + show you the query plan (how it plans to retrieve the results from the + database). + To generate the query plan without actually running the statement, simply + prepend the EXPLAIN keyword to your query. In the following example, we + generate the query plan for the poorly written query that would join every + row in the biblio.record_entry table with every row in the metabib.full_rec + view: + Query plan for a terrible query + EXPLAIN SELECT * + FROM biblio.record_entry + FULL OUTER JOIN metabib.full_rec ON 1=1 + ; + + QUERY PLAN + -------------------------------------------------------------------------------// + Merge Full Join (cost=0.00..4959156437783.60 rows=132415734100864 width=1379) + -> Seq Scan on record_entry (cost=0.00..400634.16 rows=2013416 width=1292) + -> Seq Scan on real_full_rec (cost=0.00..1640972.04 rows=65766704 width=87) + (3 rows) + + This query plan shows that the query would return 132415734100864 rows, and it + plans to accomplish what you asked for by sequentially scanning (Seq Scan) + every row in each of the tables participating in the join. + In the following example, we have realized our mistake in joining every row of + the left-hand table with every row in the right-hand table and take the saner + approach of using an INNER JOIN where the join condition is on the record ID. + Query plan for a less terrible query + EXPLAIN SELECT * + FROM biblio.record_entry bre + INNER JOIN metabib.full_rec mfr ON mfr.record = bre.id; + QUERY PLAN + ----------------------------------------------------------------------------------------// + Hash Join (cost=750229.86..5829273.98 rows=65766704 width=1379) + Hash Cond: (real_full_rec.record = bre.id) + -> Seq Scan on real_full_rec (cost=0.00..1640972.04 rows=65766704 width=87) + -> Hash (cost=400634.16..400634.16 rows=2013416 width=1292) + -> Seq Scan on record_entry bre (cost=0.00..400634.16 rows=2013416 width=1292) + (5 rows) + + This time, we will return 65766704 rows - still way too many rows. We forgot + to include a WHERE clause to limit the results to something meaningful. In + the following example, we will limit the results to deleted records that were + modified in the last month. + Query plan for a realistic query + EXPLAIN SELECT * + FROM biblio.record_entry bre + INNER JOIN metabib.full_rec mfr ON mfr.record = bre.id + WHERE bre.deleted IS TRUE + AND DATE_TRUNC('MONTH', bre.edit_date) > + DATE_TRUNC ('MONTH', NOW() - '1 MONTH'::INTERVAL) + ; + + QUERY PLAN + ----------------------------------------------------------------------------------------// + Hash Join (cost=5058.86..2306218.81 rows=201669 width=1379) + Hash Cond: (real_full_rec.record = bre.id) + -> Seq Scan on real_full_rec (cost=0.00..1640972.04 rows=65766704 width=87) + -> Hash (cost=4981.69..4981.69 rows=6174 width=1292) + -> Index Scan using biblio_record_entry_deleted on record_entry bre + (cost=0.00..4981.69 rows=6174 width=1292) + Index Cond: (deleted = true) + Filter: ((deleted IS TRUE) AND (date_trunc('MONTH'::text, edit_date) + > date_trunc('MONTH'::text, (now() - '1 mon'::interval)))) + (7 rows) + + We can see that the number of rows returned is now only 201669; that’s + something we can work with. Also, the overall cost of the query is 2306218, + compared to 4959156437783 in the original query. The Index Scan tells us + that the query planner will use the index that was defined on the deleted + column to avoid having to check every row in the biblio.record_entry table. + However, we are still running a sequential scan over the + metabib.real_full_rec table (the table on which the metabib.full_rec + view is based). Given that linking from the bibliographic records to the + flattened MARC subfields is a fairly common operation, we could create a + new index and see if that speeds up our query plan. + Query plan with optimized access via a new index + -- This index will take a long time to create on a large database + -- of bibliographic records + CREATE INDEX bib_record_idx ON metabib.real_full_rec (record); + + EXPLAIN SELECT * + FROM biblio.record_entry bre + INNER JOIN metabib.full_rec mfr ON mfr.record = bre.id + WHERE bre.deleted IS TRUE + AND DATE_TRUNC('MONTH', bre.edit_date) > + DATE_TRUNC ('MONTH', NOW() - '1 MONTH'::INTERVAL) + ; + + QUERY PLAN + ----------------------------------------------------------------------------------------// + Nested Loop (cost=0.00..1558330.46 rows=201669 width=1379) + -> Index Scan using biblio_record_entry_deleted on record_entry bre + (cost=0.00..4981.69 rows=6174 width=1292) + Index Cond: (deleted = true) + Filter: ((deleted IS TRUE) AND (date_trunc('MONTH'::text, edit_date) > + date_trunc('MONTH'::text, (now() - '1 mon'::interval)))) + -> Index Scan using bib_record_idx on real_full_rec + (cost=0.00..240.89 rows=850 width=87) + Index Cond: (real_full_rec.record = bre.id) + (6 rows) + + We can see that the resulting number of rows is still the same (201669), but + the execution estimate has dropped to 1558330 because the query planner can + use the new index (bib_record_idx) rather than scanning the entire table. + Success! + While indexes can significantly speed up read access to tables for common + filtering conditions, every time a row is created or updated the corresponding + indexes also need to be maintained - which can decrease the performance of + writes to the database. Be careful to keep the balance of read performance + versus write performance in mind if you plan to create custom indexes in your + Evergreen database. +
+
+ Inserting, updating, and deleting data + + Inserting data + To insert one or more rows into a table, use the INSERT statement to identify + the target table and list the columns in the table for which you are going to + provide values for each row. If you do not list one or more columns contained + in the table, the database will automatically supply a NULL value for those + columns. The values for each row follow the VALUES clause and are grouped in + parentheses and delimited by commas. Each row, in turn, is delimited by commas + (this multiple row syntax requires PostgreSQL 8.2 or higher). + For example, to insert two rows into the permission.usr_grp_map table: + Inserting rows into the <literal>permission.usr_grp_map</literal> table + INSERT INTO permission.usr_grp_map (usr, grp) + VALUES (2, 10), (2, 4) + ; + + Of course, as with the rest of SQL, you can replace individual column values + with one or more use sub-selects: + Inserting rows using sub-selects instead of integers + INSERT INTO permission.usr_grp_map (usr, grp) + VALUES ( + (SELECT id FROM actor.usr + WHERE family_name = 'Scott' AND first_given_name = 'Daniel'), + (SELECT id FROM permission.grp_tree + WHERE name = 'Local System Administrator') + ), ( + (SELECT id FROM actor.usr + WHERE family_name = 'Scott' AND first_given_name = 'Daniel'), + (SELECT id FROM permission.grp_tree + WHERE name = 'Circulator') + ) + ; + + + + Inserting data using a SELECT statement + Sometimes you want to insert a bulk set of data into a new table based on + a query result. Rather than a VALUES clause, you can use a SELECT + statement to insert one or more rows matching the column definitions. This + is a good time to point out that you can include explicit values, instead + of just column identifiers, in the return columns of the SELECT statement. + The explicit values are returned in every row of the result set. + In the following example, we insert 6 rows into the permission.usr_grp_map + table; each row will have a usr column value of 1, with varying values for + the grp column value based on the id column values returned from + permission.grp_tree: + Inserting rows via a <literal>SELECT</literal> statement + INSERT INTO permission.usr_grp_map (usr, grp) + SELECT 1, id + FROM permission.grp_tree + WHERE id > 2 + ; + + INSERT 0 6 + + + + Deleting rows + Deleting data from a table is normally fairly easy. To delete rows from a table, + issue a DELETE statement identifying the table from which you want to delete + rows and a WHERE clause identifying the row or rows that should be deleted. + In the following example, we delete all of the rows from the + permission.grp_perm_map table where the permission maps to + UPDATE_ORG_UNIT_CLOSING and the group is anything other than administrators: + Deleting rows from a table + DELETE FROM permission.grp_perm_map + WHERE grp IN ( + SELECT id + FROM permission.grp_tree + WHERE name != 'Local System Administrator' + ) AND perm = ( + SELECT id + FROM permission.perm_list + WHERE code = 'UPDATE_ORG_UNIT_CLOSING' + ) + ; + + There are two main reasons that a DELETE statement may not actually + delete rows from a table, even when the rows meet the conditional clause. + + + + + If the row contains a value that is the target of a relational constraint, + for example, if another table has a foreign key pointing at your target + table, you will be prevented from deleting a row with a value corresponding + to a row in the dependent table. + + + + + If the table has a rule that substitutes a different action for a DELETE + statement, the deletion will not take place. In Evergreen it is common for a + table to have a rule that substitutes the action of setting a deleted column + to TRUE. For example, if a book is discarded, deleting the row representing + the copy from the asset.copy table would severely affect circulation statistics, + bills, borrowing histories, and their corresponding tables in the database that + have foreign keys pointing at the asset.copy table (action.circulation and + money.billing and its children respectively). Instead, the deleted column + value is set to TRUE and Evergreen’s application logic skips over these rows + in most cases. + + + + + + Updating rows + To update rows in a table, issue an UPDATE statement identifying the table + you want to update, the column or columns that you want to set with their + respective new values, and (optionally) a WHERE clause identifying the row or + rows that should be updated. + Following is the syntax for the UPDATE statement: +
+ UPDATE [table-name] + SET [column] TO [new-value] + WHERE [condition] + ; +
+
+
+
+ Query requests + The following queries were requested by Bibliomation, but might be reusable + by other libraries. + + Monthly circulation stats by collection code / library + Monthly Circulation Stats by Collection Code/Library + SELECT COUNT(acirc.id) AS "COUNT", aou.name AS "Library", acl.name AS "Copy Location" + FROM asset.copy ac + INNER JOIN asset.copy_location acl ON ac.location = acl.id + INNER JOIN action.circulation acirc ON acirc.target_copy = ac.id + INNER JOIN actor.org_unit aou ON acirc.circ_lib = aou.id + WHERE DATE_TRUNC('MONTH', acirc.create_time) = DATE_TRUNC('MONTH', NOW() - INTERVAL '3 month') + AND acirc.desk_renewal IS FALSE + AND acirc.opac_renewal IS FALSE + AND acirc.phone_renewal IS FALSE + GROUP BY aou.name, acl.name + ORDER BY aou.name, acl.name, 1 + ; + + + + Monthly circulation stats by borrower stat / library + Monthly Circulation Stats by Borrower Stat/Library + SELECT COUNT(acirc.id) AS "COUNT", aou.name AS "Library", asceum.stat_cat_entry AS "Borrower Stat" + FROM action.circulation acirc + INNER JOIN actor.org_unit aou ON acirc.circ_lib = aou.id + INNER JOIN actor.stat_cat_entry_usr_map asceum ON asceum.target_usr = acirc.usr + INNER JOIN actor.stat_cat astat ON asceum.stat_cat = astat.id + WHERE DATE_TRUNC('MONTH', acirc.create_time) = DATE_TRUNC('MONTH', NOW() - INTERVAL '3 month') + AND astat.name = 'Preferred language' + AND acirc.desk_renewal IS FALSE + AND acirc.opac_renewal IS FALSE + AND acirc.phone_renewal IS FALSE + GROUP BY aou.name, asceum.stat_cat_entry + ORDER BY aou.name, asceum.stat_cat_entry, 1 + ; + + + + Monthly intralibrary loan stats by library + Monthly Intralibrary Loan Stats by Library + SELECT aou.name AS "Library", COUNT(acirc.id) + FROM action.circulation acirc + INNER JOIN actor.org_unit aou ON acirc.circ_lib = aou.id + INNER JOIN asset.copy ac ON acirc.target_copy = ac.id + INNER JOIN asset.call_number acn ON ac.call_number = acn.id + WHERE acirc.circ_lib != acn.owning_lib + AND DATE_TRUNC('MONTH', acirc.create_time) = DATE_TRUNC('MONTH', NOW() - INTERVAL '3 month') + AND acirc.desk_renewal IS FALSE + AND acirc.opac_renewal IS FALSE + AND acirc.phone_renewal IS FALSE + GROUP by aou.name + ORDER BY aou.name, 2 + ; + + + + Monthly borrowers added by profile (adult, child, etc) / library + Monthly Borrowers Added by Profile (Adult, Child, etc)/Library + SELECT pgt.name AS "Profile", aou.name AS "Library", COUNT(pgt.name) AS "Count" + FROM actor.usr au + INNER JOIN permission.grp_tree pgt + ON au.profile = pgt.id + INNER JOIN actor.org_unit aou + ON aou.id = au.home_ou + WHERE au.deleted IS FALSE + AND DATE_TRUNC('MONTH', au.create_date) = DATE_TRUNC('MONTH', NOW() - '3 months'::interval) + GROUP BY pgt.name, aou.name + ORDER BY aou.name, pgt.name + ; + + + + Borrower count by profile (adult, child, etc) / library + Borrower Count by Profile (Adult, Child, etc)/Library + SELECT pgt.name AS "Profile", aou.name AS "Library", COUNT(pgt.name) AS "Count" + FROM actor.usr au + INNER JOIN permission.grp_tree pgt + ON au.profile = pgt.id + INNER JOIN actor.org_unit aou + ON aou.id = au.home_ou + WHERE au.deleted IS FALSE + GROUP BY pgt.name, aou.name + ORDER BY aou.name, pgt.name + ; + + + + Monthly items added by collection / library + We define a "collection" as a shelving location in Evergreen. + Monthly Items Added by Collection/Library + SELECT aou.name AS "Library", acl.name, COUNT(ac.barcode) + FROM actor.org_unit aou + INNER JOIN asset.call_number acn ON acn.owning_lib = aou.id + INNER JOIN asset.copy ac ON ac.call_number = acn.id + INNER JOIN asset.copy_location acl ON ac.location = acl.id + WHERE ac.deleted IS FALSE + AND acn.deleted IS FALSE + AND DATE_TRUNC('MONTH', ac.create_date) = DATE_TRUNC('MONTH', NOW() - '1 month'::interval) + GROUP BY aou.name, acl.name + ORDER BY aou.name, acl.name + ; + + + + Hold purchase alert by library + in the following set of queries, we bring together the active title, volume, + and copy holds and display those that have more than a certain number of holds + per title. The goal is to UNION ALL the three queries, then group by the + bibliographic record ID and display the title / author information for those + records that have more than a given threshold of holds. + Hold Purchase Alert by Library + -- Title holds + SELECT all_holds.bib_id, aou.name, rmsr.title, rmsr.author, COUNT(all_holds.bib_id) + FROM + ( + ( + SELECT target, request_lib + FROM action.hold_request + WHERE hold_type = 'T' + AND fulfillment_time IS NULL + AND cancel_time IS NULL + ) + UNION ALL + -- Volume holds + ( + SELECT bre.id, request_lib + FROM action.hold_request ahr + INNER JOIN asset.call_number acn ON ahr.target = acn.id + INNER JOIN biblio.record_entry bre ON acn.record = bre.id + WHERE ahr.hold_type = 'V' + AND ahr.fulfillment_time IS NULL + AND ahr.cancel_time IS NULL + ) + UNION ALL + -- Copy holds + ( + SELECT bre.id, request_lib + FROM action.hold_request ahr + INNER JOIN asset.copy ac ON ahr.target = ac.id + INNER JOIN asset.call_number acn ON ac.call_number = acn.id + INNER JOIN biblio.record_entry bre ON acn.record = bre.id + WHERE ahr.hold_type = 'C' + AND ahr.fulfillment_time IS NULL + AND ahr.cancel_time IS NULL + ) + ) AS all_holds(bib_id, request_lib) + INNER JOIN reporter.materialized_simple_record rmsr + INNER JOIN actor.org_unit aou ON aou.id = all_holds.request_lib + ON rmsr.id = all_holds.bib_id + GROUP BY all_holds.bib_id, aou.name, rmsr.id, rmsr.title, rmsr.author + HAVING COUNT(all_holds.bib_id) > 2 + ORDER BY aou.name + ; + + + + Update borrower records with a different home library + In this example, the library has opened a new branch in a growing area, + and wants to reassign the home library for the patrons in the vicinity of + the new branch to the new branch. To accomplish this, we create a staging table + that holds a set of city names and the corresponding branch shortname for the home + library for each city. + Then we issue an UPDATE statement to set the home library for patrons with a + physical address with a city that matches the city names in our staging table. + Update borrower records with a different home library + CREATE SCHEMA staging; + CREATE TABLE staging.city_home_ou_map (city TEXT, ou_shortname TEXT, + FOREIGN KEY (ou_shortname) REFERENCES actor.org_unit (shortname)); + INSERT INTO staging.city_home_ou_map (city, ou_shortname) + VALUES ('Southbury', 'BR1'), ('Middlebury', 'BR2'), ('Hartford', 'BR3'); + BEGIN; + + UPDATE actor.usr au SET home_ou = COALESCE( + ( + SELECT aou.id + FROM actor.org_unit aou + INNER JOIN staging.city_home_ou_map schom ON schom.ou_shortname = aou.shortname + INNER JOIN actor.usr_address aua ON aua.city = schom.city + WHERE au.id = aua.usr + GROUP BY aou.id + ), home_ou) + WHERE ( + SELECT aou.id + FROM actor.org_unit aou + INNER JOIN staging.city_home_ou_map schom ON schom.ou_shortname = aou.shortname + INNER JOIN actor.usr_address aua ON aua.city = schom.city + WHERE au.id = aua.usr + GROUP BY aou.id + ) IS NOT NULL; + + +
+
+ This chapter was taken from Dan Scott's Introduction to SQL for Evergreen Administrators, February 2010. +
+
diff --git a/1.6/development/workshop.xml b/1.6/development/workshop.xml new file mode 100644 index 0000000000..0b11121382 --- /dev/null +++ b/1.6/development/workshop.xml @@ -0,0 +1,1532 @@ + + + + + Evergreen development + February 2010 + + Dan + Scott + dscott@laurentian.ca + + DS +1.0February 2010DS + +
+Part 1: OpenSRF applications +OpenSRF, pronounced "Open Surf", is the Open Service Request +Framework. It was designed as an architecture on which one could +easily build scalable applications. +
+Introduction to OpenSRF +The framework is built on JSON-over-XMPP. XML can be used, but JSON +is much less verbose. XMPP is a standard messaging protocol that +has been used as the backbone of low-latency, high-volume +applications including instant messaging and Google Wave. +OpenSRF offers scalability via its clustering architecture; a service +that is a bottleneck can be moved onto its own server; or multiple +instances of the service can be run on many servers. Services can +themselves be clients of other services. +OpenSRF services listen at an XMPP address such as +"opensrf@private.localhost/open-ils.fielder_drone_at_localhost_7652". +The initial request from an OpenSRF client is directed to the +OpenSRF router, which determines whether the requested service is +accessible to the client (based on the public versus private domains), +and then connects the client to the service for any subsequent +communication that is required. +To significantly improve the speed at which request services can +respond to common requests, OpenSRF has integrated support for the +caching via the memcached daemon. For example, the contents of the +configuration files are cached by the opensrf.settings service when +that service starts, so that rather than having to parse the XML file +every time a service checks a configuration setting, the value can be +retrieved with much less overhead directly from the cache. +if you change a setting in one of those configuration files, you +must restart the opensrf.settings service to update its data. You must +then restart any of the services that make use of that setting to make +the change take effect. +Supports Perl, C, and Python as services and clients, and Java as a +client. JavaScript can access services via HTTP translator and +gateway. JSON library converts messages to/from native structures for +ease of development. +
+
+Configuring OpenSRF +Walk through the configuration files, explaining why we put the values +into the files that we do: + + + +opensrf_core.xml + + + + +Distinguish between public and private services for security of Web-based applications. + + + + +Deprecated HTTP gateway versus OpenSRF-over-HTTP + + + + + + +opensrf.xml + + + +In a clustered OpenSRF instance, these files are normally hosted on +a network share so that each member of the cluster can read them. +
+
+Starting OpenSRF services +I won’t go through this during a live session. Perhaps I can cut this +out entirely… +Issue the following commands as the opensrf user. If you are running OpenSRF +on a single-server machine, you can use the -l flag to force the hostname +to be treated as localhost. + + + +Start the OpenSRF router: + +osrf_ctl.sh -a start_router +The router must only run on a single machine in a given brick. + + + +Start all OpenSRF Perl services defined for this host: + +osrf_ctl.sh -a start_perl +You can start an individual Perl service using: +opensrf-perl.pl -s <service-name> -a start -p <PID-directory> + + + +Start all OpenSRF C services defined for this host: + +osrf_ctl.sh -a start_c + + +
+
+Stopping OpenSRF services +Issue the following commands as the opensrf user. If you are running OpenSRF +on a single-server machine, you can use the -l flag to force the hostname +to be treated as localhost. + + + +Stop the OpenSRF router: + +osrf_ctl.sh -a stop_router + + + +Stop all OpenSRF Perl services defined for this host: + +osrf_ctl.sh -a stop_perl +You can stop an individual Perl service using: +opensrf-perl.pl -s <service-name> -a stop -p <PID-directory> + + + +Stop all OpenSRF C services defined for this host: + +osrf_ctl.sh -a stop_c + + +PID files for OpenSRF services are stored and looked up +in /openils/var/run by default with osrf_ctl.sh, and in +/tmp/ with opensrf-perl.pl. For a clustered server instance +of Evergreen, you must store the PIDs on a directory that is local +to each server, or else one of your cluster servers may try +killing processes on itself that actually have PIDs on other servers. +
+
+
+Examining sample code +Show internal documentation for methods. Do some stupid srfsh tricks +(introspect for one) and show docgen.xsl in action. +
+SRFSH stupid tricks +srfsh# introspect open-ils.auth +... returns documentation for all methods registered for open-ils.auth + +srfsh# introspect open-ils.auth "open-ils.auth.authenticate" +... returns documentation for all methods with names beginning with + "open-ils.auth.authenticate" registered for open-ils.auth + +srfsh# open open-ils.cstore +... begins a stateful connection with open-ils.cstore +srfsh# request open-ils.cstore open-ils.cstore.transaction.begin +... begins a transaction +srfsh# request open-ils.cstore open-ils.cstore.direct.config.language_map.delete \ + {"code": {"like":"a%"}} +... deletes all of the entries from config.language_map that have a +... code beginning with "e" +srfsh# request open-ils.cstore open-ils.cstore.transaction.rollback +... rolls back the transaction +srfsh# close open-ils.cstore +... closes the stateful connection with open-ils.cstore +
+
+Perl +
+Services +See OpenSRF/src/perl/lib/OpenSRF/UnixServer.pm to understand how the +optional methods for initializing and cleaning up OpenSRF services +are invoked: + + + +initialize() + + + + +child_init() + + + + +child_exit() + + + +Services are implemented as Perl functions. Each service needs to be registered with: +__PACKAGE__->register_method( + method => 'method name', # + api_name => 'API name', # + api_level => 1, # + argc => # of args, # + signature => { # + desc => “Description”, + params => [ + { + name => 'parameter name', + desc => 'parameter description', + type => '(array|hash|number|string)' + } + ], + return => { + desc => 'Description of return value', + type => '(array|hash|number|string)' + } + } +); + + + +The method name is the name of the Perl method that is called when a +client invokes the corresponding OpenSRF method. + + + + +The API name is the OpenSRF method name. By convention, each API +uses the OpenSRF service name for its root, and then appends one or more +levels of names to the OpenSRF service name, depending on the complexity +of the service and the number of methods exposed by a given service. + + + + +The API level is always 1. + + + + +The number of arguments that can be passed to the OpenSRF method is +primarily for guidance purposes. + + + + +The signature is consumed by the various utilities (srfsh, docgen.xsl) +that generate documentation about the OpenSRF service. + + + +Note that arguments are converted between native data structures and JSON +for us for free. +
+
+Client cheat sheet +This is the simplest possible OpenSRF client written in Perl: + + + + +The OpenSRF::System module gives our program access to the core OpenSRF +client functionality. + + + + +The bootstrap_client() method reads the opensrf_core.xml file and sets +up communication with the OpenSRF router. + + + + +The OpenSRF::Appsession->create() instance method asks the router if it +can connect to the named service. If the router determines that the service +is accessible (either the opensrf credentials are on the private domain, which +gives it access to all public and private services; or the service is on a +public domain, which is accessible to both public and private opensrf +credentials), it returns an OpenSRF session with a connection to the named service. + + + + +The OpenSRF::Appsession->request() method invokes a method of the +associated service to return a request object. + + + + +The method name that you want to invoke is the first argument to request(). + + + + +The arguments to the method follow the method name. + + + + +Invoking the gather() method on the returned request object returns a +single result. + +If the service is expected to return multiple results, you should loop +over it with recv() instead. But then, that wouldn’t be the simplest +possible client anymore would it? + + + +The OpenSRF::Appsession->disconnect() instance method disconnects from +the service, enabling that child to go on and handle other requests. + + + +
+
+
+JavaScript +Historically, JavaScript has had access to OpenSRF methods via the +OpenSRF HTTP gateway Apache module. You can still see this in heavy use +in the OPAC and staff client as of Evergreen 1.6, but the approach has been +deprecated as it has significant performance problems with large responses. +The successor for the OpenSRF gateway is the OpenSRF-over-HTTP translator +Apache module, which supports streaming responses for improved performance +and better support for the broad range of OpenSRF attributes. +
+Invoking methods via the HTTP Translator +The following example demonstrates the basic approach to invoking +OpenSRF methods via JavaScript. It uses just three OpenSRF JavaScript +libraries to simplify calls to the OpenSRF-over-HTTP translator, +which became available to developers as part of the OpenSRF 1.0 / +Evergreen 1.4 releases. + + + + +opensrf.js defines most of the objects and methods required for a bare +JavaScript call to the OpenSRF HTTP translator. + + + + +opensrf_xhr.js provides cross-browser XMLHttpRequest support for OpenSRF. + + + + +JSON_v1.js converts the requests and responses between JavaScript and the +JSON format that the OpenSRF translator expects. + + + + +Create a client session that connects to the open-ils.resolver service. + + + + +Create a request object that identifies the target method and passes the +required method arguments. + + + + +Define the function that will be called when the request is sent and +results are returned from the OpenSRF HTTP translator. + + + + +Loop over the returned results using the recv() method. + + + + +The content of each result is accessible via the content() method of +each returned result. + + + + +open-ils.resolver.resolve_holdings returns a hash of values, so +invoking one of the hash keys (coverage) gives us access to that value. + + + + +Actually send the request to the method; the function defined by +req.oncomplete is invoked as the results are returned. + + + +
+
+
+
+Exercise +Build a new OpenSRF service. +
+Perl +The challenge: implement a service that caches responses from some +other Web service (potentially cutting down on client-side latency +for something like OpenLibrary / Google Books / xISBN services, and +avoiding timeouts if the target service is not dependable). Our +example will be to build an SFX lookup service. This has the +additional advantage of enabling XmlHttpRequest from JavaScript by +hosting the services on the same domain. +Let’s start with the simplest possible implementation – a CGI script. + +Hopefully you can follow what this CGI script is doing. It works, +but it has all the disadvantages of CGI: the environment needs to +be built up on every request, and it doesn’t remember anything +from the previous times it was called, etc. +
+Turning the CGI script into an OpenSRF service +So now we want to turn this into an OpenSRF service. + + + +Start by ripping out the CGI stuff, as we won’t need that any more. + + + + +To turn this into an OpenSRF service, we create a new +Perl module (OpenILS::Application::ResolverResolver). We no +longer have to convert results between Perl and JSON values, as +OpenSRF will handle that for us. We now have to register the +method with OpenSRF. + + + + + +Copy the file into the /openils/lib/perl5/OpenILS/Application/ directory +so that OpenSRF can find it in the @INC search path. + + + + +Add the service to opensrf.xml so it gets started with the +other Perl services on our host of choice: + +... +<open-ils.resolver> + <keepalive>3</keepalive> + <stateless>1</stateless> + <language>perl</language> + <implementation>OpenILS::Application::ResolverResolver</implementation> + <max_requests>17</max_requests> + <unix_config> + <unix_sock>open-ils.resolver_unix.sock</unix_sock> + <unix_pid>open-ils.resolver_unix.pid</unix_pid> + <max_requests>1000</max_requests> + <unix_log>open-ils.resolver_unix.log</unix_log> + <min_children>5</min_children> + <max_children>15</max_children> + <min_spare_children>3</min_spare_children> + <max_spare_children>5</max_spare_children> + </unix_config> + <app_settings> + <cache_timeout>86400</cache_timeout> + <default_url_base>http://sfx.scholarsportal.info/laurentian</default_url_base> + </app_settings> +</open-ils.resolver> +... +<!-- In the <hosts> section --> +<localhost> + ... + <appname>open-ils.resolver</appname> +</localhost> + + + +Add the service to opensrf_core.xml as a publicly exposed +service via the HTTP gateway and translator: + +... +<!-- In the public router section --> +<services> + ... + <service>open-ils.resolver</service> +</services> +... +<!-- In the public gateway section --> +<services> +<gateway> + ... + <services> + <service>open-ils.resolver</service> + </services> +</gateway> + + + +Restart the OpenSRF Perl services to refresh the OpenSRF +settings and start the service.. + + + + +Restart Apache to enable the gateway and translator to pick up +the new service. + + + +
+
+Add caching +To really make this service useful, we can take advantage of OpenSRF’s +built-in support for caching via memcached. Keeping the values +returned by the resolver for 1 week is apparently good. +We will also take advantage of the opensrf.settings service that +holds the values defined in the opensrf.xml configuration file to +supply some of our default arguments. +Caching OpenSRF Resolver Service + + +
+
+Pulling application settings from <literal>opensrf.xml</literal> +In case you missed it in the previous diff, we also started +pulling some application-specific settings from opensrf.xml +during the initialize() phase for the service. +In the following diff, we enable the service to pull the default URL from +opensrf.xml rather than hard-coding it into the OpenSRF service… because +that’s just the right thing to do. +=== modified file 'ResolverResolver.pm' +--- ResolverResolver.pm 2009-10-22 21:00:15 +0000 ++++ ResolverResolver.pm 2009-10-24 03:00:30 +0000 +@@ -77,6 +77,7 @@ + my $prefix = "open-ils.resolver_"; # Prefix for caching values + my $cache; + my $cache_timeout; ++my $default_url_base; # Default resolver location + + our ($ua, $parser); + +@@ -86,6 +87,8 @@ + my $sclient = OpenSRF::Utils::SettingsClient->new(); + $cache_timeout = $sclient->config_value( + "apps", "open-ils.resolver", "app_settings", "cache_timeout" ) || 300; ++ $default_url_base = $sclient->config_value( ++ "apps", "open-ils.resolver", "app_settings", "default_url_base"); + } + + sub child_init { +@@ -102,14 +105,11 @@ + my $conn = shift; + my $id_type = shift; # keep it simple for now, either 'issn' or 'isbn' + my $id_value = shift; # the normalized ISSN or ISBN ++ my $url_base = shift || $default_url_base; + + # We'll use this in our cache key + my $method = "open-ils.resolver.resolve_holdings"; + +- # For now we'll pass the argument with a hard-coded default +- # Should pull these specifics from the database as part of initialize() +- my $url_base = shift || 'http://sfx.scholarsportal.info/laurentian'; +- + # Big ugly SFX OpenURL request + my $url_args = '?url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&' + . 'ctx_enc=UTF-8&ctx_ver=Z39.88-2004&rfr_id=info:sid/conifer&' +The opensrf.settings service caches the settings defined in opensrf.xml, +so if you change a setting in the configuration files and want that change +to take effect immediately, you have to: + + + +Restart the opensrf.settings service to refresh the cached settings. + + + + +Restart the affected service to make the new settings take effect. + + + +Next step: add org_unit settings for resolver type and URL on a per-org_unit basis. +OrgUnit settings can be retrieved via +OpenILS::Application::AppUtils->ou_ancestor_setting_value($org_id, $setting_name)). +This is where we step beyond OpenSRF and start getting into the +Evergreen database schema (config.org_unit_setting table). +
+
+
+Further reading +OpenSRF terminology: http://open-ils.org/dokuwiki/doku.php?id=osrf-devel:terms +
+
+
+Part 2: Evergreen applications +
+Authentication +Although many services offer methods that can be invoked without +authentication, some methods require authentication in Evergreen. +Evergreen’s authentication framework returns an authentication token +when a user has successfully logged in to represent that user +session. You can then pass the authentication token to various +methods to ensure, for example, that the requesting user has permission +to access the circulation information attached to a particular account, +or has been granted the necessary permissions at a particular library +to perform the action that they are requesting. +Authentication in Evergreen is performed with the assistance of the +open-ils.auth service, which has been written in C for performance +reasons because it is invoked so frequently. A successful authentication +request requires two steps: + + + +Retrieve an authentication seed value by invoking the +open-ils.auth.authenticate.init method, passing the user name as +the only argument. As long as the user name contains no spaces, the +method returns a seed value calculated by the MD5 checksum of +a string composed of the concatenation of the time() system call, +process ID, and user name. + + + + +Retrieve an authentication token by invoking the +open-ils.auth.authenticate.complete method, passing +a JSON hash composed of a minimum of the following arguments +(where seed represents the value returned by the +open-ils.auth.authenticate.init method): + +{ + "username": username, // or "barcode": barcode, + "password": md5sum(seed + md5sum(password)), +} + + +open-ils.auth.authenticate.complete also accepts the following +additional arguments: + + + +type: one of "staff" (default), "opac", or "temp" + + + + +org: the numeric ID of the org_unit where the login is active + + + + +workstation: the registered workstation name + + + +
+Authentication in Perl +The following example is taken directly from OpenILS::WWW::Proxy: +sub oils_login { + my( $username, $password, $type ) = @_; + + $type |= "staff"; + my $nametype = 'username'; + $nametype = 'barcode' if ($username =~ /^\d+$/o); + + my $seed = OpenSRF::AppSession + ->create("open-ils.auth") + ->request( 'open-ils.auth.authenticate.init', $username ) + ->gather(1); + + return undef unless $seed; + + my $response = OpenSRF::AppSession + ->create("open-ils.auth") + ->request( 'open-ils.auth.authenticate.complete', { + $nametype => $username, + password => md5_hex($seed . md5_hex($password)), + type => $type + }) + ->gather(1); + + return undef unless $response; + + return $response->{payload}->{authtoken}; +} +
+
+Authentication in JavaScript +The following example provides a minimal implementation of the authentication +method in JavaScript. For a more complete implementation, you would +differentiate between user names and barcodes, potentially accept the +org_unit and workstation name for more granular permissions, and provide +exception handling. + + + + +opensrf.js defines most of the objects and methods required for a bare +JavaScript call to the OpenSRF HTTP translator. + + + + +opensrf_xhr.js provides cross-browser XMLHttpRequest support for OpenSRF. + + + + +JSON_v1.js converts the requests and responses between JavaScript and the +JSON format that the OpenSRF translator expects. + + + + +md5.js provides the implementation of the md5sum algorithm in the +hex_md5 function + + + + +Create a client session that connects to the open-ils.auth service. + + + + +Create a request object that invokes the open-ils.auth.authenticate.init +method, providing the user name as the salt. + + + + +Set the timeout property on the request object to make it a +synchronous call. + + + + +Send the request. The method returns a seed value which is assigned to +the seed variable. + + + + +Create the hash of parameters that will be sent in the request to the +open-ils.auth.authenticate.complete method, including the password and +authentication type. + + + + +Assume that the credentials being sent are based on the user name rather +than the barcode. The Perl implementation tests the value of the user name +variable to determine whether it contains a digit; if it does contain a +digit, then it is considered a barcode rather than a user name. Ensure that +your implementations are consistent! + + + + +Create a request object that invokes the +open-ils.auth.authenticate.complete method, passing the entire hash of +parameters. Once again, set the timeout parameter to make the request +synchronous. + + + + +Assign the authtoken attribute of the returned payload to the +authtoken variable. + + + +
+
+
+
+Evergreen data models and access +
+Database schema +The database schema is tied pretty tightly to PostgreSQL. Although PostgreSQL +adheres closely to ANSI SQL standards, the use of schemas, SQL functions +implemented in both plpgsql and plperl, and PostgreSQL’s native full-text +search would make it… challenging… to port to other database platforms. +A few common PostgreSQL interfaces for poking around the schema and +manipulating data are: + + + +psql (the command line client) + + + + +pgadminIII (a GUI client). + + + +Or you can read through the source files in Open-ILS/src/sql/Pg. +Let’s take a quick tour through the schemas, pointing out some highlights +and some key interdependencies: + + + +actor.org_unit → asset.copy_location + + + + +actor.usr → actor.card + + + + +biblio.record_entry → asset.call_number → asset.copy + + + + +config.metabib_field → metabib.*_field_entry + + + +
+
+Database access methods +You could use direct access to the database via Perl DBI, JDBC, etc, +but Evergreen offers several database CRUD services for +creating / retrieving / updating / deleting data. These avoid tying +you too tightly to the current database schema and they funnel database +access through the same mechanism, rather than tying up connections +with other interfaces. +
+
+Evergreen Interface Definition Language (IDL) +Defines properties and required permissions for Evergreen classes. +To reduce network overhead, a given object is identified via a +class-hint and serialized as a JSON array of properties (no named properties). +As of 1.6, fields will be serialized in the order in which they appear +in the IDL definition file, and the is_new / is_changed / is_deleted +properties are automatically added. This has greatly reduced the size of +the fm_IDL.xml file and makes DRY people happier :) + + + +… oils_persist:readonly tells us, if true, that the data lives in the database, but is pulled from the SELECT statement defined in the <oils_persist:source_definition> child element + + + +
+IDL basic example (config.language_map) +<class id="clm" controller="open-ils.cstore open-ils.pcrud" + oils_obj:fieldmapper="config::language_map" + oils_persist:tablename="config.language_map" + reporter:label="Language Map" oils_persist:field_safe="true"> # + <fields oils_persist:primary="code" oils_persist:sequence=""> # + <field reporter:label="Language Code" name="code" + reporter:selector="value" reporter:datatype="text"/> # + <field reporter:label="Language" name="value" + reporter:datatype="text" oils_persist:i18n="true"/> # + </fields> + <links/> + <permacrud xmlns="http://open-ils.org/spec/opensrf/IDL/permacrud/v1"> # + <actions> + <create global_required="true" permission="CREATE_MARC_CODE"> # + <retrieve global_required="true" + permission="CREATE_MARC_CODE UPDATE_MARC_CODE DELETE_MARC_CODE"> + <update global_required="true" permission="UPDATE_MARC_CODE"> + <delete global_required="true" permission="DELETE_MARC_CODE"> + </actions> + </permacrud> +</class> + + + +The class element defines the attributes and permissions for classes, +and relationships between classes. + + + + +The id attribute on the class element defines the class hint that is +used everywhere in Evergreen. + + + + +The controller attribute defines the OpenSRF +services that provide access to the data for the class objects. + + + + + + +The oils_obj::fieldmapper attribute defines the name of the class that +is generated by OpenILS::Utils::Fieldmapper. + + + + +The oils_persist:tablename attribute defines the name of the table +that contains the data for the class objects. + + + + +The reporter interface uses reporter:label attribute values in +the source list to provide meaningful class and attribute names. The +open-ils.fielder service generates a set of methods that provide direct +access to the classes for which oils_persist:field_safe is true. For +example, + +srfsh# request open-ils.fielder open-ils.fielder.clm.atomic \ + {"query":{"code":{"=":"eng"}}} + +Received Data: [ + { + "value":"English", + "code":"eng" + } +] + + + +The fields element defines the list of fields for the class. + + + + +The oils_persist:primary attribute defines the column that acts as +the primary key for the table. + + + + +The oils_persist:sequence attribute holds the name of the database +sequence. + + + + + + +Each field element defines one property of the class. + + + + +The name attribute defines the getter/setter method name for the field. + + + + +The reporter:label attribute defines the attribute name as used in +the reporter interface. + + + + +The reporter:selector attribute defines the field used in the reporter +filter interface to provide a selectable list. This gives the user a more +meaningful access point than the raw numeric ID or abstract code. + + + + +The reporter:datatype attribute defines the type of data held by +this property for the purposes of the reporter. + + + + + + +The oils_persist:i18n attribute, when true, means that +translated values for the field’s contents may be accessible in +different locales. + + + + +The permacrud element defines the permissions (if any) required +to create, retrieve, update, and delete data for this +class. open-ils.permacrud must be defined as a controller for the class +for the permissions to be applied. + + + + +Each action requires one or more permission values that the +user must possess to perform the action. + + + + +If the global_required attribute is true, then the user must +have been granted that permission globally (depth = 0) to perform +the action. + + + + +The context_field attribute denotes the <field> that identifies +the org_unit at which the user must have the pertinent permission. + + + + +An action element may contain a <context_field> element that +defines the linked class (identified by the link attribute) and +the field in the linked class that identifies the org_unit where +the permission must be held. + + + + +If the <context_field> element contains a jump attribute, +then it defines a link to a link to a class with a field identifying +the org_unit where the permission must be held. + + + + + + + +
+
+Reporter data types and their possible values + + + +bool: Boolean true or false + + + + +id: ID of the row in the database + + + + +int: integer value + + + + +interval: PostgreSQL time interval + + + + + +link: link to another class, as defined in the <links> +element of the class definition + + + + +money: currency amount + + + + +org_unit: list of org_units + + + + +text: text value + + + + +timestamp: PostgreSQL timestamp + + + +
+
+IDL example with linked fields (actor.workstation) +Just as tables often include columns with foreign keys that point +to values stored in the column of a different table, IDL classes +can contain fields that link to fields in other classes. The <links> +element defines which fields link to fields in other classes, and +the nature of the relationship: +<class id="aws" controller="open-ils.cstore" + oils_obj:fieldmapper="actor::workstation" + oils_persist:tablename="actor.workstation" + reporter:label="Workstation"> + <fields oils_persist:primary="id" + oils_persist:sequence="actor.workstation_id_seq"> + <field reporter:label="Workstation ID" name="id" + reporter:datatype="id"/> + <field reporter:label="Workstation Name" name="name" + reporter:datatype="text"/> + <field reporter:label="Owning Library" name="owning_lib" + reporter:datatype="org_unit"/> + <field reporter:label="Circulations" name="circulations" + oils_persist:virtual="true" reporter:datatype="link"/> # + </fields> + <links> # + <link field="owning_lib" reltype="has_a" key="id" + map="" class="aou"/> # + <link field="circulations" reltype="has_many" key="workstation" + map="" class="circ"/> + <link field="circulation_checkins" reltype="has_many" + key="checkin_workstation" map="" class="circ"/> + </links> +</class> + + + +This field includes an oils_persist:virtual attribute with the value of +true, meaning that the linked class circ is a virtual class. + + + + +The <links> element contains 0 or more <link> elements. + + + + +Each <link> element defines the field (field) that links to a different +class (class), the relationship (rel_type) between this field and the target +field (key). If the field in this class links to a virtual class, the (map) +attribute defines the field in the target class that returns a list of matching +objects for each object in this class. + + + +
+
+
+<literal>open-ils.cstore</literal> data access interfaces +For each class documented in the IDL, the open-ils.cstore service +automatically generates a set of data access methods, based on the +oils_persist:tablename class attribute. +For example, for the class hint clm, cstore generates the following +methods with the config.language_map qualifer: + + + +open-ils.cstore.direct.config.language_map.id_list {"code" { "like": "e%" } } + +Retrieves a list composed only of the IDs that match the query. + + + +open-ils.cstore.direct.config.language_map.retrieve "eng" + +Retrieves the object that matches a specific ID. + + + +open-ils.cstore.direct.config.language_map.search {"code" : "eng"} + +Retrieves a list of objects that match the query. + + + +open-ils.cstore.direct.config.language_map.create <_object_> + +Creates a new object from the passed in object. + + + +open-ils.cstore.direct.config.language_map.update <_object_> + +Updates the object that has been passed in. + + + +open-ils.cstore.direct.config.language_map.delete "eng" + +Deletes the object that matches the query. + + +
+
+open-ils.pcrud data access interfaces +For each class documented in the IDL, the open-ils.pcrud service +automatically generates a set of data access methods, based on the +oils_persist:tablename class attribute. +For example, for the class hint clm, open-ils.pcrud generates the following +methods that parallel the open-ils.cstore interface: + + + +open-ils.pcrud.id_list.clm <_authtoken_>, { "code": { "like": "e%" } } + + + + +open-ils.pcrud.retrieve.clm <_authtoken_>, "eng" + + + + +open-ils.pcrud.search.clm <_authtoken_>, { "code": "eng" } + + + + +open-ils.pcrud.create.clm <_authtoken_>, <_object_> + + + + +open-ils.pcrud.update.clm <_authtoken_>, <_object_> + + + + +open-ils.pcrud.delete.clm <_authtoken_>, "eng" + + + +
+
+Transaction and savepoint control +Both open-ils.cstore and open-ils.pcrud enable you to control database transactions +to ensure that a set of operations either all succeed, or all fail, +atomically: + + + +open-ils.cstore.transaction.begin + + + + +open-ils.cstore.transaction.commit + + + + +open-ils.cstore.transaction.rollback + + + + +open-ils.pcrud.transaction.begin + + + + +open-ils.pcrud.transaction.commit + + + + +open-ils.pcrud.transaction.rollback + + + +At a more granular level, open-ils.cstore and open-ils.pcrud enable you to set database +savepoints to ensure that a set of operations either all succeed, or all +fail, atomically, within a given transaction: + + + +open-ils.cstore.savepoint.begin + + + + +open-ils.cstore.savepoint.commit + + + + +open-ils.cstore.savepoint.rollback + + + + +open-ils.pcrud.savepoint.begin + + + + +open-ils.pcrud.savepoint.commit + + + + +open-ils.pcrud.savepoint.rollback + + + +Transactions and savepoints must be performed within a stateful +connection to the open-ils.cstore and open-ils.pcrud services. +In srfsh, you can open a stateful connection using the open +command, and then close the stateful connection using the close +command - for example: +srfsh# open open-ils.cstore +... perform various transaction-related work +srfsh# close open-ils.cstore +
+JSON Queries +Beyond simply retrieving objects by their ID using the \*.retrieve +methods, you can issue queries against the \*.delete and \*.search +methods using JSON to filter results with simple or complex search +conditions. +For example, to generate a list of barcodes that are held in a +copy location that allows holds and is visible in the OPAC: +srfsh# request open-ils.cstore open-ils.cstore.json_query #\ + {"select": {"acp":["barcode"], "acpl":["name"]}, #\ + "from": {"acp":"acpl"}, #\ + "where": [ #\ + {"+acpl": "holdable"}, #\ + {"+acpl": "opac_visible"} #\ + ]} + +Received Data: { + "barcode":"BARCODE1", + "name":"Stacks" +} + +Received Data: { + "barcode":"BARCODE2", + "name":"Stacks" +} + + + +Invoke the json_query service. + + + + +Select the barcode field from the acp class and the name +field from the acpl class. + + + + +Join the acp class to the acpl class based on the linked field +defined in the IDL. + + + + +Add a where clause to filter the results. We have more than one +condition beginning with the same key, so we wrap the conditions inside +an array. + + + + +The first condition tests whether the boolean value of the holdable +field on the acpl class is true. + + + + +The second condition tests whether the boolean value of the +opac_visible field on the acpl class is true. + + + +For thorough coverage of the breadth of support offered by JSON +query syntax, see JSON Queries: A Tutorial. +
+
+Fleshing linked objects +A simplistic approach to retrieving a set of objects that are linked to +an object that you are retrieving - for example, a set of call numbers +linked to the barcodes that a given user has borrowed - would be to: + 1. Retrieve the list of circulation objects (circ class) +for a given user (usr class). + 2. For each circulation object, look up the target copy (target_copy +field, linked to the acp class). + 3. For each copy, look up the call number for that copy (call_number +field, linked to the acn class). +However, this would result in potentially hundreds of round-trip +queries from the client to the server. Even with low-latency connections, +the network overhead would be considerable. So, built into the open-ils.cstore and +open-ils.pcrud access methods is the ability to flesh linked fields - +that is, rather than return an identifier to a given linked field, +the method can return the entire object as part of the initial response. +Most of the interfaces that return class instances from the IDL offer the +ability to flesh returned fields. For example, the +open-ils.cstore.direct.\*.retrieve methods allow you to specify a +JSON structure defining the fields you wish to flesh in the returned object. +Fleshing fields in objects returned by <literal>open-ils.cstore</literal> +srfsh# request open-ils.cstore open-ils.cstore.direct.asset.copy.retrieve 1, \ + { + "flesh": 1, #\ + "flesh_fields": { #\ + "acp": ["location"] + } + } + + + + +The flesh argument is the depth at which objects should be fleshed. +For example, to flesh out a field that links to another object that includes +a field that links to another object, you would specify a depth of 2. + + + + +The flesh_fields argument contains a list of objects with the fields +to flesh for each object. + + + +Let’s flesh things a little deeper. In addition to the copy location, +let’s also flesh the call number attached to the copy, and then flesh +the bibliographic record attached to the call number. +Fleshing fields in fields of objects returned by <literal>open-ils.cstore</literal> +request open-ils.cstore open-ils.cstore.direct.asset.copy.retrieve 1, \ + { + "flesh": 2, + "flesh_fields": { + "acp": ["location", "call_number"], + "acn": ["record"] + } + } + +
+
+
+Adding an IDL entry for ResolverResolver +Most OpenSRF methods in Evergreen define their object interface in the +IDL. Without an entry in the IDL, the prospective caller of a given +method is forced to either call the method and inspect the returned +contents, or read the source to work out the structure of the JSON +payload. At this stage of the tutorial, we have not defined an entry +in the IDL to represent the object returned by the +open-ils.resolver.resolve_holdings method. It is time to complete +that task. +The open-ils.resolver service is unlike many of the other classes +defined in the IDL because its data is not stored in the Evergreen +database. Instead, the data is requested from an external Web service +and only temporarily cached in memcached. Fortunately, the IDL +enables us to represent this kind of class by setting the +oils_persist:virtual class attribute to true. +So, let’s add an entry to the IDL for the open-ils.resolver.resolve_holdings +service: + +And let’s make ResolverResolver.pm return an array composed of our new +rhr classes rather than raw JSON objects: + +Once we add the new entry to the IDL and copy the revised ResolverResolver.pm +Perl module to /openils/lib/perl5/OpenILS/Application/, we need to: + + + +Copy the updated IDL to both the /openils/conf/ and +/openils/var/web/reports/ directories. The Dojo approach to +parsing the IDL uses the IDL stored in the reports directory. + + + + +Restart the Perl services to make the new IDL visible to the services +and refresh the open-ils.resolver implementation + + + + +Rerun /openils/bin/autogen.sh to regenerate the JavaScript versions +of the IDL required by the HTTP translator and gateway. + + + +We also need to adjust our JavaScript client to use the nifty new +objects that open-ils.resolver.resolve_holdings now returns. +The best approach is to use the support in Evergreen’s Dojo extensions +to generate the JavaScript classes directly from the IDL XML file. +Accessing classes defined in the IDL via Fieldmapper + + + + + +Load the Dojo core. + + + + +fieldmapper.AutoIDL reads /openils/var/reports/fm_IDL.xml to +generate a list of class properties. + + + + +fieldmapper.dojoData seems to provide a store for Evergreen data +accessed via Dojo. + + + + +fieldmapper.Fieldmapper converts the list of class properties into +actual classes. + + + + +fieldmapper.standardRequest invokes an OpenSRF method and returns +an array of objects. + + + + +The first argument to fieldmapper.standardRequest is an array +containing the OpenSRF service name and method name. + + + + +The second argument to fieldmapper.standardRequest is an array +containing the arguments to pass to the OpenSRF method. + + + + +As Fieldmapper has instantiated the returned objects based on their +class hints, we can invoke getter/setter methods on the objects. + + + +
+
+
+License +This work is licensed under a Creative Commons Attribution-Share Alike 2.5 Canada License. +
+
diff --git a/1.6/media/10.png b/1.6/media/10.png new file mode 100644 index 0000000000000000000000000000000000000000..997bbc8246a316e040e0804174ba260e219d7d33 GIT binary patch literal 361 zcmeAS@N?(olHy`uVBq!ia0vp^JRr;gBp8b2n5}^nQWtZ~+OvdJMW|Y+^UT?O-M{rKJsmzxdayJ{ zDCQA!%%@7Jj$q%-wf8e0_jRx8Dqi$}^?K=?6FriQFLv>>oc^CE+aVHhW3=nZ+fQ4!M=ZC7H>3sl|FJr3LwU zC3?yExf6FO?f@F61vV}-Juk7O6lk8Yg;}bFaZ-|HQc7Azopr01?u8M*si- literal 0 HcmV?d00001 diff --git a/1.6/media/11.png b/1.6/media/11.png new file mode 100644 index 0000000000000000000000000000000000000000..ce47dac3f52ac49017749a3fea53db57d006993c GIT binary patch literal 565 zcmeAS@N?(olHy`uVBq!ia0vp^JRr=$1SD^YpWXnZI14-?iy0V%N{XE z)7O>#600DeuDZ?5tOl@ql94%{~0TwC?8m~C^ZqJRG}m@H-L1 z5L@scq?{XUcxG{OP9jig5ySQaTl#^*93bKF#G<^+ymW>G($Cs~V(bw8rA5i93}62@ zzlJGu&d<$F%`0K}c4pdspcorSSx9C{PAbEScbC)|7#JBmT^vIy=9KoYUDZ+`aP)jU z&ny=ErrK^#Gw!AcR}pdfMERuV^@&0$@(#^6b8c@rn^6RWX3pUb z4*6@PZ+H0#u=rjsXzS?6n6*sBGbHqGTU%mCsH?n#%j;eD^2}qe=iX*J@VQ3BRpz+u z{PX#N(^9X${`$90+;!pWs>o@z_n8G)7Uo7PJz`jrS+)QE@=PWHmc~UIw=WmUe73o7 z>^bR(M752aYoNg~ozu7U7&{(U>{s!;bn#f?ItjL^o`e{*EOQHqO;ccnz9hLK5@2cAyw@AaPFL~Cp#02|E|4xeQteNtB7waMs QVCXP-y85}Sb4q9e0GRUFb^rhX literal 0 HcmV?d00001 diff --git a/1.6/media/12.png b/1.6/media/12.png new file mode 100644 index 0000000000000000000000000000000000000000..31daf4e2f25b6712499ee32de9c2e3b050b691ca GIT binary patch literal 617 zcmeAS@N?(olHy`uVBq!ia0vp^JRr=$1SD^YpWXnZI14-?iy0V%N{XE z)7O>#600De9$%>2LVd81Yeb1-X-P(Y5yQ%LXFPyHJS9LOm(=3qqRfJl%=|nCVNOM5 zpg0#u+&RCXvM4h>ql94%{~0TwC?8m~C^ZqJRG}m@H-L1 z5L@scq?{XUcxG{OP9jig5ySQaTl#^*93bKF#G<^+ymW>G($Cs~V(bw8rA5i93}62@ zzlJGu&d<$F%`0K}c4pdspcorSSx9C{PAbEScbC)|7#JBmT^vIy=Cn>wTzx1(qV@bS z0hYvspf(--lM>otrqbK$7p{3DzJ|+KN8%5ows)AI?zWk_n>jwEHXrTJecpEW_0xL= z?}N`*R`T~d2{AN${y8T#GEn4hUb&52^}Op@TW4{oc)A6)%$5=G}h# z?O{QLj@aRcAIf&y&OiUN=H2gq=_}V|pWfuReDV|{jwXw~>#w)I|9${XE z)7O>#600Dep5bGK9wD%hYeb1-X-P(Y5yQ%LXFPyHJS9LOm(=3qqRfJl%=|nCVNOM5 zpg0#u+&RCXvM4h>ql94%{~0TwC?8m~C^ZqJRG}m@H-L1 z5L@scq?{XUcxG{OP9jig5ySQaTl#^*93bKF#G<^+ymW>G($Cs~V(bw8rA5i93}62@ zzlJGu&d<$F%`0K}c4pdspcorSSx9C{PAbEScbC)|7#JBmT^vIy=Cn>w>~AWNX^a2R zbkveVY|45D7UnZ&JtjPwvdCCscZp0EA*0()#GOw)UH4-^&)y^E*4%UC)*|J}q_Ss;tN`nd8$>x9$_Xb^O2EpX&@C ZI46EzbLxq-voTO7gQu&X%Q~loCIF_C`w;*D literal 0 HcmV?d00001 diff --git a/1.6/media/14.png b/1.6/media/14.png new file mode 100644 index 0000000000000000000000000000000000000000..64014b75fe2e84d45ed861974c72462727979360 GIT binary patch literal 411 zcmV;M0c8G(P)!ax*-PXaQ9e~6^e1gu=a6a&KSz}bR`+prYG9ayB$BDjWGfIE;t#wl!+ zR3S(jA%y#i_@eOOedXoc%RQe%L;wH~k+s%ZI~)!<=dD%?4MaplaU9QPGski2q3`>r z(}{j@0a$CLl+)={2vLWml*i-oa5#J}DW$gCZB~Z!(!M#)2St|1_V^0qpmCrBof=Y&NUas@LmfSw=)4B4f;8Fu)(eFsv24 zJzXxBrayquXcR?J{XE z)7O>#600De0j~t#c`vY#Yeb1-X-P(Y5yQ%LXFPyHJS9LOm(=3qqRfJl%=|nCVNOM5 zpg0#u+&RCXvM4h>ql94%{~0TwC?8m~C^ZqJRG}m@H-L1 z5L@scq?{XUcxG{OP9jig5ySQaTl#^*93bKF#G<^+ymW>G($Cs~V(bw8rA5i93}62@ zzlJGu&d<$F%`0K}c4pdspcorSSx9C{PAbEScbC)|7#JBmT^vIy=9Eq_Jl&Ka(%QdX zh{H8O%#_7)Tc@t$mM`p4(Ne7omR*~(>gd8_8AZH{=3ms$Fmzm^yL@_+(#aQQ5>7QW z>3g2fIsH(ugM)!V$x4Rr_+!J_XU%4xbz0aE;^N{m@42Z|@0S@TQ=WbP`TMV5Ok;<| z^Ihv+@6tQ{sciRF9dD7Nr=KobwJJ68zJK$<1Pd9rz%4O)*;}Jzj&~nTGMecz>B%lV zK|`fmIc8mp-h8iSXiGFW=C(L+XH4DRxZQX87^-dLuD>odo6YLT@Sw)dfBEIG)v2@6 zR)%mL7GRj1x-&v&+2q@A%a&h0`Lw7|#(w_!tgT!PoJ|+re`l - Administration + Reports @@ -85,12 +85,15 @@ Development - + + + Appendices + -- 2.43.2