Scrub smartquotes / emdashes from JSONGrammar.xml as well
[Evergreen.git] / docs / TechRef / JSONGrammar.xml
1 <?xml version="1.0" encoding="utf-8"?>
2
3 <sect1 version="5.0" xmlns="http://docbook.org/ns/docbook"
4         xmlns:xi="http://www.w3.org/2003/XInclude" 
5         xmlns:xlink="http://www.w3.org/1999/xlink">
6
7         <title>Grammar of JSON Queries</title>
8
9         <para>
10                 <author>
11                         <personname>
12                                 <firstname>Scott</firstname>
13                                 <surname>McKellar</surname>
14                         </personname>
15                         <affiliation>
16                                 <orgname>Equinox Software, Inc.</orgname>
17                         </affiliation>
18                 </author>
19         </para>
20
21
22         <sect2>
23                 <title>Introduction</title>
24                 <para> The format of this grammar approximates Extended Backus-Naur notation. However it is
25                         intended as input to human beings, not to parser generators such as Lex or Yacc. Do not
26                         expect formal rigor. Sometimes narrative text will explain things that are clumsy to
27                         express in formal notation. More often, the text will restate or summarize the formal
28                         productions. </para>
29                 <para> Conventions: </para>
30                 <orderedlist>
31                         <listitem>
32                                 <para>The grammar is a series of productions.</para>
33                         </listitem>
34                         <listitem>
35                                 <para>A production consists of a name, followed by "::=", followed by a definition
36                                         for the name. The name identifies a grammatical construct that can appear on the
37                                         right side of another production.</para>
38                         </listitem>
39                         <listitem>
40                                 <para>Literals (including punctuation) are enclosed in 'single quotes', or in
41                                         "double quotes" if case is not significant.</para>
42                         </listitem>
43                         <listitem>
44                                 <para>A single quotation mark within a literal is escaped with a preceding
45                                         backslash: 'dog\'s tail'.</para>
46                         </listitem>
47                         <listitem>
48                                 <para>If a construct can be defined more than one way, then the alternatives may
49                                         appear in separate productions; or, they may appear in the same production,
50                                         separated by pipe symbols. The choice between these representations is of only
51                                         cosmetic significance.</para>
52                         </listitem>
53                         <listitem>
54                                 <para>A construct enclosed within square brackets is optional.</para>
55                         </listitem>
56                         <listitem>
57                                 <para>A construct enclosed within curly braces may be repeated zero or more
58                                         times.</para>
59                         </listitem>
60                         <listitem>
61                                 <para>JSON allows arbitrary white space between tokens. To avoid ugly clutter, this
62                                         grammar ignores the optional white space. </para>
63                         </listitem>
64                         <listitem>
65                                 <para>In many cases a production defines a JSON object, i.e. a list of name-value
66                                         pairs, separated by commas. Since the order of these name/value pairs is not
67                                         significant, the grammar will not try to show all the possible sequences. In
68                                         general it will present the required pairs first, if any, followed by any
69                                         optional elements.</para>
70                         </listitem>
71                 </orderedlist>
72
73                 <para> Since both EBNF and JSON use curly braces and square brackets, pay close attention to
74                         whether these characters are in single quotes. If they're in single quotes, they are
75                         literal elements of the JSON notation. Otherwise they are elements of the EBNF notation.
76                 </para>
77         </sect2>
78
79         <sect2>
80                 <title>Primitives</title>
81                 <para> We'll start by defining some primitives, to get them out of the way. They're mostly
82                         just what you would expect. </para>
83
84                 <productionset>
85                         <production xml:id="ebnf.string">
86                                 <lhs> string </lhs>
87                                 <rhs> '"' chars '"' </rhs>
88                         </production>
89
90                         <production xml:id="ebnf.chars">
91                                 <lhs> chars </lhs>
92                                 <rhs> any valid sequence of UTF-8 characters, with certain special characters
93                                         escaped according to JSON rules </rhs>
94                         </production>
95
96                         <production xml:id="ebnf.integer_literal">
97                                 <lhs> integer_literal </lhs>
98                                 <rhs> [ sign ] digit { digit } </rhs>
99                         </production>
100
101                         <production xml:id="ebnf.sign">
102                                 <lhs> sign </lhs>
103                                 <rhs> '+' | '-' </rhs>
104                         </production>
105
106                         <production xml:id="ebnf.digit">
107                                 <lhs> digit </lhs>
108                                 <rhs>digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'</rhs>
109                         </production>
110
111                         <production xml:id="ebnf.integer_string">
112                                 <lhs> integer_string </lhs>
113                                 <rhs> '"' integer_literal '"' </rhs>
114                         </production>
115
116                         <production xml:id="ebnf.integer">
117                                 <lhs> integer </lhs>
118                                 <rhs> integer_literal | integer_string </rhs>
119                         </production>
120
121                         <production xml:id="ebnf.number">
122                                 <lhs> number </lhs>
123                                 <rhs> any valid character sequence that is numeric according to JSON rules </rhs>
124                         </production>
125
126                 </productionset>
127
128                 <para> When json_query requires an integral value, it will usually accept a quoted string
129                         and convert it to an integer by brute force -- to zero if necessary. Likewise it may
130                         truncate a floating point number to an integral value. Scientific notation will be
131                         accepted but may not give the intended results. </para>
132
133                 <productionset>
134
135                         <production xml:id="ebnf.boolean">
136                                 <lhs> boolean </lhs>
137                                 <rhs> 'true' | 'false' | string | number </rhs>
138                         </production>
139
140                 </productionset>
141
142                 <para> The preferred way to encode a boolean is with the JSON reserved word true or false,
143                         in lower case without quotation marks. The string <literal>true</literal>, in upper,
144                         lower, or mixed case, is another way to encode true. Any other string evaluates to
145                         false. </para>
146                 <para> As an accommodation to perl, numbers may be used as booleans. A numeric value of 1
147                         means true, and any other numeric value means false. </para>
148                 <para> Any other valid JSON value, such as an array, will be accepted as a boolean but
149                         interpreted as false. </para>
150                 <para> The last couple of primitives aren't really very primitive, but we introduce them
151                         here for convenience: </para>
152
153                 <productionset>
154
155                         <production xml:id="ebnf.class_name">
156                                 <lhs> class_name </lhs>
157                                 <rhs> string </rhs>
158                         </production>
159
160                 </productionset>
161
162                 <para> A class_name is a special case of a string: the name of a class as defined by the
163                         IDL. The class may refer either to a database table or to a source_definition, which is
164                         a subquery. </para>
165
166                 <productionset>
167
168                         <production xml:id="ebnf.field_name">
169                                 <lhs> field_name </lhs>
170                                 <rhs> string </rhs>
171                         </production>
172
173                 </productionset>
174
175                 <para> A field_name is another special case of a string: the name of a non-virtual field as
176                         defined by the IDL. A field_name is also a column name for the table corresponding to
177                         the relevant class. </para>
178
179         </sect2>
180
181         <sect2>
182                 <title>Query</title>
183
184                 <para> The following production applies not only to the main query but also to most
185                         subqueries. </para>
186
187                 <productionset>
188
189                         <production xml:id="ebnf.query">
190                                 <lhs> query </lhs>
191                                 <rhs> '{'<sbr/> '"from"' ':' from_list<sbr/> [ ',' '"select"' ':' select_list
192                                         ]<sbr/> [ ',' '"where"' ':' where_condition ]<sbr/> [ ',' '"having"' ':'
193                                         where_condition ]<sbr/> [ ',' '"order_by"' ':' order_by_list ]<sbr/> [ ','
194                                         '"limit"' ':' integer ]<sbr/> [ ',' '"offset"' ':' integer ]<sbr/> [ ','
195                                         '"distinct"' ':' boolean ]<sbr/> [ ',' '"no_i18n"' ':' boolean ]<sbr/> '}'
196                                 </rhs>
197                         </production>
198
199                 </productionset>
200
201                 <para> Except for the <literal>"distinct"</literal> and <literal>no_i18n</literal> entries,
202                         each name/value pair represents a major clause of the SELECT statement. The name/value
203                         pairs may appear in any order. </para>
204                 <para> There is no name/value pair for the GROUP BY clause, because json_query generates it
205                         automatically according to information encoded elsewhere. </para>
206                 <para> The <literal>"distinct"</literal> entry, if present and true, tells json_query that
207                         it may have to create a GROUP BY clause. If not present, it defaults to false. </para>
208                 <para> The <literal>"no_i18n"</literal> entry, if present and true, tells json_query to
209                         suppress internationalization. If not present, it defaults to false. (Note that
210                                 <literal>"no_i18n"</literal> contains the digit one, not the letter ell.) </para>
211                 <para> The values for <literal>limit</literal> and <literal>offset</literal> provide the
212                         arguments of the LIMIT and OFFSET clauses, respectively, of the SQL statement. Each
213                         value should be non-negative, if present, or else the SQL won't work. </para>
214
215         </sect2>
216
217         <sect2><title>FROM Clause</title>
218                 <para>
219                         The object identified by <literal>"from"</literal> encodes the FROM clause of
220                         the SQL.  The associated value may be a string, an array, or a JSON object.
221                 </para>
222
223                 <productionset>
224
225                         <production xml:id="ebnf.from_list_0">
226                                 <lhs> from_list </lhs>
227                                 <rhs> class_name </rhs>
228                         </production>
229
230                 </productionset>
231
232                 <para>
233                         If <literal>from_list</literal> is a <literal>class_name</literal>, the
234                         json_query inserts the corresponding table name or subquery into the FROM
235                         clause, using the <literal>class_name</literal> as an alias for the table
236                         or subquery.  The class must be defined as non-virtual in the IDL.
237                 </para>
238
239                 <productionset>
240
241                         <production xml:id="ebnf.from_list_1">
242                                 <lhs> from_list </lhs>
243                                 <rhs> '['  string  {  ','  parameter  }  ']' </rhs>
244                         </production>
245
246                         <production xml:id="ebnf.parameter">
247                                 <lhs> parameter </lhs>
248                                 <rhs> string  |  number  |  'null' </rhs>
249                         </production>
250
251                 </productionset>
252
253                 <para>
254                         If from_list is a JSON array, then it represents a table-like function from
255                         which the SQL statement will select rows, using a SELECT clause consisting
256                         of "SELECT *" (regardless of the select_list supplied by the method parameter).
257                 </para>
258                 <para>
259                         The first entry in the array is the name of the function.  It must be a string
260                         naming a stored function.  Each subsequent entry is a function parameter.  If
261                         it is a string or a number, json_query will insert it into a comma-separated
262                         parameter list, enclosed in quotes, with any special characters escaped as needed.
263                         If it is the JSON reserved word <literal>null</literal>, json_query will insert
264                         it into the parameter list as a null value.
265                 </para>
266                 <para>
267                         If <literal>from_list</literal> is a JSON object, it must contain exactly one entry.
268                         The key of this entry must be the name of a non-virtual class defined in the IDL.
269                         This class will be the top-level class of the FROM clause, the only one named
270                         outside of a JOIN clause.
271                 </para>
272
273                 <productionset>
274
275                         <production xml:id="ebnf.from_list_2">
276                                 <lhs> from_list </lhs>
277                                 <rhs> '{' class_name ':' join_list '}' </rhs>
278                         </production>
279
280                         <production xml:id="ebnf.join_list_0">
281                                 <lhs> join_list </lhs>
282                                 <rhs> class_name </rhs>
283                         </production>
284
285                         <production xml:id="ebnf.join_list_1">
286                                 <lhs> join_list </lhs>
287                                 <rhs> '{' join_def { ',' join_def } '}' </rhs>
288                         </production>
289
290                 </productionset>
291
292                 <para>
293                         If the associated data is a <literal>class_name</literal>, json_query will
294                         construct an INNER JOIN clause joining the class to the top-level clause,
295                         using the columns specified by the IDL for such a join.
296                 </para>
297                 <para>
298                         Otherwise, the associated data must be a JSON object with one or more entries,
299                         each entry defining a join:
300                 </para>
301
302                 <productionset>
303
304                         <production xml:id="ebnf.join_def">
305                                 <lhs> join_def </lhs>
306                                 <rhs>
307                                         class_name  ':'<sbr/>
308                                         '{'<sbr/>
309                                         [  '"type"'      ':'  string      ]<sbr/>
310                                         [  '"field"'     ':'  field_name  ]<sbr/>
311                                         [  '"fkey"'      ':'  field_name  ]<sbr/>
312                                         [  '"filter"'    ':'  where_condition  ]<sbr/>
313                                         [  '"filter_op"' ':'  string      ]<sbr/>
314                                         [  '"join"'      ':'  join_list   ]<sbr/>
315                                         '}'
316                                 </rhs>
317                         </production>
318
319                 </productionset>
320
321                 <para>
322                         The data portion of the <literal>"join_type"</literal> entry tells json_query
323                         whether to use a left join, right join, full join, or inner join.  The values
324                         <literal>"left"</literal>, <literal>"right"</literal>, and <literal>"full"</literal>,
325                         in upper, lower, or mixed case, have the obvious meanings.  If the
326                         <literal>"join_type"</literal> entry has any other value, or is not present,
327                         json_query constructs an inner join.
328                 </para>
329                 <para>
330                         The <literal>"field"</literal> and <literal>"fkey"</literal> attributes specify the
331                         columns to be equated in the join condition.  The <literal>"field"</literal>
332                         attribute refers to the column in the joined table, i.e. the one named by the
333                         <literal>join_def</literal>.  The <literal>"fkey"</literal> attribute refers to the
334                         corresponding column in the other table, i.e. the one named outside the
335                         <literal>join_def</literal> -- either the top-level table or a table named by some
336                         other <literal>join_def</literal>.
337                 </para>
338                 <para>
339                         It may be tempting to suppose that <literal>"fkey"</literal> stands for "foreign key",
340                         and therefore refers to a column in the child table that points to the key of a
341                         parent table.  Resist the temptation; the labels are arbitrary.  The json_query
342                         method doesn't care which table is the parent and which is the child.
343                 </para>
344                 <para>
345                         These relationships are best explained with an example.  The following
346                         <literal>from_list</literal>:
347                 </para>
348
349                 <informalexample>
350                         <programlisting language="JSON">
351         {
352             "aou": {
353                 "asv": {
354                     "type" : "left",
355                     "fkey" : "id",
356                     "field" : "owner"
357                 }
358             }
359         }
360                         </programlisting>
361                 </informalexample>
362
363                 <para>
364                         ...turns into the following FROM clause:
365                 </para>
366
367                 <informalexample>
368                         <programlisting language="SQL">
369         FROM
370             actor.org_unit AS "aou"
371                 LEFT JOIN action.survey AS "asv"
372                     ON ( "asv".owner = "aou".id )
373                         </programlisting>
374                 </informalexample>
375
376                 <para>
377                         Note in this example that <literal>"fkey"</literal> refers to a column of the
378                         class <literal>"aou"</literal>, and <literal>"field"</literal> refers to a
379                         column of the class <literal>"asv"</literal>.
380                 </para>
381                 <para>
382                         If you specify only one of the two columns, json_query will try to identify the
383                         other one from the IDL. However, if you specify only the column from the parent
384                         table, this attempt will probably fail.
385                 </para>
386                 <para>
387                         If you specify both columns, json_query will use the column names you specify,
388                         without verifying them with a lookup in the IDL.  By this means you can perform
389                         a join using a linkage that the IDL doesn't define.  Of course, if the columns
390                         don't exist in the database, the query will fail when json_query tries to execute it.
391                 </para>
392                 <para>
393                         Using the columns specified, either explicitly or implicitly, the json_query
394                         method constructs a join condition.  With raw SQL it is possible (though
395                         rarely useful) to join two tables by an inequality.  However the json_query
396                         method always uses a simple equality condition.
397                 </para>
398                 <para>
399                         Using a <literal>"filter"</literal> entry in the join_def, you can apply one
400                         or more additional conditions to the JOIN clause, typically to restrict the
401                         join to certain rows of the joined table.  The data associated with the
402                         <literal>"filter"</literal> key is the same sort of
403                         <literal>where_condition</literal> that you use for a WHERE clause
404                         (discussed below).
405                 </para>
406                 <para>
407                         If the string associated with the <literal>"filter_op"</literal> entry is
408                         <literal>"OR"</literal> in upper, lower, or mixed case, then the json_query
409                         method uses OR to connect the standard join condition to any additional
410                         conditions supplied by a <literal>"filter"</literal> entry.
411                 </para>
412                 <para>
413                         (Note that if the <literal>where_condition</literal> supplies multiple
414                         conditions, they will be connected by AND.  You will probably want to move
415                         them down a layer -- enclose them in parentheses, in effect -- to avoid a
416                         confusing mixture of ANDs and ORs.)
417                 </para>
418                 <para>
419                         If the <literal>"filter_op"</literal> entry carries any other value, or if
420                         it is absent, then the json_query method uses AND.  In the absence of a
421                         <literal>"filter"</literal> entry, <literal>"filter_op"</literal> has no effect.
422                 </para>
423                 <para>
424                         A <literal>"join"</literal> entry in a <literal>join_def</literal> specifies
425                         another layer of join.  The class named in the subjoin is joined to the class
426                         named by the <literal>join_def</literal> to which it is subordinate.  By this
427                         means you can encode multiple joins in a hierarchy.
428                 </para>
429         </sect2>
430
431         <sect2><title>SELECT Clause</title>
432                 <para>
433                         If a query does not contain an entry for <literal>"select"</literal>, json_query
434                         will construct a default SELECT clause.  The default includes every non-virtual
435                         field from the top-level class of the FROM clause, as defined by the IDL.  The
436                         result is similar to SELECT *, except:
437                 </para>
438
439                 <itemizedlist>
440                         <listitem>
441                                 <para>The default includes only the fields defined in the IDL.</para>
442                         </listitem>
443                         <listitem>
444                                 <para>The columns will appear in the same order in which they appear in the IDL,
445                                 regardless of the order in which the database defines them.</para>
446                         </listitem>
447                 </itemizedlist>
448                 
449                 <para>
450                         There are other ways to specify a default SELECT list, as shown below.
451                 </para>
452                 <para>
453                         If a <literal>"select"</literal> entry is present, the associated value must
454                         be a JSON object, keyed on class names:
455                 </para>
456
457                 <productionset>
458
459                         <production xml:id="ebnf.select_list">
460                                 <lhs> select_list </lhs>
461                                 <rhs> '{' class_name ':' field_list { ',' class_name ':' field_list } '}' </rhs>
462                         </production>
463
464                 </productionset>
465
466                 <para>
467                         The <literal>class_name</literal> must identify either the top-level class or
468                         a class belonging to one of the joins.  Otherwise json_query will silently
469                         ignore the <literal>select_list</literal>.
470                 </para>
471
472                 <productionset>
473
474                         <production xml:id="ebnf.field_list_0">
475                                 <lhs> field_list </lhs>
476                                 <rhs> 'null'  |  '"*"' </rhs>
477                         </production>
478
479                 </productionset>
480
481                 <para>
482                         If a field_list is either the JSON reserved word <literal>null</literal>
483                         (in lower case) or an asterisk in double quotes, json_query constructs a
484                         default SELECT list -- provided that the class is the top-level class of the
485                         query.  If the class belongs to a join somewhere, json_query ignores the
486                         <literal>field_list</literal>.
487                 </para>
488                 <para>
489                         More commonly, the <literal>field_list</literal> is a JSON array of zero or
490                         more field specifications:
491                 </para>
492
493                 <productionset>
494
495                         <production xml:id="ebnf.field_list_1">
496                                 <lhs> field_list </lhs>
497                                 <rhs> '['  [  field_spec  {  ','  field_spec  }  ]  ']' </rhs>
498                         </production>
499
500                 </productionset>
501
502                 <para>
503                         If the array is empty, json_query will construct a default SELECT list for
504                         the class -- again, provided that the class is the top-level class in the query.
505                 </para>
506                 <para>
507                         In the simplest case, a field specification may name a non-virtual field
508                         defined in the IDL:
509                 </para>
510
511                 <productionset>
512
513                         <production xml:id="ebnf.field_spec_0">
514                                 <lhs> field_spec </lhs>
515                                 <rhs> field_name </rhs>
516                         </production>
517
518                 </productionset>
519
520                 <para>
521                         In some cases json_query constructs a call to the
522                         <literal>oils_i18n_xlate</literal> function to internationalize the value of the
523                         selected column.  Specifically, it does so if all the following are true:
524                 </para>
525
526                 <itemizedlist>
527                         <listitem>
528                                 <para>the settings file defines a locale;</para>
529                         </listitem>
530                         <listitem>
531                                 <para>in the field definition for the field in the IDL, the tag
532                                 <literal>"il8n"</literal> is present and true;</para>
533                         </listitem>
534                         <listitem>
535                                 <para>the query does <emphasis>not</emphasis> include the
536                                 <literal>"no_il8n"</literal> tag (or includes it with a value of false).</para>
537                         </listitem>
538                 </itemizedlist>
539                 
540                 <para>
541                         A field specification may be a JSON object:
542                 </para>
543                 
544                 <productionset>
545                         
546                         <production xml:id="ebnf.field_spec_1">
547                                 <lhs> field_spec </lhs>
548                                 <rhs>
549                                         '{'<sbr/>
550                                         '"column"'  ':'  <sbr/>
551                                         [ ',' '"alias"'  ':'  string  ]<sbr/>
552                                         [ ',' '"aggregate"'  ':'  boolean  ]<sbr/>
553                                         [ ',' transform_spec  ]<sbr/>
554                                         '}'
555                                 </rhs>
556                         </production>
557
558                 </productionset>
559
560                 <para>
561                         The <literal>"column"</literal> entry provides the column name, which must
562                         be defined as non-virtual in the IDL.
563                 </para>
564                 <para>
565                         The <literal>"alias"</literal> entry provides a column alias.  If no alias
566                         is specified, json_query uses the column name as its own alias.
567                 </para>
568                 <para>
569                         The <literal>"aggregate"</literal> entry has no effect on the SELECT clause
570                         itself.  Rather, it affects the construction of a GROUP BY class.  If there
571                         is an <literal>"aggregate"</literal> entry for any field, then json_query builds
572                         a GROUP BY clause listing every column that is <emphasis>not</emphasis> tagged
573                         for aggregation (or that carries an <literal>"aggregate"</literal> entry with
574                         a value of false).  If <emphasis>all</emphasis> columns are tagged for
575                         aggregation, then json_query omits the GROUP BY clause.
576                 </para>
577
578                 <productionset>
579
580                         <production xml:id="ebnf.transform_spec_0">
581                                 <lhs> transform_spec </lhs>
582                                 <rhs>
583                                         '"transform"'  ':'  string  ]<sbr/>
584                                         [ ',' '"result_field"  ':'  string  ]<sbr/>
585                                         [ ',' '"params"  ':' param_list  ]
586                                 </rhs>
587                         </production>
588
589                 </productionset>
590
591                 <para>
592                         When a <literal>transform_spec</literal> is present, json_query selects the
593                         return value of a function instead of selecting the column directly.  The entry
594                         for <literal>"transform"</literal> provides the name of the function, and the
595                         column name (as specified by the <literal>"column"</literal> tag), qualified by
596                         the class name, is the argument to the function.  For example, you might use such
597                         a function to format a date or time, or otherwise transform a column value.
598                         You might also use an aggregate function such as SUM, COUNT, or MAX (possibly
599                         together with the <literal>"aggregate"</literal> tag).
600                 </para>
601                 <para>
602                         The <literal>"result_field"</literal> entry, when present, specifies a subcolumn
603                         of the function's return value.  The resulting SQL encloses the function call
604                         in parentheses, and follows it with a period and the subcolumn name.
605                 </para>
606                 <para>
607                         The <literal>"params"</literal> entry, if present, provides a possibly empty
608                         array of additional parameter values, either strings, numbers, or nulls:
609                 </para>
610
611                 <productionset>
612
613                         <production xml:id="ebnf.param_list">
614                                 <lhs> param_list </lhs>
615                                 <rhs> '['  [  parameter  {  ','  parameter  }  ]  ']' </rhs>
616                         </production>
617
618                 </productionset>
619
620                 <para>
621                         Such parameter values are enclosed in single quotes, with any special characters
622                         escaped as needed, and inserted after the column name as additional parameters
623                         to the function.  You might, for example, use an additional parameter to provide
624                         a format string for a reformatting function.
625                 </para>
626         </sect2>
627
628         <sect2><title>WHERE Clause</title>
629                 <para> There are two types of <literal>where_condition</literal>: objects and arrays.
630                         Of these, the object type is the more fundamental, and occurs at some level in every
631                         <literal>where_condition</literal>.  The array type is mainly a way of circumventing
632                         a limitation of the object type. </para>
633                 <para> The object type of <literal>where_condition</literal> is a comma-separated list
634                         of one or more <literal>conditions</literal>: </para>
635
636                 <productionset>
637
638                         <production xml:id="ebnf.where_condition_0">
639                                 <lhs> where_condition </lhs>
640                                 <rhs> '{' condition { ',' condition } '}' </rhs>
641                         </production>
642
643                 </productionset>
644
645                 <para> The generated SQL will include a code fragment for each <literal>condition</literal>,
646                         joined by AND (or in some cases by OR, as described below). </para>
647                 <para> As usual for entries in a JSON object, each <literal>condition</literal> consists
648                         of a unique string to serve as a key, a colon, and an associated value. </para>
649                 <para> The key string may be the name of a column belonging to the relevant table, or
650                         it may be an operator string.  In order to distinguish it from any possible column
651                         name, an operator string always begins with a plus sign or minus sign. </para>
652                 <para> JSON requires that every key string be unique within an object.  This requirement
653                         imposes some awkward limitations on a JSON query.  For example, you might want to
654                         express two conditions for the same column: id &gt; 10 and id != 25.  Since each of
655                         those conditions would have the same key string, namely "id", you can't put them
656                         into the same JSON object. </para>
657                 <para> The solution is to put such conflicting conditions in separate JSON objects, and
658                         put the objects into an array: </para>
659
660                 <productionset>
661
662                         <production xml:id="ebnf.where_condition_1">
663                                 <lhs> where_condition </lhs>
664                                 <rhs> '[' where_condition { ',' where_condition } ']' </rhs>
665                         </production>
666
667                 </productionset>
668
669                 <para> The resulting SQL encloses each subordinate set of <literal>conditions</literal>
670                         in parentheses, and connects the sets with AND (or in some cases OR, as described
671                         below).  It's possible to put only a single <literal>where_condition</literal> in
672                         the array; the result is to add a layer of parentheses around the condition. </para>
673                 <para> There are two kinds of <literal>condition</literal> where the operator begins
674                         with a plus sign.  In the simpler case, the associated data is simply a column name:
675                 </para>
676
677                 <productionset>
678
679                         <production xml:id="ebnf.condition_0">
680                                 <lhs> condition </lhs>
681                                 <rhs> plus_class ':' field_name </rhs>
682                         </production>
683
684                 </productionset>
685
686                 <para> A <literal>plus_class</literal> is a string that begins with a plus sign.
687                         The rest of the string, after the plus sign, must be the class name for the table
688                         to which the column belongs. </para>
689                 <para> If the column is a boolean, then the resulting SQL uses it (as qualified by the
690                         class name) as a stand-alone condition. </para>
691                 <para> Otherwise, this kind of syntax provides a way to place a column on the right side
692                         of a comparison operator.  For example: </para>
693
694                 <informalexample>
695                         <programlisting language="JSON">
696         {
697             "from":"aou",
698             "select": { "aou":[ "id", "name" ] },
699             "where": {
700                 "id": {
701                     "&gt;": { "+aou":"parent_ou" }
702                 }
703             }
704         }
705                         </programlisting>
706                 </informalexample>
707
708                 <para> The resulting SQL: </para>
709
710                 <informalexample>
711                         <programlisting language="SQL">
712         SELECT
713             "aou".id AS "id",
714             "aou".name AS "name"
715         FROM
716             actor.org_unit AS "aou"
717         WHERE
718             (
719                 "aou".id &gt; (  "aou".parent_ou  )
720             );
721                         </programlisting>
722                 </informalexample>
723
724                 <para> The other type of <literal>condition</literal> that uses a
725                         <literal>plus_class</literal> applies a specified class name to a
726                         <literal>where_condition</literal>: </para>
727
728                 <productionset>
729
730                         <production xml:id="ebnf.condition_1">
731                                 <lhs> condition </lhs>
732                                 <rhs> plus_class ':' where_condition </rhs>
733                         </production>
734
735                 </productionset>
736
737                 <para> The resulting SQL is enclosed in parentheses, and qualifies the columns with
738                         the specified class name.  This syntax provides a mechanism to shift the class
739                         context -- i.e. to refer to one class in a context that would otherwise refer to
740                         a different class. </para>
741                 <para> Ordinarily the class name must be a valid non-virtual class defined in the IDL,
742                         and applicable to the associated <literal>where_condition</literal>.  There is at
743                         least one peculiar exception.  The JSON fragment: </para>
744
745                 <informalexample>
746                         <programlisting language="JSON">
747         "+abc": { "+xyz":"frobozz" }
748                         </programlisting>
749                 </informalexample>
750
751                 <para> ...is rendered as: </para>
752
753                 <informalexample>
754                         <programlisting language="SQL">
755         (  "xyz".frobozz  )
756                         </programlisting>
757                 </informalexample>
758
759                 <para> ...even though neither <literal>"abc"</literal>, nor <literal>"xyz"</literal>,
760                         nor <literal>"frobozz"</literal> is defined in the IDL.  The class name
761                         <literal>"abc"</literal> isn't used at all because the <literal>"+xyz"</literal>
762                         operator overrides it.  Such a query won't fail until json_query tries
763                         to execute it in the database. </para>
764                 <para> The other operators that may occur at this level all begin with a minus sign,
765                         and they all represent familiar SQL operators.  For example, the
766                         <literal>"-or"</literal> operator joins the conditions within a
767                         <literal>where_condition</literal> by OR (instead of the default AND), and
768                         encloses them all in parentheses: </para>
769
770                 <productionset>
771
772                         <production xml:id="ebnf.condition_2">
773                                 <lhs> condition </lhs>
774                                 <rhs> '"-or"' ':' where_condition </rhs>
775                         </production>
776
777                 </productionset>
778
779                 <para> In fact the <literal>"-or"</literal> operator is the only way to get OR into
780                         the WHERE clause. </para>
781                 <para> The <literal>"-and"</literal> operator is similar, except that it uses AND: </para>
782
783                 <productionset>
784
785                         <production xml:id="ebnf.condition_3">
786                                 <lhs> condition </lhs>
787                                 <rhs> '"-and"' ':'  where_condition </rhs>
788                         </production>
789
790                 </productionset>
791
792                 <para> Arguably the <literal>"-and"</literal> operator is redundant, because you can
793                         get the same effect by wrapping the subordinate <literal>where_condition</literal>
794                         in a JSON array.  Either technique merely adds a layer of parentheses, since AND
795                         connects successive conditions by default. </para>
796                 <para> The <literal>"-not"</literal> operator expands the subordinate
797                         <literal>where_condition</literal> within parentheses, and prefaces the result
798                         with NOT: </para>
799
800                 <productionset>
801
802                         <production xml:id="ebnf.condition_4">
803                                 <lhs> condition </lhs>
804                                 <rhs> '"-not"' ':'  where_condition </rhs>
805                         </production>
806
807                 </productionset>
808
809                 <para> The <literal>"-exists"</literal> or <literal>"-not-exists"</literal> operator
810                         constructs a subquery within an EXISTS  or NOT EXISTS clause, respectively: </para>
811
812                 <productionset>
813
814                         <production xml:id="ebnf.condition_5">
815                                 <lhs> condition </lhs>
816                                 <rhs> '"-exists"' ':' query </rhs>
817                         </production>
818
819                         <production xml:id="ebnf.condition_6">
820                                 <lhs> condition </lhs>
821                                 <rhs> '"-not-exists"' ':' query </rhs>
822                         </production>
823
824                 </productionset>
825
826                 <para> The remaining kinds of <literal>condition</literal> all have a
827                         <literal>field_name</literal> on the left and some kind of <literal>predicate</literal>
828                         on the right.  A <literal>predicate</literal> places a constraint on the value of
829                         the column -- or, in some cases, on the value of the column as transformed by some
830                         function call: </para>
831
832                 <productionset>
833
834                         <production xml:id="ebnf.condition_7">
835                                 <lhs> condition </lhs>
836                                 <rhs> field_name ':' predicate </rhs>
837                         </production>
838
839                 </productionset>
840
841                 <para> The simplest such constraint is to require that the column have a specific value,
842                         or be null: </para>
843
844                 <productionset>
845
846                         <production xml:id="ebnf.predicate_0">
847                                 <lhs> predicate </lhs>
848                                 <rhs> lit_value | 'null' </rhs>
849                         </production>
850
851                         <production xml:id="ebnf.lit_value">
852                                 <lhs> lit_value </lhs>
853                                 <rhs> string | number </rhs>
854                         </production>
855
856                 </productionset>
857
858                 <para> You can also compare a column to a literal value using some kind of inequality.
859                         However it's a bit more complicated because you have to specify what kind of comparison
860                         to make: </para>
861
862                 <productionset>
863
864                         <production xml:id="ebnf.compare_op_0">
865                                 <lhs> predicate </lhs>
866                                 <rhs> '{' compare_op ':' lit_value '}' </rhs>
867                         </production>
868
869                         <production xml:id="ebnf.compare_op_1">
870                                 <lhs> compare_op </lhs>
871                                 <rhs> string </rhs>
872                         </production>
873
874                 </productionset>
875
876                 <para> A <literal>compare_op</literal> is a string that defines a comparison operator.
877                         Valid values include the following: </para>
878
879                 <programlisting language="SQL">
880         =    &lt;&gt;   !=
881         &lt;    &gt;    &lt;=   &gt;=
882         ~    ~*   !~   !~*
883         like      ilike
884         similar to
885                 </programlisting>
886
887                 <para> Strictly speaking, json_query accepts any <literal>compare_op</literal>
888                         that doesn't contain semicolons or white space (or
889                         <literal>"similar to"</literal> as a special exception).  As a result, it
890                         is possible -- and potentially useful -- to use a custom operator like
891                         <literal>"&gt;100*"</literal> in order to insert an expression that would
892                         otherwise be difficult or impossible to create through a JSON query.  The ban
893                         on semicolons and white space prevents certain kinds of SQL injection. </para>
894                 <para> Note that json_query does <emphasis>not</emphasis> accept two operators that
895                         PostgreSQL <emphasis>does</emphasis> accept: <literal>"is distinct from"</literal>
896                         and <literal>"is not distinct from"</literal>. </para>
897                 <para> You can also compare a column to a null value: </para>
898
899                 <productionset>
900
901                         <production xml:id="ebnf.predicate_1">
902                                 <lhs> predicate </lhs>
903                                 <rhs> '{' compare_op ':' 'null' '}' </rhs>
904                         </production>
905
906                 </productionset>
907
908                 <para> The equality operator <literal>"="</literal> turns into IS NULL.  Any other
909                         operator turns into IS NOT NULL. </para>
910                 <para> When a <literal>compare_op</literal> is paired with an array, it defines a
911                         function call: </para>
912
913                 <productionset>
914
915                         <production xml:id="ebnf.predicate_2">
916                                 <lhs> predicate </lhs>
917                                 <rhs> '{' compare_op ':' '[' string { ',' parameter } ']' '}' </rhs>
918                         </production>
919
920                 </productionset>
921
922                 <para> The first entry in the array is the function's name.  Subsequent entries in
923                         the array, if any, represent the parameters of the function call.  They may be
924                         strings, numbers, or nulls.  In the generated SQL, the function call appears on
925                         the right of the comparison. </para>
926                 <para> The <literal>"between"</literal> operator creates a BETWEEN clause: </para>
927
928                 <productionset>
929
930                         <production xml:id="ebnf.predicate_3">
931                                 <lhs> predicate </lhs>
932                                 <rhs> '{'  "between"  ':'  '['  lit_value  ','  lit_value  ']'  '}' </rhs>
933                         </production>
934
935                 </productionset>
936
937                 <para> Although PostgreSQL allows a null value in a BETWEEN clause, json_query
938                         requires literal non-null values.  It isn't sensible to use null values in a
939                         BETWEEN clause.  A few experiments show that the results of the comparison are
940                         peculiar and erratic. </para>
941                 <para> There are two ways to create an IN list of allowed values.  The simplest is
942                         to put literal values into a JSON array: </para>
943
944                 <productionset>
945
946                         <production xml:id="ebnf.predicate_4">
947                                 <lhs> predicate </lhs>
948                                 <rhs> '[' lit_value { ',' lit_value }  ']' </rhs>
949                         </production>
950
951                 </productionset>
952
953                 <para> As with BETWEEN clauses, json_query does not accept null values in an IN list,
954                         even though PostgreSQL does allow them.  Nulls are not sensible in this context
955                         because they never match anything. </para>
956                 <para>  </para>
957         </sect2>
958
959         <sect2><title>Having Clause</title>
960                 <para>For the HAVING clause, json_query accepts exactly the same syntax as it accepts for
961                         the WHERE clause.</para>
962                 <para> The other way to create an IN list is to use an explicit
963                         <literal>"in"</literal> operator with an array of literal values.  This format
964                         also works for the <literal>"not in"</literal> operator: </para>
965
966                 <productionset>
967
968                         <production xml:id="ebnf.predicate_5">
969                                 <lhs> predicate </lhs>
970                                 <rhs> '{' in_operator ';'  '[' lit_value [ ',' lit_value ]  ']'  '}' </rhs>
971                         </production>
972
973                         <production xml:id="ebnf.in_operator">
974                                 <lhs> in_operator </lhs>
975                                 <rhs> "in"  |  "not in" </rhs>
976                         </production>
977
978                 </productionset>
979
980                 <para> Another kind of IN or NOT IN clause uses a subquery instead of a list of
981                         values: </para>
982
983                 <productionset>
984
985                         <production xml:id="ebnf.predicate_6">
986                                 <lhs> predicate </lhs>
987                                 <rhs> '{' in_operator ':'  query  '}' </rhs>
988                         </production>
989
990                 </productionset>
991
992                 <para> The remaining types of <literal>predicate</literal> can put a function call on
993                         the left of the comparison, by using a <literal>transform_spec</literal> together
994                         with a <literal>"value"</literal> tag.   The <literal>transform_spec</literal> is
995                         optional, and if you don't need it, the same SQL would in many cases be easier to
996                         express by other means. </para>
997                 <para> The <literal>transform_spec</literal> construct was described earlier in
998                         connection with the SELECT clause, but here it is again: </para>
999
1000                 <productionset>
1001
1002                         <production xml:id="ebnf.transform_spec_1">
1003                                 <lhs> transform_spec </lhs>
1004                                 <rhs>
1005                                         '"transform"'  ':'  string  ]<sbr/>
1006                                         [ ',' '"result_field"  ':'  string  ]<sbr/>
1007                                         [ ',' '"params"  ':' param_list  ]
1008                                 </rhs>
1009                         </production>
1010
1011                 </productionset>
1012
1013                 <para> As in the SELECT clause, the <literal>"transform"</literal> string names the
1014                         function.  The first parameter is always the column identified by the field_name.
1015                         Additional parameters, if any, appear in the <literal>param_list</literal>.  The
1016                         <literal>"result_field"</literal> string, if present, identifies one column of a
1017                         multicolumn return value. </para>
1018                 <para> Here's a second way to compare a value to a literal value (but not to a null
1019                         value): </para>
1020
1021                 <productionset>
1022
1023                         <production xml:id="ebnf.predicate_7">
1024                                 <lhs> predicate </lhs>
1025                                 <rhs> '{' compare_op ':' '{' '"value"' ':' lit_value<sbr/>
1026                                         [ transform_spec ] '}' '}' </rhs>
1027                         </production>
1028
1029                 </productionset>
1030
1031                 <para> ...and a way to compare a value to a boolean expression: </para>
1032
1033                 <productionset>
1034
1035                         <production xml:id="ebnf.predicate_8">
1036                                 <lhs> predicate </lhs>
1037                                 <rhs> '{' compare_op ':' '{' '"value"' ':' '{'<sbr/>
1038                                         condition { ',' condition } [ transform_spec ] '}' '}' </rhs>
1039                         </production>
1040
1041                 </productionset>
1042
1043                 <para> The final predicate is another way to put a function call on the right side
1044                         of the comparison: </para>
1045
1046                 <productionset>
1047
1048                         <production xml:id="ebnf.predicate_9">
1049                                 <lhs> predicate </lhs>
1050                                 <rhs> '{' compare_op ':' '{' '"value"' ':' '['<sbr/>
1051                                         string { ',' parameter } ']' [ transform_spec ] '}' '}' </rhs>
1052                         </production>
1053
1054                 </productionset>
1055
1056                 <para> This format is available for the sake of consistency, but offers no advantage
1057                         over the simpler version. </para>
1058         </sect2>
1059
1060         <sect2><title>ORDER BY Clause</title>
1061                 <para> There are two ways to encode an ORDER BY clause: as an array, or as a list.
1062                         Either may be empty, in which case the generated SQL will not include an ORDER BY
1063                         clause: </para>
1064
1065                 <productionset>
1066
1067                         <production xml:id="ebnf.order_by_list_0">
1068                                 <lhs> order_by_list </lhs>
1069                                 <rhs> '['  ']'  |  '{'  '}' </rhs>
1070                         </production>
1071
1072                 </productionset>
1073
1074                 <para> If not empty, the array contains one or more objects, each defining a sort
1075                         field: </para>
1076
1077                 <productionset>
1078
1079                         <production xml:id="ebnf.order_by_list_1">
1080                                 <lhs> order_by_list </lhs>
1081                                 <rhs> '{' sort_field_def  {  ','  sort_field_def }  '}' </rhs>
1082                         </production>
1083
1084                         <production xml:id="ebnf.sort_field_def">
1085                                 <lhs> sort_field_def </lhs>
1086                                 <rhs> '{'<sbr/>
1087                                         '"class"'  ':'  class_name<sbr/>
1088                                         ','  '"field"'  ':'  field_name<sbr/>
1089                                         [  ','  '"direction"'  ':'  lit_value  ]<sbr/>
1090                                         [  ','  transform_spec  ]<sbr/>
1091                                         '}' </rhs>
1092                         </production>
1093
1094                 </productionset>
1095
1096                 <para> The <literal>"class"</literal> and <literal>"field"</literal> entries are
1097                         required, and of course the field must belong to the class.  Furthermore, at
1098                         least one field from the class must appear in the SELECT clause. </para>
1099                 <para> The <literal>"direction"</literal> entry, if present, specifies whether the
1100                         sort will be ascending or descending for the column in question.  If the associated
1101                         value begins with "D" or "d", the sort will be descending; otherwise the sort will
1102                         be ascending.  If the value is a number, it will be treated as a string that does not
1103                         start with "D" or "d", resulting in an ascending sort. </para>
1104                 <para> In the absence of a <literal>"direction"</literal> entry, the sort will be
1105                         ascending. </para>
1106                 <para> The <literal>transform_spec</literal> works here the same way it works in the
1107                         SELECT clause and the WHERE clause, enabling you to pass the column through a
1108                         transforming function before the sort: </para>
1109
1110                 <productionset>
1111
1112                         <production xml:id="ebnf.transform_spec_2">
1113                                 <lhs> transform_spec </lhs>
1114                                 <rhs>
1115                                         '"transform"'  ':'  string  ]<sbr/>
1116                                         [ ',' '"result_field"  ':'  string  ]<sbr/>
1117                                         [ ',' '"params"  ':' param_list  ]
1118                                 </rhs>
1119                         </production>
1120
1121                 </productionset>
1122
1123                 <para> When the <literal>order_by_list</literal> is an object instead of an array,
1124                         the syntax is less verbose, but also less flexible.  The keys for the object are
1125                         class names: </para>
1126
1127                 <productionset>
1128
1129                         <production xml:id="ebnf.order_by_list_2">
1130                                 <lhs> order_by_list </lhs>
1131                                 <rhs> '{' class_name ':' sort_class_def<sbr/>
1132                                         { ',' class_name ':' sort_class_def } '}' </rhs>
1133                         </production>
1134
1135                 </productionset>
1136
1137                 <para> Each class must be referenced in the SELECT clause. </para>
1138                 <para> As in the SELECT clause, all the fields for a given class must be grouped
1139                         together.  You can't sort by a column from one table, then a column from a second
1140                         table, then another column from the first table. If you need this kind of sort,
1141                         you must encode the ORDER BY clause as an array instead of an object. </para>
1142                 <para> The data associated with a <literal>class_name</literal> may be either an array
1143                         or an object.  If an array, it's simply a list of field names, and each field must
1144                         belong to the class: </para>
1145
1146                 <productionset>
1147
1148                         <production xml:id="ebnf.sort_class_def_0">
1149                                 <lhs> sort_class_def </lhs>
1150                                 <rhs> '['  field_name  { ','  field_name }  ']' </rhs>
1151                         </production>
1152
1153                 </productionset>
1154
1155                 <para> With this syntax, the direction of sorting will always be ascending. </para>
1156                 <para> If the data is an object, the keys are field names, and as usual the fields
1157                         must belong to the class: </para>
1158
1159                 <productionset>
1160
1161                         <production xml:id="ebnf.sort_class_def_1">
1162                                 <lhs> sort_class_def </lhs>
1163                                 <rhs> '{'  field_name  ':' sort_class_subdef<sbr/>
1164                                         { ','  field_name  ':' sort_class_subdef  }  '}' </rhs>
1165                         </production>
1166
1167                 </productionset>
1168
1169                 <para> Since the <literal>field_name</literal> is the key for the object, it cannot
1170                         appear more than once.  As a result, some kinds of sorts are not possible with this
1171                         syntax.  For example, one might want to sort by UPPER( family_name ), and then by
1172                         family_name with case unchanged, to make sure that "diBona" comes before "Dibona".
1173                         For situations like this, you must encode the ORDER BY clause as an array rather
1174                         than an object. </para>
1175                 <para> The data associated with each <literal>field_name</literal> may take either of
1176                         two forms.  In the simplest case, it's a literal value to specify the direction
1177                         of sorting: </para>
1178
1179                 <productionset>
1180
1181                         <production xml:id="ebnf.sort_class_subdef_0">
1182                                 <lhs> sort_class_subdef </lhs>
1183                                 <rhs> lit_value </rhs>
1184                         </production>
1185
1186                 </productionset>
1187
1188                 <para> If the literal is a string starting with "D" or "d", json_query sorts the field
1189                         in descending order.  Otherwise it sorts the field in ascending order. </para>
1190                 <para> In other cases, the <literal>field_name</literal> may be paired with an object
1191                         to specify more details: </para>
1192
1193                 <productionset>
1194
1195                         <production xml:id="ebnf.sort_class_subdef_1">
1196                                 <lhs> sort_class_subdef </lhs>
1197                                 <rhs> '{'<sbr/>
1198                                         [  '"direction"'  ':'  lit_value ]<sbr/>
1199                                         [  transform_spec  ]<sbr/>
1200                                         '}' </rhs>
1201                         </production>
1202
1203                 </productionset>
1204
1205                 <para> As before, the value tagged as <literal>"direction"</literal> specifies the
1206                         direction of the sort, depending on the first character.  If not otherwise
1207                         specified, the sort direction defaults to ascending. </para>
1208                 <para> Also as before, the <literal>transform_spec</literal> may specify a function
1209                         through which to pass the column. </para>
1210                 <para> Since both the <literal>"direction"</literal> tag and the
1211                         <literal>transform_spec</literal> are optional, the object may be empty: </para>
1212
1213                 <productionset>
1214
1215                         <production xml:id="ebnf.sort_class_subdef_2">
1216                                 <lhs> sort_class_subdef </lhs>
1217                                 <rhs> '{'  '}' </rhs>
1218                         </production>
1219
1220                 </productionset>
1221         </sect2>
1222
1223 </sect1>