* Move to in-core fts function, instead of the compat wrapper provided by the tsearch2 contrib
* Provide default cover density tuning (config file)
* Move default preferred language settings from storage to search, where they make more sense
More on the CD tuning:
Evergreen uses a cover density algorithm for calculating relative ranking of matches. There
are several tuning parameters and options available. By default, no document length normalization
is applied. From the Postgres documentation on ts_rank_cd() (the function used by Evergreen):
Since a longer document has a greater chance of containing a query term it is reasonable
to take into account document size, e.g., a hundred-word document with five instances of
a search word is probably more relevant than a thousand-word document with five instances.
Both ranking functions take an integer normalization option that specifies whether and how
a document's length should impact its rank. The integer option controls several behaviors,
so it is a bit mask: you can specify one or more behaviors using | (for example, 2|4).
0 (the default) ignores the document length
1 divides the rank by 1 + the logarithm of the document length
2 divides the rank by the document length
4 divides the rank by the mean harmonic distance between extents (this is implemented only by ts_rank_cd)
8 divides the rank by the number of unique words in document
16 divides the rank by 1 + the logarithm of the number of unique words in document
32 divides the rank by itself + 1
If more than one flag bit is specified, the transformations are applied in the order listed.
It is important to note that the ranking functions do not use any global information, so it
is impossible to produce a fair normalization to 1% or 100% as sometimes desired. Normalization
option 32 (rank/(rank+1)) can be applied to scale all ranks into the range zero to one, but of
course this is just a cosmetic change; it will not affect the ordering of the search results.
In Evergreen, these options are set via search modifiers. The modifiers are mapped in the
following way:
* #CD_logDocumentLength => 1 :: rank / (1 + LOG(total_word_count)) :: Longer documents slightly less relevant
* #CD_documentLength => 2 :: rank / total_word_count :: Longer documents much less relevant
* #CD_meanHarmonic => 4 :: Word Proximity :: Greater matched-word distance is less relevant
* #CD_uniqueWords => 8 :: rank / unique_word_count :: Documents with repeated words much less relevant
* #CD_logUniqueWords => 16 :: rank / (1 + LOG(unique_word_count)) :: Documents with repeated words slightly less relevant
* #CD_selfPlusOne => 32 :: rank / (1 + rank) :: Cosmetic normalization of rank value between 0 and 1
Adding one or more of these to the default_CD_modifiers list will cause all searches that use QueryParser to apply them.
The helper script grab-db-comment.pl is what actually parses out
the comment statements.
To avoid repetition, the list of default SQL scripts to use when
initializing an Evergreen database has been moved to a new file
called sql_file_manifest.
* remove copyright, license verbiage, and C-style comment marking
from the comments; these can live in the SQL scripts
* updated several copyright headers
* minor improvements to documentation of a couple tables
We were 98% of the way there; now we no longer need to
cd into the same directory as the i18n testing scripts
to run them with meaningful output. Should be useful
for adding these to the CI server.
Must have asked this script to check JS files for valid entities
for a reason at some point in the dark past, but it couldn't have
been a very good reason; we're getting a false positive that needs
to be hushed now. Better to just stop looking for XML entities in
JavaScript.
Empty strings in oils_i18n_gettext() throw i18n errors
When you run 'make newpot', if you have an empty string in an
oils_i18n_gettext() function, you'll see errors like:
Error in line 1712 of SQL source file: 'NoneType' object has no attribute 'group'
This satisfies the i18n build process and also serves as a
more evident placeholder for expanded descriptions if someone
feels so inclined in the future.
%SUBSTR(#)%...%SUBSTR_END%
Take substring starting at position # to end of string.
If # is negative count backwards from end of string.
%SUBSTR(#,#)%...%SUBSTR_END%
Same as previous, but limit to second provided number characters after start point.
If second number is negative, count backwards instead of forwards.
TRIM macros inside of SUBSTR will be replaced first, then SUBSTR, then TRIM outside of SUBSTR.
Author: Thomas Berezansky <tsbere@mvlc.org> Signed-off-by: Thomas Berezansky <tsbere@mvlc.org> Signed-off-by: Jason Etheridge <jason@esilibrary.com>
git-svn-id: svn://svn.open-ils.org/ILS/trunk@20137 dcc99617-32d9-48b4-a31d-7c20da2025e4
Allow NULL "use restriction" fields for located URIs
The asset.uri.use_restriction field, which is really a sort of public notes
field for 856 fields, was grabbing the $u subfield (URL) as a sort of last-gasp
effort to give it some data. However, the effect was rather odd and led to
workarounds like Conifer's skin to avoid displaying the use restriction field
if its value was identical to the URL, etc.
Instead, stop grabbing $u and handle the case where use_restriction column is
NULL gracefully, just like the schema intended.
Delete ##URI## call numbers and uri_call_number_map entries on bib reingest
This approach will lead to some acn/auricnm ID inflation, but it works.
Addresses LP# 761130 (immortal ##URI## entries in asset.call_number) reported
by Ben Shum and LP# 761085 (cannot delete bib with ##URI## volumes) reported
by Jason Etheridge.
Protect dumb JavaScript engines from having to deal with actual Unicode
The holdings_xml format did not include an XML declaration, but adding that
as we do here still does not make the Firefox and Chromium JS engines capable
of consuming XML that contains Unicode content outside of the base ASCII
range.
So, we invoke entityize() to convert anything outside of the realm of
ASCII to XML entities. An alternative would be to invoke entityize() in
OpenILS::Application::SuperCat::unAPI::acn but it's not clear if that
would interfere with any other uses.
With this change, library names / copy location names with Unicode content
can be displayed correctly on the search results page.
use org unit shortname for site param; capture site/depth in search builder to pass on to unapi retrieval, though there's still some template work to do w/ displaying the correct data there (for copy counts)
At some point (r16750) we started doing a numeric comparison of
$flesh instead of just checking to see if $flesh was defined; this
returned false when $flesh == 'uris', preventing URIs from being
included in the marcxml-uris unAPI format.
This restores URIs to marcxml-uris and so we can revert the extra
BibTemplate call in rdetail_summary.xml.
Specify the holdings_xml unAPI format for URI calls
The unAPI marcxml-uris format is not returning URIs at the moment.
While we're getting that fixed, use the holdings_xml format to
get the URI job done; requires an extra JS call, but that's
better than not working at all.
Escape rather than filter SIMILAR TO metacharacters in patron crazy search
The filtering I introduced in r19983 was overly aggressive, and included
characters that weren't actually SIMILAR TO metacharacters. Instead, escape
each character, carefully going through the list of metacharacters listed at
http://www.postgresql.org/docs/8.4/interactive/functions-matching.html
Works for email addresses like "foo.bar+baz@example.com".
* used version from wiki, which provides same results as the
previous version but performs better on large databases
* now works without editing (a vacuum cannot run inside of a transaction)
* don't do vacuum full, just a regular vacuum analyze
particularly for running the catalog embedded in the staff client, which makes no visual indication of page progress, it's good to let the caller know something is happening w/ a search. after a 1-second search delay, show a small progress spinny icon
inisial staff client integration in record details page w/ new staff js file; move footer and other js loading to their own templates; hide top-nav pane (my account summary) for embedded mode; load slim version of marc html (no external css; no print button)
moving toward svf for mattype extraction; much media/material-type icon cleanup; icons are now accessed directly via code instead of inconsistent and map-requiring human names
Add support for Multi-Homed Items (aka Foreign Bibs, aka Linked Items)
Evergreen needs to support the ability to attach a barcoded item to more than one bibliographic record. Use cases include:
1. Barcoded E-Readers with preloaded content
* Readers would all be items attached to a single "master" bib record in the traditional way, through call numbers that define their ownership
* Each reader, as a barcoded item, can be attached through Multi-homed Items to records describing the list of preloaded content
* These attached Multi-homed Items can be added and removed as content is swapped out on each reader
2. Dual-language items
* Cataloger decides which of several alternate languages is the primary, and attaches the barcoded item to that record in the traditional way
* Alternate language records are attached to this item through Multi-homed Items
3. "Back-to-back" books -- two books printed upside down relative to one another, with two "front" covers
* Cataloger decides which of the two titles is the primary, and attaches the barcoded item to that record in the traditional way
* Alternate title record is attached to this item through Multi-homed Items
4. Bound Volumes -- Sets of individual works collected into a single barcoded package
* Cataloger decides which of the titles is the primary (or creates a record for the collection as a whole), and attaches the barcoded item to that record in the traditional way
* Remaining title records for the collected peices are attached to this item through Multi-homed Items
Functionality funded by Natural Resources Canada -- http://www.nrcan-rncan.gc.ca/com/
Please see http://git.esilibrary.com/?p=evergreen-equinox.git;a=shortlog;h=refs/heads/multi_home for the full commit history.
patch from Ben Ostrowsky (w/ input) to add support to the Apache redirect module to also optionally read redirect skin and domain from the library IP's configuration file.