]> git.evergreen-ils.org Git - Evergreen.git/commit
backport naco_normalize revisions to rel_2_0
authorgmc <gmc@dcc99617-32d9-48b4-a31d-7c20da2025e4>
Wed, 19 Jan 2011 16:07:14 +0000 (16:07 +0000)
committergmc <gmc@dcc99617-32d9-48b4-a31d-7c20da2025e4>
Wed, 19 Jan 2011 16:07:14 +0000 (16:07 +0000)
commit4d39dd908d6c7abd202cc48e7ae472b0700c32ca
treee1a3baeb0030edb7978e74334cf73855c9f59aa1
parentfa5dc0ecbde1f7a8c845fa784968c01257b67dbd
backport naco_normalize revisions to rel_2_0

This implements the latest version of the NACO
normalization specification found at

http://www.loc.gov/catdir/pcc/naco/SCA_PccNormalization_Final_revised.pdf

This version of the algorithm is more general -- for example,
all combining characters are removed -- so there should be
fewer fiddly edge cases to worry about for most European
languages.

Rebuilding the metabib.*_field_entry tables (e.g., by using
reingest-1.6-2.0.pl) is recommended if there are any bibs that contain
any non-ASCII characters.

Normalized text is now left in the NFKD form, so while this should
be transparent to the search system after reindexing, it does mean
that (for example) Korean text in metabib.*_field_entry may not
be in the same Unicode normalization form as that found in
biblio.record_entry.

Also includes fix for bug #684467: more bulletproofing of naco_normalize

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
git-svn-id: svn://svn.open-ils.org/ILS/branches/rel_2_0@19205 dcc99617-32d9-48b4-a31d-7c20da2025e4
Open-ILS/src/perlmods/OpenILS/Application/Storage/FTS.pm
Open-ILS/src/sql/Pg/002.schema.config.sql
Open-ILS/src/sql/Pg/020.schema.functions.sql
Open-ILS/src/sql/Pg/1.6.1-2.0-upgrade-db.sql
Open-ILS/src/sql/Pg/upgrade/0478.schema.naco_normalize_tweak.sql [new file with mode: 0644]
Open-ILS/tests/naco_normalize.t [new file with mode: 0644]