]> git.evergreen-ils.org Git - Evergreen.git/commit
revised version of naco_normalize
authorgmc <gmc@dcc99617-32d9-48b4-a31d-7c20da2025e4>
Mon, 29 Nov 2010 21:44:34 +0000 (21:44 +0000)
committergmc <gmc@dcc99617-32d9-48b4-a31d-7c20da2025e4>
Mon, 29 Nov 2010 21:44:34 +0000 (21:44 +0000)
commit0d883cd743c1fb7f76f172796f5bb8e31be01c2c
tree10789a44cbf992f0c5c8dfea992412810758828f
parent72220fc0e5cb7f9a5d19fbbac6dc53a882d1ae4f
revised version of naco_normalize

This implements the latest version of the NACO
normalization specification found at

http://www.loc.gov/catdir/pcc/naco/SCA_PccNormalization_Final_revised.pdf

This version of the algorithm is more general -- for example,
all combining characters are removed -- so there should be
fewer fiddly edge cases to worry about for most European
languages.

Rebuilding the metabib.*_field_entry tables (e.g., by using
reingest-1.6-2.0.pl) is recommended if there are any bibs that contain
any non-ASCII characters.

Normalized text is now left in the NFKD form, so while this should
be transparent to the search system after reindexing, it does mean
that (for example) Korean text in metabib.*_field_entry may not
be in the same Unicode normalization form as that found in
biblio.record_entry.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
git-svn-id: svn://svn.open-ils.org/ILS/trunk@18864 dcc99617-32d9-48b4-a31d-7c20da2025e4
Open-ILS/src/sql/Pg/002.schema.config.sql
Open-ILS/src/sql/Pg/020.schema.functions.sql
Open-ILS/src/sql/Pg/upgrade/0467.schema.updated_naco_normalize.sql [new file with mode: 0644]