]> git.evergreen-ils.org Git - Evergreen.git/commit
Fix Unicode mangling in clean_marc function
authorDan Scott <dscott@laurentian.ca>
Sun, 4 Mar 2012 07:41:11 +0000 (02:41 -0500)
committerDan Scott <dscott@laurentian.ca>
Wed, 7 Mar 2012 21:03:04 +0000 (16:03 -0500)
commit2788298ec23d1caff3755f9c151d03510420651d
tree122dabbc2e719c9f3d9502c1fa1f0877a8c5f555
parentd939d7d09f231319a59f7bc309b7e40c451f273e
Fix Unicode mangling in clean_marc function

Calling s/\p{Cc}//go; before entityize() was resulting in all xFFFD
entities being returned for the upper case diacritic characters, which
in turn caused the new unit test to fail (yay unit tests). I added a
corresponding unit tese for entityize() to ensure that the problem
wasn't coming from that function. Switching the order in which the p{Cc}
regex and entityize() calls resolved the corruption in the unit test.

This suggests that Vandelay may be introducing significant corruption to
imported records and that backporting of this commit to the inline
Vandelay variants from previous releases may be warranted.

Signed-off-by: Dan Scott <dscott@laurentian.ca>
Signed-off-by: Jason Stephenson <jstephenson@mvlc.org>
Open-ILS/src/perlmods/lib/OpenILS/Utils/Normalize.pm
Open-ILS/src/perlmods/t/01-OpenILS-Application.t
Open-ILS/src/perlmods/t/14-OpenILS-Utils.t