Sorting Chinese characters
December 28, 2013 -Recently we decided to localize country selection list at work and there was some confusion about how to sort Chinese characters. I asked my wife and she told me that sorting by pinyin is seems most reasonable to her. So here's how to do it in Perl:
use 5.010;
use strict;
use warnings;
use utf8::all;
use Encode;
use Unicode::Collate::Locale;
use Unicode::Unihan;
use Locale::Country::Multilingual;
my $lcm = Locale::Country::Multilingual->new;
$lcm->set_lang('zh');
my @names = map { decode_utf8($_) } $lcm->all_country_names;
my $uh = Unicode::Unihan->new;
my $ucl = Unicode::Collate::Locale->new( locale => 'zh__pinyin' );
for ( $ucl->sort(@names) ) {
say $_, " ", join "",
map { $_ //= ''; s/[0-9]//g; s/ .*//; $_; } $uh->Mandarin($_);
};
The problem with this method is that 中国 (China itself) becomes the last item
in the list. If you replace zh__pinyin
with zh_stroke
it will sort by the
number of strokes.