Sorting Chinese characters

December 28th, 2013

Recently we decided to localize country selection list at work and there was some confusion about how to sort Chinese characters. I asked my wife and she told me that sorting by pinyin is seems most reasonable to her. So here’s how to do it in Perl:

use 5.010;
use strict;
use warnings;
use utf8::all;

use Encode;
use Unicode::Collate::Locale;
use Unicode::Unihan;
use Locale::Country::Multilingual;

my $lcm = Locale::Country::Multilingual->new;
$lcm->set_lang('zh');
my @names = map { decode_utf8($_) } $lcm->all_country_names;
my $uh = Unicode::Unihan->new;

my $ucl = Unicode::Collate::Locale->new( locale => 'zh__pinyin' );
for ( $ucl->sort(@names) ) {
    say $_, "   ", join "", map { $_ //= ''; s/[0-9]//g; s/ .*//; $_; } $uh->Mandarin($_);
};

The problem with this method is that 中国 (China itself) becomes the last item in the list. If you replace zh__pinyin with zh_stroke it will sort by the number of strokes.

View on Putrajaya

August 12th, 2013

DSC_6729

ZeroMQ Rant

April 3rd, 2013

Today deployed at work a new service that based on zeromq. I never liked zeromq because it does not provide any feedback about what is going on, and this is exactly what caused problems. After deployment service worked as expected on all servers except one. Daemon was starting, creating zeromq socket and waiting for messages, but zeromq did not establish connection and of course did not report any errors — it is supposed simply work, and if it does not… well, it will pretend that everything is fine, zeromq developers assume that it is better to do nothing than return an error or at least print some warning.
Read the rest of this entry »

Cameron Highlands. Sunset.

January 14th, 2013

We spent four days in Cameron Highlands in December. Here are photos of sunset I took from balcony of our hotel room.

Cameron Highlands. Sunset.
Read the rest of this entry »

Packing timeval

December 5th, 2012

Recently I got a lot of failures from CPAN Testers for RedisDB on NetBSD i386. After investigating a bit I’ve found that NetBSD 6.0 comes now with 64-bit time_t on all architectures. It means that the way I used to pack struct timeval value to set timeout on socket, didn’t work anymore. Previously it was the same as long, long on all systems and pack looked like:

my $timeval = pack "L!L!", $sec, $usec;

Now I had a special case for NetBSD there the first part is always 64 bit number. And pack on 32-bit perl doesn’t even support “Q” unless is was compiled with 64-bit integers. Read the rest of this entry »

Xubuntu countdown

October 11th, 2012

Countdown to Xubuntu 12.04

Memory leaks

October 8th, 2012

Spent the whole day looking for sources of memory leaks. One of them was because I decided to use named captures. It so happened that perl leaks some memory if named capture doesn’t match. Here’s an example:

use 5.010;
use strict;
use warnings;

say `ps vp $$`;
for (1..1_000_000) {
    "08-Oct-2012" =~ /^(?'date'\d\d-\w\w\w-\d\d\d\d)(?: (?'time'\d\d:\d\d))?/;
    say "Time: $+{time}" if $+{time} and 0;
}
say `ps vp $$`;

Funnily, if you replace “08-Oct-2012″ with “08-Oct-2012 22:17″ it will stop leaking. I can reproduce the problem with 5.10.1 and 5.14.2. It was fixed in 5.16.0, but at the work we are using Debian, so have to wait around 2.5 years before fix will make it into stable and we will be able to start using named captures.

Another one was in ZMQ::LibZMQ2 library: https://github.com/lestrrat/p5-ZMQ/issues/15, so I had some fun fixing XS code. Fixed version is already available from CPAN.