How to truncate UTF-8 string
July 16, 2008 -Suppose you need to put UTF-8 string into a fixed length buffer. Actually I was in need to do this. Problem is that the last symbol may be incomplete, so here is the example how to do this:
#include <string.h>
#include <stdio.h>
#include <err.h>
main (int argc, char *argv[])
{
char buf[64];
int i;
if (argc != 2)
errx (-1, "Usage: %s string", argv[0]);
memset (buf, '\0', 64);
strncpy (buf, argv[1], 63);
/*
* The following printf may output a trash
* in the end of the string
*/
printf ("Before: `%s'\n", buf);
/*
* here we check if there is truncated utf-8
* character in the end of the string
*/
i = 62;
if (buf[i] & 128)
{
if (buf[i] & 64)
buf[i] = '\0';
else if ((buf[i - 1] & 224) == 224)
buf[i - 1] = '\0';
else if ((buf[i - 2] & 240) == 240)
buf[i - 2] = '\0';
}
/*
* Here is a clean output
*/
printf ("After: `%s'\n", buf);
}