How to get the exact number of characters in a string, if it has a different encoding
How to get the exact number of characters in a string, if it has a different encoding.
In my php – QCubed project i need to get a exact length of substring from a string. To get a title from a whole string. In the standard way may use php function substr.
<?php echo substr(QApplication::Translate('_NEWS_TEXT1_'), 0, 45) . ' ...'; ?>
Where QApplication::Translate(‘_NEWS_TEXT1_’) is whole string get from a i18n implementation on QCubed. The problem is that string is a UTF-8 encoding, and in different language encoding function substr return different length of substring. The problem is that in UTF-8 encoding different characters have different numbers of bytes.
eg.
echo strlen('здрасти');
This code return 14 instead of expected 7. In other way:
echo strlen('zdrasti');
return exactly 7.
Тhis is so because the first string ‘здрасти’ is in UTF-8 Bulgarian language, where each character is different bytes length, and the second string ‘zdrasti’ is with one byte for one character.
Тo enable the first code example to show 45 characters, it is necessary to use that rate to increase this number to receive the exact number of bytes to 45 displayed characters.
$kUtfString = strlen(QApplication::Translate('_NEWS_TEXT1_')) / mb_strlen(QApplication::Translate('_NEWS_TEXT1_'), 'UTF-8');
The function mb_strlen return the real number of chars in string.
Solving the above problem that happens when the first change function as follows:
<?php echo substr(QApplication::Translate('_NEWS_TEXT1_'), 0, 45*$kUtfString) . ' ...';?>
Unique visitors to post: 2