I found a good code snippet while looking for a mb_strlen
fallback implementation.
/**
* Fallback implementation of mb_strlen, hardcoded to UTF-8.
* @param string $str
* @param string $enc optional encoding; ignored
* @return int
*/
function new_mb_strlen( $str, $enc="" ) {
$counts = count_chars( $str );
$total = 0;
// Count ASCII bytes
for( $i = 0; $i < 0x80; $i++ ) {
$total += $counts[$i];
}
// Count multibyte sequence heads
for( $i = 0xc0; $i < 0xff; $i++ ) {
$total += $counts[$i];
}
return $total;
}
I thought I can improve this function so tried some methods. A few minutes later, I got the following code.
/**
* Fallback implementation of mb_strlen, hardcoded to UTF-8.
* @param string $str
* @param string $enc optional encoding; ignored
* @return int
*/
function newer_mb_strlen( $str ) {
$counts = count_chars($str);
for($i = 0x80; $i < 0xc0; $i++) {
unset($counts[$i]);
}
return array_sum($counts);
}
This code removes unnecessary ranges and uses the built-in function array_sum to total up character counts rather than summing it one by one.
The result of the benchmark is as follows. I had to write the young.txt because text files were missing.
Testing young.txt: strlen 2582 chars 0.010ms mb_strlen 2462 chars 0.025ms old_mb_strlen 2462 chars 2.698ms new_mb_strlen 2462 chars 0.197ms newer_mb_strlen 2462 chars 0.117ms Testing young.txt: strlen 2582 chars 0.012ms mb_strlen 2462 chars 0.024ms old_mb_strlen 2462 chars 2.908ms new_mb_strlen 2462 chars 0.190ms newer_mb_strlen 2462 chars 0.096ms Testing young.txt: strlen 2582 chars 0.009ms mb_strlen 2462 chars 0.023ms old_mb_strlen 2462 chars 2.681ms new_mb_strlen 2462 chars 0.200ms newer_mb_strlen 2462 chars 0.116ms Testing young.txt: strlen 2582 chars 0.010ms mb_strlen 2462 chars 0.025ms old_mb_strlen 2462 chars 2.685ms new_mb_strlen 2462 chars 0.205ms newer_mb_strlen 2462 chars 0.106ms
[adsense]