Little faster mb_strlen implementation

I found a good code snippet while looking for a mb_strlen fallback implementation.

/**
 * Fallback implementation of mb_strlen, hardcoded to UTF-8.
 * @param string $str
 * @param string $enc optional encoding; ignored
 * @return int
 */
function new_mb_strlen( $str, $enc="" ) {
	$counts = count_chars( $str );
	$total = 0;

	// Count ASCII bytes
	for( $i = 0; $i < 0x80; $i++ ) {
		$total += $counts[$i];
	}

	// Count multibyte sequence heads
	for( $i = 0xc0; $i < 0xff; $i++ ) {
		$total += $counts[$i];
	}
	return $total;
}

I thought I can improve this function so tried some methods. A few minutes later, I got the following code.

/**
 * Fallback implementation of mb_strlen, hardcoded to UTF-8.
 * @param string $str
 * @param string $enc optional encoding; ignored
 * @return int
 */
function newer_mb_strlen( $str ) {
	$counts = count_chars($str);
	for($i = 0x80; $i < 0xc0; $i++) {
		unset($counts[$i]);
	}
	return array_sum($counts);
}

This code removes unnecessary ranges and uses the built-in function array_sum to total up character counts rather than summing it one by one.
The result of the benchmark is as follows. I had to write the young.txt because text files were missing.

Testing young.txt:
              strlen       2582 chars    0.010ms
           mb_strlen       2462 chars    0.025ms
       old_mb_strlen       2462 chars    2.698ms
       new_mb_strlen       2462 chars    0.197ms
     newer_mb_strlen       2462 chars    0.117ms

Testing young.txt:
              strlen       2582 chars    0.012ms
           mb_strlen       2462 chars    0.024ms
       old_mb_strlen       2462 chars    2.908ms
       new_mb_strlen       2462 chars    0.190ms
     newer_mb_strlen       2462 chars    0.096ms

Testing young.txt:
              strlen       2582 chars    0.009ms
           mb_strlen       2462 chars    0.023ms
       old_mb_strlen       2462 chars    2.681ms
       new_mb_strlen       2462 chars    0.200ms
     newer_mb_strlen       2462 chars    0.116ms

Testing young.txt:
              strlen       2582 chars    0.010ms
           mb_strlen       2462 chars    0.025ms
       old_mb_strlen       2462 chars    2.685ms
       new_mb_strlen       2462 chars    0.205ms
     newer_mb_strlen       2462 chars    0.106ms

Leave a Reply