How can strip whitespaces in PHP's variable?


Question

I know this comment PHP.net. I would like to have a similar tool like tr for PHP such that I can run simply

tr -d " " ""

I run unsuccessfully the function php_strip_whitespace by

$tags_trimmed = php_strip_whitespace($tags);

I run the regex function also unsuccessfully

$tags_trimmed = preg_replace(" ", "", $tags);
1
73
11/4/2018 10:15:49 AM

Accepted Answer

A regular expression does not account for UTF-8 characters by default. The \s meta-character only accounts for the original latin set. Therefore, the following command only removes tabs, spaces, carriage returns and new lines

// http://stackoverflow.com/a/1279798/54964
$str=preg_replace('/\s+/', '', $str);

With UTF-8 becoming mainstream this expression will more frequently fail/halt when it reaches the new utf-8 characters, leaving white spaces behind that the \s cannot account for.

To deal with the new types of white spaces introduced in unicode/utf-8, a more extensive string is required to match and removed modern white space.

Because regular expressions by default do not recognize multi-byte characters, only a delimited meta string can be used to identify them, to prevent the byte segments from being alters in other utf-8 characters (\x80 in the quad set could replace all \x80 sub-bytes in smart quotes)

$cleanedstr = preg_replace(
    "/(\t|\n|\v|\f|\r| |\xC2\x85|\xc2\xa0|\xe1\xa0\x8e|\xe2\x80[\x80-\x8D]|\xe2\x80\xa8|\xe2\x80\xa9|\xe2\x80\xaF|\xe2\x81\x9f|\xe2\x81\xa0|\xe3\x80\x80|\xef\xbb\xbf)+/",
    "_",
    $str
);

This accounts for and removes tabs, newlines, vertical tabs, formfeeds, carriage returns, spaces, and additionally from here:

nextline, non-breaking spaces, mongolian vowel separator, [en quad, em quad, en space, em space, three-per-em space, four-per-em space, six-per-em space, figure space, punctuation space, thin space, hair space, zero width space, zero width non-joiner, zero width joiner], line separator, paragraph separator, narrow no-break space, medium mathematical space, word joiner, ideographical space, and the zero width non-breaking space.

Many of these wreak havoc in xml files when exported from automated tools or sites which foul up text searches, recognition, and can be pasted invisibly into PHP source code which causes the parser to jump to next command (paragraph and line separators) which causes lines of code to be skipped resulting in intermittent, unexplained errors that we have begun referring to as "textually transmitted diseases"

[Its not safe to copy and paste from the web anymore. Use a character scanner to protect your code. lol]

40
10/14/2016 9:43:44 PM

To strip any whitespace, you can use a regular expression

$str=preg_replace('/\s+/', '', $str);

See also this answer for something which can handle whitespace in UTF-8 strings.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon