How do I remove blank lines from text in PHP?


Question

I need to remove blank lines (with whitespace or absolutely blank) in PHP. I use this regular expression, but it does not work:

$str = ereg_replace('^[ \t]*$\r?\n', '', $str);
$str = preg_replace('^[ \t]*$\r?\n', '', $str);

I want a result of:

blahblah

blahblah

   adsa 


sad asdasd

will:

blahblah
blahblah
   adsa 
sad asdasd
1
32
7/4/2019 2:19:09 AM

Accepted Answer

// New line is required to split non-blank lines
preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);

The above regular expression says:

/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/
    1st Capturing group (^[\r\n]*|[\r\n]+)
        1st Alternative: ^[\r\n]*
        ^ assert position at start of the string
            [\r\n]* match a single character present in the list below
                Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
                \r matches a carriage return (ASCII 13)
                \n matches a fine-feed (newline) character (ASCII 10)
        2nd Alternative: [\r\n]+
            [\r\n]+ match a single character present in the list below
            Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            \r matches a carriage return (ASCII 13)
            \n matches a fine-feed (newline) character (ASCII 10)
    [\s\t]* match a single character present in the list below
        Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
        \s match any white space character [\r\n\t\f ]
        \tTab (ASCII 9)
    [\r\n]+ match a single character present in the list below
        Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
        \r matches a carriage return (ASCII 13)
        \n matches a fine-feed (newline) character (ASCII 10)
74
4/28/2014 12:03:09 PM

Your ereg-replace() solution is wrong because the ereg/eregi methods are deprecated. Your preg_replace() won't even compile, but if you add delimiters and set multiline mode, it will work fine:

$str = preg_replace('/^[ \t]*[\r\n]+/m', '', $str);

The m modifier allows ^ to match the beginning of a logical line rather than just the beginning of the whole string. The start-of-line anchor is necessary because without it the regex would match the newline at the end of every line, not just the blank ones. You don't need the end-of-line anchor ($) because you're actively matching the newline characters, but it doesn't hurt.

The accepted answer gets the job done, but it's more complicated than it needs to be. The regex has to match either the beginning of the string (^[\r\n]*, multiline mode not set) or at least one newline ([\r\n]+), followed by at least one newline ([\r\n]+). So, in the special case of a string that starts with one or more blank lines, they'll be replaced with one blank line. I'm pretty sure that's not the desired outcome.

But most of the time it replaces two or more consecutive newlines, along with any horizontal whitespace (spaces or tabs) that lies between them, with one linefeed. That's the intent, anyway. The author seems to expect \s to match just the space character (\x20), when in fact it matches any whitespace character. That's a very common mistake. The actual list varies from one regex flavor to the next, but at minimum you can expect \s to match whatever [ \t\f\r\n] matches.

Actually, in PHP you have a better option:

$str = preg_replace('/^\h*\v+/m', '', $str);

\h matches any horizontal whitespace character, and \v matches vertical whitespace.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon