php regex to remove HTML


Before we start, strip_tags() doesn't work.


I've got some data that needs to be parsed, the problem is, I need to get rid of all the HTML that has been formated very strangely. the tags look like this: (notice the spaces)

< p > blah blah blah < / p > < a href= " link.html " > blah blah blah < /a >

All the regexs I've been trying aren't working, and I don't know enough about regex formating to make them work. I don't care about preserving anything inside of the tags, and would prefer to get rid of the text inside a link if I could.

Anyone have any idea?

(I really need to just sit down and learn regular expressions one day)

4/17/2009 2:53:10 AM

Accepted Answer


preg_replace('/<[^>]*>/', '', $content)


4/17/2009 2:55:58 AM

strip_tags() will work if you use html_entity_decode() on a variable before strip_tags()

$text = '< p > blah blah blah < / p > < a href= " link.html " > blah blah blah< /a >';
echo strip_tags(html_entity_decode($text));

