php regex to remove HTML


Question

Before we start, strip_tags() doesn't work.

now,

I've got some data that needs to be parsed, the problem is, I need to get rid of all the HTML that has been formated very strangely. the tags look like this: (notice the spaces)

< p > blah blah blah < / p > < a href= " link.html " > blah blah blah < /a >

All the regexs I've been trying aren't working, and I don't know enough about regex formating to make them work. I don't care about preserving anything inside of the tags, and would prefer to get rid of the text inside a link if I could.

Anyone have any idea?

(I really need to just sit down and learn regular expressions one day)

1
11
4/17/2009 2:53:10 AM

Accepted Answer

Does

preg_replace('/<[^>]*>/', '', $content)

work?

29
4/17/2009 2:55:58 AM

strip_tags() will work if you use html_entity_decode() on a variable before strip_tags()

<?php
$text = '< p > blah blah blah < / p > < a href= " link.html " > blah blah blah< /a >';
echo strip_tags(html_entity_decode($text));
?>

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon