How to remove html special chars?


Question

I am creating a RSS feed file for my application in which I want to remove HTML tags, which is done by strip_tags. But strip_tags is not removing HTML special code chars:

  & © 

etc.

Please tell me any function which I can use to remove these special code chars from my string.

1
54
7/21/2014 4:33:07 PM

Accepted Answer

Either decode them using html_entity_decode or remove them using preg_replace:

$Content = preg_replace("/&#?[a-z0-9]+;/i","",$Content); 

(From here)

EDIT: Alternative according to Jacco's comment

might be nice to replace the '+' with {2,8} or something. This will limit the chance of replacing entire sentences when an unencoded '&' is present.

$Content = preg_replace("/&#?[a-z0-9]{2,8};/i","",$Content); 
106
12/14/2009 8:43:59 PM

Use html_entity_decode to convert HTML entities.

You'll need to set charset to make it work correctly.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon