A part of my site allows users to create comments in a text box to be stored in an SQL database. Because a lot of people copy/paste things in from word or other places, I have to keep <p>
and <br>
tags to keep formatting, and also <a>
tags to let users create their own links. Everything else gets stripped out. I was accomplishing this like so:
$text = strip_tags( $text, '<br><a><p>' );
But today a user came to me and told me they lost a large portion of their text because they made a arrow <-
for visual effect. So now I know strip tags removes everything after a <
.
I can accomplish a similar effect with preg_replace
like so:
preg_replace('/((?!<((\/)?p|br|a))<[^>]*>)/', "", $text);
But this still has the downside of only working if the tag spans one line (I think), leaving in html comments and probably a few other things that I'm not aware of. What are my options? Is there a catch all solution? A library I can use? I most work alone so I'm not really aware of industry standards.
Use html purifier. It help clean the summited html and removes the unwanted codes for example if a user adds a scripts tag that might cause harm to your website (XSS Attack) html purifier before submitting. It also adds or completes html for example a user inputs < strong > gamer ... with out closing the tag, it will close the tag and output cleaner html.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments