Escape HTML Specials

From CodeCodex

In HTML, “&” is special because it is used to start entity references. “<” is special because it starts tags. Unpaired “>” is not special, but is escaped just to be safe. The HTML 4 spec seems to indicate that either “'” or “"” may be used to quote attribute values, but in practice only “"” seems to be used. Is this right?

[edit] C++

a quick and dirty port from the js version:

string EscapeHTML(string & Str)
  /* returns Str with all characters with special HTML meanings converted to
    entity references. */
  {
    string Escaped="";
    for (int i = 0; i < Str.size(); ++i)
      {
        string ThisCh = Str.substr(i,1);
        if (ThisCh == "&")
            ThisCh = "&amp;";
        else if (ThisCh == "<")
            ThisCh = "&lt;";
        else if (ThisCh == "\"")
            ThisCh = "&quot;";
        else if (ThisCh == ">")
            ThisCh = "&gt;";
        Escaped += ThisCh;
      } /*for*/
    return Escaped;
  } /*EscapeHTML*/

[edit] JavaScript

Surprisingly, there is no built-in JavaScript function for doing this.

function EscapeHTML(Str)
  /* returns Str with all characters with special HTML meanings converted to
    entity references. */
  {
    var Escaped = ""
    for (var i = 0; i < Str.length; ++i)
      {
        var ThisCh = Str.charAt(i)
        if (ThisCh == "&")
          {
            ThisCh = "&amp;"
          }
        else if (ThisCh == "<")
          {
            ThisCh = "&lt;"
          }
        else if (ThisCh == "\"")
          {
            ThisCh = "&quot;"
          }
        else if (ThisCh == ">")
          {
            ThisCh = "&gt;"
          } /*if*/
        Escaped += ThisCh
      } /*for*/
    return Escaped
  } /*EscapeHTML*/

You can however also use Apache Commons library org.apache.commons.lang.StringEscapeUtils.

  String test = StringEscapeUtils.escapeHtml("\"bread\" & \"butter\"");
  System.out.println(test);

In the above example, the output will be:

&quot;bread&quot; &amp; &quot;butter&quot; 

[edit] Perl


use HTML::Entities qw(encode_entities);
encode_entities $s, q{<>&"};