Friday 15 March 2013

php - Using Tidy to clean HTML, HTML content is being changed, encoding problem? -


I'm taking HTML from a smart template and it needs to be cleaned (just want to remove extra white space, and Format / Indent HTML well), I'm using a systematically to do something like this:

  $ html = $ smarty-> Fetch ('foo.tmpl'); $ Clean = New Clean; $ Tidy-> parseString ($ html, array ('hide-comments' => TRUE, 'output-xhtml' => TRUE, 'indent' => TRUE, 'wrap' => 0)); $ Tidy-> cleanRepair (); Refund $ Clean;  

Although this work is fine for English, multilingual support seems to break it. For example, I have Arabic characters right in HTML $, but after getting streamlined I get some dirty encoding Is:

& ugrave; And daggers; And ugrave; & Bdquo; And Oslash; & Pound; And ugrave; & Dagger; & Oslash; & Ordf; And ugrave; & Hellip; & Oslash; & Ordf; & Oslash; & Pound; And ugrave; & Fnof; & Oslash; & Macr; And Oslash; & Pound; And ugrave; & Dagger; And ugrave; & Fnof; And Oslash; & Ordf; & Oslash; & Plusmn; And ugrave; & Scaron; & Oslash; & Macr;

Is there a setting clear, which will format HTML, but leave HTML alone? I looked at this post: But it seems that this will not work because I'm gripping my HTML smartly.

Any suggestion appreciated

parseString to set encoding in Second Try to use logic


No comments:

Post a Comment