Sunday, 15 August 2010

utf 8 - PHP remove special characters to make sure a string is utf-8 encoded -


I was lost to remove special characters from the string to make sure that only the URT-8 + French characters are included. Below are special characters in base 64 string and my sanitizing function failed to remove them and it is not printing text while using FPDF cells etc. If you decode the string, you will see special characters.

> sanitizing function static function remove_none_word_chars ($ string) {return preg_replace ('/ [^ a-zA-Z0-9` _;. @ #% ~ \' \ ' "+ * \? \ ^ \ [\] \ $ \ (\) \ {\} \ = \ & Lt; \ & gt; \ | \ -: \ s \ / \\ sàâçéèêëîïôûùüÿñæœ] / ui ',' ' , $ string);} 74KnIFN1cGVydmlzZXIgbGUgdHJhdmFpbCBkZSBs4oCZZW5zZW1ibGUgZHUgcGVyc29ubmVsIGRlIHByb2R1Y3Rpb24sIGRlIGzigJllbnRyZXRpZW4gZXQgZGUgbGEgbWFpbnRlbmFuY2Ugc3VyIGxlIHF1YXJ0IGRlIG51aXQgZW4gdGVuYW50IGNvbXB0ZSBkZSBsYSBjb252ZW50aW9uIGNvbGxlY3RpdmU7Cu + CpyBBc3N1cmVyIHVuZSBib25uZSBnZXN0aW9uIGRlIGzigJllbnNlbWJsZSBkZXMgb3DDqXJhdGlvbnMgZGUgbOKAmXVzaW5lOwrvgqcgUGxhbmlmaWVyIGRlcyBvcMOpcmF0aW9ucyBlbiBmb25jdGlvbiBkZXMgYm9ucyBkZSBjb21tYW5kZTsK74KnIEFwcG9ydGVyIGxlcyBtb2RpZmljYXRpb25zIGV4aWfDqWVzIGxvcnMgZGVzIGRpZmbDqXJlbnRzIGF1ZGl0cyAoR2 VuZXJhbCBEeW5hbWljcywgSVNPOTAwMSwgT0hTQVMxODAwMSwgZXRjLik7Cu + CpyBSZW5kcmUgY29tcHRlIGR1IHN1aXZpIGRlcyBvc MOpcmF0aW9ucyDDoCBjaGFxdWUgZGlyZWN0ZXVyIGRlIGTDqXBhcnRlbWVudCBsb3JzIGR1IGNoYW5nZW1lbnQgZGUgcXVhcnQ7Cu + CpyBWb2lyIGF1IHN1aXZpIGRlcyBidWRnZXRzIGV0IGVuIGFzc3VyZXIgbGUgcmVzcGVjdC4 =

Update all like to thank you answer the above function actually works in a conditional statement I had forgotten to change somewhere else :( embarrassing thats.

To delete a non-print letter, you can use a regular expression.

  $ data = preg_replace ('/ [^ \ x0A \ x20- \ x7E \ xC0- \ xD6 \ xD8- \ XF6 \ xF8- \ xFF] /', '', $ data ); // To preserve extended characters, use the expression below // Many of these can still be non-printable. $ Data = preg_replace ('/ (?! \ N) [[: cntrl:]] + /', '', $ data); This is by default to remove non-print characters from the assigned string for  error_log . 

What removes all of the characters that are not in the list provided, or (in the second example) are the control letters. Lists:

  \ x0A = [newline] \ x20- \ x7E = [location]! . "# $% Amp; '() * +, - / 0 1 2 3 4 5 6 7 8 9: & lt; = & gt; @ Symbol of Consent [\] ^ _` Symbol of Consent | ~ \ XC0? - \ xD6 = one a single æ ç ee ee ei i think i d i o o o o o o xx8- \ xf6 = u u u u u u Ý Þ ß a a a æ Ç job é e e i feel i think i dn o o o o x x x8 = xff = o oo you u ü ý Þ y  

encoding in UTF-8 As for, this should really be a lot of problem, but available functions are available, such as, which can be helpful I believe that you have to call on the string to remove non-printing characters in advance. However, please note that if the string is not the difference, it justifies the format, or is already using the UTF -8, this string can make unreadable.


No comments:

Post a Comment