I am looking for a regular expression in java that matches all the spaces in the string. "\ S" only matches a few, it's  & amp; Nbsp;  and similar non-Eski Whitespace I am looking for a regular expression that matches all (normal) white space characters that can be in Java string. 
[edit]
 To clarify: I do not mean string sequence " & nbsp; " I mean sincle unicode character U + 00A0 That is often represented by "" and  ", in HTML, and all other Unicode characters with the same white-space meaving, e.g. In the form of "Narov no-space" (U + 202F), Unicode 3.2 and above, U + 2060 as word adder, "zero Vodate no-space" (U + FeFF) and any other character white. Space 
[Answer]
The following expression works for my pupose, i.e. all the blank space characters, Unicode + conventional capture:
  [\ P {Z} \ s]  
The answer is in the comment below, but since it's a bit hidden I repeat it here.
  & amp; Nbsp;  Use only one to extract plain text in HTML is white space. And  \ s  should work properly. 
No comments:
Post a Comment