I am looking for a regular expression in java that matches all the spaces in the string. "\ S" only matches a few, it's & amp; Nbsp;
and similar non-Eski Whitespace I am looking for a regular expression that matches all (normal) white space characters that can be in Java string.
[edit]
To clarify: I do not mean string sequence " & nbsp;
" I mean sincle unicode character U + 00A0 That is often represented by "" and
", in HTML, and all other Unicode characters with the same white-space meaving, e.g. In the form of "Narov no-space" (U + 202F), Unicode 3.2 and above, U + 2060 as word adder, "zero Vodate no-space" (U + FeFF) and any other character white. Space
[Answer]
The following expression works for my pupose, i.e. all the blank space characters, Unicode + conventional capture:
[\ P {Z} \ s]
The answer is in the comment below, but since it's a bit hidden I repeat it here.
& amp; Nbsp;
Use only one to extract plain text in HTML is white space. And \ s
should work properly.
No comments:
Post a Comment