Thursday, 15 September 2011

regex - Java regular expression to match _all_ whitespace characters -


I am looking for a regular expression in java that matches all the spaces in the string. "\ S" only matches a few, it's & amp; Nbsp; and similar non-Eski Whitespace I am looking for a regular expression that matches all (normal) white space characters that can be in Java string.

[edit]

To clarify: I do not mean string sequence " & nbsp; " I mean sincle unicode character U + 00A0 That is often represented by "" and ", in HTML, and all other Unicode characters with the same white-space meaving, e.g. In the form of "Narov no-space" (U + 202F), Unicode 3.2 and above, U + 2060 as word adder, "zero Vodate no-space" (U + FeFF) and any other character white. Space

[Answer]

The following expression works for my pupose, i.e. all the blank space characters, Unicode + conventional capture:

[\ P {Z} \ s]

The answer is in the comment below, but since it's a bit hidden I repeat it here.

& amp; Nbsp; Use only one to extract plain text in HTML is white space. And \ s should work properly.


No comments:

Post a Comment