Tuesday, 15 March 2011

Removing punctuations from text using R -


I need to remove punctuation from text. I am using a TM package but hold:

Example: Text is something like this: -

  data & lt; - "I am a new visitor" "R", "Please help", "me: out", "here"  

Now when I

  Data & lt; -removePunctuation (data)  

In my code, the result is:

  I'm a new comerto rplease helpmeouthere  
< P> But what do I expect:

  I am a new visitor for R. Please help me here  

Here's how I raise my question, and one answer that is very close to the commentary @ David Arnberg.

  data & lt; - "," I: out "," here "'gsub' ('[[punct:]] +', '', data [1]" I am a new one " After adding spaces to the extra space string in the code / code> 

[: punct:] and + regular expression matches one or more sequential items.

In some cases, its side effect is desirable, in order to reduce any order spaces.

No comments:

Post a Comment