Sunday, 15 August 2010

python - How to find out wether a word exists in english using nltk -


I am looking for the right solution to this question. Many times before this question has been asked and I have not found any answer which is favorable.

wordnet.synsets (word) I use a corpus in an NRTK to know whether a word is an English word

It is not a word for many common words, using the list of words in English and displaying in the file is not an option. Using magic is not an option either. If there is any other library that can do the same thing, please use the API. If not, then please provide a corpus in nltk which has all the words in English.

There is nothing more than that wordlist word carps from unix / usr / share / dict / Word file is used by some spell checkers . We can use it to find unusual or misspelled words in the text corpus, as shown in:

  def unusual_words (text): text_vocab = set (w. Lower () for text.split (w) w.isalpha ()) nl_tk.corpus.words.words () w for english_vocab = set (w.lower ()) abnormal = text_vocab - english_vocab return sorted ( Unusual)  

And in this case you can check the member ship of your word with english_vocab .

  & gt; & Gt; & Gt; Import nltk & gt; & Gt; & Gt; In English_vocab = set (w.lower () nltk.corpus.words.words (w)> for the w>> English_wokab true>> this' gt; & gt; & gt; & gt; Truth in the English_Webab; English: 'nothing' in Vocab & gt; & gt; & gt; & gt; 'nothingg' in English_vocab & gt; & gt; & gt; English_Work & # 39; Corpus in Truth & Gt; & gt; & gt; Terminology. (English) English_Webab is true & gt; & gt; 'Sorted' is true in English_Webab  

No comments:

Post a Comment