Friday, 15 February 2013

python - Reading in TSV with unescaped character in Pandas -


I have a TSV file where each line is a word token and its POS tag, separated by tabs.

  DET's Boy NON said that VERB "Point Hi It's Mam No" PINT  

used it as the basis of POS-tagger later Will go My problem is that whenever Pandas faces quotation marks, it returns back:

  The word tag 0 DET1 boy NOUN 2 said VERB 3 \ tPUNCT \ r \ NHi \ tINTJ \ r \ nMum \ TNOUN \ r \ nPUNCT  

I have tried to explicitly define quotation marks as escape characters, but this does not work . The other thing I can think of is to save them directly from TSV files, but since I have a lot of them, and generated by me for an external source, it is hard and time consuming.

> Pandas to cite while reading a file, in this case, Panda uses the same configuration option as the builtin csv module, so you It must pass QUOTE_NONE constant from the CSV module:

  Import CSV import Panda pandes .read_table (fn, quoting = csv.QUOTE_NONE, names = ('word', 'Tags'))  

No comments:

Post a Comment