Monday, 15 February 2010

linux - Regex replace on specific column with SED/AWK -


I have data that looks like this (Tab delimited):

  Organ cluster No Analysis Ln 200 C12 Gene Atolology Ln 200 C 116 Gene Onology CN-200C2 Gene Onatology  

What do I do 3 or <3> every line To exclude C column, , excluding the header row:

Organ clust is not analyzed LN 20012 Jane Antronology LN 2002 116 Jean Aunt It will not do this because it will affect other column and header row:
  sed 's / c / / ' 

What is the right way to do

Good tool for this:

  $ awk - F '\ t' -v OFS = '\ t' 'NR & gt; = 2 {Sub (/ ^ c /, "", $ 3)} 1 'file organ K-clust no analysis LN 200 200 gene ontology LN 2002 116 Gene otology CN K200 2 gene onetologia  

how it works

  • -F '\ t'

    on the input field as the delimiter Use tabs.

  • -v OFS = '\ t'

    Use as tab

  • NR & gt; = 2 {sub (/ ^ c /, "", $ 3)}

    Field delimiter on the output C remove line only 3.

  • 1

    It's a secret cloak of awk for print.

Use of sed

  $ sed -r '2, $ s / cna -200 gene onterology LN 2002 116 Gene Entrepreneur CN K -200 2 Jane Onitalology   
ul>
  • -r

    Use extended regular expressions. (On Mac OSX or other BSD platform, use -E instead.)

  • 2, $ s / (([ ^ \ T] + \ T) {2}) C / 1 /

    This replacement is applied only at the end of the file for line 2.

    (([^ \ T] + \ t) {2}) matches the first column separated by two tabs. It assumes that only one tab separates each column. Because regex is enclosed in parens, the one that matches it will later be available as \ 1

    C C .

    \ 1 replaces the matched text with the first two columns, not the C ..


  • No comments:

    Post a Comment