Skip to content
Advertisement

Remove duplicates from each cell

I have a file like this and need to remove duplicates in each cell without changing the order or format

Sl.no Name1 Name2  Dis  From  Type      item    Animal         Code
 2    qw     wsa   12    23   car,car   Case    CAT1,CAT1,Dog  p.12>a,p.12>a
23    as     swe   34    2,2  Bus,Bus   Case1,, Dog1,Dog1,,    N.12>a,N.12>a
23    ks     awe   35    .    Bike,Bike Case1,, rat4,rat4,,    5.16>b,5.16>b

The missing data are noted as . (dot).

So far I have tried with awk

 awk '{str="";c=0;split($0,arr,","); for (v in arr) c++; for (m=c;m >= 1;m--) for (n=1; n<m;n++) if (arr[m] == arr[n]) delete arr[m]; for (k=1;k<=c;k++) {if (k ==1 ) {s=arr[k] } else if (arr[k] != "") str=str" "arr[k] } print str}'

But it is killing the format. Is there any other way to do this ?

Expected output

Sl.no Name1 Name2  Dis  From  Type      item    Animal        Code
 2    qw     wsa   12    23   car       Case    CAT1,Dog    p.12>a
23    as     swe   34    2    Bus       Case1   Dog1        N.12>a
23    ks     awe   35    .    Bike      Case1   rat4        5.16>b

Advertisement

Answer

with sed

$ sed -E 's/t(.*),1/t1/g;s/,+t/t/g' file | column -ts$'t'

Sl.no  Name1  Name2  Dis  From  Type  item   Animal    Code
 2     qw     wsa    12   23    car   Case   CAT1,Dog  p.12>a
23     as     swe    34   2     Bus   Case1  Dog1      N.12>a
23     ks     awe    35   .     Bike  Case1  rat4      5.16>b
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement