I have a file like this and need to remove duplicates in each cell without changing the order or format
Sl.no Name1 Name2 Dis From Type item Animal Code 2 qw wsa 12 23 car,car Case CAT1,CAT1,Dog p.12>a,p.12>a 23 as swe 34 2,2 Bus,Bus Case1,, Dog1,Dog1,, N.12>a,N.12>a 23 ks awe 35 . Bike,Bike Case1,, rat4,rat4,, 5.16>b,5.16>b
The missing data are noted as . (dot).
So far I have tried with awk
awk '{str="";c=0;split($0,arr,","); for (v in arr) c++; for (m=c;m >= 1;m--) for (n=1; n<m;n++) if (arr[m] == arr[n]) delete arr[m]; for (k=1;k<=c;k++) {if (k ==1 ) {s=arr[k] } else if (arr[k] != "") str=str" "arr[k] } print str}'
But it is killing the format. Is there any other way to do this ?
Expected output
Sl.no Name1 Name2 Dis From Type item Animal Code 2 qw wsa 12 23 car Case CAT1,Dog p.12>a 23 as swe 34 2 Bus Case1 Dog1 N.12>a 23 ks awe 35 . Bike Case1 rat4 5.16>b
Advertisement
Answer
with sed
$ sed -E 's/t(.*),1/t1/g;s/,+t/t/g' file | column -ts$'t' Sl.no Name1 Name2 Dis From Type item Animal Code 2 qw wsa 12 23 car Case CAT1,Dog p.12>a 23 as swe 34 2 Bus Case1 Dog1 N.12>a 23 ks awe 35 . Bike Case1 rat4 5.16>b