I have a three column of data as arranged below:
Input file:
>>>>> 1.0 2.0 3.0 2.0 2.0 4.0 3.0 4.5 8.0 >>>>> 1.0 2.5 6.8 2.0 3.5 6.8 3.0 1.2 1.9 >>>>> 1.0 1.2 1.3 2.0 2.7 1.8 3.0 4.5 8.5
In the above input file the first column values are repeated so I want to take only once that value and want to sum the third column values row wise and do not want to take any second column values.
I also want to append a third column with the fixed value 1.0
Finally want to save the result on another test file called output.txt.
Output:
1.0 11.1 1.0 2.0 12.6 1.0 3.0 18.4 1.0
In the output second column values resulted from is following:
3.0+6.8+1.3 4.0+6.8+1.8 8.0+1.9+8.5
I tried with numpy but getting error:
import numpy as np import pandas as pd import glob data=np.loadtxt("input.txt")
Advertisement
Answer
You need to read your input file using pandas.read_csv
, you need to set the delimiter to " "
, specify no header and ">"
as comment lines.
Then perform the groupby
/sum
operation, and export without header using pandas.to_csv
import pandas as pd # input df = pd.read_csv('filename.csv', delimiter=' ', header=None, comment='>') # output (df.groupby(0)[[2]].sum() .assign(col=1.0) .to_csv('output.txt', header=False, sep=' ', float_format='%.2f') )
output.txt:
1.00 11.10 1.00 2.00 12.60 1.00 3.00 18.40 1.00