I have a text file that needs to be converted into CSV file using pandas. A piece of it is presented in the following:
time 00:15 min cod,10,1=0,2=2,3=2,4=1,5=6,6=4,7=2,8=7,9=1,10=9,11=7 cod,18,1=27,2=18,3=19,4=20,5=47,6=2,7=2,8=0,9=33,10=61,11=13,12=2,13=3,14=0,15=0
Rows are cod,10, and cod,18 and the columns are 1, 2, 3,…, 15. Any idea? Regards, Ali
Advertisement
Answer
I use pandas to deal with the conversion, but vanilla Python to deal with some of aspects of the data, I hope that is alright.
One issue we need to deal with is the fact that there are a different number of columns per row. So I just put NaN in columns that are missing for a row. For instance, row 1 is shorter than row 2, so the missing columns in row 1 are given values as “NaN”.
Here is my idea:
import pandas as pd lines = [] with open('/path/to/test.txt', 'r') as infile: for line in infile: if "," not in line: continue else: lines.append(line.strip().split(",")) row_names = [] column_data = {} max_length = max(*[len(line) for line in lines]) for line in lines: while(len(line) < max_length): line.append(f'{len(line)-1}=NaN') for line in lines: row_names.append(" ".join(line[:2])) for info in line[2:]: (k,v) = info.split("=") if k in column_data: column_data[k].append(v) else: column_data[k] = [v] df = pd.DataFrame(column_data) df.index = row_names print(df) df.to_csv('/path/to/test.csv')
Output (the printed DataFrame):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 cod 10 0 2 2 1 6 4 2 7 1 9 7 NaN NaN NaN NaN cod 18 27 18 19 20 47 2 2 0 33 61 13 2 3 0 0
CSV File Output:
,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 cod 10,0,2,2,1,6,4,2,7,1,9,7,NaN,NaN,NaN,NaN cod 18,27,18,19,20,47,2,2,0,33,61,13,2,3,0,0