I have a text file that needs to be converted into CSV file using pandas. A piece of it is presented in the following:
JavaScript
x
4
1
time 00:15 min
2
cod,10,1=0,2=2,3=2,4=1,5=6,6=4,7=2,8=7,9=1,10=9,11=7
3
cod,18,1=27,2=18,3=19,4=20,5=47,6=2,7=2,8=0,9=33,10=61,11=13,12=2,13=3,14=0,15=0
4
Rows are cod,10, and cod,18 and the columns are 1, 2, 3,…, 15. Any idea? Regards, Ali
Advertisement
Answer
I use pandas to deal with the conversion, but vanilla Python to deal with some of aspects of the data, I hope that is alright.
One issue we need to deal with is the fact that there are a different number of columns per row. So I just put NaN in columns that are missing for a row. For instance, row 1 is shorter than row 2, so the missing columns in row 1 are given values as “NaN”.
Here is my idea:
JavaScript
1
34
34
1
import pandas as pd
2
3
lines = []
4
with open('/path/to/test.txt', 'r') as infile:
5
for line in infile:
6
if "," not in line:
7
continue
8
else:
9
lines.append(line.strip().split(","))
10
11
row_names = []
12
column_data = {}
13
14
max_length = max(*[len(line) for line in lines])
15
16
for line in lines:
17
while(len(line) < max_length):
18
line.append(f'{len(line)-1}=NaN')
19
20
for line in lines:
21
row_names.append(" ".join(line[:2]))
22
for info in line[2:]:
23
(k,v) = info.split("=")
24
if k in column_data:
25
column_data[k].append(v)
26
else:
27
column_data[k] = [v]
28
29
df = pd.DataFrame(column_data)
30
df.index = row_names
31
print(df)
32
33
df.to_csv('/path/to/test.csv')
34
Output (the printed DataFrame):
JavaScript
1
4
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2
cod 10 0 2 2 1 6 4 2 7 1 9 7 NaN NaN NaN NaN
3
cod 18 27 18 19 20 47 2 2 0 33 61 13 2 3 0 0
4
CSV File Output:
JavaScript
1
4
1
,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
2
cod 10,0,2,2,1,6,4,2,7,1,9,7,NaN,NaN,NaN,NaN
3
cod 18,27,18,19,20,47,2,2,0,33,61,13,2,3,0,0
4