Reading software-specific text file data into pandas dataframe

Tags: , , , ,



A software I use outputs the results as text txt files in the following way Output Text File.

or like here for example:

------------------------------------
Joint 1
------------------------------------
Time    Parameter1  Parameter 2
16.000  1174    7.45
17.000  1174    7.87
18.000  1174    7.64
------------------------------------
Joint 2
------------------------------------
Time    Parameter1  Parameter 2
16.000  1174    7.45
17.000  1154    7.87
18.000  1124    7.64
------------------------------------
Joint 3
------------------------------------
Time    Parameter1  Parameter 2
16.000  1174    7.55
17.000  1174    7.67
18.000  1174    7.84

Now I want to analyse the results for each joint and dont know how to import the text file into pandas in a feasible way. Optimally I want something like this Wanted Format or a separate pandas dataframe or numpy array / list for each joint. Like here for example:

Joint 1                 Joint 2                 Joint 3
Time    Parameter1  Parameter 2 Time    Parameter1  Parameter 2 Time    Parameter1  Parameter 2
16.000  1174    7.45            16.000  1174    7.45            16.000  1174    7.55
17.000  1174    7.87            17.000  1154    7.87            17.000  1174    7.67
18.000  1174    7.64            18.000  1124    7.64            18.000  1174    7.84

Would be great if someone can help with how to deal with the hyphen line breaks of the output text file.

Answer

With a slightly modified Output Text File

------------------------------------
Joint 1
------------------------------------
Time    Parameter1  Parameter2
16.000  1174    7.45
17.000  1174    7.87
18.000  1174    7.64
...

pasted into a file named output.txt this

import pandas as pd
from io import StringIO

dfs = []
with open('ouput.txt', 'r') as file:
    next(file)
    for line in file:
        joint = line.strip()
        next(file)
        df_str = ''
        for line in file:
            if line.startswith('--'):
                break
            df_str += line
        df = pd.read_csv(StringIO(df_str), delim_whitespace=True)
        df.columns = pd.MultiIndex.from_product([[joint], df.columns])
        dfs.append(df)

df = pd.concat(dfs, axis=1)

produces the following output (print(df))

  Joint 1                        ... Joint 3                      
     Time Parameter1 Parameter2  ...    Time Parameter1 Parameter2
0    16.0       1174       7.45  ...    16.0       1174       7.55
1    17.0       1174       7.87  ...    17.0       1174       7.67
2    18.0       1174       7.64  ...    18.0       1174       7.84

[3 rows x 9 columns]

at least for me. (This stuff is a bit fickle, the exact formatting of the file is rather important.) The list dfs contains one DataFrames for each “joint”, if you rather want to deal with them individually.

If you are more interested in a vertical arrangement with Time as part of the index then the replacement

...
        df = pd.read_csv(StringIO(df_str), delim_whitespace=True, index_col=0)
        df.index = pd.MultiIndex.from_product([[joint], df.index])
...
df = pd.concat(dfs, axis=0)

produces

              Parameter1  Parameter2
        Time                        
Joint 1 16.0        1174        7.45
        17.0        1174        7.87
        18.0        1174        7.64
Joint 2 16.0        1174        7.45
        17.0        1154        7.87
        18.0        1124        7.64
Joint 3 16.0        1174        7.55
        17.0        1174        7.67
        18.0        1174        7.84

The difference in the file is that Parameter 2 is replaced with Parameter2. So the question is: Is whitespace in the header names possible? If so, things get more complicated.

Taking up your idea in the comment this should also work:

# Extracting the column names (could also be done manually)
with open('ouput.txt') as fin:
    for _ in range(3):
        next(fin)
    columns = ['Joint'] + next(fin).split()

# Transforming file into a csv-file (ignoring the header lines)
with open('ouput.txt', 'r') as fin, open('ouput-tidy.csv', 'w') as fout:
    next(fin)
    for line in fin:
        joint = line.strip()
        next(fin)
        next(fin)
        for line in fin:
            if line.startswith('--'):
                break
            fout.write(','.join([joint] + line.split()) + 'n')

# Reading csv-file in DataFrame
df = pd.read_csv('ouput-tidy.csv', header=None)
df.columns = columns
df.set_index(['Joint', 'Time'], drop=True, inplace=True)


Source: stackoverflow