Reading software-specific text file data into pandas dataframe

Question

A software I use outputs the results as text txt files in the following way Output Text File. or like here for example: Now I want to analyse the results for each joint and dont know how to import the text file into pandas in a feasible way. Optimally I want something like this Wanted Format or a separate pandas

Accepted Answer

With a slightly modified Output Text File------------------------------------Joint 1------------------------------------Time    Parameter1  Parameter216.000  1174    7.4517.000  1174    7.8718.000  1174    7.64...pasted into a file named output.txt thisimport pandas as pdfrom io import StringIOdfs = []with open('ouput.txt', 'r') as file:    next(file)    for line in file:        joint = line.strip()        next(file)        df_str = ''        for line in file:            if line.startswith('--'):                break            df_str += line        df = pd.read_csv(StringIO(df_str), delim_whitespace=True)        df.columns = pd.MultiIndex.from_product([[joint], df.columns])        dfs.append(df)df = pd.concat(dfs, axis=1)produces the following output (print(df))  Joint 1                        ... Joint 3                           Time Parameter1 Parameter2  ...    Time Parameter1 Parameter20    16.0       1174       7.45  ...    16.0       1174       7.551    17.0       1174       7.87  ...    17.0       1174       7.672    18.0       1174       7.64  ...    18.0       1174       7.84[3 rows x 9 columns]at least for me. (This stuff is a bit fickle, the exact formatting of the file is rather important.) The list dfs contains one DataFrames for each &#8220;joint&#8221;, if you rather want to deal with them individually.If you are more interested in a vertical arrangement with Time as part of the index then the replacement...        df = pd.read_csv(StringIO(df_str), delim_whitespace=True, index_col=0)        df.index = pd.MultiIndex.from_product([[joint], df.index])...df = pd.concat(dfs, axis=0)produces              Parameter1  Parameter2        Time                        Joint 1 16.0        1174        7.45        17.0        1174        7.87        18.0        1174        7.64Joint 2 16.0        1174        7.45        17.0        1154        7.87        18.0        1124        7.64Joint 3 16.0        1174        7.55        17.0        1174        7.67        18.0        1174        7.84The difference in the file is that Parameter 2 is replaced with Parameter2. So the question is: Is whitespace in the header names possible? If so, things get more complicated.Taking up your idea in the comment this should also work:# Extracting the column names (could also be done manually)with open('ouput.txt') as fin:    for _ in range(3):        next(fin)    columns = ['Joint'] + next(fin).split()# Transforming file into a csv-file (ignoring the header lines)with open('ouput.txt', 'r') as fin, open('ouput-tidy.csv', 'w') as fout:    next(fin)    for line in fin:        joint = line.strip()        next(fin)        next(fin)        for line in fin:            if line.startswith('--'):                break            fout.write(','.join([joint] + line.split()) + 'n')# Reading csv-file in DataFramedf = pd.read_csv('ouput-tidy.csv', header=None)df.columns = columnsdf.set_index(['Joint', 'Time'], drop=True, inplace=True)

Advertisement

Answer