Skip to content
Advertisement

Python project – Writing contents of .txt file to Pandas dataframe

I’m currently working on a Python project where I want to:

  1. Loop through subdirectories of a root directory
  2. Find .txt files with names starting with ‘memory_’. Txt files are: newline-separated, lines consist of: ‘colName: Value’ pairs. Like this.
Memory dump

Serialnr: 1412b23990

Date/time: 24-11-2016 08:10



mode: version

Hardware release: ic2kkit01*P131113*

Software release: V3.82

Rom test 1 checksum: e0251fda

Rom test 2 checksum: cae0351f



mode: statistics

Line power connected (hours): 360

Line power disconnected (number of times): 2

Ch function(hours): 54

Dhw function(hours): 4

Burnerstarts (number of times): 604

Ignition failed (number of times): 0

Flame lost (number of times): 0

Reset (number of times): 0



mode: status

T1: 17.42

T2: 17.4

T3: 16.38

T4: -35.0

T5: -35.0

T6: 17.4

Temp_set: 0.0

Fanspeed_set: 0.0

Fanspeed: 0.0

Fan_pwm: 0.0

Opentherm: 0

Roomtherm: 0

Tap_switch: 0

  1. Appending the contents of the .txt file to a Pandas data frame with predefined column names. I.e.: I would like to write each .txt file into a data frame row, using the colName:Value pairs. See attached image for how (a part of) the data frame should look like.

Current .py code:

import os
import pandas as pd

# Set rootdir for os.walk
rootdir = 'K:/Retouren' 

## Create empty Pandas dataframe with just column names
column_names = ["Memory dump", "Serialnr", "Date/time", "mode", "Hardware release", "Software release", "Rom test 1 checksum", "Rom test 2 checksum", 
    "mode", "Line power connected (hours)", "Line power disconnected (number of times)", "Ch function(hours)", "Dhw function(hours)", "Burnerstarts (number of times)", 
    "Ignition failed (number of times)", "Flame lost (number of times)", "Reset (number of times)", "Gasmeter_ch", "Gasmeter_dhw", "Watermeter", "Burnerstarts_dhw", 
    "mode", "T1", "T2", "T3", "T4", "T5", "T6", "Temp_set", "Fanspeed_set", "Fanspeed", "Fan_pwm", "Opentherm", "Roomtherm", "Tap_switch", "Gp_switch", "Pump", "Dwk", 
    "Gasvalve", "Io_signal", "Spark", "Io_curr", "Displ_code", "Ch_pressure", "Rf_rth_bound", "Rf_rth_communication", "Rf_rth_battery_info", "Rf_rth_battery_ok", 
    "Bc_tapflow", "Pump_pwm", "Room_override_zone1", "Room_set_zone1", "Room_temp_zone1", "Room_override_zone2", "Room_set_zone2", "Room_temp_zone2", "Outside_temp", 
    "Ot_master_member_id", "Ot_therm_prod_version", "Ot_therm_prod_type", "mode", "Node nr", "Cloud id0", "Cloud id1", "Cloud id2", "Rf cloud nr", "Rssi_lower", 
    "Rssi_upper", "Rssi_wait", "Attention_period", "Attention_number", "Info10", "Info11", "Info12", "Info13", "Info14", "Info15", "Info16", "Info17", "Info18", 
    "Ramses_thermostat_idh", "Ramses_thermostat_idm", "Ramses_thermostat_idl", "Ramses_boiler_idh", "Ramses_boiler_idm", "Ramses_boiler_idl", "Prod. token", 
    "Year", "Month", "Line number", "Serial1", "Serial2", "Serial3", "mode", "Id_dongle0", "Id_dongle1", "Id_dongle2", "Id_dongle3", "Id_lan0", "Id_lan1", 
    "Id_lan2", "Id_lan3", "Info2_7", "Info2_8", "Info2_9", "Info2_10", "Info2_11", "Info2_12", "Info2_13", "Info2_14", "mode", "Interrupt_time", 
    "Interrupt_load (%)", "Main_load (%)", "Net fequency (hz)", "Voltage ref. (v)", "Checksum1", "Checksum2", "nmode", "Fault 0", "Fault 1", "Fault 2", 
    "Fault 3", "Fault 4", "Fault 5", "Fault 6", "Fault 7", "Fault 8", "Fault 9", "Fault 10", "Fault 11", "Fault 12", "Fault 13", "Fault 14", "Fault 15", 
    "Fault 16", "Fault 17", "Fault 18", "Fault 19", "Fault 20", "Fault 21", "Fault 22", "Fault 23", "Fault 24", "Fault 25", "Fault 26", "Fault 27", "Fault 28", 
    "Fault 29", "Fault 30", "Fault 31", "mode", "Heater_on", "Comfort_mode", "Ch_set_max", "Dhw_set", "Eco_days", "Comfort_set", "Dhw_at_night", "Ch_at_night", 
    "Parameter 1", "Parameter 2", "Parameter 3", "Parameter 4", "Parameter 5", "Parameter 6", "Parameter 7", "Parameter 8", "Parameter 9", "Parameter a", 
    "Parameter b", "Parameter c", "Parameter c", "Parameter d", "Parameter e", "Parameter e.", "Parameter f", "Parameter h", "Parameter n", "Parameter o", 
    "Parameter p", "Parameter r", "Parameter f.", "mode", "Param31", "Param32", "Param33", "Param34", "Param35", "Param36", "Param37", "Param38", "Param39", 
    "Param40", "Param41", "Param42", "Param43", "Param44", "Param45", "Param46", "Param47", "Param48", "Param49", "Param50", "Param51", "Param52", "Param53", 
    "Param54", "Param55", "Param56", "Param57", "Param58", "Param59", "Param60", "Param61", "Param62", "Param63"]
data = pd.DataFrame(columns = column_names)

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        if file.startswith('memory_') and os.path.splitext(file)[1] == '.txt':
            filepath = os.path.join(subdir, file)
            with open (filepath, "r") as curfile:
                data.append()  
                ## Here is where I would like to append the .txt data as a row in the data frame
 

I have the first two steps down, but the third is exceeding my programming knowledge. Any tips would be greatly appreciated.

Example of the desired dataframe:

Example of the desired dataframe

Advertisement

Answer

I suggest reading the file with readlines(), which will return a list of lines. Then loop over the lines and process only those that contain : in the string. Split by the colon (and trailing whitespace) while wrapping everything in dict() will create a dictionary with the strings before the colon as keys and the strings after the colons as values:

dict(i.split(': ',1) for i in curfile.readlines() if ':' in i)

for your sample data this would make:

{'Serialnr': '1412b23990', 'Date/time': '24-11-2016 08:10', 'mode': 'status', 'Hardware release': 'ic2kkit01*P131113*', 'Software release': 'V3.82', 'Rom test 1 checksum': 'e0251fda', 'Rom test 2 checksum': 'cae0351f', 'Line power connected (hours)': '360', 'Line power disconnected (number of times)': '2', 'Ch function(hours)': '54', 'Dhw function(hours)': '4', 'Burnerstarts (number of times)': '604', 'Ignition failed (number of times)': '0', 'Flame lost (number of times)': '0', 'Reset (number of times)': '0', 'T1': '17.42', 'T2': '17.4', 'T3': '16.38', 'T4': '-35.0', 'T5': '-35.0', 'T6': '17.4', 'Temp_set': '0.0', 'Fanspeed_set': '0.0', 'Fanspeed': '0.0', 'Fan_pwm': '0.0', 'Opentherm': '0', 'Roomtherm': '0', 'Tap_switch': '0'}

If you create an empty list before the loop, and append the dictionaries to that list within the loop, you’ll end up with a list of dicts that you can load with pandas in one go:

import os
import pandas as pd

# Set rootdir for os.walk
rootdir = 'K:/Retouren' 

## Create empty list
data = []

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        if file.startswith('memory_') and os.path.splitext(file)[1] == '.txt':
            filepath = os.path.join(subdir, file)
            with open (filepath, "r") as curfile:
                data.append(dict(i.split(': ',1) for i in curfile.readlines() if ':' in i))  
            
df = pd.DataFrame(data)

An added advantage is that you don’t need to set the column names manually, because pandas will use the dict keys for that. DataFrame:

Serialnr Date/time mode Hardware release Software release Rom test 1 checksum Rom test 2 checksum Line power connected (hours) Line power disconnected (number of times) Ch function(hours) Dhw function(hours) Burnerstarts (number of times) Ignition failed (number of times) Flame lost (number of times) Reset (number of times) T1 T2 T3 T4 T5 T6 Temp_set Fanspeed_set Fanspeed Fan_pwm Opentherm Roomtherm Tap_switch
0 1412b23990 24-11-2016 08:10 status ic2kkit01P131113 V3.82 e0251fda cae0351f 360 2 54 4 604 0 0 0 17.42 17.4 16.38 -35 -35 17.4 0 0 0 0 0 0 0

There is one disadvantage: as a dict can only contain unique keys you will lose two mode values. I’ll leave it as they seem to be headers rather than containers of information. Otherwise it would require some additional renaming.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement