I’m currently working on a Python project where I want to:
- Loop through subdirectories of a root directory
- Find .txt files with names starting with ‘memory_’. Txt files are: newline-separated, lines consist of: ‘colName: Value’ pairs. Like this.
Memory dump Serialnr: 1412b23990 Date/time: 24-11-2016 08:10 mode: version Hardware release: ic2kkit01*P131113* Software release: V3.82 Rom test 1 checksum: e0251fda Rom test 2 checksum: cae0351f mode: statistics Line power connected (hours): 360 Line power disconnected (number of times): 2 Ch function(hours): 54 Dhw function(hours): 4 Burnerstarts (number of times): 604 Ignition failed (number of times): 0 Flame lost (number of times): 0 Reset (number of times): 0 mode: status T1: 17.42 T2: 17.4 T3: 16.38 T4: -35.0 T5: -35.0 T6: 17.4 Temp_set: 0.0 Fanspeed_set: 0.0 Fanspeed: 0.0 Fan_pwm: 0.0 Opentherm: 0 Roomtherm: 0 Tap_switch: 0
- Appending the contents of the .txt file to a Pandas data frame with predefined column names. I.e.: I would like to write each .txt file into a data frame row, using the colName:Value pairs. See attached image for how (a part of) the data frame should look like.
Current .py code:
import os import pandas as pd # Set rootdir for os.walk rootdir = 'K:/Retouren' ## Create empty Pandas dataframe with just column names column_names = ["Memory dump", "Serialnr", "Date/time", "mode", "Hardware release", "Software release", "Rom test 1 checksum", "Rom test 2 checksum", "mode", "Line power connected (hours)", "Line power disconnected (number of times)", "Ch function(hours)", "Dhw function(hours)", "Burnerstarts (number of times)", "Ignition failed (number of times)", "Flame lost (number of times)", "Reset (number of times)", "Gasmeter_ch", "Gasmeter_dhw", "Watermeter", "Burnerstarts_dhw", "mode", "T1", "T2", "T3", "T4", "T5", "T6", "Temp_set", "Fanspeed_set", "Fanspeed", "Fan_pwm", "Opentherm", "Roomtherm", "Tap_switch", "Gp_switch", "Pump", "Dwk", "Gasvalve", "Io_signal", "Spark", "Io_curr", "Displ_code", "Ch_pressure", "Rf_rth_bound", "Rf_rth_communication", "Rf_rth_battery_info", "Rf_rth_battery_ok", "Bc_tapflow", "Pump_pwm", "Room_override_zone1", "Room_set_zone1", "Room_temp_zone1", "Room_override_zone2", "Room_set_zone2", "Room_temp_zone2", "Outside_temp", "Ot_master_member_id", "Ot_therm_prod_version", "Ot_therm_prod_type", "mode", "Node nr", "Cloud id0", "Cloud id1", "Cloud id2", "Rf cloud nr", "Rssi_lower", "Rssi_upper", "Rssi_wait", "Attention_period", "Attention_number", "Info10", "Info11", "Info12", "Info13", "Info14", "Info15", "Info16", "Info17", "Info18", "Ramses_thermostat_idh", "Ramses_thermostat_idm", "Ramses_thermostat_idl", "Ramses_boiler_idh", "Ramses_boiler_idm", "Ramses_boiler_idl", "Prod. token", "Year", "Month", "Line number", "Serial1", "Serial2", "Serial3", "mode", "Id_dongle0", "Id_dongle1", "Id_dongle2", "Id_dongle3", "Id_lan0", "Id_lan1", "Id_lan2", "Id_lan3", "Info2_7", "Info2_8", "Info2_9", "Info2_10", "Info2_11", "Info2_12", "Info2_13", "Info2_14", "mode", "Interrupt_time", "Interrupt_load (%)", "Main_load (%)", "Net fequency (hz)", "Voltage ref. (v)", "Checksum1", "Checksum2", "nmode", "Fault 0", "Fault 1", "Fault 2", "Fault 3", "Fault 4", "Fault 5", "Fault 6", "Fault 7", "Fault 8", "Fault 9", "Fault 10", "Fault 11", "Fault 12", "Fault 13", "Fault 14", "Fault 15", "Fault 16", "Fault 17", "Fault 18", "Fault 19", "Fault 20", "Fault 21", "Fault 22", "Fault 23", "Fault 24", "Fault 25", "Fault 26", "Fault 27", "Fault 28", "Fault 29", "Fault 30", "Fault 31", "mode", "Heater_on", "Comfort_mode", "Ch_set_max", "Dhw_set", "Eco_days", "Comfort_set", "Dhw_at_night", "Ch_at_night", "Parameter 1", "Parameter 2", "Parameter 3", "Parameter 4", "Parameter 5", "Parameter 6", "Parameter 7", "Parameter 8", "Parameter 9", "Parameter a", "Parameter b", "Parameter c", "Parameter c", "Parameter d", "Parameter e", "Parameter e.", "Parameter f", "Parameter h", "Parameter n", "Parameter o", "Parameter p", "Parameter r", "Parameter f.", "mode", "Param31", "Param32", "Param33", "Param34", "Param35", "Param36", "Param37", "Param38", "Param39", "Param40", "Param41", "Param42", "Param43", "Param44", "Param45", "Param46", "Param47", "Param48", "Param49", "Param50", "Param51", "Param52", "Param53", "Param54", "Param55", "Param56", "Param57", "Param58", "Param59", "Param60", "Param61", "Param62", "Param63"] data = pd.DataFrame(columns = column_names) for subdir, dirs, files in os.walk(rootdir): for file in files: if file.startswith('memory_') and os.path.splitext(file)[1] == '.txt': filepath = os.path.join(subdir, file) with open (filepath, "r") as curfile: data.append() ## Here is where I would like to append the .txt data as a row in the data frame
I have the first two steps down, but the third is exceeding my programming knowledge. Any tips would be greatly appreciated.
Example of the desired dataframe:
Advertisement
Answer
I suggest reading the file with readlines()
, which will return a list of lines. Then loop over the lines and process only those that contain :
in the string. Split by the colon (and trailing whitespace) while wrapping everything in dict()
will create a dictionary with the strings before the colon as keys and the strings after the colons as values:
dict(i.split(': ',1) for i in curfile.readlines() if ':' in i)
for your sample data this would make:
{'Serialnr': '1412b23990', 'Date/time': '24-11-2016 08:10', 'mode': 'status', 'Hardware release': 'ic2kkit01*P131113*', 'Software release': 'V3.82', 'Rom test 1 checksum': 'e0251fda', 'Rom test 2 checksum': 'cae0351f', 'Line power connected (hours)': '360', 'Line power disconnected (number of times)': '2', 'Ch function(hours)': '54', 'Dhw function(hours)': '4', 'Burnerstarts (number of times)': '604', 'Ignition failed (number of times)': '0', 'Flame lost (number of times)': '0', 'Reset (number of times)': '0', 'T1': '17.42', 'T2': '17.4', 'T3': '16.38', 'T4': '-35.0', 'T5': '-35.0', 'T6': '17.4', 'Temp_set': '0.0', 'Fanspeed_set': '0.0', 'Fanspeed': '0.0', 'Fan_pwm': '0.0', 'Opentherm': '0', 'Roomtherm': '0', 'Tap_switch': '0'}
If you create an empty list before the loop, and append the dictionaries to that list within the loop, you’ll end up with a list of dicts that you can load with pandas in one go:
import os import pandas as pd # Set rootdir for os.walk rootdir = 'K:/Retouren' ## Create empty list data = [] for subdir, dirs, files in os.walk(rootdir): for file in files: if file.startswith('memory_') and os.path.splitext(file)[1] == '.txt': filepath = os.path.join(subdir, file) with open (filepath, "r") as curfile: data.append(dict(i.split(': ',1) for i in curfile.readlines() if ':' in i)) df = pd.DataFrame(data)
An added advantage is that you don’t need to set the column names manually, because pandas will use the dict keys for that. DataFrame:
Serialnr | Date/time | mode | Hardware release | Software release | Rom test 1 checksum | Rom test 2 checksum | Line power connected (hours) | Line power disconnected (number of times) | Ch function(hours) | Dhw function(hours) | Burnerstarts (number of times) | Ignition failed (number of times) | Flame lost (number of times) | Reset (number of times) | T1 | T2 | T3 | T4 | T5 | T6 | Temp_set | Fanspeed_set | Fanspeed | Fan_pwm | Opentherm | Roomtherm | Tap_switch | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1412b23990 | 24-11-2016 08:10 | status | ic2kkit01P131113 | V3.82 | e0251fda | cae0351f | 360 | 2 | 54 | 4 | 604 | 0 | 0 | 0 | 17.42 | 17.4 | 16.38 | -35 | -35 | 17.4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
There is one disadvantage: as a dict can only contain unique keys you will lose two mode
values. I’ll leave it as they seem to be headers rather than containers of information. Otherwise it would require some additional renaming.