Python project – Writing contents of .txt file to Pandas dataframe

Question

I'm currently working on a Python project where I want to: Loop through subdirectories of a root directory Find .txt files with names starting with 'memory_'. Txt files are: newline-separated, lines consist of: 'colName: Value' pairs. Like this. Appending the contents of the .txt file to a Pandas data frame with predefined column names. I.e.: I would like to write

Accepted Answer

I suggest reading the file with readlines(), which will return a list of lines. Then loop over the lines and process only those that contain : in the string. Split by the colon (and trailing whitespace) while wrapping everything in dict() will create a dictionary with the strings before the colon as keys and the strings after the colons as values:dict(i.split(': ',1) for i in curfile.readlines() if ':' in i)for your sample data this would make:{'Serialnr': '1412b23990', 'Date/time': '24-11-2016 08:10', 'mode': 'status', 'Hardware release': 'ic2kkit01*P131113*', 'Software release': 'V3.82', 'Rom test 1 checksum': 'e0251fda', 'Rom test 2 checksum': 'cae0351f', 'Line power connected (hours)': '360', 'Line power disconnected (number of times)': '2', 'Ch function(hours)': '54', 'Dhw function(hours)': '4', 'Burnerstarts (number of times)': '604', 'Ignition failed (number of times)': '0', 'Flame lost (number of times)': '0', 'Reset (number of times)': '0', 'T1': '17.42', 'T2': '17.4', 'T3': '16.38', 'T4': '-35.0', 'T5': '-35.0', 'T6': '17.4', 'Temp_set': '0.0', 'Fanspeed_set': '0.0', 'Fanspeed': '0.0', 'Fan_pwm': '0.0', 'Opentherm': '0', 'Roomtherm': '0', 'Tap_switch': '0'}If you create an empty list before the loop, and append the dictionaries to that list within the loop, you&#8217;ll end up with a list of dicts that you can load with pandas in one go:import osimport pandas as pd# Set rootdir for os.walkrootdir = 'K:/Retouren' ## Create empty listdata = []for subdir, dirs, files in os.walk(rootdir):    for file in files:        if file.startswith('memory_') and os.path.splitext(file)[1] == '.txt':            filepath = os.path.join(subdir, file)            with open (filepath, "r") as curfile:                data.append(dict(i.split(': ',1) for i in curfile.readlines() if ':' in i))              df = pd.DataFrame(data)An added advantage is that you don&#8217;t need to set the column names manually, because pandas will use the dict keys for that. DataFrame:SerialnrDate/timemodeHardware releaseSoftware releaseRom test 1 checksumRom test 2 checksumLine power connected (hours)Line power disconnected (number of times)Ch function(hours)Dhw function(hours)Burnerstarts (number of times)Ignition failed (number of times)Flame lost (number of times)Reset (number of times)T1T2T3T4T5T6Temp_setFanspeed_setFanspeedFan_pwmOpenthermRoomthermTap_switch01412b2399024-11-2016 08:10statusic2kkit01P131113V3.82e0251fdacae0351f360254460400017.4217.416.38-35-3517.40000000There is one disadvantage: as a dict can only contain unique keys you will lose two mode values. I&#8217;ll leave it as they seem to be headers rather than containers of information. Otherwise it would require some additional renaming.

Advertisement

Answer