Python – Parse text file with no delimiter and dynamic width values

Question

My goal is to parse a text file in Python that has no headers therefore no columns names and no delimiters. A sample of the original file looks as follows: I tried to import the file into an Excel file but since it has no delimiter nor fixed-width, each value of the row is wrapped within one cell (cell A).

Accepted Answer

You can read the file line by line and use str.split() to parse it:import dateutilimport pandas as pddata = []with open("your_file.txt", "r") as f_in:    for line in map(str.strip, f_in):        if not line:            continue        line = line.split(maxsplit=6)        date = " ".join(line[:6])        status = line[-1].split(maxsplit=1)[0]        rest = line[-1].split(maxsplit=1)[-1]        data.append({"date": date, "status": status, "rest": rest})tzmapping = {    "CET": dateutil.tz.gettz("Europe/Berlin"),    "CEST": dateutil.tz.gettz("Europe/Berlin"),}df = pd.DataFrame(data)df["date"] = df["date"].apply(dateutil.parser.parse, tzinfos=tzmapping)print(df)Prints:                       date status                                               rest0 2021-04-14 00:40:00+02:00   INFO  [purge.PurgeManager run] PURGE: Atom purge all...1 2021-04-14 01:40:00+02:00   INFO  [purge.PurgeManager run] PURGE: Atom purge all...2 2021-04-14 02:40:00+02:00   INFO  [purge.PurgeManager run] PURGE: Atom purge all...

Advertisement

Answer