Skip to content
Advertisement

Python – Parse text file with no delimiter and dynamic width values

My goal is to parse a text file in Python that has no headers therefore no columns names and no delimiters. A sample of the original file looks as follows:

JavaScript

I tried to import the file into an Excel file but since it has no delimiter nor fixed-width, each value of the row is wrapped within one cell (cell A).

Now, since the file is not fixed width or delimited, how can I extract the date from within each row (which is dynamic in width as you can see – row 1 length Apr 14, 2021 12:40:00 AM CEST INFO > row 2 length Apr 14, 2021 1:40:00 AM CEST INFO)? I have no interest in manipulating other columns’ values, except for this.

I’ve tried using panda library with both read_csv() and read_fwf() and:

  1. using read_csv() it returns a data frame with two columns: column[0] – the index and column[1] – the value (date and the rest) wrapped into one column cell;
  2. using read_fwf(): cannot quite use it since the width of the date is dynamic.

Is there any way of achieving this using Python? Thanks

Advertisement

Answer

You can read the file line by line and use str.split() to parse it:

JavaScript

Prints:

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement