Use list items as column seperators pd.read_fwf

Question

I have text files containing tables which I want to put into a dataframe. Per file the column headers are the same, but the width is different depending on the content (because they contain names of different lengths for example). So far I managed to get the index of the first character of the header, so I know where the

Accepted Answer

So what you can do is standardize the spacing with regex.import restring = "something    something  something  more"results = re.sub("(W+)", "|", string)resultsThat returns'something|something|something|more'If you have standardized the delimiters, you can load it with fwf or just read_csv.EDITIn order to derive the span of the header that is delimited with a exclamation mark ! you can use the re library too. The logic of the pattern is that the sequence has to start with ! and then is followed up by many non-!. The next group would inherently start with a !. In code it would look something like this:example_txt = """!Column1        !Column2     !Column3       !Column4     !Column5                  Company with a  $1,000,000   Yes            Jack, Hank   Xname            Company with.   $2,000       No             Rita, Hannah Xanother name"""first_line = example_txt.split("n")[0]import re indexes = []p = re.compile("![^!]*")for m in p.finditer(first_line):    indexes.append(m.span())print(indexes)Which returns[(0, 16), (16, 29), (29, 44), (44, 57), (57, 83)]This should bring you close to what you need for fwf method of pandas. Not that indexing in python starts at 0 and that if the end-index doesn&#8217;t count. So if you index from [0:16] then you would get the 0th to 15th element (not including the 16th element), returning 16 elements in total. The index can therefore be directly applied.

Advertisement

Answer

EDIT