Skip to content
Advertisement

How would I find the longest string per row in a data frame and print the row number if it exceeds a certain amount

I want to write a program which searches through a data frame and if any of the items in it are above 50 characters long, print the row number and ask if you want to continue through the data frame.

threshold = 50 

mask = (df.drop(columns=exclude, errors='ignore')
          .apply(lambda s: s.str.len().ge(threshold))
        )

out = df.loc[~mask.any(axis=1)]

I tried using this, but I don’t want to drop the rows, just print the row numbers where the strings exceed 50

Input:

0 "Robert","20221019161921","London"
1 "Edward","20221019161921","London"
2 "Johnny","20221019161921","London"
3 "Insane string which is way too longggggggggggg","20221019161921","London"

Output:

Row 3 is above the 50-character limit.

I would also like the program to print the specific value or string which is too long.

Advertisement

Answer

You can use:

exclude = []
threshold = 30

mask = (df.drop(columns=exclude, errors='ignore')
          .apply(lambda s: s.str.len().ge(threshold))
        )

s = mask.any(axis=1)

for idx in s[s].index:
    print(f'row {idx} is above the {threshold}-character limit.')
    s2 = mask.loc[idx]
    for string in df.loc[idx, s2.reindex(df.columns, fill_value=False)]:
        print(string)

Output:

row 3 is above the 30-character limit.
"Insane string which is way too longggggggggggg","20221019161921","London"

Intermediate s:

0    False
1    False
2    False
3     True
dtype: bool
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement