Skip to content
Advertisement

Subset dataframe based on integer in column name

I have a dataframe that has names such as these for its columns:

column_names=[c_12_2_heart,
c_29_4_lung,
c_21_21_stomach,
c_2_25_bladder,
c_40_1_kidney]

In Python, how can I return a list of only the dataframe columns where the number after the first underscore is greater than 20?

Advertisement

Answer

We can use a list comprehension with basic string splitting logic:

column_names = ["c_12_2_heart", "c_29_4_lung", "c_21_21_stomach", "c_2_25_bladder", "c_40_1_kidney"]
output = [x for x in column_names if int(x.split("_")[1].split("_")[0]) > 20]
print(output)  # ['c_29_4_lung', 'c_21_21_stomach', 'c_40_1_kidney']
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement