I want to get only columns whose names start with 'Q1'
and those starting with 'Q3'
, I know that this is possible by doing:
JavaScript
x
2
1
new_df=df[['Q1_1', 'Q1_2', 'Q1_3','Q3_1', 'Q3_2', 'Q3_3']]
2
But since my real df
is too large (more than 70 variables) I search a way to get the new_df
by using only desired first letters in the columns titles.
My example dataframe is:
JavaScript
1
15
15
1
df=pd.DataFrame({
2
'Q1_1': [np.random.randint(1,100) for i in range(10)],
3
'Q1_2': np.random.random(10),
4
'Q1_3': np.random.randint(2, size=10),
5
'Q2_1': [np.random.randint(1,100) for i in range(10)],
6
'Q2_2': np.random.random(10),
7
'Q2_3': np.random.randint(2, size=10),
8
'Q3_1': [np.random.randint(1,100) for i in range(10)],
9
'Q3_2': np.random.random(10),
10
'Q3_3': np.random.randint(2, size=10),
11
'Q4_1': [np.random.randint(1,100) for i in range(10)],
12
'Q4_2': np.random.random(10),
13
'Q4_3': np.random.randint(2, size=10)
14
})
15
df
has the following display:
JavaScript
1
12
12
1
Q1_1 Q1_2 Q1_3 Q2_1 Q2_2 Q2_3 Q3_1 Q3_2 Q3_3 Q4_1 Q4_2 Q4_3
2
0 92 0.551722 1 36 0.063269 1 95 0.541573 1 91 0.521076 1
3
1 89 0.951076 1 82 0.853572 1 49 0.782290 1 98 0.232572 0
4
2 88 0.909953 1 19 0.544450 1 66 0.021061 1 51 0.951225 0
5
3 66 0.904642 1 17 0.727190 1 85 0.697792 0 35 0.412844 1
6
4 78 0.802783 1 23 0.634575 1 77 0.759861 0 55 0.460012 0
7
5 41 0.943271 1 63 0.460578 1 95 0.004986 1 89 0.970059 0
8
6 54 0.600558 0 18 0.031487 0 84 0.716314 0 84 0.636364 1
9
7 2 0.458006 0 95 0.029421 0 10 0.927356 1 27 0.031572 1
10
8 38 0.029658 1 30 0.125706 1 94 0.096702 1 32 0.241613 1
11
9 52 0.584300 1 85 0.026642 0 78 0.358952 0 70 0.696008 0
12
I want a simpler way to get the following sub-df:
JavaScript
1
12
12
1
Q1_1 Q1_2 Q1_3 Q3_1 Q3_2 Q3_3
2
0 92 0.551722 1 95 0.541573 1
3
1 89 0.951076 1 49 0.782290 1
4
2 88 0.909953 1 66 0.021061 1
5
3 66 0.904642 1 85 0.697792 0
6
4 78 0.802783 1 77 0.759861 0
7
5 41 0.943271 1 95 0.004986 1
8
6 54 0.600558 0 84 0.716314 0
9
7 2 0.458006 0 10 0.927356 1
10
8 38 0.029658 1 94 0.096702 1
11
9 52 0.584300 1 78 0.358952 0
12
Please if you need more detail let me know in comments,
Any help from your side will be highly appreciated.
Advertisement
Answer
You can use pd.DataFrame.filter
for this:
JavaScript
1
14
14
1
df.filter(regex = r'Q1_d|Q3_d')
2
3
Q1_1 Q1_2 Q1_3 Q3_1 Q3_2 Q3_3
4
0 5 0.631041 0 46 0.768563 0
5
1 32 0.594106 1 46 0.982396 1
6
2 78 0.703139 1 38 0.252107 0
7
3 98 0.353230 0 35 0.324079 0
8
4 77 0.913203 1 11 0.456287 0
9
5 62 0.565350 1 77 0.387365 0
10
6 38 0.975652 1 59 0.276421 1
11
7 97 0.505808 1 84 0.035756 0
12
8 15 0.525452 0 57 0.675310 1
13
9 94 0.545259 0 25 0.628030 0
14