Give df,
df = pd.DataFrame({'col1':np.arange(6), 'col2':[*'abcdef']}) col1 col2 0 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f
Then when selecting a single column, using:
df['col1'] # returns a pd.Series 0 0 1 1 2 2 3 3 4 4 5 5 Name: col1, dtype: int32
Likewise when selecting a single row,
df.loc[0] # returns a pd.Series col1 0 col2 a Name: 0, dtype: object
How can we force a single column or single row selection to return pd.DataFrame?
Advertisement
Answer
Getting a single row or column as a pd.DataFrame or a pd.Series
There are times you need to pass a dataframe column or a dataframe row as a series and other times you’d like to view that row or column as a dataframe. I am going to show you a few tricks using square brackets, [], and double square brackets, [[]], along with reindex and squeeze.
df[['col1']] # Using double square brackets returns a pd.DataFrame col1 0 0 1 1 2 2 3 3 4 4 5 5 # Also using pd.DataFrame.reindex we can returns a single column dataframe df.reindex(['col1'], axis=1)
Now, lets go the other way from the output:
# Let's squeeze to get pd.Series from this dataframe df.reindex(['col1'], axis=1).squeeze() 0 0 1 1 2 2 3 3 4 4 5 5 Name: col1, dtype: int32
And, likewise with rows:
df.loc[[0]] # Using double square brackets returns a single row dataframe col1 col2 0 0 a # Also using reindex df.reindex([0])
Let’s squeeze to get pd.Series from this dataframe
df.reindex([0]).squeeze() col1 0 col2 a Name: 0, dtype: object
The advantages or using pd.DataFrame.reindex
over pd.DataFrame.loc
is handling columns or indexes that may or may not be present in your dataframe. Using .loc, you will get a KeyError if the column is not present. However, using reindex, you will not get an Error you results will be all NaN allowing the code to continue executing.
Using pd.DataFrame.squeeze
allows you to convert that single column dataframe to a pd.Series without typing in the column header.