Skip to content
Advertisement

Index Pandas Dataframe mixing row number and column name

Coming from R and finding the index rules for pandas dataframes to be not easy to use. I have a dataframe where I want to get the ith row and some columns by their names. I can clearly understand using either iloc or loc as shown below.

df = pd.DataFrame(np.random.randn(8, 4),columns=['A', 'B', 'C', 'D'])
df.loc[:,['A', 'B']]
df.iloc[0:,0:2]

Conceptually what I want is something like:

df.loc[0:,['A', 'B']]

Meaning the first row with those columns. Of course that code fails. I can seemingly use:

df.loc[0:0,['A', 'B']]

But, this seems strange, though it works. How does one properly index using a combination of row number and column names? In R we would do something like:

df = data.frame(matrix(rnorm(32),8,4))
colnames(df) <- c("A", "B", "C", "D") 
df[1, c('A', 'B')]

*** UPDATE *** I was mistaken, the example code above indeed works on this toy dataframe. But, on my real data, I see the following? Both objects are of same type and code is the same, not understanding the error here.

type(poly_set)
<class 'pandas.core.frame.DataFrame'>
poly_set.loc[:,['P1', 'P2', 'P3']]
                      P1            P2           P3
29   -2.0897226679999998  -1.237649556         None
361  -2.0789117340000001   0.144751427  1.572417454
642  -2.0681314259999999  -0.196563749  1.500834574

poly_set.loc[0,['P1', 'P2', 'P3']]
Traceback (most recent call last):
  File "C:UsersAppDataLocalProgramsPythonPython38-32libsite-packagespandascoreindexesbase.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas_libsindex.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas_libsindex.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas_libshashtable_class_helper.pxi", line 998, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas_libshashtable_class_helper.pxi", line 1005, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0

Advertisement

Answer

You are using slicing which means between two given index. If you only want first row data just use:

Try:

df = df.reset_index()    
df.loc[0,['A', 'B']]
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement