Skip to content
Advertisement

Problems with DataFrame indexing with pandas

Using pandas, I have to modify a DataFrame so that it only has the indexes that are also present in a vector, which was acquired by performing operations in one of the df’s columns. Here’s the specific line of code used for that (please do not mind me picking the name ‘dataset’ instead of ‘dataframe’ or ‘df’):

dataset = dataset.iloc[list(set(dataset.index).intersection(set(vector.index)))]

it worked, and the image attached here shows the df and some of its indexes. However, when I try accessing a specific value by index in the new ‘dataset’, such as the line shown below, I get an error: single positional indexer is out-of-bounds

print(dataset.iloc[:, 21612])

note: I’ve also tried the following, to make sure it isn’t simply an issue with me not knowing how to use iloc:

print(dataset.iloc[21612, :])

and

print(dataset.iloc[21612])

Do I have to create another column to “mimic” the actual indexes? What am I doing wrong? Please mind that it’s necessary for me to make it so the indexes are not changed at all, despite the size of the DataFrame changing. E.g. if the DataFrame originally had 21000 rows and the new one only 15000, I still need to use the number 20999 as an index if it passed the intersection check shown in the first code snippet. Thanks in advance

Advertisement

Answer

Try this:

print(dataset.loc[21612, :])

After you have eliminated some of the original rows, the first (i.e., index) argument to iloc[] must not be greater than len(index) - 1.

Advertisement