I have a dataframe df_params
. It contains parameters for the stored procedure.
PurchaseOrderID OrderDate SupplierReference DF_Name 0 1 2013-01-01 B2084020 dataframe1 1 2 2013-01-01 293092 dataframe2 2 3 2013-01-01 08803922 dataframe3 3 4 2013-01-01 BC0280982 dataframe4 4 5 2013-01-01 ML0300202 dataframe5
I simply want to access the elements of the dataframe in a loop:
for i in range(len(df_params)): print(df_params[i][0])
But it gives me an error without really explanation:
Traceback (most recent call last): File "C:mypathsite-packagespandascoreindexesbase.py", line 2897, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 0 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "Test3.py", line 35, in <module> print(df_params[i][0]) File "C:UsersmypathPython37libsite-packagespandascoreframe.py", line 2995, in __getitem__ indexer = self.columns.get_loc(key) File "C:UsersmypathPython37libsite-packagespandascoreindexesbase.py", line 2899, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 0 PS Microsoft.PowerShell.CoreFileSystem::\mypath>
The goal is to supply value to the stored procedure:
for i in range(len(df_params)): query = "EXEC Purchasing.GetPurchaseOrder " + df_params[i][0] + "," + str(df_params[i][1]) + "," + df_params[i][2]) df = pd.read_sql(query, conn)
desired outcome from print(query):
EXEC Purchasing.GetPurchaseOrder 1, '2013-01-01', 'B2084020' EXEC Purchasing.GetPurchaseOrder 2, '2013-01-01', '293092' EXEC Purchasing.GetPurchaseOrder 3, '2013-01-01', '08803922' EXEC Purchasing.GetPurchaseOrder 4, '2013-01-01', 'BC0280982' EXEC Purchasing.GetPurchaseOrder 5, '2013-01-01', 'ML0300202'
Advertisement
Answer
pandas.DataFrame
s don’t behave exactly like numpy.ndarray
s. There are basically three options:
option 1: iterrows-method: You can iterate over rows of a pandas.dataframe by
for idx, row in df_params.iterrows(): print(row['PurchaseOrderID'])
This is a particularly readable way, so personally I prefer this
option 2: indexing:
if you want to index pandas.dataframe just like an numpy.ndarray object, go with the method .iat[]
for i in range(len(df_params)): print(df_params.iat[i, 0])
This actually indexes all elements and ignores the index of the dataframe! So assuming that you have a different index (in the extreme some strings or a table with a pandas.DataTimeIndex
) this still works… just as if you would have done a df_params.to_numpy()[i, 0]
.
Note: There exists a similar function that uses the column name: .at[]
There is a second way to index a pandas.DataFrame
object and it is just a little safer with regard to columns:.loc[]
It takes an index and column name(s)
for idx in df_params.index: print(df_params.iloc[idx, 'PurchaseOrderID'])
option 3: slicing a pandas.Series object:
Every column in a pandas.DataFrame
is a pandas.Series
object, which you can index similar (you actually index the series as described above) to a numpy.ndarray
:
col = df_params['PurchaseOrderID'] for idx in col.index: print(col[idx])
So what went wrong in your case?
The double indexing is almost the same as the last example but it calls calls .loc[]
under the hood and thus expects a column name and not a number (that would have been the method .iloc[]
. And it is expecting to see the column first and then the row.
So if you really want, you could go like this:
for i in range(len(df_params)): print(df_params.iloc[0][i])
but this only works because your pandas.DataFrame
has continuous numeric indices starting from 0! So please don’t do this and use the actual indices of your table (actually use one of the options above and not the last one ;) )