Say I have the following Excel file:
A B C 0 - - - 1 Start - - 2 3 2 4 3 7 8 4 4 11 2 17
I want to read the file in a dataframe making sure that I start to read it below the row where the Start
value is.
Attention: the Start
value is not always located in the same row, so if I were to use:
import pandas as pd xls = pd.ExcelFile('C:UsersMyFolderMyFile.xlsx') df = xls.parse('Sheet1', skiprows=4, index_col=None)
this would fail as skiprows
needs to be fixed. Is there any workaround to make sure that xls.parse
finds the string value instead of the row number?
Advertisement
Answer
df = pd.read_excel('your/path/filename')
This answer helps in finding the location of ‘start’ in the df
for row in range(df.shape[0]): for col in range(df.shape[1]): if df.iat[row,col] == 'start': row_start = row break
after having row_start you can use subframe of pandas
df_required = df.loc[row_start:]
And if you don’t need the row containing ‘start’, just u increment row_start by 1
df_required = df.loc[row_start+1:]