Currently I’m reading all excel files and appending the “data” sheet to an array from a given directory. The issue is that some of the excel files in the directory shouldn’t actually be there and don’t contain a “data” sheet. The program then breaks whenever it encounters such a case. My workaround is printing the name of the file before it’s added and then manually deleting the file and rerunning the script. I tried creating a Sheet_check function as shown further below, although this significantly slowed down the run time and also didn’t work as intended.
Q: If there is a simple/efficient method to only add an excel file given that a specific sheet exists
read_files = glob.glob(os.path.join(file_path, "*.xlsx"))
file_array = []
for files in read_files:
#Remove_password_xlsx(files, password)
#if Sheet_check(files) == False:
#read_files.remove(files)
print(files)
data = pd.read_excel(files, 'Data')
file_array.append(data)
Doesn’t work:
def Sheet_check(filename):
bAvailable = False
book = load_workbook(filename)
sheets = book.sheetnames
for sheet in sheets:
if sheet.upper() == "DATA":
bAvailable = True
break
return bAvailable
Advertisement
Answer
Using exceptions should work:
from xlrd import XLRDError
read_files = glob.glob(os.path.join(file_path, "*.xlsx"))
file_array = []
for files in read_files:
#Remove_password_xlsx(files, password)
#if Sheet_check(files) == False:
#read_files.remove(files)
print(files)
try:
data = pd.read_excel(files, sheet_name='Data')
file_array.append(data)
except XLRDError:
print('No "Data" sheet')