Currently I’m reading all excel files and appending the “data” sheet to an array from a given directory. The issue is that some of the excel files in the directory shouldn’t actually be there and don’t contain a “data” sheet. The program then breaks whenever it encounters such a case. My workaround is printing the name of the file before it’s added and then manually deleting the file and rerunning the script. I tried creating a Sheet_check function as shown further below, although this significantly slowed down the run time and also didn’t work as intended.
Q: If there is a simple/efficient method to only add an excel file given that a specific sheet exists
read_files = glob.glob(os.path.join(file_path, "*.xlsx")) file_array = [] for files in read_files: #Remove_password_xlsx(files, password) #if Sheet_check(files) == False: #read_files.remove(files) print(files) data = pd.read_excel(files, 'Data') file_array.append(data)
Doesn’t work:
def Sheet_check(filename): bAvailable = False book = load_workbook(filename) sheets = book.sheetnames for sheet in sheets: if sheet.upper() == "DATA": bAvailable = True break return bAvailable
Advertisement
Answer
Using exceptions should work:
from xlrd import XLRDError read_files = glob.glob(os.path.join(file_path, "*.xlsx")) file_array = [] for files in read_files: #Remove_password_xlsx(files, password) #if Sheet_check(files) == False: #read_files.remove(files) print(files) try: data = pd.read_excel(files, sheet_name='Data') file_array.append(data) except XLRDError: print('No "Data" sheet')