Append all excel files in directory to an array ONLY if it contains a specific sheet

Tags: , , ,



Currently I’m reading all excel files and appending the “data” sheet to an array from a given directory. The issue is that some of the excel files in the directory shouldn’t actually be there and don’t contain a “data” sheet. The program then breaks whenever it encounters such a case. My workaround is printing the name of the file before it’s added and then manually deleting the file and rerunning the script. I tried creating a Sheet_check function as shown further below, although this significantly slowed down the run time and also didn’t work as intended.

Q: If there is a simple/efficient method to only add an excel file given that a specific sheet exists

read_files = glob.glob(os.path.join(file_path, "*.xlsx"))

file_array = []
for files in read_files:
    #Remove_password_xlsx(files, password)

    #if Sheet_check(files) == False:
        #read_files.remove(files)

    print(files)
    data = pd.read_excel(files, 'Data')
    file_array.append(data)

Doesn’t work:

def Sheet_check(filename):
    bAvailable = False
    book = load_workbook(filename)
    sheets = book.sheetnames
    for sheet in sheets:
        if sheet.upper() == "DATA":
            bAvailable = True
            break
    return bAvailable

Answer

Using exceptions should work:

from xlrd import XLRDError
read_files = glob.glob(os.path.join(file_path, "*.xlsx"))

file_array = []
for files in read_files:
    #Remove_password_xlsx(files, password)

    #if Sheet_check(files) == False:
        #read_files.remove(files)

    print(files)
    try:
        data = pd.read_excel(files, sheet_name='Data')
        file_array.append(data)
    except XLRDError:
        print('No "Data" sheet')


Source: stackoverflow