I’ve been trying to concatenate two sheets while preserving the original indices of both dataframes. However, upon concatenation I can’t seem to get the result to output the way I expect or want.
If I use ignore_index = True
The old indices are replaced by an index that encompasses both sheets total rows.
If I use ignore_index = False
The indices are preserved, but there is a new empty column preceding it shifting the rest of columns over by one.
How can I concatenate my sheets without this excess column?
import pandas as pd import easygui path = easygui.fileopenbox("Select a file") xls = pd.ExcelFile(path) potential_names = [sheet for sheet in xls.sheet_names if sheet.startswith('Numbers_')] df = pd.concat(pd.read_excel(xls, sheet_name = potential_names), ignore_index = True) command_list = ["AA", "dd", "DD"] warning_list = ["Dd", "dD"] ingenium_list = ["CC", "BB"] col_name_type = 'TYPE' col_name_other = 'OTHER' col_name_source = 'SOURCE' df_filtered_command = df[df[col_name_type].isin(command_list)] df_filtered_warnings = df[df[col_name_type].isin(warning_list)] df_filtered_other = df[df[col_name_other].isin(ingenium_list)] with pd.ExcelWriter(('(Processed) ' + os.path.basename(path))) as writer: #df_filtered_command.to_excel(writer, sheet_name="Command", index=True) df_final_command.to_excel(writer, sheet_name=("Command"), index=True) df_filtered_warnings.to_excel(writer, sheet_name="Warnings", index=True) df_filtered_other.to_excel(writer, sheet_name="Issues", index=True)
I suspect how the concat function is working, but I’ve not been able to figure out how to fix it.
Any help or direction would be amazing.
Edit: adding an example of my df after running.
I was mistaken before, it seems the first column is empty aside from the sheet names, but I’m still not able to find a way to prevent pandas from making that first column if I don’t remake the index.
Advertisement
Answer
Since you passed in a dictionary (not a list) of data frames from using a list in sheet_names
of pandas.read_excel
, pandas.concat
will preserve the dict keys, as specified for the first argument:
objs: a sequence or mapping of Series or DataFrame objects
If a mapping is passed, the sorted keys will be used as the keys argument, unless it is passed, in which case the values will be selected (see below).
Consequently, the sheet names (i.e., dict keys) and the index values of each data frame migrate as a MultIindex. Consider using the names
argument to name the indexes:
names: list, default None
Names for the levels in the resulting hierarchical index.
sheets_df = pd.concat( pd.read_excel(xls, sheet_name = potential_names), names = ["sheet_name", "index"] )
If you want to convert the dual index as new columns, simple run reset_index
afterwards:
sheets_df = ( pd.concat( pd.read_excel(xls, sheet_name = potential_names), names = ["sheet_name", "index"] ).reset_index() )