So I have a dataframe that kinda looks like this:
0 PROG # : T307710 1 REV. : NC PART #: 121563 2 OP.# : 0200-00-0 ALL TOOL STICKOUTS SET +-0.050 FROM INDICATED VALUE 3 4 HOLDER DESCRIPTION CUTTER # OPERATION TOOL DESCRIPTION 5 A.63.140.1/8z Harvey 980215 GROOVE CREM_.125_.015R_1 6 A.63.140.1/8z Harvey 980215 ROUGHING CREM_.125_.015R_2 7 A.63.140.1/8z Harvey 28178 SEMI-FINISH CREM_.0781_.015R_1 8 A.63.140.1/8z Harvey 28178 FINISH CREM_.0781_.015R_2 9 A.63.140.1/8z Harvey 74362-C3 DEBURR & BLEND BEM_.0625 10 11 12 13 Notes: SEE SHEET 2 FOR CSYS INFORMATION
I want to search the dataframe and return exactly:
HOLDER DESCRIPTION CUTTER # OPERATION TOOL DESCRIPTION A.63.140.1/8z Harvey 980215 GROOVE CREM_.125_.015R_1 A.63.140.1/8z Harvey 980215 ROUGHING CREM_.125_.015R_2 A.63.140.1/8z Harvey 28178 SEMI-FINISH CREM_.0781_.015R_1 A.63.140.1/8z Harvey 28178 FINISH CREM_.0781_.015R_2 A.63.140.1/8z Harvey 74362-C3 DEBURR & BLEND BEM_.0625
Cutting out the top and bottom. I was able to create a function that filtered out the “Titles” of each column of data. Now I am stuck trying to filter the data out cell by cell until I reach a “NaN” or blank in that column and then I want to move onto the next column. This is what I tried to get it to work:
for cycle in cycles: if na_finder.at[row_list[cycle], column_list[cycle]] == True: #if nan is notna() = True, then do some more for n in total_rows: if insensitive_compare(df.at[row_list[n], column_list[cycle]], search_string) == True: if na_finder.at[n, column_list[cycle]] == True: store = df.at[n, column_list[cycle]] storage_list.append(store) else: continue else: continue else: continue
I can’t get it to work because the function “insensitive compare” compares 2 strings and I get the error: “AttributeError: ‘float’ object has no attribute ‘lower'”. Which makes sense to me because I know I can’t do it, but I can’t figure out the logic otherwise. How do I start at a specific point in the column that is known by its index and column name. Then store the data until a “NaN” or blank is reached? Then I would want to go to the next column in the DataFrame and do the same thing.
EDIT** This function spits out the data in the cell I want, along with the row and column it is in:
#function for case-insenitive string compare def insensitive_compare(x1, y1): if x1.lower() == y1.lower(): return True #function to iterate through current_file for strings and compare to #search_parameters to grab data column headers def filter_data(columns, rows): #I need to fix this to stop getting that A for name in columns: for number in rows: cell = df.at[number, name] if cell == '': continue for place in length_search: #main compare, if str and matches search params, then do... if isinstance(cell, str) and insensitive_compare(search.at[place, 'Parameters'], cell) == True: #this is to prevent repeats in the header list if cell in header_list: continue else: header_list.append(cell) #store data headers row_list.append(number) #store row number where it is in that data frame column_list.append(name) #store column number where it is in that data frame else: continue
Advertisement
Answer
Just an quick idea:
If you can assume that all valuable stuff starts with non-blank in the first character then you could get a new DF with valuable stuff:
dfe = df.loc[~df[0].str.startswith(' ')]
(Blank lines need to have space in the first position for the above to work. )
Then if you could assume that the columns are separated by more than one space and that there are only single spaces in the data then something like:
dfe[0].str.split('ss+', expand=True)
would split into columns with the first row containing the column names.
Don’t have time to work it all out but thought I’d get you the idea in case it might help.