Skip to content
Advertisement

Unsure how to set up for loops and if statements to filter the data that I want:

So I have a dataframe that kinda looks like this:

0           PROG # :    T307710                 
1   REV. :  NC  PART #: 121563                  
2   OP.# :  0200-00-0       ALL TOOL STICKOUTS SET +-0.050 FROM INDICATED VALUE                 
3                                   
4 HOLDER DESCRIPTION  CUTTER #         OPERATION        TOOL DESCRIPTION
5 A.63.140.1/8z       Harvey 980215    GROOVE           CREM_.125_.015R_1
6 A.63.140.1/8z       Harvey 980215    ROUGHING         CREM_.125_.015R_2
7 A.63.140.1/8z       Harvey 28178     SEMI-FINISH      CREM_.0781_.015R_1
8 A.63.140.1/8z       Harvey 28178     FINISH           CREM_.0781_.015R_2   
9 A.63.140.1/8z       Harvey 74362-C3  DEBURR & BLEND   BEM_.0625
10                                  
11                                  
12                                  
13          Notes:  SEE SHEET 2 FOR CSYS INFORMATION                    

I want to search the dataframe and return exactly:

HOLDER DESCRIPTION  CUTTER #         OPERATION        TOOL DESCRIPTION
A.63.140.1/8z       Harvey 980215    GROOVE           CREM_.125_.015R_1
A.63.140.1/8z       Harvey 980215    ROUGHING         CREM_.125_.015R_2
A.63.140.1/8z       Harvey 28178     SEMI-FINISH      CREM_.0781_.015R_1
A.63.140.1/8z       Harvey 28178     FINISH           CREM_.0781_.015R_2   
A.63.140.1/8z       Harvey 74362-C3  DEBURR & BLEND   BEM_.0625

Cutting out the top and bottom. I was able to create a function that filtered out the “Titles” of each column of data. Now I am stuck trying to filter the data out cell by cell until I reach a “NaN” or blank in that column and then I want to move onto the next column. This is what I tried to get it to work:

for cycle in cycles:
        if na_finder.at[row_list[cycle], column_list[cycle]] == True: #if nan is notna() = True, then do some more
            for n in total_rows:
                if insensitive_compare(df.at[row_list[n], column_list[cycle]], search_string) == True:
                    if na_finder.at[n, column_list[cycle]] == True:
                        store =  df.at[n, column_list[cycle]]
                        storage_list.append(store)
                    else:
                        continue
                else:
                    continue
        else:
            continue

I can’t get it to work because the function “insensitive compare” compares 2 strings and I get the error: “AttributeError: ‘float’ object has no attribute ‘lower'”. Which makes sense to me because I know I can’t do it, but I can’t figure out the logic otherwise. How do I start at a specific point in the column that is known by its index and column name. Then store the data until a “NaN” or blank is reached? Then I would want to go to the next column in the DataFrame and do the same thing.

EDIT** This function spits out the data in the cell I want, along with the row and column it is in:

#function for case-insenitive string compare
def insensitive_compare(x1, y1):
    if x1.lower() == y1.lower():
        return True  

#function to iterate through current_file for strings and compare to 
#search_parameters to grab data column headers
def filter_data(columns, rows): #I need to fix this to stop getting that A
    for name in columns:
        for number in rows:
            cell = df.at[number, name]
            if cell == '':
                continue
            for place in length_search:
                #main compare, if str and matches search params, then do...
                if isinstance(cell, str) and insensitive_compare(search.at[place, 'Parameters'], cell) == True:
                    #this is to prevent repeats in the header list
                    if cell in header_list:
                        continue
                    else:
                        header_list.append(cell) #store data headers
                        row_list.append(number)  #store row number where it is in that data frame
                        column_list.append(name) #store column number where it is in that data frame
                else:
                    continue

Advertisement

Answer

Just an quick idea:

If you can assume that all valuable stuff starts with non-blank in the first character then you could get a new DF with valuable stuff:

dfe = df.loc[~df[0].str.startswith(' ')]

(Blank lines need to have space in the first position for the above to work. )

Then if you could assume that the columns are separated by more than one space and that there are only single spaces in the data then something like:

dfe[0].str.split('ss+', expand=True)

would split into columns with the first row containing the column names.

Don’t have time to work it all out but thought I’d get you the idea in case it might help.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement