How to fillna in pandas dataframe based on pattern like in excel dragging?

Question

I have dataframe which should be filled by understanding rows understanding like we do in excel. If its continious integer it fill by next number itself. Is there any function in python like this? output required: I tried df.interpolate(method='krogh') #it fill 1,2,3,4,5,6 but incorrect others. Answer Here is my solution for the specific use case you mention - The code

Accepted Answer

Here is my solution for the specific use case you mention &#8211;The code for these helper functions for categorical_repeat, continous_interpolate and other is provided below in EXPLANATION > Approach section.config = {'year':categorical_repeat,    #shortest repeating sequence          'cat1':continous_interpolate, #curve fitting (linear)          'cat2':other}                 #forward fillprint(df.agg(config))     year  cat1 cat20  2019.0     1   c11  2020.0     2   c12  2019.0     3   c13  2020.0     4   c24  2019.0     5   c25  2020.0     6   c2EXPLANATION:As I understand, there is no direct way of handling all types of patterns in pandas as excel does. Excel involves linear interpolation for continuous sequences, but it involves other methods for other column patterns.Continous integer array -> linear interpolationRepeated cycles -> Smallest repeating sequenceAlphabet (and similar) -> Tiling fixed sequence until the length of dfUnrecognizable pattern -> Forward fillHere is the dummy dataset that I attempt my approach on &#8211;data = {'A': [2019, 2020, 2019, 2020, 2019, 2020],        'B': [1, 2, 3, 4, 5, 6],        'C': [6, 5, 4, 3, 2, 1],        'D': ['C', 'D', 'E', 'F', 'G', 'H'],        'E': ['A', 'B', 'C', 'A', 'B', 'C'],        'F': [1,2,3,3,4,2]       }df = pd.DataFrame(data)empty = pd.DataFrame(columns=df.columns, index=df.index)[:4]df_new = df.append(empty).reset_index(drop=True)print(df_new)      A    B    C    D    E    F0  2019    1    6    C    A    11  2020    2    5    D    B    22  2019    3    4    E    C    33  2020    4    3    F    A    34  2019    5    2    G    B    45  2020    6    1    H    C    26   NaN  NaN  NaN  NaN  NaN  NaN7   NaN  NaN  NaN  NaN  NaN  NaN8   NaN  NaN  NaN  NaN  NaN  NaN9   NaN  NaN  NaN  NaN  NaN  NaNApproach:Let&#8217;s start with some helper functions &#8211;import numpy as npimport scipy as spimport pandas as pd#Curve fitting (linear)def f(x, m, c):    return m*x+c     #Modify to extrapolate for exponential sequences etc.#Interpolate continous lineardef continous_interpolate(s):    clean = s.dropna()    popt, pcov = sp.optimize.curve_fit(f, clean.index, clean)    output = [round(i) for i in f(s.index, *popt)]  #Remove the round() for float values    return pd.Series(output)#Smallest Repeating sub-sequencedef pattern(inputv):    '''    https://stackoverflow.com/questions/6021274/finding-shortest-repeating-cycle-in-word    '''    pattern_end =0    for j in range(pattern_end+1,len(inputv)):        pattern_dex = j%(pattern_end+1)        if(inputv[pattern_dex] != inputv[j]):            pattern_end = j;            continue        if(j == len(inputv)-1):            return inputv[0:pattern_end+1];    return inputv;#Categorical repeat imputationdef categorical_repeat(s):    clean = s.dropna()    cycle = pattern(clean)        repetitions = (len(s)//len(cycle))+1    output = np.tile(cycle, repetitions)[:len(s)]    return pd.Series(output)#continous sequence of alphabetsdef alphabet(s):    alp = 'abcdefghijklmnopqrstuvwxyz'    alp2 = alp*((len(s)//len(alp))+1)        start = s[0]    idx = alp2.find(start.lower())    output = alp2[idx:idx+len(s)]    if start.isupper():        output = output.upper()        return pd.Series(list(output))#If no pattern then just ffilldef other(s):    return s.ffill()Next, lets create a configuration based on what we want to solve and apply the methods required &#8211;config = {'A':categorical_repeat,          'B':continous_interpolate,           'C':continous_interpolate,           'D':alphabet,          'E':categorical_repeat,           'F':other}output_df = df_new.agg(config)print(output_df)      A   B  C  D  E  F0  2019   1  6  C  A  11  2020   2  5  D  B  22  2019   3  4  E  C  33  2020   4  3  F  A  34  2019   5  2  G  B  45  2020   6  1  H  C  26  2019   7  0  I  A  27  2020   8 -1  J  B  28  2019   9 -2  K  C  29  2020  10 -3  L  A  2

Advertisement

Answer

EXPLANATION:

Approach: