How to make a python program that calculates a result for each row of the input table?

Question

I am trying to make a Python program that will calculate a result based on a formula, given factors and an input dataframe. I have a number of cars (N_cars) on a given length of the road (l) and their average speed (v): I also know the factors needed for the formula for each category of car, and I know

Accepted Answer

SolutionNot sure what your expected results are, but I believe this does what you&#8217;re asking for:def formula(g, *, l, N_cars, v):    x = (1 - g.h) * (g.a * v*v + g.b*v + g.c + g.d/v) / (g.e * v*v + g.f*v + g.g)    return N_cars * l * (x * g.pct / 100).sum()groups = factors_df.rename(columns={"category %": "pct"}).groupby("option")result = input_df.apply(lambda r: groups.apply(lambda g: formula(g, **r)), axis=1)Output:In [5]: input_df.join(result)Out[5]:      l  N_cars    v         opt_1         opt_2         opt_30   3.5    1000  100   5411.685077   5115.048256   5500.9859161   5.7     500  110   4425.339734   4169.893681   4483.5958032  10.0     367  110   5698.595376   5369.652565   5773.6128413  11.1    1800   95  30820.717985  29180.106606  31384.7854434   2.8     960  105   4165.270216   3930.726187   4226.8778935   4.7     800  120   5860.057879   5506.509637   5919.4966926  10.4     103  111   1663.960420   1567.455541   1685.3398487  20.1    1950  115  60976.735053  57375.300546  61685.075902ExplanationThe first step is to group factors_df by option. Just to show what that looks like:In [6]: groups.apply(print)  category  pct option         a        b  ...        d        e        f       g     h0        A   58  opt_1  0.000011  0.23521  ...  0.39458  0.00817  0.24566  0.0010  0.003        B   22  opt_1  0.002452  0.48327  ...  0.92852  0.00871  0.29568  0.0009  0.026        C   17  opt_1  0.082583  0.39493  ...  0.82714  0.00918  0.28572  0.0012  0.009        D    3  opt_1  0.018327  0.32342  ...  0.92752  0.00988  0.21958  0.0016  0.00[4 rows x 11 columns]   category  pct option         a        b  ...        d        e        f       g     h1         A   58  opt_2  0.000011  0.23521  ...  0.39458  0.00467  0.24566  0.0010  0.004         B   22  opt_2  0.002899  0.49327  ...  0.92852  0.00871  0.30468  0.0009  0.027         C   17  opt_2  0.072587  0.35493  ...  0.82723  0.00912  0.29572  0.0018  0.0010        D    3  opt_2  0.014427  0.32342  ...  0.92752  0.00968  0.22558  0.0026  0.00[4 rows x 11 columns]   category  pct option         a        b  ...        d        e        f       g     h2         A   58  opt_3  0.000011  0.23521  ...  0.39458  0.00467  0.24566  0.0010  0.005         B   22  opt_3  0.002452  0.48327  ...  0.92852  0.00771  0.29568  0.0119  0.018         C   17  opt_3  0.082583  0.39493  ...  0.82714  0.00962  0.28572  0.0012  0.0111        D    3  opt_3  0.018327  0.32342  ...  0.94452  0.00988  0.21258  0.0016  0.00Note that I renamed the category % to pct. This isn&#8217;t necessary, but made accessing that column in the formula() function a bit cleaner (g.pct vs g["category %"]).The next step was to implement formula() in such a way as to accept a group from factors_df as an argument:def formula(g, *, l, N_cars, v):    x = (1 - g.h) * (g.a * v*v + g.b*v + g.c + g.d/v) / (g.e * v*v + g.f*v + g.g)    return N_cars * l * (x * g.pct / 100).sum()In the function signature, g is a group from factors_df, then the keyword-only arguments l, N_cars, and v, which will come from a single row of input_df at a time.Each of the three groups shown above will be entered into the formula() function one at a time, in their entirety. For example, during one call to formula(), the g argument will hold all of this data:  category  pct option         a        b  ...        d        e        f       g     h0        A   58  opt_1  0.000011  0.23521  ...  0.39458  0.00817  0.24566  0.0010  0.003        B   22  opt_1  0.002452  0.48327  ...  0.92852  0.00871  0.29568  0.0009  0.026        C   17  opt_1  0.082583  0.39493  ...  0.82714  0.00918  0.28572  0.0012  0.009        D    3  opt_1  0.018327  0.32342  ...  0.92752  0.00988  0.21958  0.0016  0.00When the formula uses something like g.e, it&#8217;s accessing the entire e column, and is taking advantage of vectorization to perform the arithmetic calculations on the entire column at the same time. When the dust settles, x will be a Series where each item in the series will be the result of the formula for each of the four categories of car. Here&#8217;s an example:0    0.2312423    0.6190186    7.1889419    1.792376Notice the indices? Those correspond to category A, B, C, and D from factors_df, respectively.From there, we need to call formula() on each row of input_df, using the axis argument of pd.DataFrame.apply():input_df.apply(lambda r: groups.apply(lambda g: formula(g, **r)), axis=1)The lambda r is an anonymous function object being passed to apply, being applied over axis 1, meaning that r will be a single row from input_df at a time, for example:In [13]: input_df.apply(print, axis=1)l            3.5N_cars    1000.0v          100.0Name: 0, dtype: float64...Now, on each row-wise apply, we&#8217;re also applying the formula() function on the groups groupby object with lambda g: formula(g, **r). The **r unpacks the row from input_df as keyword arguments, which helps to ensure that the values for v, l, and N_cars aren&#8217;t misused in the formula (no need to worry about which order they&#8217;re passed into the formula() function).

Advertisement

Answer

Solution

Explanation