Skip to content
Advertisement

How to make a python program that calculates a result for each row of the input table?

I am trying to make a Python program that will calculate a result based on a formula, given factors and an input dataframe.

I have a number of cars (N_cars) on a given length of the road (l) and their average speed (v):

JavaScript

I also know the factors needed for the formula for each category of car, and I know the percentage of each category. I also have different options for each category (3 options that I have here are just an example, there are many more options).

JavaScript

For each option (opt_1, opt_2, opt_3), I have to calculate the result based on this formula (factors are taken from the factors table, but v is coming from the input table):

JavaScript

However, I have to take into account the percentage of each category of car. For each row of the input_df I have to perform the calculations three times, once for each of the three options. For example, for the index 0 of input_df, I have N_cars=1000, v=100 and l=3.5, the output should be something like this:

JavaScript

So, as an output, for each of the rows in input_df, I should have three results, one for each of the three options.

I can do the calculation manually for each step, but what I am having troubles with is to make a loop that does it automatically for each input row and all 3 options and then passes to the next input row and so on until the last input row.

Advertisement

Answer

Solution

Not sure what your expected results are, but I believe this does what you’re asking for:

JavaScript

Output:

JavaScript

Explanation

The first step is to group factors_df by option. Just to show what that looks like:

JavaScript

Note that I renamed the category % to pct. This isn’t necessary, but made accessing that column in the formula() function a bit cleaner (g.pct vs g["category %"]).

The next step was to implement formula() in such a way as to accept a group from factors_df as an argument:

JavaScript

In the function signature, g is a group from factors_df, then the keyword-only arguments l, N_cars, and v, which will come from a single row of input_df at a time.

Each of the three groups shown above will be entered into the formula() function one at a time, in their entirety. For example, during one call to formula(), the g argument will hold all of this data:

JavaScript

When the formula uses something like g.e, it’s accessing the entire e column, and is taking advantage of vectorization to perform the arithmetic calculations on the entire column at the same time. When the dust settles, x will be a Series where each item in the series will be the result of the formula for each of the four categories of car. Here’s an example:

JavaScript

Notice the indices? Those correspond to category A, B, C, and D from factors_df, respectively.

From there, we need to call formula() on each row of input_df, using the axis argument of pd.DataFrame.apply():

JavaScript

The lambda r is an anonymous function object being passed to apply, being applied over axis 1, meaning that r will be a single row from input_df at a time, for example:

JavaScript

Now, on each row-wise apply, we’re also applying the formula() function on the groups groupby object with lambda g: formula(g, **r). The **r unpacks the row from input_df as keyword arguments, which helps to ensure that the values for v, l, and N_cars aren’t misused in the formula (no need to worry about which order they’re passed into the formula() function).

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement