Regarding featuretools, the rank results are wrong

Question

Using Featuretools, I want to convert the value of a certain feature to rank. This will be the exact question. If anyone can help me, please answer. First, the following code uses the rank function of pandas and displays the result. I believe this result is correct. However, when I create a custom primitive and run the following code, the

Accepted Answer

NEW ANSWER:Based on your updated code, the problem is arising because you are setting njobs=-1. When you do this, behind the scenes, Featuretools is distributing the calculation of the feature matrix to multiple workers. In doing so, Featuretools is breaking up the dataframe for calculating the transform feature values among the workers and sending pieces to each worker.This creates a problem with the Rank primitive you have defined as this primitive requires all of the data to be present to get a correct answer. For situations like this you need to set uses_full_entity=True when defining the primitive to force featuretools to include all of the data when the primitive function is called to compute the feature values.If you update the Rank primitive definition as follows, you will get the correct answer:class Rank(TransformPrimitive):    name = 'rank'    input_types = [Numeric]    return_type = Numeric    uses_full_entity = True    def get_function(self):        def rank(column):            return column.rank(method="dense",ascending=True)             return rankOLD ANSWER:In the custom primitive function you define, the parameters you are passing to rank are different than the parameters you are using when you call rank directly on the DataFrame.When calling directly on the DataFrame you are using the following parameters:.rank(method="min", ascending=False, numeric_only=True)In the custom primitive function you are using different values:.rank(method="dense", ascending=True) If you update the primitive function to use the same parameters, the results you get from Featuretools should match what you get when calling rank directly on the DataFrame.

Advertisement

Answer