Skip to content
Advertisement

Make a data profiler class that takes as params on the init a list of data

I need this class to include the following methods only using self:

  1. get_summary_stats: should calculate the mean, min and max.
  2. min_max_scale: converts the array to 0-1 values.
  3. score_scale: converts the array to zscores.

I’m trying it this way without luck, when you run it it smoothly goes by but when I add the last line it gives me an error saying __init__() takes 1 positional argument but 2 were given

class summary_stats_class():
    
    def __init__(self):
        min_my_list  = min(self)
        max_my_list  = max(self)
        mean_my_list = sum(self)/len(self)
        std_my_list = np.std(self)
    def get_summary_stats(self):
        print(min_my_list,max_my_list,mean_my_list)
        
    def min_max_scale(self):
        print((i - min_my_list) / (max_my_list-mean_my_list) for i in self)
    
    def zscore_scale(self):
        print((i - mean_my_list) / std_my_list for i in self)
summary_stats_class([1,2,3,4,5,6,7,8,9,10])

I’ve tried 1)adding list to summary_stats_class(list) because I read it in other question, as well as 2)adding self.list = [] after __init__ piece and finally 3) adding , *args, **kwargs to each method without any luck, could someone please help? Thanks in advance!

Advertisement

Answer

The self is used to indicate the class level variables. The documentation on classes has some examples showing how to use the self to declare instance variables and use them in class methods.

I have updated the class to mitigate the error:

import numpy as np


class summary_stats_class():

    def __init__(self, data):
        self.data = data
        self.min_data = min(data)
        self.max_data = max(data)
        self.mean_data = sum(data) / len(data)
        self.std_data = np.std(data)

    def get_summary_stats(self):
        return self.min_data, self.max_data, self.mean_data

    def min_max_scale(self):
        return [(i - self.min_data) / (self.max_data - self.min_data) for i in
                self.data]

    def zscore_scale(self):
        return [(i - self.mean_data) / self.std_data for i in self.data]


if __name__ == "__main__":
    ssc = summary_stats_class([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    print(ssc.get_summary_stats())
    print(ssc.min_max_scale())
    print(ssc.zscore_scale())

Output:

(1, 10, 5.5)
[0.0, 0.1111111111111111, 0.2222222222222222, 0.3333333333333333, 0.4444444444444444, 0.5555555555555556, 0.6666666666666666, 0.7777777777777778, 0.8888888888888888, 1.0]
[-1.5666989036012806, -1.2185435916898848, -0.8703882797784892, -0.5222329678670935, -0.17407765595569785, 0.17407765595569785, 0.5222329678670935, 0.8703882797784892, 1.2185435916898848, 1.5666989036012806]

Explanation:

  • We passed the list as a parameter when we created the ssc object of the class summary_stats_class.
  • In the constructor of summary_stats_class, we set the parameter list to self.data variable.
  • We then calculated the min, max, and mean of the data and set them to self.min_data, self.max_data, and self.mean_data respectively. We set them to self so that these variables can be used from any other methods of the class.
  • In the get_summary_stats, min_max_scale, and zscore_scale method, we used the previously calculated variables using self.VARIABLE_NAME.

References:

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement