I find it difficult to calculate the time complexity of this program as it involves a lot of built-in methods. Could anyone please help? Basically the question is to find topper of each subject and 3 overall best performers!
from sys import argv df=pd.read_csv(sys.argv[1]) subjects=['Maths','Biology','Physics','English','Chemistry','Hindi'] total=[] for column in subjects: a=df[column].max() #finding the maximum value in each column b=df.loc[(df[column]==a),['Name']] #locating the corresponding row of the found maximum value print("Topper in "+column+" is "+re.sub("[|]|'","",str(b.values.tolist()))) df['total']=df['Maths']+df['Biology']+df['Physics']+df['Chemistry']+df['Hindi']+df['English'] df_v1=df.sort_values(by=['total'],ascending=False) print("Best students in this class are: ") for i in range(3): print(str(i+1)+"."+df_v1.iloc[i]['Name'])
Input csv file looks something like this:
Name Physics Chemistry Biology Maths Hindi English Steve 99 1000 100 95 97 85 John 80 90 75 70 100 100
Output:
Topper in maths is X Topper in physics is y Overall best students are X,y,z
Advertisement
Answer
Your for loop goes over all columns for each row => O(row * col) complexity.
Calculation of totals does the same => O(row * col)
The
sort_values
sorts all values in one column, and usually, sort functions are O(nLog(n)) in theory, so this gives us O(row * Log(row))
All in all, we have O(row * col) + O(row * col) + O(row * log(row) => O(row * col)
So the answer is O(row * col)
Edit
If col << row, you might actually get O(rowlog(row)). So if the number of columns is finite, it is actually O(rowlog(row))