I find it difficult to calculate the time complexity of this program as it involves a lot of built-in methods. Could anyone please help? Basically the question is to find topper of each subject and 3 overall best performers!
from sys import argv
df=pd.read_csv(sys.argv[1])
subjects=['Maths','Biology','Physics','English','Chemistry','Hindi']
total=[]
for column in subjects:
a=df[column].max() #finding the maximum value in each column
b=df.loc[(df[column]==a),['Name']] #locating the corresponding row of the found maximum value
print("Topper in "+column+" is "+re.sub("[|]|'","",str(b.values.tolist())))
df['total']=df['Maths']+df['Biology']+df['Physics']+df['Chemistry']+df['Hindi']+df['English']
df_v1=df.sort_values(by=['total'],ascending=False)
print("Best students in this class are: ")
for i in range(3):
print(str(i+1)+"."+df_v1.iloc[i]['Name'])
Input csv file looks something like this:
Name Physics Chemistry Biology Maths Hindi English
Steve 99 1000 100 95 97 85
John 80 90 75 70 100 100
Output:
Topper in maths is X
Topper in physics is y
Overall best students are X,y,z
Advertisement
Answer
Your for loop goes over all columns for each row => O(row * col) complexity.
Calculation of totals does the same => O(row * col)
The
sort_values
sorts all values in one column, and usually, sort functions are O(nLog(n)) in theory, so this gives us O(row * Log(row))
All in all, we have O(row * col) + O(row * col) + O(row * log(row) => O(row * col)
So the answer is O(row * col)
Edit
If col << row, you might actually get O(rowlog(row)). So if the number of columns is finite, it is actually O(rowlog(row))