Skip to content
Advertisement

What will the time complexity of this python program in Big O notation?

I find it difficult to calculate the time complexity of this program as it involves a lot of built-in methods. Could anyone please help? Basically the question is to find topper of each subject and 3 overall best performers!

from sys import argv
df=pd.read_csv(sys.argv[1])
subjects=['Maths','Biology','Physics','English','Chemistry','Hindi']
total=[]
for column in subjects:
  a=df[column].max()  #finding the maximum value in each column
  b=df.loc[(df[column]==a),['Name']] #locating the corresponding row of the found maximum value
print("Topper in "+column+" is "+re.sub("[|]|'","",str(b.values.tolist())))


df['total']=df['Maths']+df['Biology']+df['Physics']+df['Chemistry']+df['Hindi']+df['English']
df_v1=df.sort_values(by=['total'],ascending=False)
print("Best students in this class are: ")
for i in range(3):
 print(str(i+1)+"."+df_v1.iloc[i]['Name'])

Input csv file looks something like this:

Name  Physics Chemistry Biology Maths Hindi English
Steve  99     1000      100     95    97    85
John    80     90        75     70    100   100

Output:

  Topper in maths is X
  Topper in physics is y
Overall best students are X,y,z

Advertisement

Answer

  1. Your for loop goes over all columns for each row => O(row * col) complexity.

  2. Calculation of totals does the same => O(row * col)

  3. The sort_values sorts all values in one column, and usually, sort functions are O(nLog(n)) in theory, so this gives us O(row * Log(row))

All in all, we have O(row * col) + O(row * col) + O(row * log(row) => O(row * col)

So the answer is O(row * col)

Edit

If col << row, you might actually get O(rowlog(row)). So if the number of columns is finite, it is actually O(rowlog(row))

Advertisement