I have a DataFrame from Pandas:
import pandas as pd data1 = {"a":[1.,3.,5.,2.]} df1 = pd.DataFrame(data1)
df1:
a 0 1.0 1 3.0 2 5.0 3 2.0
Now I want to iterate over the rows. For every row, divided by the first elements of the same column and then iterate elements.
Taking all the rows one by one and divided by the first element as standard in the denominator and all rows with second elements and so on.
For example: 1./1., 3./1., 5./1., 2./1. and take next element 1./3., 3./3., 5./3., 2./3. and next, 1./5.,3./5.,5./5.,2./5. and then last 1./2., 3./2., 5./2., 2./2.
and python code works well with for loop.
def div(x): d = [] for i in range(len(x)): for j in range(len(x)): di = x.iloc[j,:]/x.iloc[i,:] d.append(di) return d div(df1)
If I have huge dataset like 15000 rows and I want to implement using applymap()
or map()
result = df.applymap(div)
or
result = map(div,df1) print(list(result))
It gives an error for both. Maybe if anyone could help me to optimize my code would appreciate it.
Thanks in advance
Advertisement
Answer
You can use numpy
to increase the speed of the process:
>>> df1 a a_1 a_2 0 1.0 1 3.0 2 5.0 3 2.0
import numpy as np a = np.hstack(df1.values) m = np.repeat(a, len(a)).reshape((a.shape[0], -1)) df = pd.DataFrame(a / m, columns=df1.columns.repeat(len(df1)))
>>> df a a_1 a_2 a_2 a_2 a_2 0 1.000000 3.0 5.000000 2.000000 1 0.333333 1.0 1.666667 0.666667 2 0.200000 0.6 1.000000 0.400000 3 0.500000 1.5 2.500000 1.000000 >>> a array([1., 3., 5., 2.]) >>> m array([[1., 1., 1., 1.], [3., 3., 3., 3.], [5., 5., 5., 5.], [2., 2., 2., 2.]])
Used function and method: