I’m trying to create a routine in Python to collect every diagonal group of values in df
. Here’s a reproducible example of what I’m trying to achieve:
data = {'column1':[1,1, 2, 3,6, 4,5,6], 'column2':[np.nan,4,3,5,6,2,3,4], 'column3':[np.nan,np.nan,3,2,5,np.nan,8,4], 'column4':[np.nan,np.nan,np.nan,3,6,np.nan,np.nan, 6], 'column5':[np.nan, np.nan, np.nan, np.nan, 8, np.nan, np.nan,np.nan]} df = pd.DataFrame(data, columns = ['column1', 'column2', 'column3', 'column4', 'column5']) my_list = [] # dict_list = {'list' + str(i):[] for i in list(range(len(df)))} for i in range(len(df)): for j in range(len(df.columns)): if (i + j) < df.iloc[6,2]: my_list.append(df.iloc[i + j, j]) else: break
This code returns me one single list:
my_list = [1,4.0,3.0,3.0,8.0,1,3.0,2.0,6.0,nan,2,5.0,5.0,nan,nan,3,6.0,nan,nan,nan,6,2.0,8.0,6.0,4,3.0,40,5,4.0,6]
And based on the structure of the given df
, what I’m trying to achieve is:
dict_list = [[1,4,3,3,8],[1,3,2,6],[2,5,5],[3,6],[6,2,8,6],[4,3,4],[5,4],[6]]
From what I’ve seen I could do this by creating a list of lists (commented in the code as dict_list
, here’s the reference: Python : creating multiple lists), but I haven’t been able to put my data as shown in dict_list
object.
I will appreciate any help or guide.
Thank you!
Advertisement
Answer
Using the numpy.diag()
will help you
This is the code I used:
import pandas as pd import numpy as np data = {'column1':[1,1, 2, 3,6, 4,5,6], 'column2':[np.nan,4,3,5,6,2,3,4], 'column3':[np.nan,np.nan,3,2,5,np.nan,8,4], 'column4':[np.nan,np.nan,np.nan,3,6,np.nan,np.nan, 6], 'column5':[np.nan, np.nan, np.nan, np.nan, 8, np.nan, np.nan,np.nan]} df = pd.DataFrame(data, columns = ['column1', 'column2', 'column3', 'column4', 'column5']) nump=df.to_numpy() my_list = [] for i in range(len(nump)): my_list.append(np.diag(nump,k=-(i)))
OUTPUT:
[array([1., 4., 3., 3., 8.]), array([ 1., 3., 2., 6., nan]), array([ 2., 5., 5., nan, nan]), array([ 3., 6., nan, nan, nan]), array([6., 2., 8., 6.]), array([4., 3., 4.]), array([5., 4.]), array([6.])]
To clean nan
values:
cleanedList=[] for i in range(len(my_list)): l=[x for x in my_list[i] if str(x) != 'nan'] print(l) cleanedList.append(l)
OUTPUT:
[[1.0, 4.0, 3.0, 3.0, 8.0], [1.0, 3.0, 2.0, 6.0], [2.0, 5.0, 5.0], [3.0, 6.0], [6.0, 2.0, 8.0, 6.0], [4.0, 3.0, 4.0], [5.0, 4.0], [6.0]]
For more information about how to use numpy.diag()
visit the documentation numpy.diag