I am trying to create a kind of nested list from a pandas data frame.
I have this data frame:
id1 Name1 ids1 Name2 ids2 ID col1 Goal col2 col3 0 ab-85643 aasd1 234,34,11223,345,345_2 vaasd1 2234,354,223,35,3435 G-0001 1 NaN 3 1 1 ab-85644 aasd2 2343,355,121,34 G-0002 2 56.0000 4 22 2 ab-8564312 aabsd1 24 , 23 ,244 ,2421 ,567 ,789 G-00023 3 NaN 32 33 3 ab-8564314 aabsd2 87 ,35 ,67_1 averabsd 387 ,355 ,667_1 G-01034 4 89.0000 43 44 #Here is the above data frame and you can convert it again to pandas using the below command df.to_dict() dic = {'id1 ': {0: ab-85643, 1: ab-85644, 2: ab-8564312, 3: ab-8564314}, 'Name1': {0: 'aasd1 ', 1: 'aasd2 ', 2: 'aabsd1', 3: 'aabsd2'}, 'ids1 ': {0: '234,34,11223,345,345_2 ', 1: '2343,355,121,34 ', 2: '24 , 23 ,244 ,2421 ,567 ,789', 3: '87 ,35 ,67_1 '}, 'Name2': {0: 'vaasd1 ', 1: ' ', 2: ' ', 3: 'averabsd'}, 'ids2': {0: '2234,354,223,35,3435', 1: ' ', 2: ' ', 3: ' 387 ,355 ,667_1 '}, 'ID': {0: 'G-0001 ', 1: 'G-0002 ', 2: 'G-00023', 3: 'G-01034'}, 'col1': {0: 1, 1: 2, 2: 3, 3: 4}, 'Goal ': {0: ' NaN ', 1: 56, 2: ' NaN ', 3: 89}, 'col2': {0: 3, 1: 4, 2: 32, 3: 43}, 'col3': {0: 1, 1: 22, 2: 33, 3: 44}} pd.DataFrame.from_dict(dic)
So I want to create a kind of nested list using the above data frame using ‘id1’ column, and ‘Name1’ and ‘Name2’ columns. For example, if we think about the first row, id1 should be in one list ([‘ab-85643’]) and ‘Name1’ and ‘Name2’ should be another list ([‘aasd1′,’vaasd1’]). Then for the 1st row, id1 list and ‘Name1’ and ‘Name2’ list should be in the same list ([[‘aasd1′,’vaasd1’],[‘ab-85643’]]). Some rows doesn’t have “Name” or “Name2”. This should need to be done for all the rows and the final list should be just like the below one.
collection = [[ ['aasd1','vaasd1'],['ab-85643'] ],[ ['aasd2'],['ab-85644'] ],[ ['aabsd1'],['ab-8564312'] ],[ ['aabsd2','averabsd'],['ab-8564314'] ]]
Is it possible to create that using python?
Can someone give me an idea, please?
Anything is appreciated. Thanks in advance!
Advertisement
Answer
It’s easier if you apply a custom function to the relevant columns:
def get_collections(row): first = row[:2].str.strip() return [first[first!=''].tolist(), [row[2]]] out = df[['Name1','Name2','id1']].apply(get_collections, axis=1).tolist()
Output:
[[['aasd1', 'vaasd1'], ['ab-85643']], [['aasd2'], ['ab-85644']], [['aabsd1'], ['ab-8564312']], [['aabsd2', 'averabsd'], ['ab-8564314']]]