I have a dataframe df looking as follows:
JavaScript
x
9
1
id cited_ids dummy_paper d
2
2 [4] NaN NaN
3
4 [9,18,6] NaN NaN
4
6 [] 9 0
5
7 [2] NaN NaN
6
9 [4] 7 0
7
14 [18,6] 3 0
8
18 [7] 1 0
9
What I would like to do is to substitute into df['cited_ids']
0 whenever the corresponding id has d=0 (i) and replace d=1 if there is at least one 0 in the list of df['cited_ids']
and the previous d was not 0 (ii). In other words, the first step (i) would result in:
JavaScript
1
9
1
id cited_ids dummy_paper d
2
2 [4] NaN NaN
3
4 [0,0,6] NaN NaN
4
6 [] 9 0
5
7 [2] NaN NaN
6
9 [4] 7 0
7
14 [0,6] 3 0
8
18 [0] 1 0
9
The second step (ii) would then result in:
JavaScript
1
9
1
id cited_ids dummy_paper d
2
2 [4] NaN NaN
3
4 [0,0,6] NaN 1
4
6 [] 9 0
5
7 [2] NaN NaN
6
9 [4] 7 0
7
14 [0,6] 3 0
8
18 [0] 1 0
9
Please also notice that the dataframe comes with df['cited_ids']
being an object.
df.to_dict() gives:
JavaScript
1
11
11
1
{'docdb': {0: 2, 1: 4, 2: 6, 3: 7, 4: 9, 5: 14, 6: 18},
2
'cited_docdb': {0: [4],
3
1: [9, 18, 6],
4
2: [],
5
3: [2],
6
4: [4],
7
5: [18, 6],
8
6: [7]},
9
'fronteer': {0: nan, 1: nan, 2: 9.0, 3: nan, 4: 7.0, 5: 3.0, 6: 1.0},
10
'distance': {0: nan, 1: nan, 2: 0.0, 3: nan, 4: 0.0, 5: 0.0, 6: 0.0}}
11
Thank you
Advertisement
Answer
The exact logic is unclear and your output doesn’t seem to match the description, but IIUC:
JavaScript
1
9
1
s = df.set_index('id')['d'].dropna().convert_dtypes()
2
3
df['cited_ids'] = [[s.get(i, i) for i in x]
4
for x in df['cited_ids']]
5
6
m = [0 in x for x in df['cited_ids']]
7
8
df.loc[m&df['d'].isna(), 'd'] = 1
9
output:
JavaScript
1
9
1
id cited_ids dummy_paper d
2
0 2 [4] NaN NaN
3
1 4 [0, 0, 0] NaN 1.0
4
2 6 [] 9.0 0.0
5
3 7 [2] NaN NaN
6
4 9 [4] 7.0 0.0
7
5 14 [0, 0] 3.0 0.0
8
6 18 [7] 1.0 0.0
9