I have checklist form data coming into a spreadsheet that I am trying to determine if a specific value was not checked and provide info based off that. My first thought was to have a master list/df where all the form values are then do a left/right merge on each response to determine values that weren’t there.
Sample data and script. Expecting nan/NA
for the missing Address response in the second set, with 6 rows instead of the original 5.
JavaScript
x
13
13
1
import pandas as pd
2
df_responses = pd.DataFrame({'Timestamp' : ['2022-09-21 10:39:40.676','2022-09-22 10:28:57.753'],
3
'Email': ['j@gmail.com', 'm@gmail.com'],
4
'Responses' : [['Name', 'Address', 'City'],['Name','City']]})
5
6
df_responses_expanded = df_responses.explode('Responses')
7
8
possible_responses = ['Name','Address','City']
9
10
df_possible_responses = pd.DataFrame({'responses_all':possible_responses})
11
12
df_check_responses = pd.merge(df_possible_responses,df_responses_expanded,how='left',left_on='responses_all',right_on='Responses')
13
Advertisement
Answer
You can do this by constructing your own MultiIndex
and performing a .reindex
operation.
JavaScript
1
22
22
1
import pandas as pd
2
3
index = pd.MultiIndex.from_product(
4
[df_responses.index, possible_responses], names=[None, 'Responses']
5
)
6
7
out = (
8
df_responses_expanded
9
.set_index('Responses', append=True)
10
.reindex(index)
11
)
12
13
print(out)
14
Timestamp Email
15
Responses
16
0 Name 2022-09-21 10:39:40.676 j@gmail.com
17
Address 2022-09-21 10:39:40.676 j@gmail.com
18
City 2022-09-21 10:39:40.676 j@gmail.com
19
1 Name 2022-09-22 10:28:57.753 m@gmail.com
20
Address NaN NaN
21
City 2022-09-22 10:28:57.753 m@gmail.com
22