Skip to content
Advertisement

Map pandas dataframe columns to an array

I have a dataframe like this: Dataframe

{'choice_0': {0: 52, 1: 26, 2: 100, 3: 2, 4: 53},
 'choice_1': {0: 38, 1: 4, 2: 54, 3: 95, 4: 1},
 'choice_2': {0: 12, 1: 82, 2: 25, 3: 1, 4: 47},
 'choice_3': {0: 82, 1: 5, 2: 12, 3: 96, 4: 93},
 'choice_4': {0: 33, 1: 11, 2: 27, 3: 32, 4: 26},
 'choice_5': {0: 75, 1: 47, 2: 82, 3: 6, 4: 3},
 'choice_6': {0: 64, 1: 38, 2: 10, 3: 40, 4: 46},
 'choice_7': {0: 76, 1: 6, 2: 89, 3: 31, 4: 16},
 'choice_8': {0: 10, 1: 66, 2: 80, 3: 9, 4: 42},
 'choice_9': {0: 28, 1: 61, 2: 33, 3: 59, 4: 39},
 'n_people': {0: 4, 1: 4, 2: 3, 3: 2, 4: 4}}

And an array like:

input_arr = (
    np.array([[   0,   52],
              [   1,   82],
              [   2,  27],
              [   3,   2],
              [   4,   53]]))

The first element will be for family_id=0 and column “choice_0” = 52

The second element will be for family_id=1 and column “choice_2” = 82

The third element will be for family_id=2 and column “choice_4” = 27

And I will like to get:

    array([[   0,   0],
    [   1,   2],
    [   2,  3],
    [   3,   0],
    [   4,   0])

The logic will be:

  • For family_id =0 The initial array has a a 52. And I will like to receive a 0 because it belongs to the “choice_0” column.
  • For family_id = 1 The initial array has a 82. And I will like to receive a 2 because it belongs to the “choice_2″column.

Note: Number within a row(family_id) can´t be repeated.

I don´t know even what is the title, feel free to change it.

Advertisement

Answer

Suppose you have:

df = pd.DataFrame.from_dict({'family_id': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4},
 'choice_0': {0: 52, 1: 26, 2: 100, 3: 2, 4: 53},
 'choice_1': {0: 38, 1: 4, 2: 54, 3: 95, 4: 1},
 'choice_2': {0: 12, 1: 82, 2: 25, 3: 1, 4: 47},
 'choice_3': {0: 82, 1: 5, 2: 12, 3: 96, 4: 93},
 'choice_4': {0: 33, 1: 11, 2: 27, 3: 32, 4: 26},
 'choice_5': {0: 75, 1: 47, 2: 82, 3: 6, 4: 3}})


input_arr = (
    np.array([[   0,   52],
              [   1,   82],
              [   2,  27]])
)

You can get you desired output using a list comprehension.

output_arrary=np.array([[e[0], df.iloc[i].tolist().index(e[1])-1] for i, e in enumerate(input_arr)])

print(output_arrary)

[[0 0]
 [1 2]
 [2 4]]
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement