I am hoping someone can help me optimize the following Python/Pandas code. My code works, but I know there must be a cleaner and faster way to perform the operation under consideration.
I am looking for an optimized strategy because my use case will involve 16 unique ADC Types, as opposed to 4 in the example below. Also, my initial Pandas Series (i.e. ADC Type column), will be several 100,000 data points in length, rather than 8 in the example below.
import numpy as np import pandas as pd from enum import Enum data_dict = {"RAW": [4000076160, 5354368, 4641792, 4289860736, 4136386944, 5440384, 4772864, 4289881216], "ADC_TYPE": [3, 7, 8, 9, 3, 7, 8, 9]} df = pd.DataFrame(data_dict) print(df)
The initial DataFrame (i.e. df) is:
RAW ADC_TYPE 0 4000076160 3 1 5354368 7 2 4641792 8 3 4289860736 9 4 4136386944 3 5 5440384 7 6 4772864 8 7 4289881216 9
I then manipulate the DataFrame above using the following code:
unique_types = df["ADC_TYPE"].unique() dict_concat = {"RAW": [], "ADC_TYPE_3": [], "ADC_TYPE_7": [], "ADC_TYPE_8": [], "ADC_TYPE_9": []} df_concat = pd.DataFrame(dict_concat) for adc_type in unique_types: df_group = df.groupby(["ADC_TYPE"]).get_group(adc_type).rename(columns={"ADC_TYPE": f"ADC_TYPE_{adc_type}"}) df_concat = pd.concat([df_concat, df_group]) print(df_concat.sort_index())
The returned DataFrame (i.e. df_concat) is displayed below. The ordering of RAW and the associated ADC Type values must remain unchanged. I need the return DataFrame to look just like the DataFrame below.
RAW ADC_TYPE_3 ADC_TYPE_7 ADC_TYPE_8 ADC_TYPE_9 0 4.000076e+09 3.0 NaN NaN NaN 1 5.354368e+06 NaN 7.0 NaN NaN 2 4.641792e+06 NaN NaN 8.0 NaN 3 4.289861e+09 NaN NaN NaN 9.0 4 4.136387e+09 3.0 NaN NaN NaN 5 5.440384e+06 NaN 7.0 NaN NaN 6 4.772864e+06 NaN NaN 8.0 NaN 7 4.289881e+09 NaN NaN NaN 9.0
Advertisement
Answer
I liked the idea of using get_dummies
, so I modified it a bit:
df = (pd.get_dummies(df, 'ADC_TYPE', '_', columns=['ADC_TYPE']) .replace(1, np.nan) .apply(lambda x: x.fillna(df['ADC_TYPE'])) .replace(0, np.nan))
Output:
RAW ADC_TYPE_3 ADC_TYPE_7 ADC_TYPE_8 ADC_TYPE_9 0 4000076160 3.0 NaN NaN NaN 1 5354368 NaN 7.0 NaN NaN 2 4641792 NaN NaN 8.0 NaN 3 4289860736 NaN NaN NaN 9.0 4 4136386944 3.0 NaN NaN NaN 5 5440384 NaN 7.0 NaN NaN 6 4772864 NaN NaN 8.0 NaN 7 4289881216 NaN NaN NaN 9.0