Skip to content
Advertisement

Panda dataframe of distribution of particles: group by ID and find the half flux and the half flux radius

I am using Panda dataframe; I have a distribution of particles, their distance from the center of the distribution, and the associated fluxes. I want to find the total flux enclosed in the “half flux radius” (or “half light radius”), which is the radius that encloses half of the flux, by definition. I make you an example and then I ask you If you have any idea of how to make it.

Here I list 2 distribution of particles, identified by dist_ID, the distance of each particle from the center of the distribution R, and the flux of each particle.

     dist_ID          R        flux
0    702641.0    5.791781  0.097505
1    702641.0    2.806051  0.015750
2    702641.0    3.254907  0.086941
3    702641.0    8.291544  0.081764
4    702641.0    4.901959  0.053561
5    702641.0    8.630691  0.144661
...
228  802663.0   95.685763  0.025735
229  802663.0  116.070396  0.026012
230  802663.0  112.806001  0.022163
231  802663.0  229.388117  0.026154

For example, considering the particle distribution with dist_ID=702641.0, the total flux of the particle distribution is the sum of “flux”: total_flux=0.48; the half flux is half_flux=total_flux/2.=0.24; the radius that encloses half of the flux is R_2<R_hf<R_3 (where R_2=3.25 of particle 2; R_3=8.29 of particle 3), so I would consider R_h as the upper limit of that interval, i.e. R_hf=R_3.

I want a way, grouping by dist_ID with Panda dataframe, half_flux and R_hf of each distribution. Thanks

Advertisement

Answer

Can be done in this way:

import pandas as pd

data = {'dist_ID':  [702641.0,702641.0,702641.0,702641.0,702641.0,702641.0,802663.0,802663.0,802663.0,802663.0],
        'R':        [5.791781,2.806051,3.254907,8.291544,4.901959,8.630691,95.685763,116.070396,112.806001,229.388117],
        'flux':     [0.097505,0.015750,0.086941,0.081764,0.053561,0.144661,0.025735,0.026012,0.022163,0.026154]}
df = pd.DataFrame(data)


# Sort DF
df = df.sort_values(['dist_ID', 'R'])

# Calculate cumsum
df['flux_cumsum'] = df.groupby('dist_ID')['flux'].transform(pd.Series.cumsum)

# Calculate half_flux
df_halfflux = df.groupby('dist_ID').apply(lambda x: x.flux.sum() / 2).to_frame().rename(columns={0:'half_flux'})
df = pd.merge(df,df_halfflux, how="left", on=['dist_ID'])

# Calculate discrepancy
df['flux_diff'] = abs(df.half_flux- df.flux_cumsum)

print(df)

# Find R_hf-row
df = df.groupby(['dist_ID', 'half_flux']).agg({'flux_diff': 'min'}).rename(columns={'flux_diff': 'R_hf'})

print(df)

Upper code output this:

    dist_ID           R      flux  flux_cumsum  half_flux  flux_diff
0  702641.0    2.806051  0.015750     0.015750   0.240091   0.224341
1  702641.0    3.254907  0.086941     0.102691   0.240091   0.137400
2  702641.0    4.901959  0.053561     0.156252   0.240091   0.083839
3  702641.0    5.791781  0.097505     0.253757   0.240091   0.013666
4  702641.0    8.291544  0.081764     0.335521   0.240091   0.095430
5  702641.0    8.630691  0.144661     0.480182   0.240091   0.240091
6  802663.0   95.685763  0.025735     0.025735   0.050032   0.024297
7  802663.0  112.806001  0.022163     0.047898   0.050032   0.002134
8  802663.0  116.070396  0.026012     0.073910   0.050032   0.023878
9  802663.0  229.388117  0.026154     0.100064   0.050032   0.050032

                        R_hf
dist_ID  half_flux
702641.0 0.240091   0.013666
802663.0 0.050032   0.002134
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement