I am using Panda dataframe; I have a distribution of particles, their distance from the center of the distribution, and the associated fluxes. I want to find the total flux enclosed in the “half flux radius” (or “half light radius”), which is the radius that encloses half of the flux, by definition. I make you an example and then I ask you If you have any idea of how to make it.
Here I list 2 distribution of particles, identified by dist_ID, the distance of each particle from the center of the distribution R, and the flux of each particle.
dist_ID R flux
0 702641.0 5.791781 0.097505
1 702641.0 2.806051 0.015750
2 702641.0 3.254907 0.086941
3 702641.0 8.291544 0.081764
4 702641.0 4.901959 0.053561
5 702641.0 8.630691 0.144661
228 802663.0 95.685763 0.025735
229 802663.0 116.070396 0.026012
230 802663.0 112.806001 0.022163
231 802663.0 229.388117 0.026154
For example, considering the particle distribution with dist_ID=702641.0
, the total flux of the particle distribution is the sum of “flux”: total_flux=0.48
;
the half flux is half_flux=total_flux/2.=0.24
;
the radius that encloses half of the flux is R_2<R_hf<R_3
(where R_2=3.25
of particle 2; R_3=8.29
of particle 3), so I would consider R_h
as the upper limit of that interval, i.e. R_hf=R_3
.
I want a way, grouping by dist_ID
with Panda dataframe, half_flux
and R_hf
of each distribution. Thanks
Advertisement
Answer
Can be done in this way:
import pandas as pd
data = {'dist_ID': [702641.0,702641.0,702641.0,702641.0,702641.0,702641.0,802663.0,802663.0,802663.0,802663.0],
'R': [5.791781,2.806051,3.254907,8.291544,4.901959,8.630691,95.685763,116.070396,112.806001,229.388117],
'flux': [0.097505,0.015750,0.086941,0.081764,0.053561,0.144661,0.025735,0.026012,0.022163,0.026154]}
df = pd.DataFrame(data)
# Sort DF
df = df.sort_values(['dist_ID', 'R'])
# Calculate cumsum
df['flux_cumsum'] = df.groupby('dist_ID')['flux'].transform(pd.Series.cumsum)
# Calculate half_flux
df_halfflux = df.groupby('dist_ID').apply(lambda x: x.flux.sum() / 2).to_frame().rename(columns={0:'half_flux'})
df = pd.merge(df,df_halfflux, how="left", on=['dist_ID'])
# Calculate discrepancy
df['flux_diff'] = abs(df.half_flux- df.flux_cumsum)
print(df)
# Find R_hf-row
df = df.groupby(['dist_ID', 'half_flux']).agg({'flux_diff': 'min'}).rename(columns={'flux_diff': 'R_hf'})
print(df)
Upper code output this:
dist_ID R flux flux_cumsum half_flux flux_diff
0 702641.0 2.806051 0.015750 0.015750 0.240091 0.224341
1 702641.0 3.254907 0.086941 0.102691 0.240091 0.137400
2 702641.0 4.901959 0.053561 0.156252 0.240091 0.083839
3 702641.0 5.791781 0.097505 0.253757 0.240091 0.013666
4 702641.0 8.291544 0.081764 0.335521 0.240091 0.095430
5 702641.0 8.630691 0.144661 0.480182 0.240091 0.240091
6 802663.0 95.685763 0.025735 0.025735 0.050032 0.024297
7 802663.0 112.806001 0.022163 0.047898 0.050032 0.002134
8 802663.0 116.070396 0.026012 0.073910 0.050032 0.023878
9 802663.0 229.388117 0.026154 0.100064 0.050032 0.050032
R_hf
dist_ID half_flux
702641.0 0.240091 0.013666
802663.0 0.050032 0.002134