I am using Panda dataframe; I have a distribution of particles, their distance from the center of the distribution, and the associated fluxes. I want to find the total flux enclosed in the “half flux radius” (or “half light radius”), which is the radius that encloses half of the flux, by definition. I make you an example and then I ask you If you have any idea of how to make it.
Here I list 2 distribution of particles, identified by dist_ID, the distance of each particle from the center of the distribution R, and the flux of each particle.
dist_ID R flux 0 702641.0 5.791781 0.097505 1 702641.0 2.806051 0.015750 2 702641.0 3.254907 0.086941 3 702641.0 8.291544 0.081764 4 702641.0 4.901959 0.053561 5 702641.0 8.630691 0.144661 ... 228 802663.0 95.685763 0.025735 229 802663.0 116.070396 0.026012 230 802663.0 112.806001 0.022163 231 802663.0 229.388117 0.026154
For example, considering the particle distribution with dist_ID=702641.0
, the total flux of the particle distribution is the sum of “flux”: total_flux=0.48
;
the half flux is half_flux=total_flux/2.=0.24
;
the radius that encloses half of the flux is R_2<R_hf<R_3
(where R_2=3.25
of particle 2; R_3=8.29
of particle 3), so I would consider R_h
as the upper limit of that interval, i.e. R_hf=R_3
.
I want a way, grouping by dist_ID
with Panda dataframe, half_flux
and R_hf
of each distribution. Thanks
Advertisement
Answer
Can be done in this way:
import pandas as pd data = {'dist_ID': [702641.0,702641.0,702641.0,702641.0,702641.0,702641.0,802663.0,802663.0,802663.0,802663.0], 'R': [5.791781,2.806051,3.254907,8.291544,4.901959,8.630691,95.685763,116.070396,112.806001,229.388117], 'flux': [0.097505,0.015750,0.086941,0.081764,0.053561,0.144661,0.025735,0.026012,0.022163,0.026154]} df = pd.DataFrame(data) # Sort DF df = df.sort_values(['dist_ID', 'R']) # Calculate cumsum df['flux_cumsum'] = df.groupby('dist_ID')['flux'].transform(pd.Series.cumsum) # Calculate half_flux df_halfflux = df.groupby('dist_ID').apply(lambda x: x.flux.sum() / 2).to_frame().rename(columns={0:'half_flux'}) df = pd.merge(df,df_halfflux, how="left", on=['dist_ID']) # Calculate discrepancy df['flux_diff'] = abs(df.half_flux- df.flux_cumsum) print(df) # Find R_hf-row df = df.groupby(['dist_ID', 'half_flux']).agg({'flux_diff': 'min'}).rename(columns={'flux_diff': 'R_hf'}) print(df)
Upper code output this:
dist_ID R flux flux_cumsum half_flux flux_diff 0 702641.0 2.806051 0.015750 0.015750 0.240091 0.224341 1 702641.0 3.254907 0.086941 0.102691 0.240091 0.137400 2 702641.0 4.901959 0.053561 0.156252 0.240091 0.083839 3 702641.0 5.791781 0.097505 0.253757 0.240091 0.013666 4 702641.0 8.291544 0.081764 0.335521 0.240091 0.095430 5 702641.0 8.630691 0.144661 0.480182 0.240091 0.240091 6 802663.0 95.685763 0.025735 0.025735 0.050032 0.024297 7 802663.0 112.806001 0.022163 0.047898 0.050032 0.002134 8 802663.0 116.070396 0.026012 0.073910 0.050032 0.023878 9 802663.0 229.388117 0.026154 0.100064 0.050032 0.050032 R_hf dist_ID half_flux 702641.0 0.240091 0.013666 802663.0 0.050032 0.002134