UPDATE:
I have managed to find the error source: In the current version of pandas, dataframes with the column ‘object’ dtype no longer use the scientific notation. For big values the cells display the right significant figures, but for small numbers the displayed value is 0.0
.
If you access the cell from the running script you still get the correct value. The issue is that if you store the dataframe, as a text file for example, you save the incorrect value.
This is a code example with the correct (for me) behaviour in previous versions:
import pandas as pd print(f'pandas version {pd.__version__}') idx = 'H1_6563A' data = {'ion': 'H1', 'wavelength': 6563.0, 'latex_label': '$6563AA,HI$', 'intgr_flux': 3.128572e-14, 'dist': 2.8e20, 'eqw': 1464.05371} mySeries = pd.Series(index=data.keys(), dtype='object') for param, value in data.items(): mySeries[param] = value print(f'nSeries: n {mySeries}') myDF = pd.DataFrame(columns=data.keys()) myDF.loc[idx] = mySeries print(f'nDataFrame:n {myDF}')
Where the dataframe shows a combination of scientific and non-scientific floats:
pandas version 1.2.3 Series: ion H1 wavelength 6563.0 latex_label $6563AA,HI$ intgr_flux 0.0 dist 280000000000000000000.0 eqw 1464.05371 dtype: object DataFrame: ion wavelength latex_label intgr_flux dist eqw H1_6563A H1 6563.0 $6563AA,HI$ 3.128572e-14 2.800000e+20 1464.05371
The same script in pandas 1.4.1 returns:
pandas version 1.4.1 Series: ion H1 wavelength 6563.0 latex_label $6563AA,HI$ intgr_flux 0.0 dist 280000000000000000000.0 eqw 1464.05371 dtype: object DataFrame: ion wavelength latex_label intgr_flux dist eqw H1_6563A H1 6563.0 $6563AA,HI$ 0.0 280000000000000000000.0 1464.05371
I wonder if anyone would please share their approaches to replicate the original behaviour so I can have a dataframe with mixed variables (strings, ints, floats, None, scientific, non-scientific) and show the correct significant figures.
Thank you very much.
ORIGINAL QUESTION
I am using a pandas.Series as a container for entries of different types. I have noticed the following issue while declaring small floats in scientific notation:
import numpy as np import pandas as pd print(f'Pandas {pd.__version__}') columns = ['c0', 'c1', 'c2', 'c3'] mySeries = pd.Series(index=columns) mySeries['c0'] = 'None' mySeries['c1'] = np.nan mySeries['c2'] = 1234.0 mySeries['c3'] = 1.234e-18 print(mySeries)
which returns:
c0 None c1 NaN c2 1234.0 c3 0.0 dtype: object
Calling the ‘c3’ entry the returns the complete float, however, if you convert this series to a pandas.DataFrame and you save it to a text file (using the .to_string()
attribute) it will be stored as 0.0.
If your first entry is a float this does not happen:
columns = ['c0', 'c1', 'c2', 'c3'] mySeries = pd.Series(index=columns) mySeries['c0'] = 123 mySeries['c1'] = np.nan mySeries['c2'] = 1234.0 mySeries['c3'] = 1.234e-18 print(mySeries) c0 1.230000e+02 c1 NaN c2 1.234000e+03 c3 1.234000e-18 dtype: float64
So my question is: Which is the right way to declare the input variable dtype so the entry order does not affect the display. Moreover, I wonder if anyone knows which is the parameter which decides when a cell uses the scientific notation or not.
Thanks a lot.
Advertisement
Answer
I would shape my df first, with proper dtypes, then add the data:
import pandas as pd df = pd.DataFrame( {'ion': pd.Series(dtype='str'), 'wavelength': pd.Series(dtype='float'), 'intgr_flux': pd.Series(dtype='float')}) idx = 'H1_6563A' data = { 'ion': 'H1', 'wavelength': 6563.0, 'intgr_flux': 3.128572e-14} df.loc[idx] = data print(df) # Outputs: # ion wavelength intgr_flux # H1_6563A H1 6563.0 3.128572e-14