This discussion covers the differences between dtypes
and converters
in pandas.read_csv
function.
I could not find an equivalent to converters for the pandas.DataFrame constructor in the documentation.
If I build a dataframe directly from a list of lists, what would be the best way to mimic the same behavior?
Some made-up example:
# data.csv sport,population football,15M darts,50k sailing,3000
# convert_csv_to_df.py import pandas as pd def f_population_to_int(population): dict_multiplier={"k": 1000, "M": 1000000} try: multiplier = dict_multiplier[population[-1]] return int(population[0:-1]) * multiplier except KeyError: return population dict_converters = {"population": f_population_to_int} df = pd.read_csv("data.csv", converters=dict_converters)
output:
sport population 0 football 15000000 1 darts 50000 2 sailing 3000
What would be the best way to get the same dataframe from a list of lists?
data = [["sports", "population"], ["football", "15M"], ["darts", "50k"], ["sailing", 3000]]
Edit for clarification:
The example dict_converter holds only one function, but the idea is to be able to apply different conversions for multiple columns.
Advertisement
Answer
Change f_population_to_int
function for return same value if any error (remove KeyError
) and after create DataFrame use Series.apply
:
data = [["sports", "population"], ["football", "15M"], ["darts", "50k"], ["sailing", 3000]] def f_population_to_int(population): dict_multiplier={"k": 1000, "M": 1000000} try: multiplier = dict_multiplier[population[-1]] return int(population[0:-1]) * multiplier except: return population df = pd.DataFrame(data[1:], columns=data[0]) df['population'] = df['population'].apply(f_population_to_int) print (df) sports population 0 football 15000000 1 darts 50000 2 sailing 3000
If need dict dict_converters
use:
dict_converters = {"population": f_population_to_int} for k, v in dict_converters.items(): df[k] = df[k].apply(v)