Skip to content
Advertisement

pandas.Dataframe equivalent for Pandas.read_csv converters?

This discussion covers the differences between dtypesand converters in pandas.read_csv function.

I could not find an equivalent to converters for the pandas.DataFrame constructor in the documentation.

If I build a dataframe directly from a list of lists, what would be the best way to mimic the same behavior?

Some made-up example:

# data.csv

sport,population
football,15M
darts,50k
sailing,3000
# convert_csv_to_df.py

import pandas as pd

def f_population_to_int(population):
    dict_multiplier={"k": 1000, "M": 1000000}
    try:
         multiplier = dict_multiplier[population[-1]]
         return int(population[0:-1]) * multiplier
    except KeyError:
         return population

dict_converters = {"population": f_population_to_int}
df = pd.read_csv("data.csv", converters=dict_converters)

output:

      sport population 
0  football   15000000 
1     darts      50000 
2   sailing       3000 

What would be the best way to get the same dataframe from a list of lists?

data = [["sports", "population"], ["football", "15M"], ["darts", "50k"], ["sailing", 3000]]

Edit for clarification:

The example dict_converter holds only one function, but the idea is to be able to apply different conversions for multiple columns.

Advertisement

Answer

Change f_population_to_int function for return same value if any error (remove KeyError) and after create DataFrame use Series.apply:

data = [["sports", "population"], ["football", "15M"], ["darts", "50k"], ["sailing", 3000]]


def f_population_to_int(population):
    dict_multiplier={"k": 1000, "M": 1000000}
    try:
         multiplier = dict_multiplier[population[-1]]
         return int(population[0:-1]) * multiplier
    except:
         return population

df = pd.DataFrame(data[1:], columns=data[0])
df['population'] = df['population'].apply(f_population_to_int)

print (df)
     sports population
0  football   15000000
1     darts      50000
2   sailing       3000

If need dict dict_converters use:

dict_converters = {"population": f_population_to_int}
for k, v in dict_converters.items():
    df[k] = df[k].apply(v)
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement