This discussion covers the differences between dtypes
and converters
in pandas.read_csv
function.
I could not find an equivalent to converters for the pandas.DataFrame constructor in the documentation.
If I build a dataframe directly from a list of lists, what would be the best way to mimic the same behavior?
Some made-up example:
JavaScript
x
7
1
# data.csv
2
3
sport,population
4
football,15M
5
darts,50k
6
sailing,3000
7
JavaScript
1
15
15
1
# convert_csv_to_df.py
2
3
import pandas as pd
4
5
def f_population_to_int(population):
6
dict_multiplier={"k": 1000, "M": 1000000}
7
try:
8
multiplier = dict_multiplier[population[-1]]
9
return int(population[0:-1]) * multiplier
10
except KeyError:
11
return population
12
13
dict_converters = {"population": f_population_to_int}
14
df = pd.read_csv("data.csv", converters=dict_converters)
15
output:
JavaScript
1
5
1
sport population
2
0 football 15000000
3
1 darts 50000
4
2 sailing 3000
5
What would be the best way to get the same dataframe from a list of lists?
JavaScript
1
2
1
data = [["sports", "population"], ["football", "15M"], ["darts", "50k"], ["sailing", 3000]]
2
Edit for clarification:
The example dict_converter holds only one function, but the idea is to be able to apply different conversions for multiple columns.
Advertisement
Answer
Change f_population_to_int
function for return same value if any error (remove KeyError
) and after create DataFrame use Series.apply
:
JavaScript
1
20
20
1
data = [["sports", "population"], ["football", "15M"], ["darts", "50k"], ["sailing", 3000]]
2
3
4
def f_population_to_int(population):
5
dict_multiplier={"k": 1000, "M": 1000000}
6
try:
7
multiplier = dict_multiplier[population[-1]]
8
return int(population[0:-1]) * multiplier
9
except:
10
return population
11
12
df = pd.DataFrame(data[1:], columns=data[0])
13
df['population'] = df['population'].apply(f_population_to_int)
14
15
print (df)
16
sports population
17
0 football 15000000
18
1 darts 50000
19
2 sailing 3000
20
If need dict dict_converters
use:
JavaScript
1
4
1
dict_converters = {"population": f_population_to_int}
2
for k, v in dict_converters.items():
3
df[k] = df[k].apply(v)
4