Can you tell me when to use these vectorization methods with basic examples?
I see that map is a Series method whereas the rest are DataFrame methods. I got confused about apply and applymap methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!
Advertisement
Answer
Comparing map, applymap and apply: Context Matters
First major difference: DEFINITION
mapis defined on Series ONLYapplymapis defined on DataFrames ONLYapplyis defined on BOTH
Second major difference: INPUT ARGUMENT
mapacceptsdicts,Series, or callableapplymapandapplyaccept callables only
Third major difference: BEHAVIOR
mapis elementwise for Seriesapplymapis elementwise for DataFramesapplyalso works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.
Fourth major difference (the most important one): USE CASE
mapis meant for mapping values from one domain to another, so is optimised for performance (e.g.,df['A'].map({1:'a', 2:'b', 3:'c'}))applymapis good for elementwise transformations across multiple rows/columns (e.g.,df[['A', 'B', 'C']].applymap(str.strip))applyis for applying any function that cannot be vectorised (e.g.,df['sentences'].apply(nltk.sent_tokenize)).
Also see When should I (not) want to use pandas apply() in my code? for a writeup I made a while back on the most appropriate scenarios for using apply (note that there aren’t many, but there are a few— apply is generally slow).
Summarising
Footnotes
mapwhen passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.
applymapin more recent versions has been optimised for some operations. You will findapplymapslightly faster thanapplyin some cases. My suggestion is to test them both and use whatever works better.
mapis optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.
Series.applyreturns a scalar for aggregating operations, Series otherwise. Similarly forDataFrame.apply. Note thatapplyalso has fastpaths when called with certain NumPy functions such asmean,sum, etc.
