Difference between map, applymap and apply methods in Pandas

Question

Can you tell me when to use these vectorization methods with basic examples? I see that map is a Series method whereas the rest are DataFrame methods. I got confused about apply and applymap methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustr…

Accepted Answer

Comparing map, applymap and apply: Context MattersFirst major difference: DEFINITIONmap is defined on Series ONLYapplymap is defined on DataFrames ONLYapply is defined on BOTHSecond major difference: INPUT ARGUMENTmap accepts dicts, Series, or callableapplymap and apply accept callables onlyThird major difference: BEHAVIORmap is elementwise for Seriesapplymap is elementwise for DataFramesapply also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.Fourth major difference (the most important one): USE CASEmap is meant for mapping values from one domain to another, so is optimised for performance (e.g., df['A'].map({1:'a', 2:'b', 3:'c'}))applymap is good for elementwise transformations across multiple rows/columns (e.g., df[['A', 'B', 'C']].applymap(str.strip))apply is for applying any function that cannot be vectorised (e.g., df['sentences'].apply(nltk.sent_tokenize)).Also see When should I (not) want to use pandas apply() in my code? for a writeup I made a while back on the most appropriate scenarios for using apply (note that there aren&#8217;t many, but there are a few— apply is generally slow).SummarisingFootnotesmap when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded asNaN in the output.applymap in more recent versions has been optimised for some operations. You will find applymap slightly faster than apply insome cases. My suggestion is to test them both and use whatever worksbetter.map is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas touse faster code paths for better performance.Series.apply returns a scalar for aggregating operations, Series otherwise. Similarly for DataFrame.apply. Note that apply also hasfastpaths when called with certain NumPy functions such as mean,sum, etc.

Difference between map, applymap and apply methods in Pandas

Advertisement

Answer

Comparing `map`, `applymap` and `apply`: Context Matters

Summarising