How do I melt a pandas dataframe?

Question

On the pandas tag, I often see users asking questions about melting dataframes in pandas. I am gonna attempt a cannonical Q&A (self-answer) with this topic. I am gonna clarify: What is melt? How do I use melt? When do I use melt? I see some hotter questions about melt, like: Convert columns into rows with Pandas : This one

Accepted Answer

Note for pandas versions < 0.20.0: I will be using df.melt(...) for my examples, but you will need to use pd.melt(df, ...) instead.Documentation references:Most of the solutions here would be used with melt, so to know the method melt, see the documentaion explanationUnpivot a DataFrame from wide to long format, optionally leavingidentifiers set.This function is useful to massage a DataFrame into a format where oneor more columns are identifier variables (id_vars), while all othercolumns, considered measured variables (value_vars), are “unpivoted”to the row axis, leaving just two non-identifier columns, ‘variable’and ‘value’.Parametersid_vars : tuple, list, or ndarray, optionalColumn(s) to use as identifier variables.value_vars : tuple, list, or ndarray, optionalColumn(s) to unpivot. If not specified, uses all columns that are not set as id_vars.var_name : scalarName to use for the ‘variable’ column. If None it uses frame.columns.name or ‘variable’.value_name : scalar, default ‘value’Name to use for the ‘value’ column.col_level : int or str, optionalIf columns are a MultiIndex then use this level to melt.ignore_index : bool, default TrueIf True, original index is ignored. If False, the original index is retained. Index labels will be repeatedas necessary.New in version 1.1.0.Logic to melting:Melting merges multiple columns and converts the dataframe from wide to long, for the solution to Problem 1 (see below), the steps are:First we got the original dataframe.Then the melt firstly merges the Math and English columns and makes the dataframe replicated (longer).Then finally adds the column Subject which is the subject of the Grades columns value respectively.This is the simple logic to what the melt function does.Solutions:I will solve my own questions.Problem 1:Problem 1 could be solve using pd.DataFrame.melt with the following code:print(df.melt(id_vars=['Name', 'Age'], var_name='Subject', value_name='Grades'))This code passes the id_vars argument to ['Name', 'Age'], then automatically the value_vars would be set to the other columns (['Math', 'English']), which is transposed into that format.You could also solve Problem 1 using stack like the below:print(    df.set_index(["Name", "Age"])    .stack()    .reset_index(name="Grade")    .rename(columns={"level_2": "Subject"})    .sort_values("Subject")    .reset_index(drop=True))This code sets the Name and Age columns as the index and stacks the rest of the columns Math and English, and resets the index and assigns Grade as the column name, then renames the other column level_2 to Subject and then sorts by the Subject column, then finally resets the index again.Both of these solutions output:    Name  Age  Subject Grade0    Bob   13  English     C1   John   16  English     B2    Foo   16  English     B3    Bar   15  English    A+4   Alex   17  English     F5    Tom   12  English     A6    Bob   13     Math    A+7   John   16     Math     B8    Foo   16     Math     A9    Bar   15     Math     F10  Alex   17     Math     D11   Tom   12     Math     CProblem 2:This is similar to my first question, but this one I only one to filter in the Math columns, this time the value_vars argument can come into use, like the below:print(    df.melt(        id_vars=["Name", "Age"],        value_vars="Math",        var_name="Subject",        value_name="Grades",    ))Or we can also use stack with column specification:print(    df.set_index(["Name", "Age"])[["Math"]]    .stack()    .reset_index(name="Grade")    .rename(columns={"level_2": "Subject"})    .sort_values("Subject")    .reset_index(drop=True))Both of these solutions give:   Name  Age Subject Grade0   Bob   13    Math    A+1  John   16    Math     B2   Foo   16    Math     A3   Bar   15    Math     F4  Alex   15    Math     D5   Tom   13    Math     CProblem 3:Problem 3 could be solved with melt and groupby, using the agg function with ', '.join, like the below:print(    df.melt(id_vars=["Name", "Age"])    .groupby("value", as_index=False)    .agg(", ".join))It melts the dataframe then groups by the grades and aggregates them and joins them by a comma.stack could be also used to solve this problem, with stack and groupby like the below:print(    df.set_index(["Name", "Age"])    .stack()    .reset_index()    .rename(columns={"level_2": "Subjects", 0: "Grade"})    .groupby("Grade", as_index=False)    .agg(", ".join))This stack function just transposes the dataframe in a way that is equivalent to melt, then resets the index, renames the columns and groups and aggregates.Both solutions output:  Grade             Name                Subjects0     A         Foo, Tom           Math, English1    A+         Bob, Bar           Math, English2     B  John, John, Foo  Math, English, English3     C         Bob, Tom           English, Math4     D             Alex                    Math5     F        Bar, Alex           Math, EnglishProblem 4:We first melt the dataframe for the input data:df = df.melt(id_vars=['Name', 'Age'], var_name='Subject', value_name='Grades')Then now we can start solving this Problem 4.Problem 4 could be solved with pivot_table, we would have to specify to the pivot_table arguments, values, index, columns and also aggfunc.We could solve it with the below code:print(    df.pivot_table("Grades", ["Name", "Age"], "Subject", aggfunc="first")    .reset_index()    .rename_axis(columns=None))Output:   Name  Age English Math0  Alex   15       F    D1   Bar   15      A+    F2   Bob   13       C   A+3   Foo   16       B    A4  John   16       B    B5   Tom   13       A    CThe melted dataframe is converted back to the exact same format as the original dataframe.We first pivot the melted dataframe and then reset the index and remove the column axis name.Problem 5:Problem 5 could be solved with melt and groupby like the following:print(    df.melt(id_vars=["Name", "Age"], var_name="Subject", value_name="Grades")    .groupby("Name", as_index=False)    .agg(", ".join))That melts and groups by Name.Or you could stack:print(    df.set_index(["Name", "Age"])    .stack()    .reset_index()    .groupby("Name", as_index=False)    .agg(", ".join)    .rename({"level_2": "Subjects", 0: "Grades"}, axis=1))Both codes output:   Name       Subjects Grades0  Alex  Math, English   D, F1   Bar  Math, English  F, A+2   Bob  Math, English  A+, C3   Foo  Math, English   A, B4  John  Math, English   B, B5   Tom  Math, English   C, AProblem 6:Problem 6 could be solved with melt and no column needed to be specified, just specify the expected column names:print(df.melt(var_name='Column', value_name='Value'))That melts the whole dataframeOr you could stack:print(    df.stack()    .reset_index(level=1)    .sort_values("level_1")    .reset_index(drop=True)    .set_axis(["Column", "Value"], axis=1))Both codes output:     Column Value0       Age    161       Age    152       Age    153       Age    164       Age    135       Age    136   English    A+7   English     B8   English     B9   English     A10  English     F11  English     C12     Math     C13     Math    A+14     Math     D15     Math     B16     Math     F17     Math     A18     Name  Alex19     Name   Bar20     Name   Tom21     Name   Foo22     Name  John23     Name   BobConclusion:melt is a really handy function, often it&#8217;s required, once you meet these types of problems, don&#8217;t forget to try melt, it may well solve your problem.

How do I melt a pandas dataframe?

Dataset:

Problems:

Problem 1:

Problem 2:

Problem 3:

Problem 4:

Problem 5:

Problem 6:

Please check my self-answer below :)

Advertisement

Answer

Documentation references:

Logic to melting:

Solutions: