Skip to content
Advertisement

How do I melt a pandas dataframe?

On the tag, I often see users asking questions about melting dataframes in pandas. I am gonna attempt a cannonical Q&A (self-answer) with this topic.

I am gonna clarify:

  1. What is melt?

  2. How do I use melt?

  3. When do I use melt?

I see some hotter questions about melt, like:

So I am gonna attempt a canonical Q&A for this topic.



Dataset:

I will have all my answers on this dataset of random grades for random people with random ages (easier to explain for the answers :D):

JavaScript

Problems:

I am gonna have some problems and they will be solved in my self-answer below.

Problem 1:

How do I melt a dataframe so that the original dataframe becomes:

JavaScript

I want to transpose this so that one column would be each subject and the other columns would be the repeated names of the students and there age and score.

Problem 2:

This is similar to Problem 1, but this time I want to make the Problem 1 output Subject column only have Math, I want to filter out the English column:

JavaScript

I want the output to be like the above.

Problem 3:

If I was to group the melt and order the students by there scores, how would I be able to do that, to get the desired output like the below:

JavaScript

I need it to be ordered and the names separated by comma and also the Subjects separated by comma in the same order respectively

Problem 4:

How would I unmelt a melted dataframe? Let’s say I already melted this dataframe:

JavaScript

To become:

JavaScript

Then how would I translate this back to the original dataframe, the below:

JavaScript

How would I go about doing this?

Problem 5:

If I was to group by the names of the students and separate the subjects and grades by comma, how would I do it?

JavaScript

I want to have a dataframe like above.

Problem 6:

If I was gonna completely melt my dataframe, all columns as values, how would I do it?

JavaScript

I want to have a dataframe like above. All columns as values.

Please check my self-answer below :)

Advertisement

Answer

Note for pandas versions < 0.20.0: I will be using df.melt(...) for my examples, but you will need to use pd.melt(df, ...) instead.

Documentation references:

Most of the solutions here would be used with melt, so to know the method melt, see the documentaion explanation

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.

Parameters

  • id_vars : tuple, list, or ndarray, optional

    Column(s) to use as identifier variables.

  • value_vars : tuple, list, or ndarray, optional

    Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.

  • var_name : scalar

    Name to use for the ‘variable’ column. If None it uses frame.columns.name or ‘variable’.

  • value_name : scalar, default ‘value’

    Name to use for the ‘value’ column.

  • col_level : int or str, optional

    If columns are a MultiIndex then use this level to melt.

  • ignore_index : bool, default True

    If True, original index is ignored. If False, the original index is retained. Index labels will be repeated as necessary.

    New in version 1.1.0.

Logic to melting:

Melting merges multiple columns and converts the dataframe from wide to long, for the solution to Problem 1 (see below), the steps are:

  1. First we got the original dataframe.

  2. Then the melt firstly merges the Math and English columns and makes the dataframe replicated (longer).

  3. Then finally adds the column Subject which is the subject of the Grades columns value respectively.

This is the simple logic to what the melt function does.

Solutions:

I will solve my own questions.

Problem 1:

Problem 1 could be solve using pd.DataFrame.melt with the following code:

JavaScript

This code passes the id_vars argument to ['Name', 'Age'], then automatically the value_vars would be set to the other columns (['Math', 'English']), which is transposed into that format.

You could also solve Problem 1 using stack like the below:

JavaScript

This code sets the Name and Age columns as the index and stacks the rest of the columns Math and English, and resets the index and assigns Grade as the column name, then renames the other column level_2 to Subject and then sorts by the Subject column, then finally resets the index again.

Both of these solutions output:

JavaScript

Problem 2:

This is similar to my first question, but this one I only one to filter in the Math columns, this time the value_vars argument can come into use, like the below:

JavaScript

Or we can also use stack with column specification:

JavaScript

Both of these solutions give:

JavaScript

Problem 3:

Problem 3 could be solved with melt and groupby, using the agg function with ', '.join, like the below:

JavaScript

It melts the dataframe then groups by the grades and aggregates them and joins them by a comma.

stack could be also used to solve this problem, with stack and groupby like the below:

JavaScript

This stack function just transposes the dataframe in a way that is equivalent to melt, then resets the index, renames the columns and groups and aggregates.

Both solutions output:

JavaScript

Problem 4:

We first melt the dataframe for the input data:

JavaScript


Then now we can start solving this Problem 4.

Problem 4 could be solved with pivot_table, we would have to specify to the pivot_table arguments, values, index, columns and also aggfunc.

We could solve it with the below code:

JavaScript

Output:

JavaScript

The melted dataframe is converted back to the exact same format as the original dataframe.

We first pivot the melted dataframe and then reset the index and remove the column axis name.

Problem 5:

Problem 5 could be solved with melt and groupby like the following:

JavaScript

That melts and groups by Name.

Or you could stack:

JavaScript

Both codes output:

JavaScript

Problem 6:

Problem 6 could be solved with melt and no column needed to be specified, just specify the expected column names:

JavaScript

That melts the whole dataframe

Or you could stack:

JavaScript

Both codes output:

JavaScript

Conclusion:

melt is a really handy function, often it’s required, once you meet these types of problems, don’t forget to try melt, it may well solve your problem.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement