Skip to content
Advertisement

Merge Dataframe rows based on the date

I have a dataframe that looks like this, It has the name of the company, the date and the title of a headline that was published regarding that company on that day. There are multiple headlines published on that single day and every single one of those headlines take up a different row even for the same date.

What I wish to do is merge all the title rows as per the date, so the Title column would represent ALL the headlines that were published on the day. I tried doing it, but just messed up my dataframe.

Any help will be greatly appreciated!

Advertisement

Answer

You can groupby and aggregate:

from datetime import date

import pandas as pd

df = pd.DataFrame(
    {
        "company": ["GOOG", "GOOG", "META", "META"],
        "date": [
            date(2022, 6, 1),
            date(2022, 6, 1),
            date(2022, 6, 1),
            date(2022, 6, 2),
        ],
        "title": ["google good", "google bad", "meta good", "meta bad"],
    }
)

df.groupby(["company", "date"]).aggregate(list).reset_index()

gives

  company        date                      title
0    GOOG  2022-06-01  [google good, google bad]
1    META  2022-06-01                [meta good]
2    META  2022-06-02                 [meta bad]
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement