I have a dataframe that looks like this, It has the name of the company, the date and the title of a headline that was published regarding that company on that day. There are multiple headlines published on that single day and every single one of those headlines take up a different row even for the same date.
What I wish to do is merge all the title rows as per the date, so the Title column would represent ALL the headlines that were published on the day. I tried doing it, but just messed up my dataframe.
Any help will be greatly appreciated!
Advertisement
Answer
You can groupby
and aggregate
:
from datetime import date import pandas as pd df = pd.DataFrame( { "company": ["GOOG", "GOOG", "META", "META"], "date": [ date(2022, 6, 1), date(2022, 6, 1), date(2022, 6, 1), date(2022, 6, 2), ], "title": ["google good", "google bad", "meta good", "meta bad"], } ) df.groupby(["company", "date"]).aggregate(list).reset_index()
gives
company date title 0 GOOG 2022-06-01 [google good, google bad] 1 META 2022-06-01 [meta good] 2 META 2022-06-02 [meta bad]