I have a file (CSV) containing data that looks something like this in a table format:
ID | employee | date | value1 | value2 |
---|---|---|---|---|
1 | a | 2022-01-01 | 123 | 456 |
2 | b | 2022-01-01 | 123 | 456 |
3 | a | 2022-01-01 | 123 | 456 |
4 | c | 2022-01-01 | 123 | 456 |
5 | d | 2022-01-01 | 123 | 456 |
6 | b | 2022-01-01 | 123 | 456 |
7 | e | 2022-01-01 | 123 | 456 |
8 | e | 2022-01-01 | 123 | 456 |
In Excel I created a macro with VBA that creates new workbooks for the table filtered by the employee column. The macro works dynamically by creating a dictionary of the unique values in column [employee] and a For Each loop that filters the original table by the key of the dictionary. In this example it would create 5 new workbooks (for employee “a”, “b”, “c”, “d” and “e”) displaying only their corresponding data. E.g. for employee a:
ID | employee | date | value1 | value2 |
---|---|---|---|---|
1 | a | 2022-01-01 | 123 | 456 |
3 | a | 2022-01-01 | 123 | 456 |
VBA code:
Sub Filter_Copy() Dim rng As Range Dim wb_new As Workbook Dim dic As Object Dim cell AsSub Range Dim key As Variant Set rng = Table1.ListObjects("table").Range Set dic = CreateObject("Scripting.Dictionary") For Each cell In Range("table[column]") dic(cell.Value) = 0 Next cell For Each key In dic.keys rng.AutoFilter 5, key Set wb_new = Workbooks.Add rng.SpecialCells(xlCellTypeVisible).Copy wb_new.Worksheets(1).Range("A1") wb_new.Close True, ThisWorkbook.Path & "" & key & ".xlsx" Next key End Sub
Now I want to do the exact same in Python but I am lost with using a for loop
to loop through a dictionary of the unique values in [employee]. I have no troubles doing it manually for every employee by creating a dedicated dataframe for each unique value, filtering it like df.employee == "a"
and saving it to_excel()
but doing it dynamically is something different.
So far I have the obvious:
import pandas as pd file = "filename.csv" df = pd.read_csv(file) dict = dict(df["employee"].unique())
Advertisement
Answer
Use pandas.groupby
and iterate through them:
groupobj = df.groupby('employee') for name,group in groupobj: group.to_excel(f'./{name}.xlsx')