I have a file (CSV) containing data that looks something like this in a table format:
| ID | employee | date | value1 | value2 | 
|---|---|---|---|---|
| 1 | a | 2022-01-01 | 123 | 456 | 
| 2 | b | 2022-01-01 | 123 | 456 | 
| 3 | a | 2022-01-01 | 123 | 456 | 
| 4 | c | 2022-01-01 | 123 | 456 | 
| 5 | d | 2022-01-01 | 123 | 456 | 
| 6 | b | 2022-01-01 | 123 | 456 | 
| 7 | e | 2022-01-01 | 123 | 456 | 
| 8 | e | 2022-01-01 | 123 | 456 | 
In Excel I created a macro with VBA that creates new workbooks for the table filtered by the employee column. The macro works dynamically by creating a dictionary of the unique values in column [employee] and a For Each loop that filters the original table by the key of the dictionary. In this example it would create 5 new workbooks (for employee “a”, “b”, “c”, “d” and “e”) displaying only their corresponding data. E.g. for employee a:
| ID | employee | date | value1 | value2 | 
|---|---|---|---|---|
| 1 | a | 2022-01-01 | 123 | 456 | 
| 3 | a | 2022-01-01 | 123 | 456 | 
VBA code:
Sub Filter_Copy()
Dim rng As Range
Dim wb_new As Workbook
Dim dic As Object
Dim cell AsSub Range
Dim key As Variant
Set rng = Table1.ListObjects("table").Range
Set dic = CreateObject("Scripting.Dictionary")
For Each cell In Range("table[column]")
    dic(cell.Value) = 0
Next cell
For Each key In dic.keys
    rng.AutoFilter 5, key
    Set wb_new = Workbooks.Add
    rng.SpecialCells(xlCellTypeVisible).Copy wb_new.Worksheets(1).Range("A1")
    wb_new.Close True, ThisWorkbook.Path & "" & key & ".xlsx"
    
Next key
End Sub
Now I want to do the exact same in Python but I am lost with using a for loop to loop through a dictionary of the unique values in [employee]. I have no troubles doing it manually for every employee by creating a dedicated dataframe for each unique value, filtering it like df.employee == "a" and saving it to_excel() but doing it dynamically is something different.
So far I have the obvious:
import pandas as pd file = "filename.csv" df = pd.read_csv(file) dict = dict(df["employee"].unique())
Advertisement
Answer
Use pandas.groupby and iterate through them:
groupobj = df.groupby('employee')
for name,group in groupobj:
    group.to_excel(f'./{name}.xlsx')
