I have a dataframe which I pivoted and I now want to select spefici rows from the data. I have seen similar questions such as the one here: Selecting columns in a pandas pivot table based on specific row value?. In my case I want to return all the columns but I want to select only specific rows.
JavaScript
x
18
18
1
timestamp,value
2
2008-03-01 00:00:00,55.0
3
2008-03-01 00:15:00,20.0
4
2008-03-01 00:30:00,13.0
5
2008-03-01 00:45:00,78.0
6
2008-03-01 01:00:00,34.0
7
2008-03-01 01:15:00,123.0
8
2008-03-01 01:30:00,25.0
9
2008-03-01 01:45:00,91.0
10
2008-03-02 00:00:00,55.0
11
2008-03-02 00:15:00,46.0
12
2008-03-02 00:30:00,66.0
13
2008-03-02 00:45:00,24.0
14
2008-03-02 01:00:00,70.0
15
2008-03-02 01:15:00,32.0
16
2008-03-02 01:30:00,15.0
17
2008-03-02 01:45:00,92.0
18
I have done the below to generate the below output
JavaScript
1
16
16
1
import pandas as pd
2
import numpy as np
3
from datetime import datetime
4
5
df = pd.read_csv('df.csv')
6
df.timestamp = pd.to_datetime(df.timestamp)
7
8
9
df = df.set_index('timestamp')
10
11
df['date'] = df.index.map(lambda t: t.date())
12
df['time'] = df.index.map(lambda t: t.time())
13
df_pivot = pd.pivot_table(df, values='value', index='timestamp', columns='time')
14
df_pivot = df_pivot.fillna(0.0)
15
print(df_pivot)
16
Generated output
JavaScript
1
19
19
1
time 00:00:00 00:15:00 00:30:00 00:45:00 01:00:00 01:15:00 01:30:00 01:45:00
2
timestamp
3
2008-03-01 00:00:00 55.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4
2008-03-01 00:15:00 0.0 20.0 0.0 0.0 0.0 0.0 0.0 0.0
5
2008-03-01 00:30:00 0.0 0.0 13.0 0.0 0.0 0.0 0.0 0.0
6
2008-03-01 00:45:00 0.0 0.0 0.0 78.0 0.0 0.0 0.0 0.0
7
2008-03-01 01:00:00 0.0 0.0 0.0 0.0 34.0 0.0 0.0 0.0
8
2008-03-01 01:15:00 0.0 0.0 0.0 0.0 0.0 123.0 0.0 0.0
9
2008-03-01 01:30:00 0.0 0.0 0.0 0.0 0.0 0.0 25.0 0.0
10
2008-03-01 01:45:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 91.0
11
2008-03-02 00:00:00 55.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
12
2008-03-02 00:15:00 0.0 46.0 0.0 0.0 0.0 0.0 0.0 0.0
13
2008-03-02 00:30:00 0.0 0.0 66.0 0.0 0.0 0.0 0.0 0.0
14
2008-03-02 00:45:00 0.0 0.0 0.0 24.0 0.0 0.0 0.0 0.0
15
2008-03-02 01:00:00 0.0 0.0 0.0 0.0 70.0 0.0 0.0 0.0
16
2008-03-02 01:15:00 0.0 0.0 0.0 0.0 0.0 32.0 0.0 0.0
17
2008-03-02 01:30:00 0.0 0.0 0.0 0.0 0.0 0.0 15.0 0.0
18
2008-03-02 01:45:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 92.0
19
I want to select e.g., only the data for 2008-03-01 00:00:00
, 2008-03-01 01:15:00
, and 2008-03-02 01:00:00
.
Expected output
JavaScript
1
6
1
time 00:00:00 00:15:00 00:30:00 00:45:00 01:00:00 01:15:00 01:30:00 01:45:00
2
timestamp
3
2008-03-01 00:00:00 55.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4
2008-03-01 01:15:00 0.0 0.0 0.0 0.0 0.0 123.0 0.0 0.0
5
2008-03-02 01:00:00 0.0 0.0 0.0 0.0 70.0 0.0 0.0 0.0
6
How can I do that
Advertisement
Answer
Use list of datetimes converted by to_datetime
and select by DataFrame.loc
:
JavaScript
1
23
23
1
#create DatetimeIndex
2
df = pd.read_csv('df.csv', index_col='timestamp', parse_dates=['timestamp'])
3
4
#used pandas methods
5
df['date'] = df.index.date
6
df['time'] = df.index.time
7
#added fill_value parameter
8
df_pivot = pd.pivot_table(df,values='value',index='timestamp',columns='time',fill_value=0)
9
10
11
L = ['2008-03-01 00:00:00','2008-03-01 01:15:00','2008-03-02 01:00:00']
12
df = df_pivot.loc[pd.to_datetime(L)]
13
print (df)
14
time 00:00:00 00:15:00 00:30:00 00:45:00 01:00:00
15
2008-03-01 00:00:00 55 0 0 0 0
16
2008-03-01 01:15:00 0 0 0 0 0
17
2008-03-02 01:00:00 0 0 0 0 70
18
19
time 01:15:00 01:30:00 01:45:00
20
2008-03-01 00:00:00 0 0 0
21
2008-03-01 01:15:00 123 0 0
22
2008-03-02 01:00:00 0 0 0
23