I'm working with a large pandas dataframe and want to add a timestamp column which correlates to the value of another column. For example, the current dataframe looks like this: Server Hour server1 0 server2 0 server1000 0 server1 1 server2 1 and so on, with the hours column at ranging from 0-167, as they correlate to the hourly timestamps

Is there a better way to increment a timestamp column in a pandas dataframe?

I’m working with a large pandas dataframe and want to add a timestamp column which correlates to the value of another column. For example, the current dataframe looks like this:

Server	Hour
server1	0
server2	0
server1000	0
server1	1
server2	1

and so on, with the hours column at ranging from 0-167, as they correlate to the hourly timestamps of the following week.

I have the following code which establishes the weekly timestamps:

today = datetime.today()
start = (today - timedelta(days=today.weekday())).replace(hour=0, minute=0, second=0, microsecond=0)
end = (start + timedelta(days=6)).replace(hour=0, minute=0, second=0, microsecond=0)
print("end: " + str(end))

JavaScript
​x
 
today = datetime.today()
start = (today - timedelta(days=today.weekday())).replace(hour=0, minute=0, second=0, microsecond=0)
end = (start + timedelta(days=6)).replace(hour=0, minute=0, second=0, microsecond=0)
print("end: " + str(end))
​

From there, I try to create the new “time” column arithmetically:

end=end.timestamp()
total_df['time']=end
total_df['time'] = total_df['time'].astype(float) #to convert to a float so I can multiply it with the time column

total_df['time']=total_df['time']+3600*total_df['time'] #standardize timestamp to Sunday since the initial "end" was monday

JavaScript
 
end=end.timestamp()
total_df['time']=end
total_df['time'] = total_df['time'].astype(float) #to convert to a float so I can multiply it with the time column
​
total_df['time']=total_df['time']+3600*total_df['time'] #standardize timestamp to Sunday since the initial "end" was monday
​

Then I convert the time column back to a string and convert the unix timestamp to datetime

total_df['hour'] = total_df['hour'].astype(str)
total_df['hour']=pd.to_datetime(total_df['hour'],unit='s', utc='true')

JavaScript
 
total_df['hour'] = total_df['hour'].astype(str)
total_df['hour']=pd.to_datetime(total_df['hour'],unit='s', utc='true')
​

Unfortunately, this method doesn’t use my current timezone and standardizes to UTC, so the finalized hourly timestamps are 4 hours ahead of where they should be. I can account for this by subtracting 4 hours before conversion, but I feel like there must be a cleaner way to do this using datetime. My solution seems like such a roundabout way to say “add however many hours are in the hour column.”

My expected output should look like this:

Server	Hour	Time
server1	0	2022-04-24 00:00:00-4:00
server2	0	2022-04-24 00:00:00-04:00
serverx	0	2022-04-24 00:00:00-04:00
server1000	0	2022-04-24 00:00:00-04:00
server1	1	2022-04-24 01:00:00-04:00
server2	1	2022-04-24 01:00:00-04:00
serverx	1	2022-04-24 01:00:00-04:00
server1000	1	2022-04-24 01:00:00-04:00
x	x	x
server1000	167	2022-04-30 23:00:00-04:00

with the “x” and “serverx” covering all of the server and hour values between 1 and 1000 and 1 and 167, respectively.

Alternatively, is there an easy way to convert between time zones? My current output column looks like it should, except it’s in UTC, and I’d like it in EST.

Answer

Do I understand correctly that you start out with a dataframe that has a hour column, for example:

df = pd.DataFrame({'hour': range(5)})

JavaScript
 
df = pd.DataFrame({'hour': range(5)})
​

JavaScript
 
   hour
0     0
1     1
2     2
3     3
4     4
​

In this case you could try the following:

from datetime import date, datetime, timedelta

start = date.today()
df['time'] = (
    datetime(start.year, start.month, start.day)
    + timedelta(days=6 - start.weekday())
    + df['hour'].astype('timedelta64[h]')
).dt.tz_localize('EST')

JavaScript
 
from datetime import date, datetime, timedelta
​
start = date.today()
df['time'] = (
    datetime(start.year, start.month, start.day)
    + timedelta(days=6 - start.weekday())
    + df['hour'].astype('timedelta64[h]')
).dt.tz_localize('EST')
​

Result:

   hour                      time
0     0 2022-04-24 00:00:00-05:00
1     1 2022-04-24 01:00:00-05:00
2     2 2022-04-24 02:00:00-05:00
3     3 2022-04-24 03:00:00-05:00
4     4 2022-04-24 04:00:00-05:00

JavaScript
 
   hour                      time
0     0 2022-04-24 00:00:00-05:00
1     1 2022-04-24 01:00:00-05:00
2     2 2022-04-24 02:00:00-05:00
3     3 2022-04-24 03:00:00-05:00
4     4 2022-04-24 04:00:00-05:00
​

Or use an explicit timezone offset:

from datetime import date, datetime, timedelta, timezone

start = date.today()
df['time'] = (
    datetime(
        start.year, start.month, start.day, tzinfo=timezone(timedelta(hours=-5))
    )
    + timedelta(days=6 - start.weekday())
    + df['hour'].astype('timedelta64[h]')
)

JavaScript
 
from datetime import date, datetime, timedelta, timezone
​
start = date.today()
df['time'] = (
    datetime(
        start.year, start.month, start.day, tzinfo=timezone(timedelta(hours=-5))
    )
    + timedelta(days=6 - start.weekday())
    + df['hour'].astype('timedelta64[h]')
)
​

Advertisement

Answer