Skip to content

Tag: pandas

Pandas dataframe get first row of each group

I have a pandas DataFrame like following: I want to group this by [“id”,”value”] and get the first row of each group: Expected outcome: I tried following, which only gives the first row of the DataFrame. Any help regarding this is appreciated. Answer If you need id as column: To get n …

Convert one row of a pandas dataframe into multiple rows

I want to turn this: Into this: Context: I have data stored with one value coded for all ages (age = 99). However, the application I am developing for needs the value explicitly stated for every id-age pair (id =1, age = 25,50, and 75). There are simple solutions to this: iterate over id’s and append a …

Comparing two pandas dataframes for differences

I’ve got a script updating 5-10 columns worth of data , but sometimes the start csv will be identical to the end csv so instead of writing an identical csvfile I want it to do nothing… How can I compare two dataframes to check if they’re the same or not? Any ideas? Answer You also need to be…

Convert pandas DataFrame to a nested dict

I’m Looking for a generic way of turning a DataFrame to a nested dictionary This is a sample data frame The number of columns may differ and so does the column names. like this : What is best way to achieve this ? closest I got was with the zip function but haven’t managed to make it work for more

Change one value based on another value in pandas

I’m trying to reproduce my Stata code in Python, and I was pointed in the direction of Pandas. I am, however, having a hard time wrapping my head around how to process the data. Let’s say I want to iterate over all values in the column head ‘ID.’ If that ID matches a specific number, t…

Modify output from Python Pandas describe

Is there a way to omit some of the output from the pandas describe? This command gives me exactly what I want with a table output (count and mean of executeTime’s by a simpleDate) However that’s all I want, count and mean. I want to drop std, min, max, etc… So far I’ve only read how to…

extracting days from a numpy.timedelta64 value

I am using pandas/python and I have two date time series s1 and s2, that have been generated using the ‘to_datetime’ function on a field of the df containing dates/times. When I subtract s1 from s2 s3 = s2 – s1 I get a series, s3, of type timedelta64[ns] How do I look at 1 element of the ser…