Tag: pandas

replicating data in same dataFrame

I want to replicate the data from the same dataframe when a certain condition is fulfilled. Dataframe: I want to replicate the dataframe when going through a loop and there is a difference greater than 4 in row.hour. Expected Output: i want to replicate the rows when the iterating through all the row and ther…

Have the same index for multiple rows in dataframe, shown verticaly

dataframe pandas python

I am trying to create a dataframe that looks similar to an excel file, something like this: The code I am using right now: 1) Import packages What I am getting: What I want to do is have the dataframe show the index vertically, using the same index for 3 consequent rows at a time. Answer How about this? I

Dask dataframe crashes

dask dask-dataframe pandas python

I’m loading a large parquet dataframe using Dask but can’t seem to be able to do anything with it without the system crashing on me or getting a million errors and no output. The data weighs about 165M compressed, or 13G once loaded in pandas (it fits well in the 45G RAM available). Instead, if us…

Set dictionary keys as cells in dataframe column

pandas python

Please look at my code: Here I convert dictionary to DataFrame and set index as new column. Can it be done in 1 line at the stage of converting a dictionary to a date without I want to immediately recognize the major indices as cells of the new column. Something like Answer You can just reset_index() to creat…

Pandas pivot_table aggfunc ignores categories if more than one line of data is being aggregated

dataframe pandas python

I am trying to aggregate a dataframe using pandas.pivot_table and find it behaves differently when multiple lines are aggregated on a categorical series. Code from this issue helps explain (though the issue is different from mine). Setting up a dataframe with a categorical column: If I pivot the dataframe wit…

Return the last non-zero value in a panda df

pandas python

I have a dataframe The logic is if col1 is not zero, return col1. If col 1 is zero, return col2 (non-zero). If col 2 is zero, return col3. We don’t need to do anything for col4 My code looks like below but it only returns col1 I tried .any() and .all(), it doesnt work either. Also, is there anyway

null out n% values in series dictionary python

pandas python

How can I randomly make n% values null in a pandas series? Let’s say I want 20% null values in my dictionary, series, or list. input something = expected output with 20% null = Answer You can just use series.sample(frac=%) to index and set the values in original series as None.

ImportError while importing pandas: gettz not built

importerror pandas python python-import

Suddenly my python file won’t run anymore due to an ImportError. I already tried updating/reinstalling pandas via conda but this didn’t change anything. What could I try to fix this? Answer As suggested by furas, installing the package dateutil fixed the error.

Using python pandas how can we select very specific rows and associated column

dataframe pandas python

I am still learning python, kindly excuse if the question looks trivial to some. I have a csv file with following format and I want to extract a small segment of it and write to another csv file: So, this is what I want to do: Just extract the entries under actor_list2 and the corresponding id column and writ…

How to add row to dataframe that keeps tuple as a tuple rather than splitting out into two elements?

pandas python

I have a dataframe that I’d like to iteratively add rows to. The columns are an integer for ‘time’, an (x,y) coordinates and a dichotomous status. For the sake of this example, rather than the full iteration, I will just demonstrate the issue with adding one row to the dataframe, rather than…