Tag: pandas

pandas dataframe loc usage: what does supplying length of index to loc actually mean?

I have read about dataframe loc. I could not understand why the length of dataframe(indexPD) is being supplied to loc as a first argument. Basically what does this loc indicate? Answer That is simply telling pandas you want to do the operation on all of the rows of that column of your dataframe. Consider this…

Speeding-up pandas column operation based on several rules

bipartite networking pandas performance python

I have a data frame consisting of 5.1 mio rows. Now, consider only a query of my data frame which has the following form: date ID1 ID2 201908 a X 201905 b Y 201811 a Y 201807 a Z You can assume that the date is sorted and that there are no duplicates in the subset [‘ID1’, ‘ID2’]. Now, …

How to store CSV file in database?

csv mysql pandas python

There is a output file from Python Pandas with a lot of columns with headers. I need to be able handle this file by script and get CSV files in different columns positions. For example, initial file has columns As variation I need to get it in different sequence: I wonder what is the best way to store this fi…

Exponential fit in pandas

numpy pandas python

I have this data: The data seems to follow an exponential curve. Let’s see the plot: I want to fit an exponential curve ($$ y = Ae^{Bx} $$, A times e to the B*X)and add it as a column in Pandas. Firstly I tried to log the values: And then to use Numpy to fit the equation: But I get

Identify the columns which contain zero and output its location

pandas python

Suppose I have a dataframe where some columns contain a zero value as one of their elements (or potentially more than one zero). I don’t specifically want to retrieve these columns or discard them (I know how to do that) – I just want to locate these. For instance: if there is are zeros somewhere …

Multiply pandas dataframe with a differently shaped dataframe based on condition

dataframe pandas python

I have a pandas DataFrame (df_A) with this basic form: Furthermore I have another DataFrame (df_B): What I want to do is multiply the values of the second DataFrame with the values of the first, where the alt value is the same. I also do not want the d or e columns to be involved in the multiplication. So I

Get only data that are repeated any one of the given year in pandas

pandas python

Below is the Raw Data. I want only those event which will repeat in given set of list of years. Eg [2012,2013]. So now it should only get data if event is repeated in one of the given year in the list. I want below output. Answer You can try groupby and filter

problem with pd.wide_to_long specifications

pandas python

I have a dataframe that looks like the following: id xx_04-Feb-94 yyy_04-Feb-94 z_04-Feb-94 xx_22-Mar-94 yyy_22-Mar-94 z_22-Mar-94 123 456 789 with values inside the table filled out. I would like to pivot the data from wide to long. the desired output looks as follows: id date xx yyy z 123 04-Feb-94 123 22-M…

Calculate column value count as a bar plot in Python dataframe

bar-chart dataframe pandas pandas-groupby python

I have time series data and want to see total number of Septic (1) and Non-septic (0) patients in the SepsisLabel column. The Non-septic patients don’t have entries of ‘1’. While the Septic patients have first ‘Zeros (0)’ then it changes to ‘1’ means it now becomes se…