Tag: pandas

plyr or dplyr in Python

This is more of a conceptual question, I do not have a specific problem. I am learning python for data analysis, but I am very familiar with R – one of the great things about R is plyr (and of course ggplot2) and even better dplyr. Pandas of course has split-apply as well however in R I can do things

Pandas Replace NaN with blank/empty string

dataframe nan pandas python

I have a Pandas Dataframe as shown below: I want to remove the NaN values with an empty string so that it looks like so: Answer This might help. It will replace all NaNs with an empty string.

How to split a dataframe by unique groups and save to a csv

csv dataframe pandas python

I have a pandas dataframe I would like to iterate over. A simplified example of my dataframe: I would like to iterate over each unique gene and create a new file named: For the above example I should get three iterations with 3 outfiles and 3 dataframes: The resulting data frame contents split up by chunks will be sent to

Read specific columns with pandas or other python module

csv pandas python

I have a csv file from this webpage. I want to read some of the columns in the downloaded file (the csv version can be downloaded in the upper right corner). Let’s say I want 2 columns: 59 which in the header is star_name 60 which in the header is ra. However, for some reason the authors of the webpage

Find the column name of the second largest value of each row in a Pandas DataFrame

dataframe pandas python python-3.x

I am trying to find column name associated with the largest and second largest values in a DataFrame, here’s a simplified example (the real one has over 500 columns): Needs to become: I can find the column name with the largest value (i,e, 1larg above) with idxmax, but how can I find the second largest? Answer (You don’t have any

pandas: Convert string column to ordered Category?

pandas python

I’m working with pandas for the first time. I have a column with survey responses in, which can take ‘strongly agree’, ‘agree’, ‘disagree’, ‘strongly disagree’, and ‘neither’ values. This is the output of describe() and value_counts() for the column: I want to do a linear regression on this question versus overall score. However, I have a feeling that I should

How to iterate over consecutive chunks of Pandas dataframe efficiently

ipython pandas parallel-processing python

I have a large dataframe (several million rows). I want to be able to do a groupby operation on it, but just grouping by arbitrary consecutive (preferably equal-sized) subsets of rows, rather than using any particular property of the individual rows to decide which group they go to. The use case: I want to apply a function to each row

Using Python Pandas to bin data in one df according to bins defined in a second df

binning dataframe join pandas python

I am attempting to bin data in one dataframe according to bins defined in a second dataframe. I am thinking that some combination of pd.bin and pd.merge might get me there? This is basically the form each dataframe is currently in: df: And this is the table with the bins, df2: I would like to match the bin, and find

Plotting CDF of a pandas series in python

cdf pandas python series

Is there a way to do this? I cannot seem an easy way to interface pandas series with plotting a CDF. Answer In case you are also interested in the values, not just the plot. This will always work (discrete and continuous distributions) Alternative example with a sample drawn from a continuous distribution or you have a lot of individual