This is more of a conceptual question, I do not have a specific problem. I am learning python for data analysis, but I am very familiar with R – one of the great things about R is plyr (and of course ggplot2) and even better dplyr. Pandas of course has split-apply as well however in R I can do things
Tag: pandas
Pandas Replace NaN with blank/empty string
I have a Pandas Dataframe as shown below: I want to remove the NaN values with an empty string so that it looks like so: Answer This might help. It will replace all NaNs with an empty string.
Python pandas apply function if a column value is not NULL
I have a dataframe (in Python 2.7, pandas 0.15.0): I want to apply a simple function for rows that does not contain NULL values in a specific column. My function is as simple as possible: And my apply code is the following: It works perfectly. If I want to check column ‘B’ for NULL values the pd.notnull() works perfectly as
How to split a dataframe by unique groups and save to a csv
I have a pandas dataframe I would like to iterate over. A simplified example of my dataframe: I would like to iterate over each unique gene and create a new file named: For the above example I should get three iterations with 3 outfiles and 3 dataframes: The resulting data frame contents split up by chunks will be sent to
Read specific columns with pandas or other python module
I have a csv file from this webpage. I want to read some of the columns in the downloaded file (the csv version can be downloaded in the upper right corner). Let’s say I want 2 columns: 59 which in the header is star_name 60 which in the header is ra. However, for some reason the authors of the webpage
Find the column name of the second largest value of each row in a Pandas DataFrame
I am trying to find column name associated with the largest and second largest values in a DataFrame, here’s a simplified example (the real one has over 500 columns): Needs to become: I can find the column name with the largest value (i,e, 1larg above) with idxmax, but how can I find the second largest? Answer (You don’t have any
pandas: Convert string column to ordered Category?
I’m working with pandas for the first time. I have a column with survey responses in, which can take ‘strongly agree’, ‘agree’, ‘disagree’, ‘strongly disagree’, and ‘neither’ values. This is the output of describe() and value_counts() for the column: I want to do a linear regression on this question versus overall score. However, I have a feeling that I should
How to iterate over consecutive chunks of Pandas dataframe efficiently
I have a large dataframe (several million rows). I want to be able to do a groupby operation on it, but just grouping by arbitrary consecutive (preferably equal-sized) subsets of rows, rather than using any particular property of the individual rows to decide which group they go to. The use case: I want to apply a function to each row
Using Python Pandas to bin data in one df according to bins defined in a second df
I am attempting to bin data in one dataframe according to bins defined in a second dataframe. I am thinking that some combination of pd.bin and pd.merge might get me there? This is basically the form each dataframe is currently in: df: And this is the table with the bins, df2: I would like to match the bin, and find
Plotting CDF of a pandas series in python
Is there a way to do this? I cannot seem an easy way to interface pandas series with plotting a CDF. Answer In case you are also interested in the values, not just the plot. This will always work (discrete and continuous distributions) Alternative example with a sample drawn from a continuous distribution or you have a lot of individual