Skip to content
Advertisement

Tag: pandas

python pandas flatten a dataframe to a list

I have a df like so: I want to flatten the df so it is one continuous list like so: [‘1/2/2014’, ‘a’, ‘6’, ‘z1’, ‘1/2/2014’, ‘a’, ‘3’, ‘z1′,’1/3/2014’, ‘c’, ‘1’, ‘x3’] I can loop through the rows and extend to a list, but is a much easier way to do it? Answer You can use .flatten() on the DataFrame converted

How to extract base path from DataFrame column of path strings

There are several questions about string manipulation, but I can’t find an answer which allows me to do the following—I thought it should have been simple… I have a DataFrame which includes a column containing a filename and path The following produces a representative example DataFrame: I want to end up with just the ‘filename’ part of the string. There

Setting plot background colour in Seaborn

I am using Seaborn to plot some data in Pandas. I am making some very large plots (factorplots). To see them, I am using some visualisation facilities at my university. I am using a Compound screen made up of 4 by 4 monitors with small (but nonzero) bevel — the gap between the screens. This gap is black. To minimise

Filtering multiple items in a multi-index Python Panda dataframe

I have the following table: Note: Both NSRCODE and PBL_AWI are indices. How do I search for values in column PBL_AWI? For example I want to keep the values [‘Lake’, ‘River’, ‘Upland’]. Answer You can get_level_values in conjunction with Boolean slicing. The same idea can be expressed in many different ways, such as df[df.index.get_level_values(‘PBL_AWI’).isin([‘Lake’, ‘River’, ‘Upland’])] Note that you have

How to merge two dataframe in pandas to replace nan

I want to do this in pandas: I have 2 dataframes, A and B, I want to replace only NaN of A with B values. Answer The official way promoted exactly to do this is A.combine_first(B). Further information are in the official documentation. However, it gets outperformed massively with large databases from A.fillna(B) (performed tests with 25000 elements):

Pandas – Compute z-score for all columns

I have a dataframe containing a single column of IDs and all other columns are numerical values for which I want to compute z-scores. Here’s a subsection of it: Some of my columns contain NaN values which I do not want to include into the z-score calculations so I intend to use a solution offered to this question: how to

Extending numpy.digitize to multi-dimensional data

I have a set of large arrays (about 6 million elements each) that I want to basically perform a np.digitize but over multiple axes. I am looking for some suggestions on both how to effectively do this but also on how to store the results. I need all the indices (or all the values, or a mask) of array A

Advertisement