Skip to content
Advertisement

Tag: pandas

How to vectorize groupby and apply in pandas?

I’m trying to calculate (x-x.mean()) / (x.std +0.01) on several columns of a dataframe based on groups. My original dataframe is very large. Although I’ve splitted the original file into several chunks and I’m using multiprocessing to run the script on each chunk of the file, but still every chunk of the dataframe is very large and this process never

Dask concatenate 2 dataframes into 1 single dataframe

Objective To merge df_labelled file with a portion of labelled points to df where contains all the point. What I have tried Referring to Simple way to Dask concatenate (horizontal, axis=1, columns), I tried the code below But I get the error ValueError: Not all divisions are known, can’t align partitions. Please use set_index to set the index. Another thing

How to split string and get only one word in python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 2 years ago. Improve this question I have a string similar like this: I got this string from selecting row and column: and I want to get the “PARIS” only to

Joining two dataframes on columns they match

I have two dataframes. df1 has more elements (3) in column ‘Table_name’ than df2 (2). I want a resultant dataframe that only outputs the rows where df1 and df2 share the same column names. df1 df2 I want this to be the result. df_result This is what i tried but it doesn’t work: Answer You need loc here

PatsyError when using statsmodels for regression

I’m using ols in statsmodels to run a regression. Once I run the regressions on each row of my dataframe, I want to retrieve the X variables from patsy thats used in those regressions. But, I get an error that I just cant seem to understand. Edit: I am trying to run a regression as presented in the answer here,

pivot table raise error uniquely valued index error

I am trying to modify the following dataset in python 3/pandas into a dataframe that will have the first columns or index to be the rank and the second column all the Maj value. Something like that: … I am trying to do that with a table pivot: But get the following error: But i do not have any duplicated

Trying to merge different files csv and to label the columns

I’m trying to get a single dataset by merging several cvs files within one folder. So I would like to merge the different file, which each have 4 columns. I would also like to label the four columns using names=[] in pd.concatenate. I’m using this code: The problem is that instead of getting 4 columns I get 25, and I

Advertisement