I’m trying to calculate (x-x.mean()) / (x.std +0.01) on several columns of a dataframe based on groups. My original dataframe is very large. Although I’ve splitted the original file into several chunks and I’m using multiprocessing to run the script on each chunk of the file, but still every chunk of the dataframe is very large and this process never
Tag: pandas
Dask concatenate 2 dataframes into 1 single dataframe
Objective To merge df_labelled file with a portion of labelled points to df where contains all the point. What I have tried Referring to Simple way to Dask concatenate (horizontal, axis=1, columns), I tried the code below But I get the error ValueError: Not all divisions are known, can’t align partitions. Please use set_index to set the index. Another thing
How to split string and get only one word in python [closed]
Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 2 years ago. Improve this question I have a string similar like this: I got this string from selecting row and column: and I want to get the “PARIS” only to
Joining two dataframes on columns they match
I have two dataframes. df1 has more elements (3) in column ‘Table_name’ than df2 (2). I want a resultant dataframe that only outputs the rows where df1 and df2 share the same column names. df1 df2 I want this to be the result. df_result This is what i tried but it doesn’t work: Answer You need loc here
Getting error in dataframe typeError: ‘Series’ objects are mutable, thus they cannot be hashed
I am trying to apply this operation on my dataframe df: where data types of a,b,c are: But I am getting the error TypeError: ‘Series’ objects are mutable, thus they cannot be hashed Is it happening because of na value present in column b or c? If yes, is there a way to ignore the operation for na values? Thanks.
PatsyError when using statsmodels for regression
I’m using ols in statsmodels to run a regression. Once I run the regressions on each row of my dataframe, I want to retrieve the X variables from patsy thats used in those regressions. But, I get an error that I just cant seem to understand. Edit: I am trying to run a regression as presented in the answer here,
Is there a quick way in python to convert a string ‘1/100’ to float 0.01?
I have this df: which I would like to convert to decimal odds. I know i could use .split(‘/’) to achieve this but was wondering if there was a quicker way to do this. Answer As suggested by @ch3steR, use pd.eval and try this
pivot table raise error uniquely valued index error
I am trying to modify the following dataset in python 3/pandas into a dataframe that will have the first columns or index to be the rank and the second column all the Maj value. Something like that: … I am trying to do that with a table pivot: But get the following error: But i do not have any duplicated
Error when trying to set column as index in pandas dataframe
I have the following code: which works fine until I do (trying to set column ‘idx’ as in index for the dataframe) which throws an error What does this mean ? Answer The error is when you create A with If you print A.columns you will get: So ‘idx’ is not really in your column for you to set index.
Trying to merge different files csv and to label the columns
I’m trying to get a single dataset by merging several cvs files within one folder. So I would like to merge the different file, which each have 4 columns. I would also like to label the four columns using names=[] in pd.concatenate. I’m using this code: The problem is that instead of getting 4 columns I get 25, and I