Can we do log transform to one variable and sqrt to another for LinearRegression? If yes then what to do during MSE? Should I exp or square the y_test and prediction? Answer If you transform variables in training and test sets you don’t need to care about your evaluation metric. In case you transform your target variable (with the log
Tag: data-science
Error when trying to set column as index in pandas dataframe
I have the following code: which works fine until I do (trying to set column ‘idx’ as in index for the dataframe) which throws an error What does this mean ? Answer The error is when you create A with If you print A.columns you will get: So ‘idx’ is not really in your column for you to set index.
Plotly reformating Subplot Y axis values
Trying to turn the values in the Y axis into dollar amount, when using the update_layout method it only affects the first chart but not the others. I am not sure where to put the method, or how I could apply the formatting to each trace individually. Example of the Chart I am generating Answer You can format each y-axis
Calling an attribute defined in a method from another method in data science (python)
I’m learning object oriented programing in a data science context. I want to understand what good practice is in terms of writing methods within a class that relate to one another. When I run my code: I get the following output (only part of the output is shown due to space constrains): I am happy with the output generated by
How to Generate a dataset based on mean, median, 1st & 9th decile values?
I have the following values that describe a dataset: I need to generate any datasets that will fit these values. All the examples I found require you to have the standard deviation which I don’t. How this can be done? Thanks! Answer Interesting question! Based on Scott’s suggestions I gave it a quick try. Inputs: The Function: Comparaison: Output: Getting
I am unable to check the files available in the directory
I am trying to read the csv files in the current directory. In-order to do that, I want to check all the files present in my current directory. I have tried doing it with check_output function. However, i received this error and I’m unable to figure out how to deal with it. This is the code I have tried: this
imblearn.oversampling SMOTENC ValueError
This is my first time using SMOTENC to upsampling my categorical data. However, I’ve been getting error. Can you please advice what should I pass for categorical_features in SMOTENC? ERROR: Answer As per documentation: So, just replace the line with the line
Getting min and max datime for each date in csv
I’m kind of new to data science and Python. First of all, do you suggest using any other Library than pandas when dealing with huge dataset (100K+ rows)? Second of all, let me expose to you my current problem. I have a Dataset in which I have a Datetime column, to make it easy to understand, let’s say I only
Difference between Standard scaler and MinMaxScaler
What is the difference between MinMaxScaler() and StandardScaler(). mms = MinMaxScaler(feature_range = (0, 1)) (Used in a machine learning model) sc = StandardScaler() (In another machine learning model they used standard-scaler and not min-max-scaler) Answer From ScikitLearn site: StandardScaler removes the mean and scales the data to unit variance. However, the outliers have an influence when computing the empirical mean
How to do superscripts and subscripts in Jupyter Notebook?
I want to to use numbers to indicate references in footnotes, so I was wondering inside of Jupyter Notebook how can I use superscripts and subscripts? Answer You can do this inside of a markdown cell. A markdown cell can be created by selecting a cell then pressing the esc key followed by the M key. You can tell when