Tag: data-science

Does it make sense? If yes then how to handle in MSE?

data-analysis data-science linear-regression python scikit-learn

Can we do log transform to one variable and sqrt to another for LinearRegression? If yes then what to do during MSE? Should I exp or square the y_test and prediction? Answer If you transform variables in training and test sets you don’t need to care about your evaluation metric. In case you transform your target variable (with the log

Error when trying to set column as index in pandas dataframe

data-science dataframe numpy pandas python

I have the following code: which works fine until I do (trying to set column ‘idx’ as in index for the dataframe) which throws an error What does this mean ? Answer The error is when you create A with If you print A.columns you will get: So ‘idx’ is not really in your column for you to set index.

Plotly reformating Subplot Y axis values

data-science finance plotly python

Trying to turn the values in the Y axis into dollar amount, when using the update_layout method it only affects the first chart but not the others. I am not sure where to put the method, or how I could apply the formatting to each trace individually. Example of the Chart I am generating Answer You can format each y-axis

Calling an attribute defined in a method from another method in data science (python)

data-science oop pandas python

I’m learning object oriented programing in a data science context. I want to understand what good practice is in terms of writing methods within a class that relate to one another. When I run my code: I get the following output (only part of the output is shown due to space constrains): I am happy with the output generated by

How to Generate a dataset based on mean, median, 1st & 9th decile values?

data-science numpy pandas python statistics

I have the following values that describe a dataset: I need to generate any datasets that will fit these values. All the examples I found require you to have the standard deviation which I don’t. How this can be done? Thanks! Answer Interesting question! Based on Scott’s suggestions I gave it a quick try. Inputs: The Function: Comparaison: Output: Getting

I am unable to check the files available in the directory

anaconda data-science python subprocess unix

I am trying to read the csv files in the current directory. In-order to do that, I want to check all the files present in my current directory. I have tried doing it with check_output function. However, i received this error and I’m unable to figure out how to deal with it. This is the code I have tried: this

imblearn.oversampling SMOTENC ValueError

data-science data-science-experience pandas python scikit-learn

This is my first time using SMOTENC to upsampling my categorical data. However, I’ve been getting error. Can you please advice what should I pass for categorical_features in SMOTENC? ERROR: Answer As per documentation: So, just replace the line with the line

Getting min and max datime for each date in csv

data-science dataset pandas python

I’m kind of new to data science and Python. First of all, do you suggest using any other Library than pandas when dealing with huge dataset (100K+ rows)? Second of all, let me expose to you my current problem. I have a Dataset in which I have a Datetime column, to make it easy to understand, let’s say I only

Difference between Standard scaler and MinMaxScaler

data-science machine-learning python python-3.x scikit-learn

What is the difference between MinMaxScaler() and StandardScaler(). mms = MinMaxScaler(feature_range = (0, 1)) (Used in a machine learning model) sc = StandardScaler() (In another machine learning model they used standard-scaler and not min-max-scaler) Answer From ScikitLearn site: StandardScaler removes the mean and scales the data to unit variance. However, the outliers have an influence when computing the empirical mean

How to do superscripts and subscripts in Jupyter Notebook?

data-science jupyter jupyter-notebook python

I want to to use numbers to indicate references in footnotes, so I was wondering inside of Jupyter Notebook how can I use superscripts and subscripts? Answer You can do this inside of a markdown cell. A markdown cell can be created by selecting a cell then pressing the esc key followed by the M key. You can tell when