Tag: data-science

Apply for loop in multiple dataframe for multiple columns?

Dataframe is like below: Where I want to change dataframes value to ‘dead’ if age is more than 100. Desired outcome I was trying something like this: Error shown: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() I am looking for a loop that works on all dataframe. Please correct my

How to replace NaN value in column in Dataframe based on values from another column in same dataframe

data-science dataframe numpy pandas python

Below is the Dataframe i’m working. I want to replace NaN values in ‘Score’ columns using values from column ‘Country’ and ‘Sectors’ Below is the code which I’ve tried I want to replace only NaN values specific to country == ‘USA’ and Sectors == ‘CHEM’ and keep all values as it is. Could anyone please help?“` Answer You can use

How to remove extra quotes in between quotes for following example “Dec 01, 1999″,”Pocket Aquarium “Pocker” Pocket”,”Random : USA”,”USA” using python

csv data-cleaning data-science python

I want to remove extra quotes in each line of csv file. ex: ideal output required: Answer you could try this: input: code: test_modified.csv

Can Pandas output inferred schema for a CSV file?

csv data-science data-wrangling pandas python

Is there a method I can use to output the inferred schema on a large CSV using pandas? In addition, any way to have it tell me with that type if it is nullable/blank based off the CSV? File is about 500k rows with 250 columns. With my new job, I’m constantly being handed CSV files with zero format documentation.

Replace grouped columns’ outliers with mean of the group based on defined zscore

data-science dataframe python

I have a very huge dataFrame with many datapoints on a map with outliers which are very close to each other on the dataset(Latitudes and longitudes). I would like to group all the rows as shown below for column A, calculate their zscores and replace every value within a group whose zscore is > 1.5 with the mean value for

How to divide one column by another where one dataframe’s column value corresponds to another dataframe’s column’s value in Python Pandas?

data-science dataframe pandas python python-3.x

Consider the following data frames in Python Pandas: DataframeA ColA ColB ColC 1 dog 439 1 cat 932 1 frog 932 2 dog 2122 2 cat 454 2 frog 773 3 dog 9223 3 cat 3012 3 frog 898 DataframeB ColD ColE 1 101 2 314 3 124 To note, ColB just repeats it’s string values as ColA iterates upwards.

Jupyter Notebook ImportError: cannot import name ‘example_var’

data-science import jupyter-notebook python

When I change/add a variable to my config.py file and then try to import it to my Jupyter Notebook I get: ImportError: cannot import name ‘example_var’ from ‘config’ config.py: jp_notebook.ipynb: But after I restart the Jupyter Kernel it works fine until I modify the config.py file again. I read somewhere that it’s because jupyter already cached that import. Is there

Weibull: R vs Python – slightly different results

data-science python r scipy

I’m trying to replicate R’s fitdist() results (reference, cannot modify R code) in Python using scipy.stats. The results are quite close but still different (difference is at not acceptable level). Does anybody know why the results are different? How can I reduce the difference between the results? scipy_stats.weibull_min definition (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.weibull_min.html) seems to be the same as R’s weibull (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Weibull.html. Data

Converting a multindex dataframe to a nested dictionary [closed]

data-science dataframe multi-index pandas python

Closed. This question needs debugging details. It is not currently accepting answers. Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question. Closed 2 years ago. Improve this question I have a grouped dataframe as shown in this link: I want to

Convert timeseries csv in Python

csv data-science pandas python

I want to convert a CSV file of time-series data with multiple sensors. This is what the data currently looks like: The different sensors are described by numbers and have different numbers of axes. If a new activity is labeled, everything below belongs to this new label. The label is in the same column as the first entry of each