Tag: pandas

Create a new category by using a value from another column

My dataset currently has 1 column with different opportunity types. I have another column with a dummy variable as to whether or not the opportunity is a first time client or not. I would like to create a new category within col_opptype based on col_first. Where only 1 category (i.e. a) will be matched to its corresponding col_first I.e., col_opptype

pandas groupby dataframes, calculate diffs between consecutive rows

dataframe pandas pandas-groupby python

Using pandas, I open some csv files in a loop and set the index to the cycleID column, except the cycleID column is not unique. See below: This prints the 2 columns (cycleID and mean) of the dataframe I am interested in for further computations: The objective is to use the rows corresponding to the same cycleID and calculate the

Streamlit auto populate multiselect widgets to filter dataframe

loops pandas python streamlit

I have a streamlit app where a user can upload a csv file. I would like streamlit to detect the object/dimension columns and create a multiselect filter for each of them with the unique values inside each of the columns. For example if the user uploads a file with 3 object/dimension, 3 separate multi select filters will be created. I

How to avoid unsupported .nc file while reading from different directory

netcdf pandas python python-xarray

I have several folders in a directory containing .nc files. While reading, I am getting an error: NETCDF can not read unsupported file Since there are more than 5 thousand files, I don’t know which file is corrupted or unsupported. Is there any way to read files by jumping into another supported file? The code that I am using is:

Remove rows in which string contains other letters than A,C,T,G,N

arrays numpy pandas python

I’m fairly new to numpy and pandas, let’s say that I have a 2D numpy array and I need to delete all rows in which the second value contain only the letters ‘A’, ‘C’, ‘T’, ‘G’ and ‘N’ so after filtering I can get this I wanted to do 3 for loops that are checking each char one by one

How to set non-adjacent cell range for XlsxWriter Data Validation

pandas python xlsxwriter

I am using the Python XlsxWriter module to add a drop down list using the method data_validation currently I have the set up so I drop duplicates on a Pandas Series and convert that into a list and set the values for the drop down list like so: This works fine however if the list exceeds 255 characters as according

python, How to get smoother value?

matplotlib pandas python seaborn

Somehow seaborn draws smoother line than actual data. For example, for x-value 0.18, actual data is like 11 but value on smoother line is about 3. How would I get value 3 for the x-value when given the list of data? The actual data are: Answer You can access the plot data with: out:

Repeat pattern using python regex

dataframe pandas python regex

Well, I’m cleaning a dataset, using Pandas. I have a column called “Country”, where different rows could have numbers or other information into parenthesis and I have to remove them, for example: Australia1, Perú (country), 3Costa Rica, etc. To do this, I’m getting the column and I make a mapping over it. But I have a problem with this regex,

make correlation plot on time series data in python

matplotlib pandas python

I want to see a correlation on a rolling week basis in time series data. The reason because I want to see how rolling correlation moves each year. To do so, I tried to use pandas.corr(), pandas.rolling_corr() built-in function for getting rolling correlation and tried to make line plot, but I couldn’t correct the correlation line chart. I don’t know

Issue with conversion of text data into a dataframe

pandas python regex

I have a text file where I have several lines and between them, some data which I need to convert to the dataframe(useful data). I iterated the text file line by line and captured the useful data with the help of a regex. Something like this, The data captured look like this I thought to iterate each captured row and