My dataset currently has 1 column with different opportunity types. I have another column with a dummy variable as to whether or not the opportunity is a first time client or not. I would like to create a new category within col_opptype based on col_first. Where only 1 category (i.e. a) will be matched to its…
Tag: pandas
pandas groupby dataframes, calculate diffs between consecutive rows
Using pandas, I open some csv files in a loop and set the index to the cycleID column, except the cycleID column is not unique. See below: This prints the 2 columns (cycleID and mean) of the dataframe I am interested in for further computations: The objective is to use the rows corresponding to the same cycle…
Streamlit auto populate multiselect widgets to filter dataframe
I have a streamlit app where a user can upload a csv file. I would like streamlit to detect the object/dimension columns and create a multiselect filter for each of them with the unique values inside each of the columns. For example if the user uploads a file with 3 object/dimension, 3 separate multi select f…
How to avoid unsupported .nc file while reading from different directory
I have several folders in a directory containing .nc files. While reading, I am getting an error: NETCDF can not read unsupported file Since there are more than 5 thousand files, I don’t know which file is corrupted or unsupported. Is there any way to read files by jumping into another supported file? T…
Remove rows in which string contains other letters than A,C,T,G,N
I’m fairly new to numpy and pandas, let’s say that I have a 2D numpy array and I need to delete all rows in which the second value contain only the letters ‘A’, ‘C’, ‘T’, ‘G’ and ‘N’ so after filtering I can get this I wanted to do 3 for …
How to set non-adjacent cell range for XlsxWriter Data Validation
I am using the Python XlsxWriter module to add a drop down list using the method data_validation currently I have the set up so I drop duplicates on a Pandas Series and convert that into a list and set the values for the drop down list like so: This works fine however if the list exceeds 255 characters as acc…
python, How to get smoother value?
Somehow seaborn draws smoother line than actual data. For example, for x-value 0.18, actual data is like 11 but value on smoother line is about 3. How would I get value 3 for the x-value when given the list of data? The actual data are: Answer You can access the plot data with: out:
Repeat pattern using python regex
Well, I’m cleaning a dataset, using Pandas. I have a column called “Country”, where different rows could have numbers or other information into parenthesis and I have to remove them, for example: Australia1, Perú (country), 3Costa Rica, etc. To do this, I’m getting the column and I mak…
make correlation plot on time series data in python
I want to see a correlation on a rolling week basis in time series data. The reason because I want to see how rolling correlation moves each year. To do so, I tried to use pandas.corr(), pandas.rolling_corr() built-in function for getting rolling correlation and tried to make line plot, but I couldn’t c…
Issue with conversion of text data into a dataframe
I have a text file where I have several lines and between them, some data which I need to convert to the dataframe(useful data). I iterated the text file line by line and captured the useful data with the help of a regex. Something like this, The data captured look like this I thought to iterate each captured…