I need to scrape hundreds of pages and instead of storing the whole json of each page, I want to just store several columns from each page into a pandas dataframe. However, at the beginning when the dataframe is empty, I have a problem. I need to fill an empty dataframe without any columns or rows. So the loo…
Tag: dataframe
TypeError: cannot concatenate object of type ”; only Series and DataFrame objs are valid
I have a list of 10 dataframes named d0, d1, d2,…d9. All have 3 columns and 100 rows. I want to merge all dataframes so that I can have 3 columns and 1000 rows and then convert it into an array. The above code throws error: I used the solution suggested in pd.concat in pandas is giving a TypeError: cann…
dedup records(window function pandas)
Hi I am looking to dedup my records ordered by cancel date so I will only be interested in the most recent record. sample data id cancel_date type_of_fruit 1 2021-03-02 apple 1 2021-01-01 apple 2 2021-02-01 orange expected output id cancel_date type_of_fruit 1 2021-03-02 apple 2 2021-02-01 orange I wrote the …
How to groupby and calculate new field with python pandas?
I’d like to group by a specific column within a data frame called ‘Fruit’ and calculate the percentage of that particular fruit that are ‘Good’ See below for my initial dataframe Dataframe See below for my desired output data frame Note: Because there is 1 “Good” Appl…
Duration between two timestamps
I have a dataframe with different timestamp for each user, and I want to calculate the duration. I used this code to import my CSV files: df.head() And I want to get something like that I’ve used this code, but doesn’t work for me Answer Operations which occur over groups of values are GroupBy ope…
How to count number of rows with a specific string value in a column using pandas?
I have a pandas column with dtype ‘object’ that contains numeric values and the value ‘?’. How should I proceed to count the number of rows that have the value ‘?’ ? I’m trying to run: in a column that has numeric value and some question marks ‘?’, but I&#…
How to calculate average percentages of values within group?
I have a dataframe: I want to calculate percentage of each ‘type’ within date group and then average values among all dates. So desired results must be: and then average among all dates. this is the desired final result: How to do that? Answer You can try this: or this: It’s not quite clear …
Convert comma-separated values into integer list in pandas dataframe
How to convert a comma-separated value into a list of integers in a pandas dataframe? Input: Desired output: Answer There are 2 steps – split and convert to integers, because after split values are lists of strings, solution working well also if different lengths of lists (not added Nones): Or: Alternat…
Pandas data frame how to make a scatter plot for clustering a list of values into a set of groups
I have this pandas df with 2 columns and I want to create a plot that clusters the drugs into their target, so there will be 7 clusters (7 targets) , I am not sure how to do it.. This is the df: Answer You can plot scatterplot with seaborn like below: (Because you say in the comments of other
Dask dataframe: Can `set_index` put a single index into multiple partitions?
Empirically it seems that whenever you set_index on a Dask dataframe, Dask will always put rows with equal indexes into a single partition, even if it results in wildly imbalanced partitions. Here is a demonstration: However, I found no guarantee of this behaviour anywhere. I have tried to sift through the co…