I’m trying to plot several boxplots from different dataframes in one graph. Each dataframe has a different length. What I’m doing is the folowing: However, the output of doing that is that all boxplots are ploted one over the other and it’s not possible to distinguish anything. Can you help …
Tag: dataframe
How to order categorical month variable when plotting using matplotlib?
I am doing some topic modelling, and I am interested in showing how the average topic weight changes over time. The problem arises when I plot it using matplotlib (version 3.3.4). On the x-axis I would like to have the categorical month_year variable. The problem is that it is not ordered in a sensible way. I…
Pandas appending dictionary values with iterrows row values
I have a dict of city names, each having an empty list as a value. I am trying to use df.iterrows() to append corresponding names to each dict key(city): Can somebody explain why the code above appends all possible ‘fullname’ values to each dict’s key instead of appending them to their respe…
Trouble subtracting two column values correctly/precisely in pandas dataframe in Python
I’m trying to create a new column in my pandas dataframe which will be the difference of two other columns, but the new column has values that are significantly different what what the differences between the values of the columns are. I have heard that ‘float’ values often don’t subtr…
Row wise operation in Pandas DataFrame
I have a Dataframe as I would like to have a lambda function in apply method to create a list of dictionary (including index item) as below If I use something like .apply(lambda x: <some operation>) here, x does not include the index rather the values. Cheers, DD Answer To expand Hans Bambel’s ans…
Pandas dataframe manipulation/re-sizing of a single-column count file
I have a file that looks like this: I want to read this into a pandas dataframe and re-shape it so that it looks like this: Is this possible? If so, how? Notes: it will not always be this size, so the solution needs to be size-independent. The input file will be max ~200gRNAs x 20genes. There will be gRNA_som…
select first occurrence where column value is greater than x for each A(key) | dataframe
What I have What I want, select first occurrence where D >= 4 for each A(key) So end result will look like, Answer You can first slice the rows that match the condition on D, then groupby A and get the first element of each group: output:
Explode a column with multiple values separated by comma
How are you? I have a database where some lines have more than one product and they are separated by a comma, as in the example below (there are other columns, but to make it more practical I only took these three). id produdct value 47 product1, product 2 12000.0 48 product3 48000.0 49 product4, product1, pr…
I am getting an error while i am running a function in pandas dataframe.I am getting invalid syntax for the first line of the function
def revised_price(engine-location,price): if engine-location==front: updated_price== price else: updated_price== 2*price return new_profit df[‘updated_price’] = df.apply(lambda x: revised_price(x[‘engine-location’], x[‘price’]),axis=1) Please find the error that i am gettin…
Nested dictionary to CSV convertion optimization
I have a dictionary like this: My function to transform that into a CSV is this one: My output is a csv file like this: It is working, but it isn’t well optimized. The process is very slow when I run into a dictionary with more than > 10,000 entries. Any ideas on how to speed this process up? Thank