Tag: pandas

pandas df appending altering variables in multithreading: problems creating the initial index for the df, and is pd the correct tool for this?

I need to: Create a df that looks like this: items y y y item z z z item z z z item z z z The first column is named [‘items’] for convenience because the rows created under this custom index will do so based on changing variable item. It will be passed into the items column to create

Splitting object data into new columns in dataframe

dataframe pandas pyspark python

i have a dataframe with column business_id and attributes with thousands of rows like this: how do create new column for each attribute with the value to the business id ? and if it’s not applicable to that business id, it will specify false. example: while also noting that there are some attributes wit…

Retain strings in a column using a dictionary’s value

pandas python

I want to retain the string with the largest value based on a dictionary’s key and value. Any suggestion to how to do it effectively? Expected output: Answer One way it to use apply with max and fruit_dict.get as key: or, if you expect some names to be missing from the dictionary: output:

pivot df with duplicates as new rows

pandas pivot python

Evening, I have a dataframe that I want to reshape. there are duplicate id vars for some columns, and i want the duplicate values to appear as new rows my data looks like this, and i want to have the ids as a row, with the group as column, and the choices as the values. if there are multiple choices

Annotate just specific windows of imshow heatmaps with marks (e.g. “x”)

imshow matplotlib pandas python

Is it possible to annotate imshow heatmap the way that if the value from pandas Dataframe is e.g. less than 3, then make mark “x” in that specific heatmap window? Lets assume I have similar data to this example: I saw that we can annotate all heatmap windows with corresponding values, however I ca…

Converting dict to DataFrame gives too many rows

dataframe pandas python python-3.x

I am trying to convert a dict to Pandas DataFrame as the following: And when I print out the DataFrame, I see the following output: I expect to see 1 row only in the DataFrame but it gives 5. And I cannot understand why. What am I doing wrong here? Answer You’re not doing anything wrong. Since tags is a

The most efficient way to sum all possible pairs (x_ik, y_j) for a given k?

algorithm arrays numpy pandas python

I have two numpy array x with shape (n,m) and y with shape (p,). I would like to sum all possible pairs x[k, i] and y[j] to create a new numpy array z with shape (n, m*p). A naïve algorithm would be : This algorithm has a polynomial complexity : O(n*m*p) Knowing I am working on array with $n ~

In pandas, how to pivot a dataframe on a categorical series with missing categories?

categorical-data pandas pivot python

I have a pandas dataframe with a categorical series that has missing categories. In the example shown below, group has the categories “a”, “b”, and “c”, but there are no cases of “c” in the dataframe. The resulting pivoted dataframe has columns a and b. I expect…

Python Pandas: Append column value, based on another same column value

dataframe pandas python

I have a pandas dataframe like this. I want to append Town value, which is based on row have the same Source, Level and County value. I have tried isin, groupby, diff(but my value is str), but still not figure out. Image below is what I want to get. Really appreciate your help! Answer The way we can make this

Replace unknown values (with different median values)

pandas python

I have a particular problem, I would like to clean and prepare my data and I have a lot of unknown values for the “highpoint_metres” column of my dataframe (members). As there is no missing information for the “peak_id”, I calculated the median value of the height according to the peak…