I need to: Create a df that looks like this: items y y y item z z z item z z z item z z z The first column is named [‘items’] for convenience because the rows created under this custom index will do so based on changing variable item. It will be passed into the items column to create
Tag: pandas
Splitting object data into new columns in dataframe
i have a dataframe with column business_id and attributes with thousands of rows like this: how do create new column for each attribute with the value to the business id ? and if it’s not applicable to that business id, it will specify false. example: while also noting that there are some attributes wit…
Retain strings in a column using a dictionary’s value
I want to retain the string with the largest value based on a dictionary’s key and value. Any suggestion to how to do it effectively? Expected output: Answer One way it to use apply with max and fruit_dict.get as key: or, if you expect some names to be missing from the dictionary: output:
pivot df with duplicates as new rows
Evening, I have a dataframe that I want to reshape. there are duplicate id vars for some columns, and i want the duplicate values to appear as new rows my data looks like this, and i want to have the ids as a row, with the group as column, and the choices as the values. if there are multiple choices
Annotate just specific windows of imshow heatmaps with marks (e.g. “x”)
Is it possible to annotate imshow heatmap the way that if the value from pandas Dataframe is e.g. less than 3, then make mark “x” in that specific heatmap window? Lets assume I have similar data to this example: I saw that we can annotate all heatmap windows with corresponding values, however I ca…
Converting dict to DataFrame gives too many rows
I am trying to convert a dict to Pandas DataFrame as the following: And when I print out the DataFrame, I see the following output: I expect to see 1 row only in the DataFrame but it gives 5. And I cannot understand why. What am I doing wrong here? Answer You’re not doing anything wrong. Since tags is a
The most efficient way to sum all possible pairs (x_ik, y_j) for a given k?
I have two numpy array x with shape (n,m) and y with shape (p,). I would like to sum all possible pairs x[k, i] and y[j] to create a new numpy array z with shape (n, m*p). A naïve algorithm would be : This algorithm has a polynomial complexity : O(n*m*p) Knowing I am working on array with $n ~
In pandas, how to pivot a dataframe on a categorical series with missing categories?
I have a pandas dataframe with a categorical series that has missing categories. In the example shown below, group has the categories “a”, “b”, and “c”, but there are no cases of “c” in the dataframe. The resulting pivoted dataframe has columns a and b. I expect…
Python Pandas: Append column value, based on another same column value
I have a pandas dataframe like this. I want to append Town value, which is based on row have the same Source, Level and County value. I have tried isin, groupby, diff(but my value is str), but still not figure out. Image below is what I want to get. Really appreciate your help! Answer The way we can make this
Replace unknown values (with different median values)
I have a particular problem, I would like to clean and prepare my data and I have a lot of unknown values for the “highpoint_metres” column of my dataframe (members). As there is no missing information for the “peak_id”, I calculated the median value of the height according to the peak…