Tag: dataframe

How to choose values from col1 if their values are in col2 but not on list in Python Pandas?

I have DataFrame in Python Pandas like below: And I need to select only these people from “col1” which have value from “col” in column “description” and something else (nevermind before or after), but it something else could not be from bad_list. So I need to select only John Bravo and Ann Still because they have their value from “col1”

How to combine rows that have the same values in two columns (Python)?

csv dataframe numpy python python-3.x

I currently have a csv file as follows. The first part just shows the columns names. The g column values are the same for every f value. The only unique part is p. Using python, how could I combine this as follows: One thing to note is that the csv file is much larger and that some f values might

How to Create a Correlation Dataframe from already related data

dataframe pandas python

I have a data frame of language similarity. Here is a small snippet that’s been edited for simplicity: I would like to create a correlation dataframe such as: To create the first dataframe, I ran: I have tried: Which returns: I have looked at other similar questions but it seems that the data for use in .corr() is by itself

How to automatically fill blank column with None in pandas

dataframe pandas python

I have a info.txt file it looks like this: And when I use pandas to read it: the error is: Is there any way to automatically fill the row that not the same column length, the output should looks like: I mean every blank column will be fill with None Answer This works, and should(?) be the same as reading

Write a loop code to calculate average 77 different times, using another column as criteria

average dataframe function loops python

First of all, that’s my first code and question, so sorry for the begginer level here and lack of vocabulary. I would like to calculate and store in a dataframe the average of the first 5 rows in a column “returns” with column “N” numbered as 1, and afterwards proceeding to calculate the average return of next 5 rows using

Dataframe: shifting values over columns

dataframe numpy pandas python

I have a dataframe with some NaN values in my s_x columns. If NaN values exist in them, I want them to be in the last columns. Example: Given values in the s_x columns of [Nan, 1, Nan, 2] I want the values to shift left over the columns to result in [1, 2, NaN, NaN] Example 2: My current

Using a for loop index in .loc to access a rolling slice of a dataframe?

dataframe loops pandas python slice

I want to create rolling slices of a master dataframe. I’m trying to measure the difference in outcomes over rolling periods. The master dataframe has 120 years of data and I want to create rolling slices of 10 years of a column(s), i.e slice one goes from year 1 to 10, slice 2 goes from year 2 to 11, etc…

Python : Use of the previous value generated from a function in the same function

dataframe function pandas python shift

I am trying to have a rolling average of all the highs of [‘RSIndex’] greater than 52, series will have NaN value if the first ref value is less than 52, but I want to have the previous iterated value of [‘high_r’] that the function has generated if any other ref value is less than 52. If, anyone has any

Pandas how to explode several items of list for each new row

data-munging data-science dataframe pandas python

I have a dataframe: I want explode it such that every 3 elements from each list in the column l will be a new row, and the column for the triplet index within the original list. So I will get: What is the best way to do so? Answer Break list element into chunks first and then explode: If you

Get the first and last value of a column of dataframe respect another column

dataframe format multiple-columns python unique-values

I’m a beginner on python and I would like to get the first and last value of the column date always that the mac_address be the same, for example: I’ve ordered my dataframe by mac_address, date with the next line: And the data are: NOTE: the date column has the format “2021-01-01 05:50:54” and the differents mac address that appears