Tag: pandas

Pandas – Count repeating values by condition

Dataframe: I have columns “group” and “val” and I don’t know how to write pandas code to get column “count”? The logic is like this, it should count the number of consecutive values that are on the same side (either positive or negative) grouped by column “group…

Websocket Json Data to DataFrame

pandas python websocket

I am learning how to work with APIs and web sockets in finance. My goal for this code is to access data and create a DataFrame with only columns (index, ask, bid & quote) I have tried appending values to the DataFrame but it creates a new DataFrame every time I receive a message similar to the df = new_df…

How to compare each value of column B with the value of column A?

dataframe pandas python

Compare each value in B column with the first value in A column until it is greater than it, then set the expected column to true. Then compare the value of A column with the expected column that is true until B column value is greater than it,then set the expected column to true. Input: Expected Output Answe…

Pandas groupby column and sum nulls of all other columns

pandas python

I have a dataframe with the following structure: I’d like to know, grouping by group, how many nulls there are in each column. In this case, the output should be: I don’t have control on how many columns I have or their names. Thanks! Answer Convert column group to index, test all another values f…

Apply T-Test test per group

pandas pandas-apply python

I have dataframe like this: And i want to calculate p-value from T-Test for each variable between groups. I can manually calculate each p-value like this: So the question is how can i get a result dataframe like shown below for all variables automatically? Answer There are several ways, the core idea is to us…

Add additional timestamp to Pandas DataFrame items based on item timestamp/index

dataframe pandas python

I have a large time-indexed Pandas DataFrame with time-series data of a couple of devices. The structure of this DataFrame (in code below self._combined_data_frame) looks like this: The DateTimeIndex and device_name are filled for every row, the other columns contain nan values. Sample data is available on Go…

In Jupyter notebooks, how to connect to MS SQL with a different Windows user

jupyter-notebook pandas python sql-server windows

I have Select access to a MS SQL database that I would like to extract data into a Pandas dataframe running inside a Jupyter notebook. For reasons out of my control, I have access to the database from a different user. How can I query the database from Jupyter while connected to my current user account? Answe…

Creating another column in pandas based on a pre-existing column

data-cleaning dataframe pandas python

I have a third column in my data frame where I want to be able to create a fourth column that looks almost the same, except it has no double quotes and there is a ‘user/’ prefix before each ID in the list. Also, sometimes it is just a single ID vs. list of IDs (as shown in example DF).

Update column based on grouped date values

dataframe numpy pandas python

Edited/reposted with correct sample output. I have a dataframe that looks like the following: This dataframe is split into groups by ID. I would like to make an updated combined column based on if df[‘bool’] == True, but only if df[‘bool’] == True AND there is another ‘finished&#…