Tag: pandas

How to convert first column of dataframe in to its headers

I have dataframe df: O/P should be: I want column containing(a, b,c,d,e) as header of my dataframe. Could anyone help? Answer If your dataframe is pandas and its name is df. Try solving it with pandas: Firstly convert initial df content to a list, afterwards create a new dataframe defining its columns with th…

Pandas how to pivot/unpivot/add a dummy column name

pandas python

I want to convert from a long to a wide table with dummy column names created based on the number of accid sample excel input vs output attached Please help Answer I was able to get down to 2 steps, pivot_table using aggfunc=list, and then creating new columns from that list. I’m not sure I’ve com…

How to accumulate in a df parsed data through a loop with pandas from a web scrapping?

dataframe loops pandas python

I want to create a df with an historical dataset by scrapping a website, but I struggle to accumulate the full period within the loop. I am able to download a day, but when I try to create a loop to storage a set of iterations I am not able to accumulate the data in the dataframe. The df I

Find trigrams for all groupby clusters in a Pandas Dataframe and return in a new column

dataframe n-gram nltk pandas python

I’m trying to return the highest frequency trigram in a new column in a pandas dataframe for each group of keywords. (Essentially something like a groupby with transform, returning the highest trigram in a new column). An example dataframe with dummy data Desired Output Minimum Reproducible Example What…

pandas cumsum on lag-differenced dataframe

cumsum diff pandas python

Say I have a pd.DataFrame() that I differenced with .diff(5), which works like “new number at idx i = (number at idx i) – (number at idx i-5)” Now I want to undo this operation using the first 5 entries of example_df, and using df_diff. If i had done .diff(1), I would simply use .cumsum(). B…

Replicate a function from pandas into pyspark

apache-spark pandas pyspark python

I am trying to execute the same function on a spark dataframe rather than pandas. Answer A direct translation would require you to do multiple collect for each column calculation. I suggest you do all calculations for columns in the dataframe as a single row and then collect that row. Here’s an example.…

How to remove duplicate values in one column but keep the rows pandas?

duplicates pandas python

I have dataframe as per below Country: China, China, China, United Kingdom, United Kingdom,United Kingdom Country code: CN, CN, CN, UK, UK, UK Port Name: Yantian, Shekou, Quanzhou, Plymouth, Cardiff, Bird port I want to remove the duplicates in the first two columns, only keep as: Country: China, , , United K…

How to get prior close when you have all stocks in a single DF?

numpy ohlc pandas python stock

Sorry for the noob question. I have a bunch of stocks in a sqlite3 database: When I print the df, it gives me the following (where each stock_id refers to a unique stock, e.g APPL): I need to target each unique stock_id individually, and get the prior close. I know if each stock was in its own separate datafr…