Sort dataframe by substring condition excluding similar strings

I have a dataframe with a string type column named ‘tag’, tag has three categories (data_types): df[‘tag’] data_types=[‘DATA’,’DATAKIND’,’DATAKINDSIM’] If I want to count the number of rows there are …

Test of one dataframe in another

I have a pyspark dataframe df: A B C E00 FT AS E01 FG AD E02 FF AB E03 FH AW E04 FF AQ E05 FV AR E06 FD AE and another smaller pyspark dataframe but with 3 rows with …

CSV data to Python dictionary

I wrote my data which was in lists and dicts to a csv file, and when i import the csv file using pd.read_csv(‘file.csv’), everything becomes strings. How can i keep or convert it to its original …

Check for value of an dataframe exists in another and set values in a specific way accounting for duplicates

I have two dataframes In df1, i got an order of id’s assigned to people, each person can have at most 2 id’s: df1 id1 id2 2040 0 2041 2050 2042 0 2043 0 2044 2051 2045 …

Replace grouped columns’ outliers with mean of the group based on defined zscore

I have a very huge dataFrame with many datapoints on a map with outliers which are very close to each other on the dataset(Latitudes and longitudes). I would like to group all the rows as shown below …

Pandas dataframe custom formatting string to time

I have a dataframe that looks like this DEP_TIME 0 1851 1 1146 2 2016 3 1350 4 916 … 607341 554 607342 633 607343 657 607344 …

How to select rows where date is in index in Python Pandas DataFrame?

I have DataFrame in Pythonlike below where data is in index (we can name this column “date”): and I would like to select all column of this DF where data in index is > than 01.01.2020, …

Pandas average of previous rows fulfilling condition

I have a huge data-frame (>20m rows) with each row containing a timestamp and a numeric variable X. I want to assign a new column where for each row the value in this new column is the average of X …

Sorting a table in python with alphabet and numbers

I have the following table: Column1 Column2 99 QA 65 CD 134 LL N12 OO 127 KK Q23 MM 1 AA A10 KL K9 MA I would like to sort the table such that the numbers are sorted in descending order …

Merging pandas columns into a new column

Suppose I have a dataframe as follows earningspersharebasic earningsPerShareBasic 2019 -0.19 NaN 2018 NaN 4.00 2017 …