Tag: dataframe

duplicated rows in pandas append inside for loop

I am having trouble with a for loop inside a function. I am calculating cosine distances for a list of word vectors. with each vector, I am calculating the cosine distance and then appending it as a new column to the pandas dataframe. the problem is that there are several models, so i am comparing a word vector from model

Pandas skipping lines when in read_csv, can I record these to variable/log file

dataframe pandas python

I’ve seen similar questions on here but nothing that is quite what I want to do. I’m reading in a tsv/csv file using I have clearly defined headers within the file but sometimes I see that the file has unexpected additional columns and get the following messages in the console Skipping line 251643: Expected 20 fields in line 251643, saw

Creating new columns within a dataframe, based on the latest value from previous columns

dataframe pandas python

I’ve just completed a beginner’s course in python, so please bear with me if the code below doesn’t make sense or my issue is because of some rookie mistake. I’ve been trying to put the learning to use by working with college production of NFL players, with a view to understanding which statistics can be predictive or at least correlate

How to groupby multiple columns with count unique value in Python Pandas

dataframe pandas pandas-groupby python

I have a DataFrame df_data: I have a function and parameter like this: Explain Parameters: with CustID = 1 the parameters should be list_minor = [3,1] (position is not important), list_major = [1] because with LocationID = 324 he get 3 times and LocationID = 490 he get 1 time (324,490 gets isMajor = 0 so it should be into

How to assign a value to a column for a subset of dataframe based on a condition in Pandas?

dataframe pandas python

I have a data frame: df: index A class label 0 4 0 0 1 5 1 0 2 6 0 0 3 7 1 0 I want to change the label to 1, if the mean of A column of rows with class 0 is bigger than the mean of all data in column A? How to do this

Create a new list of dictionary from the index in dataframe Python with the fastest way

dataframe dictionary multiprocessing python python-multiprocessing

I have a ~200mil data in dictionary index_data: Key is a value in CustId and Value is an index of CustID in df_data: I have a DataFrame df_data: NOTE: If CustID is duplicate, only column Score have different data in each row I want to create a new list of dict(Total_Score is an avg Score of each CustID, Number is

Transpose 3 column excel with K:V into column Pandas

conditional-statements dataframe numpy pandas python

I have a 3 column excel file I’m reading into pandas with basically k:v pairs in columns I need to not only tie the information in unnamed:1 & unnamed:2 to the unique animal ID as this is how I will track the animal but also transpose these columns where everything to the left of the “:” is the column header

extract new columns and fill values based on categorical values data frame in python

categorical-data dataframe multiple-columns pivot python

I have a data frame where one column is categorical strings and the next one is the values corresponding to it: I want to create new columns based on df.status column, and fill empty ones with np.nan, requires pivot on multiple columns: I am looking for an efficient solution that works for large data frames. Answer You want:

Check if Dataframe is empty and print results

dataframe pandas python

I would like to go over an excel file with different stock symbols. How can I check after reading the stocks values (Open,Close,High,Low,Volume) in a dataframe with yahoo, if the dataframe is empty? In this excel list are more than 700 Symbols and some times yahoo have no data for some symbols. So I would like to exclude this symbols,