Skip to content
Advertisement

Tag: pandas

Python Pandas merge only certain columns

Is it possible to only merge some columns? I have a DataFrame df1 with columns x, y, z, and df2 with columns x, a ,b, c, d, e, f, etc. I want to merge the two DataFrames on x, but I only want to merge columns df2.a, df2.b – not the entire DataFrame. The result would be a DataFrame with

What’s the fastest way in Python to calculate cosine similarity given sparse matrix data?

Given a sparse matrix listing, what’s the best way to calculate the cosine similarity between each of the columns (or rows) in the matrix? I would rather not iterate n-choose-two times. Say the input matrix is: The sparse representation is: In Python, it’s straightforward to work with the matrix-input format: Gives: That’s fine for a full-matrix input, but I really

How to add pandas data to an existing csv file?

I want to know if it is possible to use the pandas to_csv() function to add a dataframe to an existing csv file. The csv file has the same structure as the loaded data. Answer You can specify a python write mode in the pandas to_csv function. For append it is ‘a’. In your case: The default mode is ‘w’.

Dropping infinite values from dataframes in pandas?

How do I drop nan, inf, and -inf values from a DataFrame without resetting mode.use_inf_as_null? Can I tell dropna to include inf in its definition of missing values so that the following works? Answer First replace() infs with NaN: and then drop NaNs via dropna(): For example: The same method also works for Series.

Split a large pandas dataframe

I have a large dataframe with 423244 lines. I want to split this in to 4. I tried the following code which gave an error? ValueError: array split does not result in an equal division How to split this dataframe in to 4 groups? Answer Use np.array_split:

Finding label location in a DataFrame Index

I have a pandas dataframe: I am interested in find the label location of one of the labels, say, Looking at the index values, I know that is integer location of this label 1. How can get pandas to tell what the integer value of this label is? Answer You’re looking for the index method get_loc:

Writing a pandas DataFrame to CSV file

I have a dataframe in pandas which I would like to write to a CSV file. I am doing this using: And getting the following error: Is there any way to get around this easily (i.e. I have unicode characters in my data frame)? And is there a way to write to a tab delimited file instead of a CSV

Advertisement