Tag: pandas

Python Pandas merge only certain columns

Is it possible to only merge some columns? I have a DataFrame df1 with columns x, y, z, and df2 with columns x, a ,b, c, d, e, f, etc. I want to merge the two DataFrames on x, but I only want to merge columns df2.a, df2.b – not the entire DataFrame. The result would be a DataFrame with

Pandas: Looking up the list of sheets in an excel file

excel openpyxl pandas python xlrd

The new version of Pandas uses the following interface to load Excel files: but what if I don’t know the sheets that are available? For example, I am working with excel files that the following sheets Data 1, Data 2 …, Data N, foo, bar but I don’t know N a priori. Is there any way to get the list

What’s the fastest way in Python to calculate cosine similarity given sparse matrix data?

Given a sparse matrix listing, what’s the best way to calculate the cosine similarity between each of the columns (or rows) in the matrix? I would rather not iterate n-choose-two times. Say the input matrix is: The sparse representation is: In Python, it’s straightforward to work with the matrix-input format: Gives: That’s fine for a full-matrix input, but I really

How to add pandas data to an existing csv file?

csv dataframe pandas python

I want to know if it is possible to use the pandas to_csv() function to add a dataframe to an existing csv file. The csv file has the same structure as the loaded data. Answer You can specify a python write mode in the pandas to_csv function. For append it is ‘a’. In your case: The default mode is ‘w’.

Dropping infinite values from dataframes in pandas?

numpy pandas python

How do I drop nan, inf, and -inf values from a DataFrame without resetting mode.use_inf_as_null? Can I tell dropna to include inf in its definition of missing values so that the following works? Answer First replace() infs with NaN: and then drop NaNs via dropna(): For example: The same method also works for Series.

How can I map True/False to 1/0 in a Pandas DataFrame?

boolean dataframe numpy pandas python

I have a column in python pandas DataFrame that has boolean True/False values, but for further calculations I need 1/0 representation. Is there a quick pandas/numpy way to do that? Answer A succinct way to convert a single column of boolean values to a column of integers 1 or 0:

Split a large pandas dataframe

pandas python

I have a large dataframe with 423244 lines. I want to split this in to 4. I tried the following code which gave an error? ValueError: array split does not result in an equal division How to split this dataframe in to 4 groups? Answer Use np.array_split:

Finding label location in a DataFrame Index

pandas python

I have a pandas dataframe: I am interested in find the label location of one of the labels, say, Looking at the index values, I know that is integer location of this label 1. How can get pandas to tell what the integer value of this label is? Answer You’re looking for the index method get_loc:

Writing a pandas DataFrame to CSV file

csv dataframe pandas python

I have a dataframe in pandas which I would like to write to a CSV file. I am doing this using: And getting the following error: Is there any way to get around this easily (i.e. I have unicode characters in my data frame)? And is there a way to write to a tab delimited file instead of a CSV

How to read a .xlsx file using the pandas Library in iPython?

dataframe ipython jupyter-notebook pandas python

I want to read a .xlsx file using the Pandas Library of python and port the data to a postgreSQL table. All I could do up until now is: Now I know that the step got executed successfully, but I want to know how i can parse the excel file that has been read so that I can understand how