Tag: pandas

How do I construct an incidence matrix from two dataframe columns using scipy.sparse.coo_matrix((data, (i, j)))?

numpy-ndarray pandas python scipy sparse-matrix

I have a pandas DataFrame containing two columns [‘A’, ‘B’]. Each column is made up of integers. I want to construct a sparse matrix with the following properties: row index is all integers from 0 to the max value in the dataframe column index is the same as row index entry i,j = 1 if [i,j] or [j,i] is a

Are optimizations possible when finding duplicates in pandas dataframe based on array overlap?

dataframe pandas python

I’ve a pandas dataframe that contains information about meetings. I need to find duplicate entries based on certain conditions. Let’s look at a sample dataframe first. The meeting-id is a unique identifier for a meeting. Brokers are the attendees of these meetings. The catch is, brokers are not always honest, and many of them can enter same meeting info multiple

How to convert a whole column from string type to date type in mongodb with pymongo

date mongodb pandas pymongo python

My data consist of 1million rows. A sample look like this: The thing is that date2 and date4 are in the form that i want but they are string and i want to convert them to date. The code i have used look like this: Do i need to convert them before inserting or after? Does anyone know how i

How to slice/chop a string using multiple indexes in a panda DataFrame

dataframe pandas python python-3.x string

I’m in need of some advice on the following issue: I have a DataFrame that looks like this: And what I need to get is the SEQ that’s separated between the different BEG_GAP and END_GAP. I already have worked it out (thanks to a previous question) for sequences that have only one pair of gaps, but here they have multiple.

Python reads csv in just one column

csv pandas python

So im am relativley new to python (using Python 3 with the spyder IDE) and i try to read in a csv file with some weather data.enter image description here The problem is that the file i have contains some empty cells and information i dont need. I only need from the row 18 as a header (all the physical

Select entries in one dataframe based on cross-sectional statistic of another dataframe

dataframe pandas python

I want to select the entries of one dataframe, say df2, based on the cross-sectional statistic of another dataframe, say df1: For instance, if the cross-sectional statistic on df1 is a max operation, then for the 3 rows in df1 the corresponding columns with the max entries are ‘D’, ‘C’, ‘B’ (corresponding to entries 11, 45, 314). Selecting only those

reorder data in pandas pivot_table function

pandas pivot-table python

I’ve a sample dataframe I’m trying to pivot the data using The values are not in order which I’ve mentioned in the above snippet. How can I re-structure my data to (by also repeating the row labels) Answer Use DataFrame.swaplevel with DataFrame.reindex: EDIT:

Python: How to add groupby but not affect ngroup()?

pandas pandas-groupby python

per user I want an unique item order (as they click through them). If a item already has been seen, then don’t cumulative count, but place the already assigned value there. For example, c,d, g & b in the tables below. I used the function below, but its not getting the job done at the moment. If I add the

Two-point Euclidean distance from csv file

dataframe pandas python

I want to calculate the distance between two points and label them. The problem is that the code doesn’t work on more than 1 line. When there is 1 row, the program shows me result which I want: This is an error when there is more than 1 line : “cannot convert the series to <class ‘float’>” This is my