I have a pandas DataFrame containing two columns [‘A’, ‘B’]. Each column is made up of integers. I want to construct a sparse matrix with the following properties: row index is all integers from 0 to the max value in the dataframe column index is the same as row index entry i,j = 1 if [i,j] or [j,i] is a
Tag: pandas
Are optimizations possible when finding duplicates in pandas dataframe based on array overlap?
I’ve a pandas dataframe that contains information about meetings. I need to find duplicate entries based on certain conditions. Let’s look at a sample dataframe first. The meeting-id is a unique identifier for a meeting. Brokers are the attendees of these meetings. The catch is, brokers are not always honest, and many of them can enter same meeting info multiple
How to convert a whole column from string type to date type in mongodb with pymongo
My data consist of 1million rows. A sample look like this: The thing is that date2 and date4 are in the form that i want but they are string and i want to convert them to date. The code i have used look like this: Do i need to convert them before inserting or after? Does anyone know how i
How to slice/chop a string using multiple indexes in a panda DataFrame
I’m in need of some advice on the following issue: I have a DataFrame that looks like this: And what I need to get is the SEQ that’s separated between the different BEG_GAP and END_GAP. I already have worked it out (thanks to a previous question) for sequences that have only one pair of gaps, but here they have multiple.
Python reads csv in just one column
So im am relativley new to python (using Python 3 with the spyder IDE) and i try to read in a csv file with some weather data.enter image description here The problem is that the file i have contains some empty cells and information i dont need. I only need from the row 18 as a header (all the physical
Select entries in one dataframe based on cross-sectional statistic of another dataframe
I want to select the entries of one dataframe, say df2, based on the cross-sectional statistic of another dataframe, say df1: For instance, if the cross-sectional statistic on df1 is a max operation, then for the 3 rows in df1 the corresponding columns with the max entries are ‘D’, ‘C’, ‘B’ (corresponding to entries 11, 45, 314). Selecting only those
reorder data in pandas pivot_table function
I’ve a sample dataframe I’m trying to pivot the data using The values are not in order which I’ve mentioned in the above snippet. How can I re-structure my data to (by also repeating the row labels) Answer Use DataFrame.swaplevel with DataFrame.reindex: EDIT:
Python: How to add groupby but not affect ngroup()?
per user I want an unique item order (as they click through them). If a item already has been seen, then don’t cumulative count, but place the already assigned value there. For example, c,d, g & b in the tables below. I used the function below, but its not getting the job done at the moment. If I add the
Parsing nested JSON with list comprehension in Python
My data is as following (this just extract but there are much more objects, some don’t have the additionalData) I’m trying to iterate with list comprehension to get dataframe of referenceDataItems and everything within that key, also additionalData if appears. Expected result: Answer I did some research and this almost got my desired data, needs little modification in COLUMNS_TO_DROP
Two-point Euclidean distance from csv file
I want to calculate the distance between two points and label them. The problem is that the code doesn’t work on more than 1 line. When there is 1 row, the program shows me result which I want: This is an error when there is more than 1 line : “cannot convert the series to <class ‘float’>” This is my