I have this dataframe: I want to replace the non-First values of the columns with NaN, for each day. This is how should the dataframe look like: This is what i tried: #i’m trying to use replace, but this does not consider the date Answer groupby + rank First create boolean mask with isna, then use group…
Tag: pandas
Operating large .csv file with pandas/dask Python
I’ve got a large .csv file (5GB) from UK land registry. I need to find all real estate that has been bought/sold two or more times. Each row of the table looks like this: I’ve never used pandas or any data science library. So far I’ve come up with this plan: Load the .csv file and add header…
How to skip apply function for missing value cell in pandas
I have a Dataset as below : I write the code to calculate networkdays for these row have date value in column ‘End Date’ : however, I got the error below, I don’t know how I got this, could you please help look ? my expect output like below: Answer I believe the problem comes from how you ca…
How to map list of string to existing list of integer?
I have this string vocab file: https://drive.google.com/file/d/1mL461QGC5KcA3M1r8AESaPjZ3D_ufgPA/view?usp=sharing. I have this sentences file, made from all vocab file above: https://drive.google.com/file/d/1w5ma4ROjyp6xmZfvnIQjsdH2I_K7lHoo/view?usp=sharing. I want to map every sentences into its correspondin…
DataFrame contains a column of dates which are having these types: “‘5-15-2019′” and 05152021.I want to extract pattern of it
DataFrame contains dates which are having these types: “21-10-2021” and 29052021.I want to extract pattern of it. for example ‘5-15-2019’,it needs to produce ‘%d-%m-%Y’ ‘05152021’ it needs to produce ‘%d%m%Y’ i tried in this way: output: i got a list…
Pandas UDF throws error not of required length
I have a delta table which has thrift data from kafka and I am using a UDF to deserialize it. I have no issues when I use regular UDF, but I get an error when I try to use Pandas UDF. This runs fine i.e. ruglar UDF But when I use Pandas UDF I get an error PythonException: ‘RuntimeError: Result
Labeling sentences from different nested dictionaries
I created a function to extract sentences from a specific key in a nested file. Now I would like to include in this function a label each time it comes to a new dictionary. Each time the the value HEADER appears marks the begining of a NEW story. So I would like to label the sentences that belong to the
pandas dataframe moving certain headers to index
I have the following dataframe: Desired output: I have tried: The real dictionary is very large with over 30 versions so simply typing out the version numbers into a list is not an option. thanks Answer Try this: Output:
Transforming many columns into 3 column categories which contains lists?
I have a DataFrame with 31 columns, which contains 3 categories “Classic”, “Premium” and “Luxe” I want to swap the way the DataFrame works to have only 3 comumns “Classic”, “Premium” and “Luxe” and 31 categories which can be listed inside…
How to sum a value based on group?
I am trying to figure out how to sum a value from rank 5 to the LOWEST rank (I.E. 5-1,000) for each geography in my dataframe. However, I am getting the error: ‘DataFrameGroupBy’ object has no attribute ‘iloc’ Am I using iloc incorrectly? Answer IIUC, try: