Skip to content

Tag: pandas

Replace duplicate value with NaN using groupby

Dataset(MWE) I am trying to replace duplicates from columns {people_vaccinated,people_fully_vaccinated,people_vaccinated_per_hundred} with NaN while using groupby() on location. I tried some solution online, but couldn’t get them working for me, so instead used the below logic The above logic fails when…

Extracting Specific Text From column in dataframe

I have the following dataframe and I’m trying to extract the string that has the ABC followed by it’s numbers. Description ABC12345679 132465 Test ABC12346548 Test ABC1231321 4645 I have tried: But its giving me what it comes after on instances that there’s more text after the ABC* like so: …

Remove by column in pandas.DataFrame.hist

After specifying grouping by column a and restricting to column f and g for histogram, I still have column a showing up in green. Is there a way to remove it without going into matplotlib or for loop? Answer This is clearly a bug with the pandas library. The problem seems to arise when by is a numeric dtype c…

How to upload pandas, sqlalchemy package in lambda to avoid error “Unable to import module ‘lambda_function’: No module named ‘importlib_metadata'”?

I’m trying to upload a deployment package to my AWS lambda function following the article https://korniichuk.medium.com/lambda-with-pandas-fd81aa2ff25e. My final zip file is as follows: https://drive.google.com/file/d/1NLjvf_-Ks50E8z53DJezHtx7-ZRmwwBM/view but when I run my lambda function I get the err…

Pandas, drop duplicates but merge certain columns

I’m looking for a way to drop duplicate rows based one a certain column subset, but merge some data, so it does not get removed. Parcel Res Bill Year 001 Henry 4,100 1995 002 Nick 2,300 1990 003 Paul 5,200 2008 003 Bill 4,000 2008 Some pseudo code would look something like this: Parcel Res Bill Year 001…