I am trying to solve a nlp problem, here in dataframe text column have lots of rows filled with urls like http.somethingsomething.some of the urls and other texts have no space between them for example- ‘:http:\something’,’;http:\something’,’,http:\something’. so there some…
Tag: pandas
ImportError: Missing optional dependency ‘xlrd’. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd
I used pandas to read excel file and then received an ImportError shown below. code: Error: Then I installed xlrd on my computer using code shown below: But I still received the same issue. In output, it always returned this ImportError. It made me feel confused and frustrated, because I have installed xlrd o…
Replace duplicate value with NaN using groupby
Dataset(MWE) I am trying to replace duplicates from columns {people_vaccinated,people_fully_vaccinated,people_vaccinated_per_hundred} with NaN while using groupby() on location. I tried some solution online, but couldn’t get them working for me, so instead used the below logic The above logic fails when…
Extracting Specific Text From column in dataframe
I have the following dataframe and I’m trying to extract the string that has the ABC followed by it’s numbers. Description ABC12345679 132465 Test ABC12346548 Test ABC1231321 4645 I have tried: But its giving me what it comes after on instances that there’s more text after the ABC* like so: …
Remove by column in pandas.DataFrame.hist
After specifying grouping by column a and restricting to column f and g for histogram, I still have column a showing up in green. Is there a way to remove it without going into matplotlib or for loop? Answer This is clearly a bug with the pandas library. The problem seems to arise when by is a numeric dtype c…
How to upload pandas, sqlalchemy package in lambda to avoid error “Unable to import module ‘lambda_function’: No module named ‘importlib_metadata'”?
I’m trying to upload a deployment package to my AWS lambda function following the article https://korniichuk.medium.com/lambda-with-pandas-fd81aa2ff25e. My final zip file is as follows: https://drive.google.com/file/d/1NLjvf_-Ks50E8z53DJezHtx7-ZRmwwBM/view but when I run my lambda function I get the err…
Apply for loop in multiple dataframe for multiple columns?
Dataframe is like below: Where I want to change dataframes value to ‘dead’ if age is more than 100. Desired outcome I was trying something like this: Error shown: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() I am looking for a loop that works on all d…
how to convert generated data into pandas dataframe
after creating the data. it is tuple and after converting tuple into pandas dataframe so i got 9 features (columns) but when i try to insert 9 cols it says. ValueError: Shape of passed values is (2, 1), indices imply (2, 9) Basically i wanna generate data and convert it into pandas dataframe but could not get…
Pandas, drop duplicates but merge certain columns
I’m looking for a way to drop duplicate rows based one a certain column subset, but merge some data, so it does not get removed. Parcel Res Bill Year 001 Henry 4,100 1995 002 Nick 2,300 1990 003 Paul 5,200 2008 003 Bill 4,000 2008 Some pseudo code would look something like this: Parcel Res Bill Year 001…
How do I pass a function parameter into a lambda function subsequently
I am trying to pass in the timeframe=’month’ parameter into my function. I tried applying with lambda function but it doesn’t seem to work. Any advice on how to apply my timeframe inside? I want to able to extract the day, month or year with a function. Answer you can do that by using the du…