I am trying to accept a variable input of many search terms seperated by commas via html form (@search) and query 2 columns of a dataframe.
Each column query works on its own but I cannot get them to work together in a and/or way.
First column query:
filtered = df.query ('`Drug Name` in @search')
Second column query:
filtered = df.query ('BP.str.contains(@search, na=False)', engine='python')
edit combining like this:
filtered = df.query ("('`Drug Name` in @search') and ('BP.str.contains(@search, na=False)', engine='python')")
Gives the following error, highlighting the python identifier in the engine argument
SyntaxError: Python keyword not valid identifier in numexpr query
edit 2
The dataframe is read from an excel file, with columns: Drug Name (containing a single drug name), BP, U&E (with long descriptive text entries)
The search terms will be input via html form:
search = request.values.get('searchinput').replace(" ","").split(',')
as a list of drugs which a patient may be on sometimes with the addition of specific conditions relating to medication use. sample user input:
Captopril, Paracetamol, kidney disease, chronic
I want the list to be checked against specific drug names and also to check other columns such as BP and U&E for any mention of the search terms.
edit 3
Apologies, but trying to implement the answers given is giving me stacks of errors. What I have below is giving me 90% of what I’m after, letting me search both columns including the whole contents of ‘BP’. But I can only search a single term via the terminal, if I # out and swap the lines which collect the use input (taking it from the html form as apposed to the terminal) I get:
TypeError: unhashable type: ‘list’
@app.route('/', methods=("POST", "GET")) def html_table(): searchterms = [] #searchterms = request.values.get('searchinput').replace(" ","").split(',') searchterms = input("Enter drug...") filtered = df.query('`Drug Name` in @searchterms | BP.str.contains(@searchterms, na=False)', engine='python') return render_template('drugsafety.html', tables=[filtered.to_html(classes='data')], titles=['na', 'Drug List']) <form action="" method="post"> <p><label for="search">Search</label> <input type="text" name="searchinput"></p> <p><input type="submit"></p> </form>
Sample data
The contents of the BP column can be quite long, descriptive and variable but an example is:
Every 12 months – Patients with CKD every 3 to 6 months.
Drug Name BP U&E Perindopril Every 12 months Not needed Alendronic Acid Not needed Every 12 months Allopurinol Whilst titrating - 3 months Not needed
With this line:
searchterms = request.values.get('searchinput')
Entering ‘months’ into the html form outputs:
1 Perindopril Every 12 months Not needed 14 Allopurinol Whilst titrating – 3 months Not needed
All good.
Entering ‘Alendronic Acid’ into the html form outputs:
13 Alendronic Acid Not needed Every 12 months
Also good, but entering ‘Perindopril, Allopurinol’ returns nothing.
If I change the line to:
searchterms = request.values.get('searchinput').replace(" ","").split(',')
I get TypeError: unhashable type: ‘list’ when the page reloads.
However – If I then change:
filtered = df.query('`Drug Name` in @searchterms | BP.str.contains(@searchterms, na=False)', engine='python')
to:
filtered = df.query('`Drug Name` in @searchterms')
Then the unhashable type error goes and entering ‘Perindopril, Allopurinol’ returns:
1 Perindopril Every 12 months Not needed 14 Allopurinol Whilst titrating – Every 3 months Not needed
But I’m now no longer searching the BP column for the searchterms.
Just thought that maybe its because searchterms is a list ‘[]’ changed it t oa tuple ‘()’ Didn’t change anything.
Any help is much appreciated.
Advertisement
Answer
I am assuming you want to query 2 columns and want to return the row if any of the query matches.
In this line, the issue is that engine=python is inside query.
filtered = df.query ("('`Drug Name` in @search') and ('BP.str.contains(@search, na=False)', engine='python')")
It should be
df.query("BP.str.contains(@search, na=False)", engine='python')
If you do searchterms = request.values.get('searchinput').replace(" ","").split(',')
, it converts your string to list of words which will cause Unhashable type list
error because str.contains expects str as input.
What you can do is use regex to search for search terms in list, it will look something like this:
df.query("BP.str.contains('|'.join(@search), na=False, regex=True)", engine='python')
What this does is it searches for all the individual words using regex. ('|'.join(@search)
will be “searchterm_1|search_term2|…” and “|” is used to represent or in regex, so it looks for searchterm_1 or searchterm_2 in BP column value)
To combine the outputs of both queries, you can run those separately and concatenate the results
pd.concat([df.query("`Drug Name` in @search", engine='python'),df.query("BP.str.contains('|'.join(@search), na=False, regex=True)", engine='python')])
Also any string based matching will require your strings to match perfectly, including case. so you can maybe lowercase everything in dataframe and query. Similarly for space separated words, this will remove spaces.
if you do searchterms = request.values.get('searchinput').replace(" ","").split(',')
on Every 12 months
, it will get converted to “Every12months”. so you can maybe remove the .replace() part and just use searchterms = request.values.get('searchinput').split(',')