How to combine queries with a single external variable using Pandas

Tags: , , ,



I am trying to accept a variable input of many search terms seperated by commas via html form (@search) and query 2 columns of a dataframe.

Each column query works on its own but I cannot get them to work together in a and/or way.

First column query:

filtered = df.query ('`Drug Name` in @search')

Second column query:

filtered = df.query ('BP.str.contains(@search, na=False)', engine='python')

edit combining like this:

filtered = df.query ("('`Drug Name` in @search') and ('BP.str.contains(@search, na=False)', engine='python')")

Gives the following error, highlighting the python identifier in the engine argument

SyntaxError: Python keyword not valid identifier in numexpr query

edit 2

The dataframe is read from an excel file, with columns: Drug Name (containing a single drug name), BP, U&E (with long descriptive text entries)

The search terms will be input via html form:

search = request.values.get('searchinput').replace(" ","").split(',')

as a list of drugs which a patient may be on sometimes with the addition of specific conditions relating to medication use. sample user input:

Captopril, Paracetamol, kidney disease, chronic

I want the list to be checked against specific drug names and also to check other columns such as BP and U&E for any mention of the search terms.

edit 3

Apologies, but trying to implement the answers given is giving me stacks of errors. What I have below is giving me 90% of what I’m after, letting me search both columns including the whole contents of ‘BP’. But I can only search a single term via the terminal, if I # out and swap the lines which collect the use input (taking it from the html form as apposed to the terminal) I get:

TypeError: unhashable type: ‘list’

@app.route('/', methods=("POST", "GET"))

    def html_table():
        searchterms = []
        #searchterms = request.values.get('searchinput').replace(" ","").split(',')
        searchterms = input("Enter drug...")   
        filtered = df.query('`Drug Name` in @searchterms | BP.str.contains(@searchterms, na=False)', engine='python')
        return render_template('drugsafety.html', tables=[filtered.to_html(classes='data')], titles=['na', 'Drug List'])

<form action="" method="post">
  <p><label for="search">Search</label>
  <input type="text" name="searchinput"></p>        
  <p><input type="submit"></p>
</form>

Sample data

The contents of the BP column can be quite long, descriptive and variable but an example is:

Every 12 months – Patients with CKD every 3 to 6 months.

Drug Name         BP                            U&E
Perindopril       Every 12 months               Not needed
Alendronic Acid   Not needed                    Every 12 months
Allopurinol       Whilst titrating - 3 months   Not needed

With this line:

searchterms = request.values.get('searchinput')

Entering ‘months’ into the html form outputs:

1   Perindopril  Every 12 months                Not needed 
14  Allopurinol  Whilst titrating – 3 months    Not needed

All good.

Entering ‘Alendronic Acid’ into the html form outputs:

13  Alendronic Acid Not needed  Every 12 months

Also good, but entering ‘Perindopril, Allopurinol’ returns nothing.

If I change the line to:

searchterms = request.values.get('searchinput').replace(" ","").split(',')

I get TypeError: unhashable type: ‘list’ when the page reloads.

However – If I then change:

filtered = df.query('`Drug Name` in @searchterms | BP.str.contains(@searchterms, na=False)', engine='python')

to:

filtered = df.query('`Drug Name` in @searchterms')

Then the unhashable type error goes and entering ‘Perindopril, Allopurinol’ returns:

1   Perindopril   Every 12 months                   Not needed
14  Allopurinol   Whilst titrating – Every 3 months Not needed

But I’m now no longer searching the BP column for the searchterms.

Just thought that maybe its because searchterms is a list ‘[]’ changed it t oa tuple ‘()’ Didn’t change anything.

Any help is much appreciated.

Answer

I am assuming you want to query 2 columns and want to return the row if any of the query matches.

In this line, the issue is that engine=python is inside query.

filtered = df.query ("('`Drug Name` in @search') and ('BP.str.contains(@search, na=False)', engine='python')")

It should be

df.query("BP.str.contains(@search, na=False)", engine='python')

If you do searchterms = request.values.get('searchinput').replace(" ","").split(','), it converts your string to list of words which will cause Unhashable type list error because str.contains expects str as input.

What you can do is use regex to search for search terms in list, it will look something like this:

df.query("BP.str.contains('|'.join(@search), na=False, regex=True)", engine='python')

What this does is it searches for all the individual words using regex. ('|'.join(@search) will be “searchterm_1|search_term2|…” and “|” is used to represent or in regex, so it looks for searchterm_1 or searchterm_2 in BP column value)

To combine the outputs of both queries, you can run those separately and concatenate the results

pd.concat([df.query("`Drug Name` in @search", engine='python'),df.query("BP.str.contains('|'.join(@search), na=False, regex=True)", engine='python')])

Also any string based matching will require your strings to match perfectly, including case. so you can maybe lowercase everything in dataframe and query. Similarly for space separated words, this will remove spaces.

if you do searchterms = request.values.get('searchinput').replace(" ","").split(',') on Every 12 months, it will get converted to “Every12months”. so you can maybe remove the .replace() part and just use searchterms = request.values.get('searchinput').split(',')



Source: stackoverflow