I have a dataframe like this:
Name | Alt_01 | Alt_02 |
---|---|---|
AAPL | Apple | apple Inc. |
AMZN | Amazon | NaN |
in order to check if string contains alt names, I build code like:
search_dict = df.set_index('Name').T.dropna().to_dict('list') for key in search_dict: if any(name in query for name in search_dict[key]): match.append(key)
Since not all the names have same amount of alternative names, I put dropna() function to remove NaN values.
But after I do this, I receive message like:
UserWarning: DataFrame columns are not unique, some columns will be omitted.
and returns dict with only first alt name, eg.) {AAPL : [‘Apple’], AMZN : [‘Amazon’]}
Is there any good idea for solving this?
Advertisement
Answer
If I interpret your question correctly you want a resulting dictionary that looks like this:
{'AAPL': ['Apple', 'apple Inc.'], 'AMZN': ['Amazon']}
.
If that is the case then the following code will work:
temp = df.set_index('Name').T.to_dict('list') search_dict = {k: [elem for elem in v if elem is not np.nan] for k,v in temp.items()}
The reason why pandas’ dropna()
doesn’t work is because it will either delete a whole column (so in your example ‘apple Inc.’ would also deleted) or a whole row (in your example the whole ‘AMZN’ row would be deleted).
In case the way search_dict
is created is alien to you: it is comprised of a dictionary comprehension and a list comprehension. For more info see: https://realpython.com/list-comprehension-python/