Pandas : DataFrame columns are not unique when making dictionary

Tags: ,



I have a dataframe like this:

Name Alt_01 Alt_02
AAPL Apple apple Inc.
AMZN Amazon NaN

in order to check if string contains alt names, I build code like:

search_dict = df.set_index('Name').T.dropna().to_dict('list')
        for key in search_dict:
            if any(name in query for name in search_dict[key]):
                match.append(key)

Since not all the names have same amount of alternative names, I put dropna() function to remove NaN values.

But after I do this, I receive message like:

UserWarning: DataFrame columns are not unique, some columns will be omitted.

and returns dict with only first alt name, eg.) {AAPL : [‘Apple’], AMZN : [‘Amazon’]}

Is there any good idea for solving this?

Answer

If I interpret your question correctly you want a resulting dictionary that looks like this:

{'AAPL': ['Apple', 'apple Inc.'], 'AMZN': ['Amazon']}.

If that is the case then the following code will work:

temp = df.set_index('Name').T.to_dict('list')
search_dict = {k: [elem for elem in v if elem is not np.nan] for k,v in temp.items()}

The reason why pandas’ dropna() doesn’t work is because it will either delete a whole column (so in your example ‘apple Inc.’ would also deleted) or a whole row (in your example the whole ‘AMZN’ row would be deleted).

In case the way search_dict is created is alien to you: it is comprised of a dictionary comprehension and a list comprehension. For more info see: https://realpython.com/list-comprehension-python/



Source: stackoverflow