Streamlit auto populate multiselect widgets to filter dataframe

Tags: , , ,



I have a streamlit app where a user can upload a csv file. I would like streamlit to detect the object/dimension columns and create a multiselect filter for each of them with the unique values inside each of the columns. For example if the user uploads a file with 3 object/dimension, 3 separate multi select filters will be created. I have come up with the code below however it does not seem to work. I end up with the error below. I assume the issue is looping through creating each multiselect filter but I’m not sure another way to do this dynamically. I have also tried placing data[y].unique() in place of ucolumns but this still does not work.

Any help would be great.

        for y in data.columns:
            if (data[y].dtype == np.object):
                ucolumns=list(data[y].unique())
                data[y+"filter"]=st.sidebar.multiselect('Filter '+y, ucolumns)
            else:
                pass
ValueError: Length of values (0) does not match length of index (97)
File "c:users###pycharmprojectspythonprojectvenvlibsite-packagesstreamlitscript_runner.py", line 332, in _run_script
    exec(code, module.__dict__)
File "C:Users###PycharmProjectspythonProjectstreamlittest.py", line 109, in <module>
    helper.run()
File "C:Users###PycharmProjectspythonProjectstreamlittest.py", line 66, in run
    data[y+"filter"]=st.sidebar.multiselect('Filter '+y, ucolumns)
File "c:users###pycharmprojectspythonprojectvenvlibsite-packagespandascoreframe.py", line 3163, in __setitem__
    self._set_item(key, value)
File "c:users###pycharmprojectspythonprojectvenvlibsite-packagespandascoreframe.py", line 3239, in _set_item
    value = self._sanitize_column(key, value)
File "c:users###pycharmprojectspythonprojectvenvlibsite-packagespandascoreframe.py", line 3896, in _sanitize_column
    value = sanitize_index(value, self.index)
File "c:users###pycharmprojectspythonprojectvenvlibsite-packagespandascoreinternalsconstruction.py", line 751, in sanitize_index
    raise ValueError(

Testing a few things I wrapped the multiselect inside try and except with a print of y to see which columns go into the exception. It seems like all the dimensions columns go into the exceptions but weirdly all the multi selects are created and seems to work? Could anyone explain to me what is going on? Here is the adjustment I have made:

try:
    data[y+"filter"]=st.sidebar.multiselect('Filter '+y, ucolumns)
except:
    print(y+"had to pass")

Answer

The problem happens when you try to assign the streamlit sidebar to a pandas’ DataFrame column.

That’s why, when you allow the execution to continue with the try/except block, the sidebar is set, but the exception is raised anyway.

Put in other words, if you separate the problematic line in two, you would have the following:

sidebar = st.sidebar.multiselect('Filter '+y, columns) # <-- This line is OK
data[y+"filter"] = sidebar # <-- This line fails

That line fails because data is a pandas’ DataFrame and therefore data[y+’filter’] is a column. And you cannot assign one element to a column, which is what’s stated in the quite cryptic error message:

ValueError: Length of values (0) does not match length of index (97)

Which means that your dataframe has 97 rows and you are assigning a standalone element (with “length 0”).

When you wrap the line in a try/except block, the part that works (AKA: st.sidebar.multiselect('Filter '+y, columns)) is executed and that’s why you do get the sidebar: because the error happens right after creating it.

You can solve the problem by gathering the references to the sidebars in a dictionary

sidebars = {}
for y in data.columns:
    if (data[y].dtype == np.object):
        ucolumns=list(data[y].unique())
        sidebars[y+"filter"]=st.sidebar.multiselect('Filter '+y, ucolumns)


Source: stackoverflow