I'm reading from a sqlite3 db into a df: Then I have a symbols.csv file which I want to use to filter the above df: Here's how I've tried to do it: But for some reason I only get the first line returned from the csv: Where am I going wrong here? Answer You need convert csv file to Series,

isin only returning first line from csv

I’m reading from a sqlite3 db into a df:

          id symbol                                               name
0          1   QCLR  Global X Funds Global X NASDAQ 100 Collar 95-1...
1          2    LCW                    Learn CW Investment Corporation
2          3    BUG          Global X Funds Global X Cybersecurity ETF
3          4   LDOS                              Leidos Holdings, Inc.
4          5    LDP  COHEN & STEERS LIMITED DURATION PREFERRED AND ...
...      ...    ...                                                ...
10999  11000   ERIC                Ericsson American Depositary Shares
11000  11001    EDI  Virtus Stone Harbor Emerging Markets Total Inc...
11001  11002    EVX                  VanEck Environmental Services ETF
11002  11003   QCLN  First Trust NASDAQ Clean Edge Green Energy Ind...
11003  11004    DTB  DTE Energy Company 2020 Series G 4.375% Junior...

[11004 rows x 3 columns]

JavaScript
​x
 
          id symbol                                               name
0          1   QCLR  Global X Funds Global X NASDAQ 100 Collar 95-1...
1          2    LCW                    Learn CW Investment Corporation
2          3    BUG          Global X Funds Global X Cybersecurity ETF
3          4   LDOS                              Leidos Holdings, Inc.
4          5    LDP  COHEN & STEERS LIMITED DURATION PREFERRED AND ...
...      ...    ...                                                ...
10999  11000   ERIC                Ericsson American Depositary Shares
11000  11001    EDI  Virtus Stone Harbor Emerging Markets Total Inc...
11001  11002    EVX                  VanEck Environmental Services ETF
11002  11003   QCLN  First Trust NASDAQ Clean Edge Green Energy Ind...
11003  11004    DTB  DTE Energy Company 2020 Series G 4.375% Junior...
​
[11004 rows x 3 columns]
​

Then I have a symbols.csv file which I want to use to filter the above df:

AKAM
AKRO

JavaScript
 
AKAM
AKRO
​

Here’s how I’ve tried to do it:

origin_symbols = pd.read_sql_query("SELECT  id, symbol, name from stock", conn)
mikey_symbols = pd.read_csv("symbols.csv")
df = origin_symbols[origin_symbols['symbol'].isin(mikey_symbols)]

JavaScript
 
origin_symbols = pd.read_sql_query("SELECT  id, symbol, name from stock", conn)
mikey_symbols = pd.read_csv("symbols.csv")
df = origin_symbols[origin_symbols['symbol'].isin(mikey_symbols)]
​

But for some reason I only get the first line returned from the csv:

        id symbol                                    name
6475  6476   AKAM  Akamai Technologies, Inc. Common Stock
1 df

JavaScript
 
        id symbol                                    name
6475  6476   AKAM  Akamai Technologies, Inc. Common Stock
1 df
​

Where am I going wrong here?

Answer

You need convert csv file to Series, here is added column name and for Series select it (e.g. by position):

mikey_symbols = pd.read_csv("symbols.csv", names=['tmp']).iloc[:, 0]
#or by column name
#mikey_symbols = pd.read_csv("symbols.csv", names=['tmp'])['tmp']

JavaScript
 
mikey_symbols = pd.read_csv("symbols.csv", names=['tmp']).iloc[:, 0]
#or by column name
#mikey_symbols = pd.read_csv("symbols.csv", names=['tmp'])['tmp']
​

And then remove possible traling spaces in both by Series.str.strip:

df = origin_symbols[origin_symbols['symbol'].str.strip().isin(mikey_symbols.str.strip())]

JavaScript
 
df = origin_symbols[origin_symbols['symbol'].str.strip().isin(mikey_symbols.str.strip())]
​

Advertisement

Answer