Skip to content
Advertisement

Sort dataframe by substring condition excluding similar strings

I have a dataframe with a string type column named ‘tag’,

tag has three categories (data_types):

JavaScript

If I want to count the number of rows there are by each data_type in ‘tag’ column, I apply the string include condition this way

JavaScript

But, obviously, the counting for the tag ‘DATA’ include the real ‘DATA’ rows and both ‘DATAKIND’ and ‘DATAKINDSIM’ in the accounting; same for ‘DATAKIND’ and ‘DATAKINDSIM’. How can I exclude the similar strings in the column for ‘DATA’ accounting?

This is a reproducible example:

JavaScript

And the output:

JavaScript

This would be the expected output considering the accounting is performed excluding the similar strings, just accounting the concrete string match,

Expected output,

JavaScript

Advertisement

Answer

If I understand you correctly you can use isin to first filter your tag column then use groupby.size

JavaScript
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement