Skip to content
Advertisement

Sorting rows by the number of list elements the row contains

Taking as example the following table:

index column_1 column_2
0 bli bli d e
1 bla bla a b c d e
2 ble ble a b c

If I give a token_list = ['c', 'e'] I want to order the table by the number of times the tokens each row contains in column number 2.

By ordering the table I should get the following:

index column_1 column_2 score_tmp
1 bla bla a b c d e 2
0 bli bli d e 1
2 ble ble a b c 1

Currently, I have reached the following way of doing this, but it is taking a lot of time. How could I improve the time? Thank you in advance.

JavaScript

Advertisement

Answer

You can split column_2 based on whitespaces, convert each row into a set and then use df.apply with set intersection with sort_values:

JavaScript

Timings:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement