Skip to content
Advertisement

get the number of involved singer in a phase

I have a dataset like this

JavaScript
name date singer language phase
1 Yes or No 02.01.20 Benjamin Smith en 1
2 Parabens 01.06.21 Rafael Galvao;Simon Murphy pt;en 2
3 Love 12.11.20 Michaela Condell en 1
4 Paz 11.07.19 Ana Perez; Eduarda Pinto es;pt 3
5 Stop 12.01.21 Michael Conway;Gabriel Lee en;en 1
6 Shalom 18.06.21 Shimon Cohen hebr 1
7 Habibi 22.12.19 Fuad Khoury ar 3
8 viva 01.08.21 Veronica Barnes en 1
9 Buznanna 23.09.20 Kurt Azzopardi mt 1
10 Frieden 21.05.21 Gabriel Meier dt 1
11 Uruguay 11.04.21 Julio Ramirez es 1
12 Beautiful 17.03.21 Cameron Armstrong en 3
13 Holiday 19.06.20 Bianca Watson en 3
14 Kiwi 21.10.20 Lachlan McNamara en 1
15 Amore 01.12.20 Vasco Grimaldi it 1
16 La vie 28.04.20 Victor Dubois fr 3
17 Yom 21.02.20 Ori Azerad; Naeem al-Hindi hebr;ar 2
18 Elefthería 15.06.19 Nikolaos Gekas gr 1

I convert it to 1NF.

JavaScript

And I create a dataframe. Now I would like to find out which phase has the most singers involved.

I used this

JavaScript

But I could not get a solution

The dataframe has 42 observations, so some singers occur again

Source: convert data to 1NF

Advertisement

Answer

You do not need to split/explode, you can directly count the number of ; per row and add 1:

JavaScript

If you want the classical split/explode:

JavaScript

output:

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement