I have a dataframe:
c1. c2. c3. l 1. 2. 3 [1,2,3,4,5,6,7] 3. 4. 8. [8,9,0]
I want explode it such that every 3 elements from each list in the column l will be a new row, and the column for the triplet index within the original list. So I will get:
c1. c2. c3. l idx 1. 2. 3 [1,2,3]. 0 1. 2. 3. [4,5,6]. 1 3. 4. 8. [8,9,0]. 0
What is the best way to do so?
Advertisement
Answer
Break list element into chunks first and then explode
:
df.l = df.l.apply(lambda lst: [lst[3*i:3*(i+1)] for i in range(len(lst) // 3)]) df # c1 c2 c3 l #0 1 2 3 [[1, 2, 3], [4, 5, 6]] #1 3 4 8 [[8, 9, 0]] df.explode('l') # c1 c2 c3 l #0 1 2 3 [1, 2, 3] #0 1 2 3 [4, 5, 6] #1 3 4 8 [8, 9, 0]
If you need the index column:
# store index as second element of the tuple df.l = df.l.apply(lambda lst: [(lst[3*i:3*(i+1)], i) for i in range(len(lst) // 3)]) df # c1 c2 c3 l #0 1 2 3 [([1, 2, 3], 0), ([4, 5, 6], 1)] #1 3 4 8 [([8, 9, 0], 0)] df = df.explode('l') df # c1 c2 c3 l #0 1 2 3 ([1, 2, 3], 0) #0 1 2 3 ([4, 5, 6], 1) #1 3 4 8 ([8, 9, 0], 0) # extract list and index from the tuple column df['l'], df['idx'] = df.l.str[0], df.l.str[1] df # c1 c2 c3 l idx #0 1 2 3 [1, 2, 3] 0 #0 1 2 3 [4, 5, 6] 1 #1 3 4 8 [8, 9, 0] 0