Skip to content
Advertisement

How to slice/chop a string using multiple indexes in a panda DataFrame

I’m in need of some advice on the following issue: I have a DataFrame that looks like this:

JavaScript

And what I need to get is the SEQ that’s separated between the different BEG_GAP and END_GAP. I already have worked it out (thanks to a previous question) for sequences that have only one pair of gaps, but here they have multiple.

This is what the sequences should look like:

JavaScript

Or in an exploded DF:

JavaScript

At the moment, I’m using a piece of code (that I got thanks to a previous question) that works only if there’s one gap, and it looks like this:

JavaScript

But this has the problem that it generates a bunch of sequences that don’t really exist because they actually have another gap in the middle. I.e what it would generate:

JavaScript

And so on, with the other sequences. As you can see, there are some slices that are not being generated and some that are wrong, because I don’t know how to tell the code to have in mind all the gaps while analyzing the sequence.

All advice is appreciated, I hope I was clear!

Advertisement

Answer

Let’s try defining a function and apply:

JavaScript

Output:

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement