I have a list of start and stop coordinates of ranges and would like to fill a pandas df according to their being present in a range.
The numbers of rows are predetermined and filled with ‘0’. If for example a range is 1,3 for a column then rows (index) 1-3 would be filled with ‘1’.
JavaScript
x
20
20
1
d={
2
'a': [[0,2], [3,7], [13,23], [24,25]],
3
'b': [[1,5], [8,12], [15,18], [20,24]],
4
}
5
presabsdict = {}
6
7
for G in d.keys():
8
refpositions = list('0'*50)
9
positions = d.get(G)
10
for pos in positions:
11
pos2 = pos[1]
12
pos1 = pos[0]
13
poslength = (pos2-pos1)
14
refpositions[pos1:(pos2+1)] = (list('1'*(poslength+1)))
15
presabsdict[G] = refpositions
16
17
df = pd.DataFrame.from_dict(presabsdict,orient='index').transpose()
18
df["Sitespresent"] = df.astype(int).sum(axis=1).astype(int)
19
print(df)
20
This is hugely inefficient for large datasets. The ultimate goal is the 'Sitespresent'
column so a solution that foregoes the dataframe would also be suitable
Advertisement
Answer
You can do something like this:
JavaScript
1
9
1
import pandas as pd
2
3
refpositions = pd.DataFrame({'pos':range(50)})
4
intervals = pd.arrays.IntervalArray([pd.Interval(start,end) for _, v in d.items() for start, end in v], closed='both')
5
pos_as_intv = [pd.Interval(i,i, closed='both') for i in refpositions.pos]
6
7
# Walk through overlaps and count
8
refpositions['total'] = [intervals.overlaps(x).sum() for x in pos_as_intv]
9