I am trying to compare values (0’s and 1’s) in a array. I want to search for each “1” that appears in one column, for another “1” in the other column in a specific timeframe (for example, 5 seconds, 10 seconds, etc.). I will call the 1’s as “signals”.
In example, I have an array such as:
data1 = [ 0 0 0] [ 1 0 0] [ 2 0 0] [ 3 0 0] [ 4 0 0] [ 5 0 0] [ 6 0 1] [ 7 0 0] [ 8 0 0] [ 9 0 0] [ 10 1 0] [ 11 0 0] [ 12 0 0] [ 13 0 0] [ 14 0 0] [ 15 0 0] [ 16 0 0] [ 17 0 0] [ 18 0 0] [ 19 0 0] [ 20 0 1] [ 21 0 0] [ 22 0 0] [ 23 0 0] [ 24 0 0] [ 25 0 0] ]
This is much smaller than the data I have. But the idea is this: the first column represents the timestamps. The second and third, the signals that I have. What I would like to do is calculate the proportion of the signals that occurs in the same time interval as at least one other signal (in the other column). I would like to do it in multiple timeframes, such as 5 seconds, 10 seconds, etc., as to see the differences.
I’ve tried a for loop in the arrays and could check for the signals that are in the arrays. However, I was unable to create this condition of “checking” if the signal in the other column was within a certain timeframe.
Hope I was clear. Thank you!
Advertisement
Answer
I have a working solution, though I’m sure there are more efficient ones. I have abbreviated data
to d
, which I am assuming is a NumPy array.
# Get all signal columns from array. d1 = middle column, d2 = last column. d1 = d[:,1] d2 = d[:,2] # If there is a signal in either signal column (i.e. if either column has value 1 in a row), then final_d is 1 there. Basically, final_d is 1 if there is a signal in any column. final_d = np.logical_or(d1,d2).astype(int) length = final_d.shape[0] # flags is in int form for now. flags = 0 means False, flags = 1 means True. Starts out with all flags being False. flags = np.zeros((length), dtype=int) # What range you want to work within, e.g. 5 seconds, 10 seconds, etc. time_range = 5 # This loop gets all subgroups/time ranges of time_range consecutive values. # This is why the loop does not go all the way to len(final_d); there are not that many subgroups. for i in range(length - time_range + 1): # Get each subgroup, i.e. time range. # Then get the indices within this_range (the subgroup) that are equal to 1. this_range = final_d[i:i+time_range] indices_of_signals = np.array(np.where(this_range == 1)) + i # There is more than 1 signal in the subgroup if the sum of the signals is more than 2. # If this is the case, then change the flag for all signals within this_range to 1. if np.sum(this_range) >= 2: flags[indices_of_signals] = 1 # Changes flags from int form to boolean (True/False) form. flags = flags.astype(bool)
I would like to note that the reason I did not use chunking (i.e. considering chunks 0-4, 5-9, 10-14, etc.) is that in that example, if you have signals in rows 4 and 7, even though those are within a 5-second time range, they are not in the same 5-second time chunk. My method returns a True flag if a signal is near any other signal within +- time_range.