I currently have a pandas dataframe with some columns. I’m looking to build a column, Sequential
, that lists what iteration is recorded at that part of the cycle. I’m currently doing this using itertools.cycle
, and a fixed number of iterations block_cycles
, like so:
# Fill out Sequential Numbers block_cycles = 330 lens = len(raw_data.index) sequential = list(itertools.islice(itertools.cycle(range(1, block_cycles)),lens)) interim_output['Sequential'] = sequential
With an output like this:
print(interim_output['Sequential']) 0 1 1 2 2 3 ... 329 330 331 1 332 2 332 3
And this would be fine, if the number of iterations in a cycle was the same. However, upon investigation, I’ve found that not every cycle contains the same amount of iterations. I have another column, CycleNumber
, that contains what cycle number the iteration belongs to. It looks like this:
print(raw_data['CycleNumber']) 0 1 1 1 2 1 3 1 4 1 51790 4936 51791 4936 51792 4936 51793 4936 51794 4936
So, for example, one cycle might contain 330 iterations, and another could contain 333, 331, and so forth – it’s not guaranteed to be the same. The values in cycle number increase incrementally.
I’ve built a dictionary of the amount of iterations each cycle contains, cycle_freq
, which looks like this:
# Calculate the number of iterations each cycle contains cycle_freq = {} for item in cycle_number: if (item in cycle_freq): cycle_freq[item] += 1 else: cycle_freq[item] = 1 print (cycle_freq) {1: 330, 2: 332, 3: 331, 4: 332, 5: 332, 6: 333, 7: 333, 8: 330.... 4933: 331, 4934: 334, 4935: 287, 4936: 24}
How could I go about using this dictionary to replace the constant variable block_cycles
, creating a big column list of sequential numbers based on exactly how many iterations were in that cycle? So far, this is my logic to try to get it to use the values contained in the dictionary cycle_freq
, but to no avail:
for i in cycle_freq: iteration = list(itertools.islice(itertools.cycle(range(1, cycle_freq[i])),lens)) sequential.append(iteration)
My desired output would look like this:
0 1 1 2 3 3 ... 329 330 330 1 332 2 ... 661 332 662 1 663 2
Any help would be greatly appreciated!
Advertisement
Answer
I’ve used a workaround and gave up itertools:
sequential = [] for _, cycles in cycle_freq.items(): seq = [cycle for cycle in range(1, cycles + 1)] sequential.extend(seq) interim_output['Sequential'] = sequential