Starting with two lists such as:
lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10'] lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted. For example say I wanted 50% the output would be
newLstOne = ['8', '1', '3', '7', '5'] newLstTwo = ['8', '1', '3', '7', '5']
I have achieved this using the following code:
from random import randrange lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10'] lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10'] LengthOfList = len(lstOne) print LengthOfList PercentageToUse = input("What Percentage Of Reads Do you want to extract? ") RangeOfListIndices = [] HowManyIndicesToMake = (float(PercentageToUse)/100)*float(LengthOfList) print HowManyIndicesToMake for x in lstOne: if len(RangeOfListIndices)==int(HowManyIndicesToMake): break else: random_index = randrange(0,LengthOfList) RangeOfListIndices.append(random_index) print RangeOfListIndices newlstOne = [] newlstTwo = [] for x in RangeOfListIndices: newlstOne.append(lstOne[int(x)]) for x in RangeOfListIndices: newlstTwo.append(lstTwo[int(x)]) print newlstOne print newlstTwo
But I was wondering if there was a more efficient way of doing this, in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?
Thank you
Advertisement
Answer
Q. I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted.
A. The most straight-forward approach directly matches your specification:
percentage = float(raw_input('What percentage? ')) k = len(data) * percentage // 100 indicies = random.sample(xrange(len(data)), k) new_list1 = [list1[i] for i in indicies] new_list2 = [list2[i] for i in indicies]
Q. in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?
A. In Python 2 and Python 3, the random.randrange() function completely eliminates bias (it uses the internal _randbelow() method that makes multiple random choices until a bias-free result is found).
In Python 2, the random.sample() function is slightly biased but only in the round-off in the last of 53 bits. In Python 3, the random.sample() function uses the internal _randbelow() method and is bias-free.