I have the following data structure (a list of lists)
[ ['4', '21', '1', '14', '2008-10-24 15:42:58'], ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], ['5', '21', '3', '19', '2008-10-24 15:45:45'], ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], ['7', '22', '3', '2somename', '2008-10-24 15:45:51'] ]
I would like to be able to
Use a function to reorder the list so that I can group by each item in the list. For example I’d like to be able to group by the second column (so that all the 21’s are together)
Use a function to only display certain values from each inner list. For example i’d like to reduce this list to only contain the 4th field value of ‘2somename’
so the list would look like this
[ ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], ['7', '22', '3', '2somename', '2008-10-24 15:45:51'] ]
Advertisement
Answer
For the first question, the first thing you should do is sort the list by the second field using itemgetter from the operator module:
x = [ ['4', '21', '1', '14', '2008-10-24 15:42:58'], ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], ['5', '21', '3', '19', '2008-10-24 15:45:45'], ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], ['7', '22', '3', '2somename', '2008-10-24 15:45:51'] ] from operator import itemgetter x.sort(key=itemgetter(1))
Then you can use itertools’ groupby function:
from itertools import groupby y = groupby(x, itemgetter(1))
Now y is an iterator containing tuples of (element, item iterator). It’s more confusing to explain these tuples than it is to show code:
for elt, items in groupby(x, itemgetter(1)): print(elt, items) for i in items: print(i)
Which prints:
21 <itertools._grouper object at 0x511a0> ['4', '21', '1', '14', '2008-10-24 15:42:58'] ['5', '21', '3', '19', '2008-10-24 15:45:45'] ['6', '21', '1', '1somename', '2008-10-24 15:45:49'] 22 <itertools._grouper object at 0x51170> ['3', '22', '4', '2somename', '2008-10-24 15:22:03'] ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
For the second part, you should use list comprehensions as mentioned already here:
from pprint import pprint as pp pp([y for y in x if y[3] == '2somename'])
Which prints:
[['3', '22', '4', '2somename', '2008-10-24 15:22:03'], ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]