I have a large CSV file, which is a log of caller data.
A short snippet of my file:
CompanyName High Priority QualityIssue Customer1 Yes User Customer1 Yes User Customer2 No User Customer3 No Equipment Customer1 No Neither Customer3 No User Customer3 Yes User Customer3 Yes Equipment Customer4 No User
I want to sort the entire list by the frequency of occurrence of customers so it will be like:
CompanyName High Priority QualityIssue Customer3 No Equipment Customer3 No User Customer3 Yes User Customer3 Yes Equipment Customer1 Yes User Customer1 Yes User Customer1 No Neither Customer2 No User Customer4 No User
I’ve tried groupby
, but that only prints out the Company Name and the frequency but not the other columns, I also tried
df['Totals']= [sum(df['CompanyName'] == df['CompanyName'][i]) for i in xrange(len(df))]
and
df = [sum(df['CompanyName'] == df['CompanyName'][i]) for i in xrange(len(df))]
But these give me errors:
ValueError: The wrong number of items passed 1, indices imply 24
I’ve looked at something like this:
for key, value in sorted(mydict.iteritems(), key=lambda (k,v): (v,k)): print "%s: %s" % (key, value)
but this only prints out two columns, and I want to sort my entire CSV. My output should be my entire CSV sorted by the first column.
Thanks for the help in advance!
Advertisement
Answer
This seems to do what you want, basically add a count column by performing a groupby
and transform
with value_counts
and then you can sort on that column:
df['count'] = df.groupby('CompanyName')['CompanyName'].transform(pd.Series.value_counts) df.sort_values('count', ascending=False)
Output:
CompanyName HighPriority QualityIssue count 5 Customer3 No User 4 3 Customer3 No Equipment 4 7 Customer3 Yes Equipment 4 6 Customer3 Yes User 4 0 Customer1 Yes User 3 4 Customer1 No Neither 3 1 Customer1 Yes User 3 8 Customer4 No User 1 2 Customer2 No User 1
You can drop the extraneous column using df.drop
:
df.drop('count', axis=1)
Output:
CompanyName HighPriority QualityIssue 5 Customer3 No User 3 Customer3 No Equipment 7 Customer3 Yes Equipment 6 Customer3 Yes User 0 Customer1 Yes User 4 Customer1 No Neither 1 Customer1 Yes User 8 Customer4 No User 2 Customer2 No User