Skip to content
Advertisement

Plotting top 10 Values in Big Data

I need help plotting some categorical and numerical Values in python. the code is given below:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df=pd.read_csv('train_feature_store.csv')
df.info
df.head
df.columns

plt.figure(figsize=(20,6))
sns.countplot(x='Store', data=df)
plt.show()

Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum()
Size.sort_values(by=['Size'],ascending=False).head(10)

However, the data size is so huge (Big data) that I’m not even able to make meaningful plotting in python. Basically, I just want to take the top 5 or top 10 values in python and make a plot of that as given below:-

enter image description here

In an attempt to plot the thing, I’m trying to put the below code into a dataframe and plot it, but not able to do so. Can anyone help me out in this:-

Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum()
Size.sort_values(by=['Size'],ascending=False).head(10)

Below, is a link to the sample dataset. However, the dataset is a representation, in the original one where I’m trying to do the EDA, which has around 3 thousand unique stores and 60 thousand rows of data. PLEASE HELP! Thanks!

https://drive.google.com/drive/folders/1PdXaKXKiQXX0wrHYT3ZABjfT3QLIYzQ0?usp=sharing

Advertisement

Answer

You were pretty close.

import pandas as pd
import seaborn as sns

df = pd.read_csv('train_feature_store.csv')

sns.set(rc={'figure.figsize':(16,9)})

g = df.groupby('Store', as_index=False)['Size'].sum().sort_values(by='Size', ascending=False).head(10)
sns.barplot(data=g, x='Store', y='Size', hue='Store', dodge=False).set(xticklabels=[]);

enter image description here

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement