I need help plotting some categorical and numerical Values in python. the code is given below:
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns df=pd.read_csv('train_feature_store.csv') df.info df.head df.columns plt.figure(figsize=(20,6)) sns.countplot(x='Store', data=df) plt.show() Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum() Size.sort_values(by=['Size'],ascending=False).head(10)
However, the data size is so huge (Big data) that I’m not even able to make meaningful plotting in python. Basically, I just want to take the top 5 or top 10 values in python and make a plot of that as given below:-
In an attempt to plot the thing, I’m trying to put the below code into a dataframe and plot it, but not able to do so. Can anyone help me out in this:-
Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum() Size.sort_values(by=['Size'],ascending=False).head(10)
Below, is a link to the sample dataset. However, the dataset is a representation, in the original one where I’m trying to do the EDA, which has around 3 thousand unique stores and 60 thousand rows of data. PLEASE HELP! Thanks!
https://drive.google.com/drive/folders/1PdXaKXKiQXX0wrHYT3ZABjfT3QLIYzQ0?usp=sharing
Advertisement
Answer
You were pretty close.
import pandas as pd import seaborn as sns df = pd.read_csv('train_feature_store.csv') sns.set(rc={'figure.figsize':(16,9)}) g = df.groupby('Store', as_index=False)['Size'].sum().sort_values(by='Size', ascending=False).head(10) sns.barplot(data=g, x='Store', y='Size', hue='Store', dodge=False).set(xticklabels=[]);