I have a data frame which is like the following :
import pandas as pd import numpy as np import matplotlib.pyplot as plt import os import csv import matplotlib.pyplot as plt import seaborn as sns import warnings df_input = pd.read_csv('combine_input.csv', delimiter=',') df_output = pd.read_csv('combine_output.csv', delimiter=',')
In this data frame, there are many repeated rows for example the first row is repeated more than 1000 times, and so on for the other rows
when I plot the time distribution I got that figure which shows that the frequency of the time parameter
df_input.plot(y='time',kind = 'hist',figsize=(10,10)) plt.grid() plt.show()
My question is how can I take the data only in the following red rectangular for example at time = 0.006 and frequency = 0.75 1e6 ( check the following pic )
Advertisement
Answer
Note: InPlace of target you have to write time as your column name Is time,or change column name to target
def calRows(df,x,y): #df For consideration df1 = pd.DataFrame(df.target[df.target<=x]) minCount = len(df1) targets = df1.target.unique() for i in targets: count = int(df1[df1.target == i].count()) if minCount > count: minCount = count if minCount > y: minCount = int(y) return minCount
You have To pass your data frame, x-intercept of the graph, y-intercept of graph to calRows(df,x,y) function which will return the number of rows to take for each target.
rows = CalRows(df,6,75) print(rows)
takeFeatures(df,rows,x) function will take dataframe, rows (result of first function), x-intercept of graph and will return you the final dataframe.
def takeFeatures(df,rows,x): finalDf = pd.DataFrame(columns = df.columns) df1 = df[df.target<=x] targets = df1.target.unique() for i in targets: targeti = df1[df1.target==i] sample = targeti.sample(rows) finalDf = pd.concat([finalDf,sample]) return finalDf
Calling takeFeature() Function
final = takeFeatures(df,rows,6) print(final)
Your Final DataFrame will have the Values ThatYou expected in Graph And After Plotting this final dataframe you will get like this graph