Skip to content
Advertisement

Split data frame in python based on one parameter shape

I have a data frame which is like the following :

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import csv
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

df_input = pd.read_csv('combine_input.csv', delimiter=',')
df_output = pd.read_csv('combine_output.csv', delimiter=',')

Dataframe

In this data frame, there are many repeated rows for example the first row is repeated more than 1000 times, and so on for the other rows

when I plot the time distribution I got that figure which shows that the frequency of the time parameter

df_input.plot(y='time',kind = 'hist',figsize=(10,10))
plt.grid()
plt.show()

Time distribution

My question is how can I take the data only in the following red rectangular for example at time = 0.006 and frequency = 0.75 1e6 ( check the following pic )

Red rectangular

Advertisement

Answer

Note: InPlace of target you have to write time as your column name Is time,or change column name to target

def calRows(df,x,y):
#df For consideration
df1 = pd.DataFrame(df.target[df.target<=x])
minCount = len(df1)
targets = df1.target.unique()
for i in targets:
    count = int(df1[df1.target == i].count())
    if minCount > count:
        minCount = count

if minCount > y:
    minCount = int(y)
return minCount

You have To pass your data frame, x-intercept of the graph, y-intercept of graph to calRows(df,x,y) function which will return the number of rows to take for each target.

rows = CalRows(df,6,75)
print(rows)

takeFeatures(df,rows,x) function will take dataframe, rows (result of first function), x-intercept of graph and will return you the final dataframe.

def takeFeatures(df,rows,x):
finalDf = pd.DataFrame(columns = df.columns)
df1 = df[df.target<=x]
targets = df1.target.unique()

for i in targets:
    targeti = df1[df1.target==i]
    sample = targeti.sample(rows)
    finalDf = pd.concat([finalDf,sample])

return finalDf

Calling takeFeature() Function

final = takeFeatures(df,rows,6)
print(final)

Your Final DataFrame will have the Values ThatYou expected in Graph And After Plotting this final dataframe you will get like this graph enter image description here

Advertisement