I need to add specific rows in pandas DataFrame, at specific position

Question

I'm currently working on a project and I need to add specific rows whenever the tagged sentence ends. Whenever the 'N' column equals 1 it means that a new sentence started. I want to add two rows for each sentence: a row with 'Pos'= START at the beginning of the sentence, and a row with 'Pos'=End at the end of

Accepted Answer

Analyzing your dataframe, I just assume you want to insert START before value 1 in column N and insert END after the max continuous value in column N. If so, you could do followingFirst create two dummy dataframe start_df and end_dfstart_df = pd.DataFrame({'N': [np.nan], 'Name': [np.nan], 'Pos': ['->START']})end_df = pd.DataFrame({'N': [np.nan], 'Name': [np.nan], 'Pos': ['END<-']})Then split the dataframe with continuous value in column Nmask = ~df['N'].diff().fillna(0).eq(1)gb = df.groupby(mask.cumsum())groups = [gb.get_group(x) for x in gb.groups]Moreover, insert dummy dataframe before and after each groupres = []for group in groups: res.append(start_df) res.append(group) res.append(end_df)At last, create dataframe by concating dataframe in listdf_ = pd.concat(res).reset_index(drop=True)# print(df_) N Name Pos0 NaN NaN ->START1 1.0 ἐρᾷ VERB2 2.0 μὲν ADV3 3.0 ἁγνὸς ADJ4 4.0 οὐρανὸς NOUN5 5.0 τρῶσαι VERB6 6.0 χθόνα NOUN7 7.0 , PUNCT8 8.0 ἔρως NOUN9 9.0 δὲ CCONJ10 10.0 γαῖαν NOUN11 11.0 λαμβάνει VERB12 12.0 γάμου NOUN13 13.0 τυχεῖν VERB14 14.0 . PUNCT15 NaN NaN END<-16 NaN NaN ->START17 1.0 ὄμβρος NOUN18 2.0 δ̓ ADV19 3.0 ἀπ̓ ADP20 4.0 εὐνάοντος ADJ21 5.0 οὐρανοῦ NOUN22 6.0 πεσὼν VERB23 7.0 ἔκυσε VERB24 8.0 γαῖαν NOUN25 9.0 . PUNCT26 NaN NaN END<-27 NaN NaN ->START28 1.0 ἡ DET29 2.0 δὲ ADV30 3.0 τίκτεται VERB31 4.0 βροτοῖς NOUN32 5.0 μήλων NOUN33 6.0 τε ADV34 7.0 βοσκὰς NOUN35 8.0 καὶ CCONJ36 9.0 βίον NOUN37 10.0 Δημήτριον ADJ38 11.0 . PUNCT39 NaN NaN END<-40 NaN NaN ->START41 1.0 δενδρῶτις NOUN42 2.0 ὥρα NOUN43 3.0 δ̓ ADV44 4.0 ἐκ ADP45 5.0 νοτίζοντος VERB46 6.0 γάμου NOUN47 7.0 τέλειος ADJ48 8.0 ἐστί VERB49 9.0 . PUNCT50 NaN NaN END<-

Advertisement

Answer