How to split a single column containing 1000 rows into chunk of two columns containing 500 rows per column in pandas.
I have a csv file that contains a single column and I need to split this into multiple columns. Below is the format in csv.
Steps I took:
I had multiple csv files containing one column with 364 rows. I concatenated them after converting them into a dataframe, but it copies the file in a linear fashion.
Code I tried
monthly_list = [] for file in ['D0_monthly.csv','c1_monthly.csv','c2_monthly.csv','c2i_monthly.csv','c3i_monthly.csv','c4i_monthly.csv','D1_monthly.csv','D2i_monthly.csv','D3i_monthly.csv','D4i_monthly.csv', 'D2j_monthly.csv','D3j_monthly.csv','D4j_monthly.csv','c2j_monthly.csv','c3j_monthly.csv','c4j_monthly.csv']: monthly_file = pd.read_csv(file,header=None,index_col=None,skiprows=[0]) monthly_list.append(monthly_file) monthly_all_file = pd.concat(monthly_list)
How the data is:
column1 |
---|
1 |
2 |
3 |
. |
. |
364 |
1 |
2 |
3 |
. |
. |
364 |
I need to split the above column in the format shown below.
What the data should be:
column1 | column2 |
---|---|
1 | 1 |
2 | 2 |
3 | 3 |
4 | 4 |
5 | 5 |
. | . |
. | . |
. | . |
364 | 364 |
Advertisement
Answer
Answer updated to work for arbitrary number of columns
You could start with number of columns or row length. For a given initial column length you could calculate one given the other. In this answer I use desired target column length – tgt_row_len
.
nb_groups = 4 tgt_row_len = 5 df = pd.DataFrame({'column1': np.arange(1,tgt_row_len*nb_groups+1)}) print(df) column1 0 1 1 2 2 3 3 4 4 5 5 6 6 7 ... 17 18 18 19 19 20
Create groups in the index for the following grouping operation
df.index = df.reset_index(drop=True).index // tgt_row_len column1 0 1 0 2 0 3 0 4 0 5 1 6 1 7 ... 3 17 3 18 3 19 3 20 dfn = ( df.groupby(level=0).apply(lambda x: x['column1'].reset_index(drop=True)).T .rename(columns = lambda x: 'col' + str(x+1)).rename_axis(None) ) print(dfn) col1 col2 col3 col4 0 1 6 11 16 1 2 7 12 17 2 3 8 13 18 3 4 9 14 19 4 5 10 15 20
Previous answer that handles creating two columns
This answer just shows 10 target rows as an example. That can easily be changed to 364 or 500.
A dataframe where column1 contains 2 sets of 10 rows
tgt_row_len = 10 df = pd.DataFrame({'column1': np.tile(np.arange(1,tgt_row_len+1),2)}) print(df) column1 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 1 11 2 12 3 13 4 14 5 15 6 16 7 17 8 18 9 19 10
Move the bottom set of rows to column2
df.assign(column2=df['column1'].shift(-tgt_row_len)).iloc[:tgt_row_len].astype(int) column1 column2 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10