How to move a column in a pandas dataframe

Question

I want to take a column indexed 'length' and make it my second column. It currently exists as the 5th column. I have tried: I see the following error: TypeError: must be str, not list I'm not sure how to interpret this error because it actually should be a list, right? Also, is there a general method to move any

Accepted Answer

Correcting your errorI&#8217;m not sure how to interpret this error because it actually should bea list, right?No: colnames[0] and colnames[4] are scalars, not lists. You can&#8217;t concatenate a scalar with a list. To make them lists, use square brackets:colnames = [colnames[0]] + [colnames[4]] + colnames[:-1]You can either use df[[colnames]] or df.reindex(columns=colnames): both necessarily trigger a copy operation as this transformation cannot be processed in place.Generic solutionBut converting arrays to lists and then concatenating lists manually is not only expensive, but prone to error. A related answer has many list-based solutions, but a NumPy-based solution is worthwhile since pd.Index objects are stored as NumPy arrays.The key here is to modify the NumPy array via slicing rather than concatenation. There are only 2 cases to handle: when the desired position exists after the current position, and vice versa.import pandas as pd, numpy as npfrom string import ascii_uppercasedf = pd.DataFrame(columns=list(ascii_uppercase))def shifter(df, col_to_shift, pos_to_move):    arr = df.columns.values    idx = df.columns.get_loc(col_to_shift)    if idx == pos_to_move:        pass    elif idx > pos_to_move:        arr[pos_to_move+1: idx+1] = arr[pos_to_move: idx]    else:        arr[idx: pos_to_move] = arr[idx+1: pos_to_move+1]    arr[pos_to_move] = col_to_shift    df = df.reindex(columns=arr)    return df    df = df.pipe(shifter, 'J', 1)print(df.columns)Index(['A', 'J', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N',       'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'],      dtype='object')Performance benchmarkingUsing NumPy slicing is more efficient with a large number of columns versus a list-based method:n = 10000df = pd.DataFrame(columns=list(range(n)))def shifter2(df, col_to_shift, pos_to_move):    cols = df.columns.tolist()    cols.insert(pos_to_move, cols.pop(df.columns.get_loc(col_to_shift)))    df = df.reindex(columns=cols)    return df%timeit df.pipe(shifter, 590, 5)   # 381 µs%timeit df.pipe(shifter2, 590, 5)  # 1.92 ms

Advertisement

Answer

Correcting your error

Generic solution

Performance benchmarking