Hi, can someone please help me with this? What should do if I want to use NumPy to get an array X which has a shape (2638, 1838) while the dataframe has a shape of (2638, 1840)?
Here is my code:
import pandas as pd import numpy as np df = pd.read_csv('pbmc_data.csv', index_col = 0) df.shape
Advertisement
Answer
Conversion to Numpy and back to Pandas, as advised in one of comments to your post, is not any elegant solution. Fortunately, Pandas is able to do your tasks on its own.
Your first task is to select all columns of the input df except for 2 last columns (cell_type and cell_type_string). To do it, run:
X = df.iloc[:, :-2]
The second task is to extract the last but one column (second from the end). To do it, run:
y = df.iloc[:, -2]