Good Day,
I am trying to train LSTM using multiple excel files (Motion Capture Data) as input. Each excel file represents a body motion, I would like to train the network using multiple motions in the training set and in the tests set. Below the example of a single excel file:
As for the input shape, it’s (1, 2751, 93), the input dimension breakdown: samples: 1, time steps: 2751, features: 93
The input independent variable (x) is the human joints along with their positions, and the dependent (y) is the labels of each movement.
Thanks in Advance!
EDIT: Added Elaborate Code
# Multiple Sheets import os import glob motionName = [] for ds in glob.glob("*.csv"): head, tail = os.path.split(str(ds)) motionName.append(tail) print('Motion Name: ', tail) import pandas as pd num_rows = 300 samples = 0 datasets = [] activityIndex = [] list_num_features = [[]] for i, activity in enumerate(motionName): data = pd.read_csv('{}'.format(motionName[i]), nrows = num_rows, header=None, skiprows=1) list_num_features.append([]) datasets.append(data) #datasets[i].append(data) for j in range(0, len(data.columns)): list_num_features[i].append(data.columns[j]) activityIndex.append('{}'.format(motionName[i])) samples += 1 print('activityIndex : {} '.format(activityIndex)) for i in range(0, len(datasets)-1): print('{}'.format(motionName[i])) print(datasets[i].head())
The output:
Whereby, the expected output to get when invoking the ‘df.head()’ is something similar to this output:
What I am trying to do is to be able to get/print every record (row) separately when desired. I was able to do that when loading a single dataframe using the below sample code, but failed when tried to load multiple dataframes into a list then trying to implement the same step for each dataframe using a loop.
# Single Sheet import pandas as pd dataset = pd.read_csv('motion.csv') index = dataset.index print(len(index)) num_rows = len(index) dataset.head()
EDIT: Question Clarified!
Simply, what do I have now is the following:
- 8 dataframes stored in a list (list shape (8,))
- Each dataframe shape is (300,93)
what do I want to do is have this list shaped to (8, 300, 93) for instance so it matches the input layer for the neural network.
As I keep getting the below error:
ValueError: cannot reshape array of size 8 into shape (8,300,93)
I am requesting clarification if possible as things are sort of vague at my end as to why I am having this error.
Thanks in-advance!
Advertisement
Answer
Wrote this function to handle the preprocessing to overcome the reshaping issue. Also, the function encodes the labels (y) using Scikit-Learn ‘LabelEncadoer()’.
## Data Preprocessing from sklearn.preprocessing import LabelEncoder def preprocess_df(df, start, quantity, numRows, df_name): x = [] features = [] y = [] label_encoder = LabelEncoder() for i in range(start, quantity): data = pd.read_csv('{}'.format(df[i]), nrows=numRows, skiprows=1) y.append(df[i]) x.append(data) if i == start: for j in range(0, len(data.columns)): features.append(data.columns[j]) if df_name == 'test': i = i - start print('({}/{}) x[{}]: {}'.format(i+1, (quantity - start), i, x[i].shape)) else: print('({}/{}) x[{}]: {}'.format(i+1, quantity, i, x[i].shape)) print('{} set (x) shape: {}, {} set (y) shape: {}'.format(df_name, np.array(x).shape, df_name, np.array(y).shape)) y = np.array(label_encoder.fit_transform(y)) return np.array(x), y, np.array(features)