Good Day,
I am trying to train LSTM using multiple excel files (Motion Capture Data) as input. Each excel file represents a body motion, I would like to train the network using multiple motions in the training set and in the tests set. Below the example of a single excel file:
As for the input shape, it’s (1, 2751, 93), the input dimension breakdown: samples: 1, time steps: 2751, features: 93
The input independent variable (x) is the human joints along with their positions, and the dependent (y) is the labels of each movement.
Thanks in Advance!
EDIT: Added Elaborate Code
# Multiple Sheets
import os
import glob
motionName = []
for ds in glob.glob("*.csv"):
head, tail = os.path.split(str(ds))
motionName.append(tail)
print('Motion Name: ', tail)
import pandas as pd
num_rows = 300
samples = 0
datasets = []
activityIndex = []
list_num_features = [[]]
for i, activity in enumerate(motionName):
data = pd.read_csv('{}'.format(motionName[i]), nrows = num_rows, header=None, skiprows=1)
list_num_features.append([])
datasets.append(data)
#datasets[i].append(data)
for j in range(0, len(data.columns)):
list_num_features[i].append(data.columns[j])
activityIndex.append('{}'.format(motionName[i]))
samples += 1
print('activityIndex : {} '.format(activityIndex))
for i in range(0, len(datasets)-1):
print('{}'.format(motionName[i]))
print(datasets[i].head())
The output:
Whereby, the expected output to get when invoking the ‘df.head()’ is something similar to this output:
What I am trying to do is to be able to get/print every record (row) separately when desired. I was able to do that when loading a single dataframe using the below sample code, but failed when tried to load multiple dataframes into a list then trying to implement the same step for each dataframe using a loop.
# Single Sheet
import pandas as pd
dataset = pd.read_csv('motion.csv')
index = dataset.index
print(len(index))
num_rows = len(index)
dataset.head()
EDIT: Question Clarified!
Simply, what do I have now is the following:
- 8 dataframes stored in a list (list shape (8,))
- Each dataframe shape is (300,93)
what do I want to do is have this list shaped to (8, 300, 93) for instance so it matches the input layer for the neural network.
As I keep getting the below error:
ValueError: cannot reshape array of size 8 into shape (8,300,93)
I am requesting clarification if possible as things are sort of vague at my end as to why I am having this error.
Thanks in-advance!
Advertisement
Answer
Wrote this function to handle the preprocessing to overcome the reshaping issue. Also, the function encodes the labels (y) using Scikit-Learn ‘LabelEncadoer()’.
## Data Preprocessing
from sklearn.preprocessing import LabelEncoder
def preprocess_df(df, start, quantity, numRows, df_name):
x = []
features = []
y = []
label_encoder = LabelEncoder()
for i in range(start, quantity):
data = pd.read_csv('{}'.format(df[i]), nrows=numRows, skiprows=1)
y.append(df[i])
x.append(data)
if i == start:
for j in range(0, len(data.columns)):
features.append(data.columns[j])
if df_name == 'test':
i = i - start
print('({}/{}) x[{}]: {}'.format(i+1, (quantity - start), i, x[i].shape))
else:
print('({}/{}) x[{}]: {}'.format(i+1, quantity, i, x[i].shape))
print('{} set (x) shape: {}, {} set (y) shape: {}'.format(df_name, np.array(x).shape, df_name, np.array(y).shape))
y = np.array(label_encoder.fit_transform(y))
return np.array(x), y, np.array(features)