Python

Good Day,

I am trying to train LSTM using multiple excel files (Motion Capture Data) as input. Each excel file represents a body motion, I would like to train the network using multiple motions in the training set and in the tests set. Below the example of a single excel file:

As for the input shape, it’s (1, 2751, 93), the input dimension breakdown: samples: 1, time steps: 2751, features: 93

The input independent variable (x) is the human joints along with their positions, and the dependent (y) is the labels of each movement.

Thanks in Advance!

EDIT: Added Elaborate Code

# Multiple Sheets
import os
import glob
motionName = []
for ds in glob.glob("*.csv"):
    head, tail = os.path.split(str(ds)) 
    motionName.append(tail)
    print('Motion Name: ', tail)

import pandas as pd
num_rows = 300
samples = 0
datasets = []
activityIndex = []
list_num_features = [[]]
for i, activity in enumerate(motionName):
    data = pd.read_csv('{}'.format(motionName[i]), nrows = num_rows, header=None, skiprows=1)
    list_num_features.append([])
    datasets.append(data)
    #datasets[i].append(data)
    for j in range(0, len(data.columns)):
      list_num_features[i].append(data.columns[j])
      
    activityIndex.append('{}'.format(motionName[i]))
    samples += 1
print('activityIndex : {} '.format(activityIndex))
for i in range(0, len(datasets)-1):
  print('{}'.format(motionName[i]))
  print(datasets[i].head())

JavaScript
​x
 
# Multiple Sheets
import os
import glob
motionName = []
for ds in glob.glob("*.csv"):
    head, tail = os.path.split(str(ds)) 
    motionName.append(tail)
    print('Motion Name: ', tail)
​
import pandas as pd
num_rows = 300
samples = 0
datasets = []
activityIndex = []
list_num_features = [[]]
for i, activity in enumerate(motionName):
    data = pd.read_csv('{}'.format(motionName[i]), nrows = num_rows, header=None, skiprows=1)
    list_num_features.append([])
    datasets.append(data)
    #datasets[i].append(data)
    for j in range(0, len(data.columns)):
      list_num_features[i].append(data.columns[j])
      
    activityIndex.append('{}'.format(motionName[i]))
    samples += 1
print('activityIndex : {} '.format(activityIndex))
for i in range(0, len(datasets)-1):
  print('{}'.format(motionName[i]))
  print(datasets[i].head())
​

The output:

Whereby, the expected output to get when invoking the ‘df.head()’ is something similar to this output:

What I am trying to do is to be able to get/print every record (row) separately when desired. I was able to do that when loading a single dataframe using the below sample code, but failed when tried to load multiple dataframes into a list then trying to implement the same step for each dataframe using a loop.

# Single Sheet
import pandas as pd
dataset = pd.read_csv('motion.csv')
index = dataset.index
print(len(index))
num_rows = len(index)
dataset.head()

JavaScript
 
# Single Sheet
import pandas as pd
dataset = pd.read_csv('motion.csv')
index = dataset.index
print(len(index))
num_rows = len(index)
dataset.head()
​

EDIT: Question Clarified!

Simply, what do I have now is the following:

8 dataframes stored in a list (list shape (8,))
Each dataframe shape is (300,93)

what do I want to do is have this list shaped to (8, 300, 93) for instance so it matches the input layer for the neural network.

As I keep getting the below error:

ValueError: cannot reshape array of size 8 into shape (8,300,93)

JavaScript
 
ValueError: cannot reshape array of size 8 into shape (8,300,93)
​

I am requesting clarification if possible as things are sort of vague at my end as to why I am having this error.

Thanks in-advance!

Answer

Wrote this function to handle the preprocessing to overcome the reshaping issue. Also, the function encodes the labels (y) using Scikit-Learn ‘LabelEncadoer()’.

## Data Preprocessing 
from sklearn.preprocessing import LabelEncoder
def preprocess_df(df, start, quantity, numRows, df_name):
    x = []
    features = []
    y = []
    label_encoder = LabelEncoder()
    for i in range(start, quantity):
        data = pd.read_csv('{}'.format(df[i]), nrows=numRows, skiprows=1)
        y.append(df[i])
        x.append(data)
        if i == start:    
            for j in range(0, len(data.columns)):
                features.append(data.columns[j])
        if df_name == 'test':
            i = i - start
            print('({}/{}) x[{}]: {}'.format(i+1, (quantity - start), i, x[i].shape))
        else:
            print('({}/{}) x[{}]: {}'.format(i+1, quantity, i, x[i].shape))
    print('{} set (x) shape: {}, {} set (y) shape: {}'.format(df_name, np.array(x).shape, df_name, np.array(y).shape))
    y = np.array(label_encoder.fit_transform(y))
    return np.array(x), y, np.array(features)

JavaScript
 
## Data Preprocessing 
from sklearn.preprocessing import LabelEncoder
def preprocess_df(df, start, quantity, numRows, df_name):
    x = []
    features = []
    y = []
    label_encoder = LabelEncoder()
    for i in range(start, quantity):
        data = pd.read_csv('{}'.format(df[i]), nrows=numRows, skiprows=1)
        y.append(df[i])
        x.append(data)
        if i == start:    
            for j in range(0, len(data.columns)):
                features.append(data.columns[j])
        if df_name == 'test':
            i = i - start
            print('({}/{}) x[{}]: {}'.format(i+1, (quantity - start), i, x[i].shape))
        else:
            print('({}/{}) x[{}]: {}'.format(i+1, quantity, i, x[i].shape))
    print('{} set (x) shape: {}, {} set (y) shape: {}'.format(df_name, np.array(x).shape, df_name, np.array(y).shape))
    y = np.array(label_encoder.fit_transform(y))
    return np.array(x), y, np.array(features)
​

Reshape Python List to Match Input Layer (Data preprocessing – Keras – LSTM – MoCap)

EDIT: Added Elaborate Code

EDIT: Question Clarified!

Advertisement

Answer