How could I transform my dataset (composed of images) in a federated dataset? I am trying to create something similar to emnist but for my own dataset.
tff.simulation.datasets.emnist.load_data( only_digits=True, cache_dir=None )
Advertisement
Answer
You will need to create the clientData object first
for example:
JavaScript
x
3
1
client_data = tff.simulation.datasets.ClientData.from_clients_and_tf_fn(client_ids,
2
create_dataset)
3
where create_dataset is a serializable function but first you have to prepare your images read this tutorial about preprocessing data
JavaScript
1
15
15
1
labels_tf = tf.convert_to_tensor(labels)
2
3
def parse_image(filename):
4
5
parts = tf.strings.split(filename, os.sep)
6
label_str = parts[-2]
7
8
label_int = tf.where(labels_tf == label_str)[0][0]
9
image = tf.io.read_file(filename)
10
image = tf.io.decode_jpeg(image,channels=3)
11
image = tf.image.convert_image_dtype(image, tf.float32)
12
image = tf.image.resize(image, [32, 32])
13
14
return image, label_int
15
When you prepared your data pass it to the create_dataset function
JavaScript
1
9
1
def create_dataset(client_id):
2
.
3
4
list_ds = tf.data.Dataset.list_files(<path of your dataset>)
5
6
images_ds = list_ds.map(parse_image)
7
8
return images_ds
9
after this step, you can make some preprocessing function
JavaScript
1
12
12
1
NUM_CLIENTS = 10
2
NUM_EPOCHS = 5
3
BATCH_SIZE = 20
4
SHUFFLE_BUFFER = 100
5
PREFETCH_BUFFER = 10
6
7
def preprocess(dataset):
8
9
10
return dataset.repeat(NUM_EPOCHS).shuffle(SHUFFLE_BUFFER, seed=1).batch(
11
BATCH_SIZE).prefetch(PREFETCH_BUFFER)
12
After this you could make a tf.data.Dataset which will be suitable for federated training.
JavaScript
1
6
1
def make_federated_data(client_data, client_ids):
2
return [
3
preprocess(client_data.create_tf_dataset_for_client(x))
4
for x in client_ids
5
]
6
After this your dataset is ready for federated learning!