History

HengZhang 61b812d76f The origin version of FederatedScope The origin version of FederatedScope		2024-11-21 11:25:24 +08:00
..
README.md	The origin version of FederatedScope	2024-11-21 11:25:24 +08:00
__init__.py	The origin version of FederatedScope	2024-11-21 11:25:24 +08:00
parallel_runner.py	The origin version of FederatedScope	2024-11-21 11:25:24 +08:00

README.md

Parallelization for standalone mode

To facilitate developers to quickly verify their algorithms, we designed and implemented StandaloneMultiGPURunner with torch distributed data parallel (DDP). The new runner can better utilize the computing resources of multiple GPUs and accelerate training in standalone mode of FederatedScope.

When to use

Use StandaloneMultiGPURunner when you have multiple GPUs (>=2) in your machine and need quick verification with standalone mode.

Configuration

Add federate.process_num item in the configuration file to parallelize the training.

Note: federate.process_num only takes effect when use_gpu=True, backend='torch', federate.mode='standalone' and federate.share_local_model=False, and the value is required to be not greater than the number of GPUs.

use_gpu: True
backend: 'torch'
device: 0
early_stop:
  patience: 5
seed: 12345
federate:
  mode: standalone
  client_num: 100
  total_round_num: 20
  sample_client_rate: 0.2
  share_local_model: False
  process_num: 4 # run 4 processes simultaneously
...

Use cases

Here we give an example to demonstrate the efficiency of StandaloneMultiGPURunner compared to StandaloneRunner. The configuration file and experiment result are listed below. The experiment result shows that the totoal running time of StandaloneMultiGPURunner is only 1/3 of StandaloneRunner in the case of 8 GPUs.

use_gpu: True
device: 0
early_stop:
  patience: 5
seed: 12345
federate:
  mode: standalone
  client_num: 100
  total_round_num: 10
  sample_client_rate: 0.4
  share_local_model: False
  # use StandaloneMultiGPURunner with 8 GPUs
  process_num: 8
  # use StandaloneRunner
  # process_num: 1

data:
  root: data/
  type: femnist
  splits: [0.6,0.2,0.2]
  batch_size: 10
  subsample: 0.05
  num_workers: 0
  transform: [['ToTensor'], ['Normalize', {'mean': [0.1307], 'std': [0.3081]}]]
model:
  type: convnet2
  hidden: 2048
  out_channels: 62
train:
  local_update_steps: 1
  batch_or_epoch: epoch
  optimizer:
    lr: 0.01
    weight_decay: 0.0
grad:
  grad_clip: 5.0
criterion:
  type: CrossEntropyLoss
trainer:
  type: cvtrainer
eval:
  freq: 10
  metrics: ['acc', 'correct']

	StandaloneMultiGPURunner	StandaloneRunner
Total running time (minute)	0.2406	0.7292