The origin version of FederatedScope |
||
|---|---|---|
| .. | ||
| README.md | ||
| __init__.py | ||
| parallel_runner.py | ||
README.md
Parallelization for standalone mode
To facilitate developers to quickly verify their algorithms, we designed and implemented StandaloneMultiGPURunner with torch distributed data parallel (DDP). The new runner can better utilize the computing resources of multiple GPUs and accelerate training in standalone mode of FederatedScope.
When to use
Use StandaloneMultiGPURunner when you have multiple GPUs (>=2) in your machine and need quick verification with standalone mode.
Configuration
Add federate.process_num item in the configuration file to parallelize the training.
Note:
federate.process_numonly takes effect whenuse_gpu=True,backend='torch',federate.mode='standalone'andfederate.share_local_model=False, and the value is required to be not greater than the number of GPUs.
use_gpu: True
backend: 'torch'
device: 0
early_stop:
patience: 5
seed: 12345
federate:
mode: standalone
client_num: 100
total_round_num: 20
sample_client_rate: 0.2
share_local_model: False
process_num: 4 # run 4 processes simultaneously
...
Use cases
Here we give an example to demonstrate the efficiency of StandaloneMultiGPURunner compared to StandaloneRunner. The configuration file and experiment result are listed below.
The experiment result shows that the totoal running time of StandaloneMultiGPURunner is only 1/3 of StandaloneRunner in the case of 8 GPUs.
use_gpu: True
device: 0
early_stop:
patience: 5
seed: 12345
federate:
mode: standalone
client_num: 100
total_round_num: 10
sample_client_rate: 0.4
share_local_model: False
# use StandaloneMultiGPURunner with 8 GPUs
process_num: 8
# use StandaloneRunner
# process_num: 1
data:
root: data/
type: femnist
splits: [0.6,0.2,0.2]
batch_size: 10
subsample: 0.05
num_workers: 0
transform: [['ToTensor'], ['Normalize', {'mean': [0.1307], 'std': [0.3081]}]]
model:
type: convnet2
hidden: 2048
out_channels: 62
train:
local_update_steps: 1
batch_or_epoch: epoch
optimizer:
lr: 0.01
weight_decay: 0.0
grad:
grad_clip: 5.0
criterion:
type: CrossEntropyLoss
trainer:
type: cvtrainer
eval:
freq: 10
metrics: ['acc', 'correct']
| StandaloneMultiGPURunner | StandaloneRunner | |
|---|---|---|
| Total running time (minute) | 0.2406 | 0.7292 |