FS-TFP/federatedscope/core/parallel/README.md

85 lines
2.3 KiB
Markdown

# Parallelization for standalone mode
To facilitate developers to quickly verify their algorithms, we designed and implemented `StandaloneMultiGPURunner` with torch distributed data parallel (DDP). The new runner can better utilize the computing resources of multiple GPUs and accelerate training in standalone mode of FederatedScope.
## When to use
Use `StandaloneMultiGPURunner` when you have **multiple GPUs (>=2)** in your machine and need quick verification with **standalone mode**.
## Configuration
Add `federate.process_num` item in the configuration file to parallelize the training.
> Note: `federate.process_num` only takes effect when `use_gpu=True`, `backend='torch'`, `federate.mode='standalone'` and `federate.share_local_model=False`, and the value is required to be not greater than the number of GPUs.
```yaml
use_gpu: True
backend: 'torch'
device: 0
early_stop:
patience: 5
seed: 12345
federate:
mode: standalone
client_num: 100
total_round_num: 20
sample_client_rate: 0.2
share_local_model: False
process_num: 4 # run 4 processes simultaneously
...
```
## Use cases
Here we give an example to demonstrate the efficiency of `StandaloneMultiGPURunner` compared to `StandaloneRunner`. The configuration file and experiment result are listed below.
The experiment result shows that the totoal running time of `StandaloneMultiGPURunner` is only 1/3 of `StandaloneRunner` in the case of 8 GPUs.
```yaml
use_gpu: True
device: 0
early_stop:
patience: 5
seed: 12345
federate:
mode: standalone
client_num: 100
total_round_num: 10
sample_client_rate: 0.4
share_local_model: False
# use StandaloneMultiGPURunner with 8 GPUs
process_num: 8
# use StandaloneRunner
# process_num: 1
data:
root: data/
type: femnist
splits: [0.6,0.2,0.2]
batch_size: 10
subsample: 0.05
num_workers: 0
transform: [['ToTensor'], ['Normalize', {'mean': [0.1307], 'std': [0.3081]}]]
model:
type: convnet2
hidden: 2048
out_channels: 62
train:
local_update_steps: 1
batch_or_epoch: epoch
optimizer:
lr: 0.01
weight_decay: 0.0
grad:
grad_clip: 5.0
criterion:
type: CrossEntropyLoss
trainer:
type: cvtrainer
eval:
freq: 10
metrics: ['acc', 'correct']
```
| | StandaloneMultiGPURunner | StandaloneRunner |
| :---: | :---: | :---: |
| Total running time (minute) | 0.2406 | 0.7292 |