|
|
||
|---|---|---|
| .. | ||
| README.md | ||
| __init__.py | ||
| base_data.py | ||
| base_translator.py | ||
| dummy_translator.py | ||
| raw_translator.py | ||
| utils.py | ||
| wrap_dataset.py | ||
README.md
DataZoo
FederatedScope provides a rich collection of federated datasets for researchers, including images, texts, graphs, recommendation systems, and speeches, as well as utility classes BaseDataTranslator for building your own FS datasets.
Built-in FS data
All datasets can be accessed from federatedscope.core.auxiliaries.data_builder.get_data, which are built to federatedscope.core.data.StandaloneDataDict (for more details, see [DataZoo advanced]). By setting cfg.data.type = DATASET_NAME, FS would download and pre-process a specific dataset to be passed to FedRunner. For example:
# Source: federatedscope/main.py
data, cfg = get_data(cfg)
runner = FedRunner(data=data,
server_class=get_server_cls(cfg),
client_class=get_client_cls(cfg),
config=cfg.clone())
We provide a look-up table for you to get started with our DataZoo:
cfg.data.type |
Domain |
|---|---|
| FEMNIST | CV |
| Celeba | CV |
| {DNAME}@torchvision | CV |
| Shakespeare | NLP |
| SubReddit | NLP |
| Twitter (Sentiment140) | NLP |
| {DNAME}@torchtext | NLP |
| {DNAME}@huggingface_datasets | NLP |
| Cora | Graph (node-level) |
| CiteSeer | Graph (node-level) |
| PubMed | Graph (node-level) |
| DBLP_conf | Graph (node-level) |
| DBLP_org | Graph (node-level) |
| csbm | Graph (node-level) |
| Epinions | Graph (link-level) |
| Ciao | Graph (link-level) |
| FB15k | Graph (link-level) |
| FB15k-237 | Graph (link-level) |
| WN18 | Graph (link-level) |
| MUTAG | Graph (graph-level) |
| BZR | Graph (graph-level) |
| COX2 | Graph (graph-level) |
| DHFR | Graph (graph-level) |
| PTC_MR | Graph (graph-level) |
| AIDS | Graph (graph-level) |
| NCI1 | Graph (graph-level) |
| ENZYMES | Graph (graph-level) |
| DD | Graph (graph-level) |
| PROTEINS | Graph (graph-level) |
| COLLAB | Graph (graph-level) |
| IMDB-BINARY | Graph (graph-level) |
| IMDB-MULTI | Graph (graph-level) |
| REDDIT-BINARY | Graph (graph-level) |
| HIV | Graph (graph-level) |
| ESOL | Graph (graph-level) |
| FREESOLV | Graph (graph-level) |
| LIPO | Graph (graph-level) |
| PCBA | Graph (graph-level) |
| MUV | Graph (graph-level) |
| BACE | Graph (graph-level) |
| BBBP | Graph (graph-level) |
| TOX21 | Graph (graph-level) |
| TOXCAST | Graph (graph-level) |
| SIDER | Graph (graph-level) |
| CLINTOX | Graph (graph-level) |
| graph_multi_domain_mol | Graph (graph-level) |
| graph_multi_domain_small | Graph (graph-level) |
| graph_multi_domain_biochem | Graph (graph-level) |
| cikmcup | Graph (graph-level) |
| toy | Tabular |
| synthetic | Tabular |
| quadratic | Tabular |
| {DNAME}openml | Tabular |
| vertical_fl_data | Tabular(vertical) |
| VFLMovieLens1M | Recommendation |
| VFLMovieLens10M | Recommendation |
| HFLMovieLens1M | Recommendation |
| HFLMovieLens10M | Recommendation |
| VFLNetflix | Recommendation |
| HFLNetflix | Recommendation |
DataZoo Advanced
In this section, we will introduce key concepts and tools to help you understand how FS data works and how to use it to build your own data in FS.
Concepts:
-
federatedscope.core.data.ClientData-
ClientDatais a subclass ofdict. In federated learning, each client (server) owns aClientDatafor training, validating, or testing. Thus, eachClientDatahas one or more oftrain,val, andtestas keys, andDataLoaderaccordingly. -
The
DataLoaderof each key is created bysetup()method, which specifies the arguments ofDataLoader, such asbatch_size,shuffleofcfg.Example:
# Instantiate client_data for each Client client_data = ClientData(DataLoader, cfg, train=train_data, val=None, test=test_data) # other_cfg with different batch size client_data.setup(other_cfg) print(client_data) >> {'train': DataLoader(train_data), 'test': DataLoader(test_data)}
-
-
federatedscope.core.data.StandaloneDataDictStandaloneDataDictis a subclass ofdict. As the name implies,StandaloneDataDictconsists of allClientDatawith client index as key (0,1,2, ...) in standalone mode. The key0is the data of the server for global evaluation or other usages.- The method
preprocess()inStandaloneDataDictmakes changes to innerClientDatawhencfgchanges, such as in global mode, we setcfg.federate.method == "global", andStandaloneDataDictwill merge allClientDatato one client to perform global training.
Tools
-
federatedscope.core.data.BaseDataTranslator-
BaseDataTranslatorconvertstorch.utils.data.Datasetordictof data split toStandaloneDataDictaccording tocfg. After translating, it can be directly passed toFedRunnerto launch a FL course. -
BaseDataTranslatorwill split data totrain,val,andtestbycfg.data.splits(ML split). And usingSplitterto split each data split to each client (FL split). In order to useBaseDataTranslator,cfg.data.splitter,cfg.federate.client_num,and other arguments ofSplittermust be specified.
Example:
cfg.data.splitter = 'lda' cfg.federate.client_num = 5 cfg.data.splitter_args = [{'alpha': 0.2}] translator = BaseDataTranslator(global_cfg, DataLoader) raw_data = CIFAR10() fs_data = translator(raw_data) runner = FedRunner(data=fs_data, server_class=get_server_cls(cfg), client_class=get_client_cls(cfg), config=cfg.clone()) -
-
- To generate simulated federation datasets, we provide
splitterwho are responsible for dispersing a given standalone dataset into multiple clients, with configurable statistical heterogeneity among them.
We provide a look-up table for you to get started with our
Splitter:cfg.data.splitterDomain Arguments LDA Generic alphaLouvain Graph (node-level) deltaRandom Graph (node-level) sampling_rate,overlapping_rate,drop_edgerel_type Graph (link-level) alphaScaffold Molecular - Scaffold_lda Molecular alphaRand_chunk Graph (graph-level) - - To generate simulated federation datasets, we provide