FS-TFP/materials/notebook/(CN)CIKMCUP2022.ipynb

228 lines
7.2 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Step-by-step Guidance for CIKM 2022 AnalytiCup Competition"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1. 安装FederatedScope"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2022-07-26T09:23:22.036523Z",
"start_time": "2022-07-26T09:23:21.642646Z"
}
},
"outputs": [],
"source": [
"# 拷贝FederatedScope并切换到比赛所使用的分支cikm22competition\n",
"\n",
"!cp -r fs_latest FederatedScope\n",
"\n",
"# 如果不在playground中请输入命令`cd FederatedScope`\n",
"import os\n",
"os.chdir('FederatedScope')\n",
"\n",
"!git checkout cikm22competition"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2. 搭建运行环境"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2022-07-26T09:23:28.796456Z",
"start_time": "2022-07-26T09:23:22.038843Z"
}
},
"outputs": [],
"source": [
"# 安装 FS\n",
"!pip install -e . --user"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3. 下载比赛数据集"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2022-07-26T09:23:29.189922Z",
"start_time": "2022-07-26T09:23:28.798365Z"
}
},
"outputs": [],
"source": [
"# 比赛数据集已经提前下载到`data`目录下,通过如下的命令进行解压:\n",
"!mkdir -p data\n",
"!cp -r ../data/CIKM22Competition ./data/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"现在比赛数据集已经被放置在`FederatedScope/data/CIKM22Competition`目录下,遵循`CIKM22Competition/${client_id}`的方式进行组织,其中`${client_id}`是client的序号并从1开始计数。每个client的目录下包含训练train.pt测试test.pt和验证val.pt三个部分的数据。可以通过`torch.load`的方式来查看数据:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2022-07-26T09:23:29.823058Z",
"start_time": "2022-07-26T09:23:29.192273Z"
}
},
"outputs": [],
"source": [
"import torch\n",
"# 加载序号为1的client的训练数据\n",
"train_data_client1 = torch.load('./data/CIKM22Competition/1/train.pt')\n",
"# 查看训练数据中的第一个训练样例\n",
"print(train_data_client1[0])\n",
"# 查看训练样例的label\n",
"print(train_data_client1[0].y)\n",
"# 查看训练样例的序号,即${sample_id}\n",
"print(train_data_client1[0].data_index)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4. 在比赛数据上运行baseline算法"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"FederatedScope为比赛数据内置了两个baseline算法\"isolated training\"和\"FedAvg\"。假设你已经成功的搭建好FederatedScope的运行环境并下载了比赛数据那么可以通过以下的命令运行两个baseline算法"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2022-07-26T09:23:29.824268Z",
"start_time": "2022-07-26T09:23:29.824254Z"
}
},
"outputs": [],
"source": [
"# 作为示例我们在这里仅运行3轮用户可以按照自己的需求更改运行轮数\n",
"# 按照如下命令运行isolated training\n",
"!python federatedscope/main.py --cfg federatedscope/gfl/baseline/isolated_gin_minibatch_on_cikmcup.yaml --client_cfg federatedscope/gfl/baseline/isolated_gin_minibatch_on_cikmcup_per_client.yaml federate.total_round_num 3\n",
"\n",
"# 按照如下命令运行FedAvg\n",
"!python federatedscope/main.py --cfg federatedscope/gfl/baseline/fedavg_gin_minibatch_on_cikmcup.yaml --client_cfg federatedscope/gfl/baseline/fedavg_gin_minibatch_on_cikmcup_per_client.yaml federate.total_round_num 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"其中参数--cfg xxxx.yaml指定了全局的参数--client_cfg xxx.yaml为每一个client单独指定了参数"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5. 保存并提交预测结果"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 提交格式\n",
"如CIKM 2022 AnalytiCup Competition介绍中所述参赛者需要将所有client的预测结果保存到一个csv文件中进行提交。在这个csv文件中每一行代表一个测试样例的预测结果并且由`${client_id}`和`${sample_id}`标识所预测的测试样例。其中,`${client_id}`从1开始计数`${sample_id}`需要与比赛数据中的测试样例的序号保持一致(可以通过测试样例的`data_index`属性获得测试样例的序号)。\n",
"\n",
"需要注意的是,分类任务和多维度回归任务分别遵循以下不同的格式:\n",
"* 对分类任务,每一行应当遵循如下的格式(`${category_id}`从0开始计数)\n",
"\n",
"```\n",
"${client_id},${sample_id},${category_id}\n",
"```\n",
"\n",
"* 对于N维的回归任务每一行应当遵循如下的格式\n",
"```\n",
"${client_id},${sample_id},${prediction_1st_dimension},…,${prediction_N-th_dimension}\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 通过FederatedScope保存预测结果\n",
"FederatedScope中的\"cikm22competition\"分支目前支持在训练结束时保存预测结果(相关代码见`federatedscope/gfl/trainer/graphtrainer.py`和`federatedscope/core/trainers/torch_trainer.py`)。\n",
"预测结果将会被保存在一个名为prediction.csv的文件中该文件位于`outdir`参数所指定的目录中。每次运行时,如果`outdir`所指定的目录已经存在,则会使用一个时间戳后缀用以区分。\n",
"在训练结束时FederatedScope的训练记录也将标识出预测结果所在的目录如下所示\n",
"![Finish saving prediction results](https://img.alicdn.com/imgextra/i1/O1CN01L68gve1nGu3BNp5vL_!!6000000005063-2-tps-4766-404.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 提交预测结果\n",
"最后,参赛选手可以将预测结果下载到本地,并上传到[天池](https://tianchi.aliyun.com/competition/entrance/532008/introduction)。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.2 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.2"
},
"vscode": {
"interpreter": {
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}