FS-TFP/materials/notebook/(EN)CIKMCUP2022.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step-by-step Guidance for CIKM 2022 AnalytiCup Competition"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step1. Install FederatedScope"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-07-26T09:23:22.036523Z",
     "start_time": "2022-07-26T09:23:21.642646Z"
    }
   },
   "outputs": [],
   "source": [
    "# Copy FederatedScope and switch to the stable branch cikm22competitionfor competition\n",
    "\n",
    "!cp -r fs_latest FederatedScope\n",
    "\n",
    "# `cd FederatedScope` if you are NOT in the playground.\n",
    "import os\n",
    "os.chdir('FederatedScope')\n",
    "\n",
    "!git checkout cikm22competition"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2. Setup the running environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-07-26T09:23:28.796456Z",
     "start_time": "2022-07-26T09:23:22.038843Z"
    }
   },
   "outputs": [],
   "source": [
    "# Install FS as follows\n",
    "!pip install -e . --user"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3. Download contest data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-07-26T09:23:29.189922Z",
     "start_time": "2022-07-26T09:23:28.798365Z"
    }
   },
   "outputs": [],
   "source": [
    "# We have downloaded the data in advance in `data`, unzip the contest data as follows:\n",
    "!mkdir -p data\n",
    "!cp -r ../data/CIKM22Competition ./data/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then you can access the contest data with the directory FederatedScope/data/CIKM22Competition. The contest data is organized by the index of the client CIKM22Competition/${client_id}(counts from 1), and the data of each client contains the train, test and validate splitted parts. You can load it by torch.loadas follows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-07-26T09:23:29.823058Z",
     "start_time": "2022-07-26T09:23:29.192273Z"
    }
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "# The train split of client 1\n",
    "train_data_client1 = torch.load('./data/CIKM22Competition/1/train.pt')\n",
    "# Check the first sample\n",
    "print(train_data_client1[0])\n",
    "# Check the label of the first sample\n",
    "print(train_data_client1[0].y)\n",
    "# Check the index of the first sample as ${sample_id}\n",
    "print(train_data_client1[0].data_index)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4. Execute baselines on the contest data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Within FederatedScope, we build in two baselines for the contest data, \"isolated training\" and \"FedAvg\". Suppose you have successfully built the running environment, and downloaded the contest data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-07-26T09:23:29.824268Z",
     "start_time": "2022-07-26T09:23:29.824254Z"
    }
   },
   "outputs": [],
   "source": [
    "# We only run 3 round in playground, you can modify it by yourself.\n",
    "# Run the following command to execute the isolated training\n",
    "!python federatedscope/main.py --cfg federatedscope/gfl/baseline/isolated_gin_minibatch_on_cikmcup.yaml --client_cfg federatedscope/gfl/baseline/isolated_gin_minibatch_on_cikmcup_per_client.yaml federate.total_round_num 3\n",
    "\n",
    "# Run the following command to execute the FedAvg solution\n",
    "!python federatedscope/main.py --cfg federatedscope/gfl/baseline/fedavg_gin_minibatch_on_cikmcup.yaml --client_cfg federatedscope/gfl/baseline/fedavg_gin_minibatch_on_cikmcup_per_client.yaml federate.total_round_num 3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where the argument--cfg xxxx.yaml specifies the global configuration, and --client_cfg xxx.yaml specifies the client-wise hyper parameters."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 5. Save and submit the prediction results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Submission format\n",
    "As stated in the introduction of CIKM 2022 AnalytiCup Competition, participants are required to submit the prediction results for all clients within one csv file. Within the file, each line records one prediction and is identified by `${client_id}` and `${sample_id}`. The `${client_id}` counts from 1 and `${sample_id}` should be consistent with the contest data (You can access it by the attribute data_index).\n",
    "\n",
    "The classification and multi-dimensional regression tasks follow different formats as follows\n",
    "* For classification tasks, each line follows (`${category_id}` counts from 0)\n",
    "\n",
    "```\n",
    "${client_id},${sample_id},${category_id}\n",
    "```\n",
    "\n",
    "For N-dimensional regression task, each line follows\n",
    "```\n",
    "${client_id},${sample_id},${prediction_1st_dimension},…,${prediction_N-th_dimension}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Saving prediction results\n",
    "By FederatedScope\n",
    "The \"cikm22competition\" branch in FederatedScope supports to save prediction results at the end of training. You can refer to code in federatedscope/gfl/trainer/graphtrainer.py and federatedscope/core/trainers/torch_trainer.py.\n",
    "The prediction results will be saved in a csv file named prediction.csv. For the convenience of users to conduct multiple experiments (e.g., for HPO), prediction.csv of each experimental run with a spcific configuration will be placed in this experiment's output directory (specified by outdir), which will be automatically appended with a suffix of timestamp if the specified directory has been there.\n",
    "The training log will report the directory of the prediction results. Taking FedAvg as an example, at the end of training FederatedScope will report the path of prediction results as follows.\n",
    "![Finish saving prediction results](https://img.alicdn.com/imgextra/i1/O1CN01L68gve1nGu3BNp5vL_!!6000000005063-2-tps-4766-404.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Submit prediction results\n",
    "Finally, you can submit your prediction results and get your score in [Tianchi](https://tianchi.aliyun.com/competition/entrance/532008/introduction)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8.2 64-bit",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  },
  "vscode": {
   "interpreter": {
    "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}