diff --git a/examples/benchmarks/README.md b/examples/benchmarks/README.md
index 9dc7bec5e..b1b1be82a 100644
--- a/examples/benchmarks/README.md
+++ b/examples/benchmarks/README.md
@@ -47,9 +47,10 @@ The numbers shown below demonstrate the performance of the entire `workflow` of
| ALSTM (Yao Qin, et al.) | Alpha360 | 0.0497±0.00 | 0.3829±0.04 | 0.0599±0.00 | 0.4736±0.03 | 0.0626±0.02 | 0.8651±0.31 | -0.0994±0.03 |
| LSTM(Sepp Hochreiter, et al.) | Alpha360 | 0.0448±0.00 | 0.3474±0.04 | 0.0549±0.00 | 0.4366±0.03 | 0.0647±0.03 | 0.8963±0.39 | -0.0875±0.02 |
| GRU(Kyunghyun Cho, et al.) | Alpha360 | 0.0493±0.00 | 0.3772±0.04 | 0.0584±0.00 | 0.4638±0.03 | 0.0720±0.02 | 0.9730±0.33 | -0.0821±0.02 |
-| TCTS(Xueqing Wu, et al.) | Alpha360 | 0.0454±0.01 | 0.3457±0.06 | 0.0566±0.01 | 0.4492±0.05 | 0.0744±0.03 | 1.0594±0.41 | -0.0761±0.03 |
| GATs (Petar Velickovic, et al.) | Alpha360 | 0.0476±0.00 | 0.3508±0.02 | 0.0598±0.00 | 0.4604±0.01 | 0.0824±0.02 | 1.1079±0.26 | -0.0894±0.03 |
+| TCTS(Xueqing Wu, et al.) | Alpha360 | 0.0508±0.00 | 0.3931±0.04 | 0.0599±0.00 | 0.4756±0.03 | 0.0893±0.03 | 1.2256±0.36 | -0.0857±0.02 |
| TRA(Hengxu Lin, et al.) | Alpha360 | 0.0485±0.00 | 0.3787±0.03 | 0.0587±0.00 | 0.4756±0.03 | 0.0920±0.03 | 1.2789±0.42 | -0.0834±0.02 |
- The selected 20 features are based on the feature importance of a lightgbm-based model.
- The base model of DoubleEnsemble is LGBM.
+- The base model of TCTS is GRU.
diff --git a/examples/benchmarks/TCTS/README.md b/examples/benchmarks/TCTS/README.md
index ee67ffbeb..0b405c6be 100644
--- a/examples/benchmarks/TCTS/README.md
+++ b/examples/benchmarks/TCTS/README.md
@@ -1,52 +1,38 @@
# Temporally Correlated Task Scheduling for Sequence Learning
-We provide the [code](https://github.com/microsoft/qlib/blob/main/qlib/contrib/model/pytorch_tcts.py) for reproducing the stock trend forecasting experiments.
-
### Background
Sequence learning has attracted much research attention from the machine learning community in recent years. In many applications, a sequence learning task is usually associated with multiple temporally correlated auxiliary tasks, which are different in terms of how much input information to use or which future step to predict. In stock trend forecasting, as demonstrated in Figure1, one can predict the price of a stock in different future days (e.g., tomorrow, the day after tomorrow). In this paper, we propose a framework to make use of those temporally correlated tasks to help each other.
-
-
-
-
-
### Method
-Given that there are usually multiple temporally correlated tasks, the key challenge lies in which tasks to use and when to use them in the training process. In this work, we introduce a learnable task scheduler for sequence learning, which adaptively selects temporally correlated tasks during the training process. The scheduler accesses the model status and the current training data (e.g., in current minibatch), and selects the best auxiliary task to help the training of the main task. The scheduler and the model for the main task are jointly trained through bi-level optimization: the scheduler is trained to maximize the validation performance of the model, and the model is trained to minimize the training loss guided by the scheduler. The process is demonstrated in Figure2.
+Given that there are usually multiple temporally correlated tasks, the key challenge lies in which tasks to use and when to use them in the training process. This work introduces a learnable task scheduler for sequence learning, which adaptively selects temporally correlated tasks during the training process. The scheduler accesses the model status and the current training data (e.g., in the current minibatch) and selects the best auxiliary task to help the training of the main task. The scheduler and the model for the main task are jointly trained through bi-level optimization: the scheduler is trained to maximize the validation performance of the model, and the model is trained to minimize the training loss guided by the scheduler. The process is demonstrated in Figure2.
-At step
, with training data
, the scheduler
chooses a suitable task
(green solid lines) to update the model
(blue solid lines). After
steps, we evaluate the model
on the validation set and update the scheduler
(green dashed lines).
-
-### DataSet
-* We use the historical transaction data for 300 stocks on [CSI300](http://www.csindex.com.cn/en/indices/index-detail/000300) from 01/01/2008 to 08/01/2020.
-* We split the data into training (01/01/2008-12/31/2013), validation (01/01/2014-12/31/2015), and test sets (01/01/2016-08/01/2020) based on the transaction time.
+At step
, with training data
, the scheduler
chooses a suitable task
(green solid lines) to update the model
(blue solid lines). After
steps, we evaluate the model
on the validation set and update the scheduler
(green dashed lines).
### Experiments
-#### Task Description
-* The main tasks
(
in Figure1) refers to forecasting return of stock
as following,
+Due to different data versions and different Qlib versions, the original data and data preprocessing methods of the experimental settings in the paper are different from those experimental settings in the existing Qlib version. Therefore, we provide two versions of the code according to the two kinds of settings, 1) the [code](https://github.com/lwwang1995/tcts) that can be used to reproduce the experimental results and 2) the [code](https://github.com/microsoft/qlib/blob/main/qlib/contrib/model/pytorch_tcts.py) in the current Qlib baseline.
+
+#### Setting1
+* Dataset: We use the historical transaction data for 300 stocks on [CSI300](http://www.csindex.com.cn/en/indices/index-detail/000300) from 01/01/2008 to 08/01/2020. We split the data into training (01/01/2008-12/31/2013), validation (01/01/2014-12/31/2015), and test sets (01/01/2016-08/01/2020) based on the transaction time.
+
+* The main tasks
refers to forecasting return of stock
as following,
-

+
-* Temporally correlated task sets
, in this paper,
,
and
are used.
-#### Baselines
-* GRU/MLP/LightGBM (LGB)/Graph Attention Networks (GAT)
-* Multi-task learning (MTL): In multi-task learning, multiple tasks are jointly trained and mutually boosted. Each task is treated equally, while in our setting, we focus on the main task.
-* Curriculum transfer learning (CL): Transfer learning also leverages auxiliary tasks to boost the main task. [Curriculum transfer learning](https://arxiv.org/pdf/1804.00810.pdf) is one kind of transfer learning which schedules auxiliary tasks according to certain rules. Our problem can also be regarded as a special kind of transfer learning, where the auxiliary tasks are temporally correlated with the main task. Our learning process is dynamically controlled by a scheduler rather than some pre-defined rules. In the CL baseline, we start from the task
, then
, and gradually move to the last one.
-#### Result
-| Methods |
|
|
|
-| :----: | :----: | :----: | :----: |
-| GRU | 0.049 / 1.903 | 0.018 / 1.972 | 0.014 / 1.989 |
-| MLP | 0.023 / 1.961 | 0.022 / 1.962 | 0.015 / 1.978 |
-| LGB | 0.038 / 1.883 | 0.023 / 1.952 | 0.007 / 1.987 |
-| GAT | 0.052 / 1.898 | 0.024 / 1.954 | 0.015 / 1.973 |
-| MTL(
) | 0.061 / 1.862 | 0.023 / 1.942 | 0.012 / 1.956 |
-| CL(
) | 0.051 / 1.880 | 0.028 / 1.941 | 0.016 / 1.962 |
-| Ours(
) | 0.071 / 1.851 | 0.030 / 1.939 | 0.017 / 1.963 |
-| MTL(
) | 0.057 / 1.875 | 0.021 / 1.939 | 0.017 / 1.959 |
-| CL(
) | 0.056 / 1.877 | 0.028 / 1.942 | 0.015 / 1.962 |
-| Ours(
) | 0.075 / 1.849 | 0.032 /1.939 | 0.021 / 1.955 |
-| MTL(
) | 0.052 / 1.882 | 0.020 / 1.947 | 0.019 / 1.952 |
-| CL(
) | 0.051 / 1.882 | 0.028 / 1.950 | 0.016 / 1.961 |
-| Ours(
) | 0.067 / 1.867 | 0.030 / 1.960 | 0.022 / 1.942|
\ No newline at end of file
+* Temporally correlated task sets
, in this paper,
,
and
are used in
,
, and
.
+
+#### Setting2
+* Dataset: We use the historical transaction data for 300 stocks on [CSI300](http://www.csindex.com.cn/en/indices/index-detail/000300) from 01/01/2008 to 08/01/2020. We split the data into training (01/01/2008-12/31/2014), validation (01/01/2015-12/31/2016), and test sets (01/01/2017-08/01/2020) based on the transaction time.
+
+* The main tasks
refers to forecasting return of stock
as following,
+
+

+
+
+* In Qlib baseline,
, is used in
.
+
+### Experimental Result
+You can find the experimental result of setting1 in the [paper](http://proceedings.mlr.press/v139/wu21e/wu21e.pdf) and the experimental result of setting2 in this [page](https://github.com/microsoft/qlib/tree/main/examples/benchmarks).
\ No newline at end of file
diff --git a/examples/benchmarks/TCTS/task_description.png b/examples/benchmarks/TCTS/task_description.png
deleted file mode 100644
index 7a9005bf2..000000000
Binary files a/examples/benchmarks/TCTS/task_description.png and /dev/null differ
diff --git a/examples/benchmarks/TCTS/workflow_config_tcts_Alpha360.yaml b/examples/benchmarks/TCTS/workflow_config_tcts_Alpha360.yaml
index cd3bbf59c..484ed45b1 100644
--- a/examples/benchmarks/TCTS/workflow_config_tcts_Alpha360.yaml
+++ b/examples/benchmarks/TCTS/workflow_config_tcts_Alpha360.yaml
@@ -22,9 +22,9 @@ data_handler_config: &data_handler_config
- class: CSRankNorm
kwargs:
fields_group: label
- label: ["Ref($close, -1) / $close - 1",
- "Ref($close, -2) / Ref($close, -1) - 1",
- "Ref($close, -3) / Ref($close, -2) - 1"]
+ label: ["Ref($close, -2) / Ref($close, -1) - 1",
+ "Ref($close, -3) / Ref($close, -1) - 1",
+ "Ref($close, -4) / Ref($close, -1) - 1"]
port_analysis_config: &port_analysis_config
strategy:
class: TopkDropoutStrategy
@@ -53,9 +53,8 @@ task:
d_feat: 6
hidden_size: 64
num_layers: 2
- dropout: 0.0
+ dropout: 0.3
n_epochs: 200
- lr: 1e-3
early_stop: 20
batch_size: 800
metric: loss
@@ -64,12 +63,11 @@ task:
fore_optimizer: adam
weight_optimizer: adam
output_dim: 3
- fore_lr: 5e-4
- weight_lr: 5e-4
+ fore_lr: 2e-3
+ weight_lr: 2e-3
steps: 3
- target_label: 1
+ target_label: 0
lowest_valid_performance: 0.993
- seed: 0
dataset:
class: DatasetH
module_path: qlib.data.dataset
@@ -93,8 +91,7 @@ task:
kwargs:
ana_long_short: False
ann_scaler: 252
- label_col: 1
- class: PortAnaRecord
module_path: qlib.workflow.record_temp
kwargs:
- config: *port_analysis_config
+ config: *port_analysis_config
\ No newline at end of file
diff --git a/qlib/contrib/model/pytorch_tcts.py b/qlib/contrib/model/pytorch_tcts.py
index da7fda5f5..c0dae98e4 100644
--- a/qlib/contrib/model/pytorch_tcts.py
+++ b/qlib/contrib/model/pytorch_tcts.py
@@ -61,8 +61,9 @@ class TCTS(Model):
weight_lr=5e-7,
steps=3,
GPU=0,
- seed=None,
target_label=0,
+ mode="soft",
+ seed=None,
lowest_valid_performance=0.993,
**kwargs
):
@@ -87,6 +88,7 @@ class TCTS(Model):
self.weight_lr = weight_lr
self.steps = steps
self.target_label = target_label
+ self.mode = mode
self.lowest_valid_performance = lowest_valid_performance
self._fore_optimizer = fore_optimizer
self._weight_optimizer = weight_optimizer
@@ -100,6 +102,8 @@ class TCTS(Model):
"\nn_epochs : {}"
"\nbatch_size : {}"
"\nearly_stop : {}"
+ "\ntarget_label : {}"
+ "\nmode : {}"
"\nloss_type : {}"
"\nvisible_GPU : {}"
"\nuse_GPU : {}"
@@ -111,6 +115,8 @@ class TCTS(Model):
n_epochs,
batch_size,
early_stop,
+ target_label,
+ mode,
loss,
GPU,
self.use_gpu,
@@ -120,9 +126,17 @@ class TCTS(Model):
def loss_fn(self, pred, label, weight):
- loc = torch.argmax(weight, 1)
- loss = (pred - label[np.arange(weight.shape[0]), loc]) ** 2
- return torch.mean(loss)
+ if self.mode == "hard":
+ loc = torch.argmax(weight, 1)
+ loss = (pred - label[np.arange(weight.shape[0]), loc]) ** 2
+ return torch.mean(loss)
+
+ elif self.mode == "soft":
+ loss = (pred - label.transpose(0, 1)) ** 2
+ return torch.mean(loss * weight.transpose(0, 1))
+
+ else:
+ raise NotImplementedError("mode {} is not supported!".format(self.mode))
def train_epoch(self, x_train, y_train, x_valid, y_valid):
@@ -132,6 +146,10 @@ class TCTS(Model):
indices = np.arange(len(x_train_values))
np.random.shuffle(indices)
+ task_embedding = torch.zeros([self.batch_size, self.output_dim])
+ task_embedding[:, self.target_label] = 1
+ task_embedding = task_embedding.to(self.device)
+
init_fore_model = copy.deepcopy(self.fore_model)
for p in init_fore_model.parameters():
p.init_fore_model = False
@@ -155,12 +173,13 @@ class TCTS(Model):
init_pred = init_fore_model(feature)
pred = self.fore_model(feature)
-
dis = init_pred - label.transpose(0, 1)
- weight_feature = torch.cat((feature, dis.transpose(0, 1), label, init_pred.view(-1, 1)), 1)
+ weight_feature = torch.cat(
+ (feature, dis.transpose(0, 1), label, init_pred.view(-1, 1), task_embedding), 1
+ )
weight = self.weight_model(weight_feature)
- loss = self.loss_fn(pred, label, weight) # hard
+ loss = self.loss_fn(pred, label, weight)
self.fore_optimizer.zero_grad()
loss.backward()
@@ -188,11 +207,11 @@ class TCTS(Model):
pred = self.fore_model(feature)
dis = pred - label.transpose(0, 1)
- weight_feature = torch.cat((feature, dis.transpose(0, 1), label, pred.view(-1, 1)), 1)
+ weight_feature = torch.cat((feature, dis.transpose(0, 1), label, pred.view(-1, 1), task_embedding), 1)
weight = self.weight_model(weight_feature)
loc = torch.argmax(weight, 1)
- valid_loss = torch.mean((pred - label[:, 0]) ** 2)
- loss = torch.mean(-valid_loss * torch.log(weight[np.arange(weight.shape[0]), loc]))
+ valid_loss = torch.mean((pred - label[:, abs(self.target_label)]) ** 2)
+ loss = torch.mean(valid_loss * torch.log(weight[np.arange(weight.shape[0]), loc]))
self.weight_optimizer.zero_grad()
loss.backward()
@@ -207,7 +226,6 @@ class TCTS(Model):
self.fore_model.eval()
- scores = []
losses = []
indices = np.arange(len(x_values))
@@ -277,7 +295,7 @@ class TCTS(Model):
dropout=self.dropout,
)
self.weight_model = MLPModel(
- d_feat=360 + 2 * self.output_dim + 1,
+ d_feat=360 + 3 * self.output_dim + 1,
hidden_size=self.hidden_size,
num_layers=self.num_layers,
dropout=self.dropout,
@@ -303,8 +321,6 @@ class TCTS(Model):
best_loss = np.inf
best_epoch = 0
stop_round = 0
- fore_best_param = copy.deepcopy(self.fore_optimizer.state_dict())
- weight_best_param = copy.deepcopy(self.weight_optimizer.state_dict())
for epoch in range(self.n_epochs):
print("Epoch:", epoch)