mirror of
https://github.com/microsoft/qlib.git
synced 2026-06-29 17:11:20 +08:00
154 lines
6.1 KiB
ReStructuredText
154 lines
6.1 KiB
ReStructuredText
.. _model:
|
|
|
|
============================================
|
|
Interday Model: Model Training & Prediction
|
|
============================================
|
|
|
|
Introduction
|
|
===================
|
|
|
|
``Interday Model`` is designed to make the `prediction score` about stocks. Users can use the ``Interday Model`` in an automatic workflow by ``qrun``, please refer to `Workflow: Workflow Management <workflow.html>`_.
|
|
|
|
Because the components in ``Qlib`` are designed in a loosely-coupled way, ``Interday Model`` can be used as an independent module also.
|
|
|
|
Base Class & Interface
|
|
======================
|
|
|
|
``Qlib`` provides a base class `qlib.model.base.Model <../reference/api.html#module-qlib.model.base>`_ from which all models should inherit.
|
|
|
|
The base class provides the following interfaces:
|
|
|
|
- `__init__(**kwargs)`
|
|
- Initialization.
|
|
|
|
- `fit(self, dataset, **kwargs)`
|
|
- Train model.
|
|
- Parameter:
|
|
- `dataset`, ``Qlib``'s ``DatasetH`` type. For more information about ``DatasetH``, users can refer to the related document: `Qlib Dataset <../component/data.html#dataset>`_.
|
|
The `dataset` is passed into the `model`'s method because there are some unique data preprocessing procedures for each, we want to give each model maximum flexibility to handle the data that is suitable for their own.
|
|
The following code example shows how to retrieve `x_train`, `y_train` and `w_train` from the `dataset`:
|
|
|
|
.. code-block:: Python
|
|
|
|
# get features and labels
|
|
df_train, df_valid = dataset.prepare(
|
|
["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L
|
|
)
|
|
x_train, y_train = df_train["feature"], df_train["label"]
|
|
x_valid, y_valid = df_valid["feature"], df_valid["label"]
|
|
|
|
# get weights
|
|
try:
|
|
wdf_train, wdf_valid = dataset.prepare(["train", "valid"], col_set=["weight"], data_key=DataHandlerLP.DK_L)
|
|
w_train, w_valid = wdf_train["weight"], wdf_valid["weight"]
|
|
except KeyError as e:
|
|
w_train = pd.DataFrame(np.ones_like(y_train.values), index=y_train.index)
|
|
w_valid = pd.DataFrame(np.ones_like(y_valid.values), index=y_valid.index)
|
|
|
|
- `predict(self, dataset, **kwargs)`
|
|
- Predict test data.
|
|
- Parameter:
|
|
- `dataset`, ``Qlib``'s ``DatasetH`` type. The usage is similar to the example above.
|
|
- Returns:
|
|
- Predic results with type: `pandas.Series`.
|
|
|
|
- `finetune(self, dataset, **kwargs)`
|
|
- Finetune the model.
|
|
- Parameter:
|
|
- `dataset`, ``Qlib``'s ``DatasetH`` type. The usage is similar to the example above.
|
|
|
|
|
|
For other interfaces such as `finetune`, please refer to `Model API <../reference/api.html#module-qlib.model.base>`_.
|
|
|
|
Example
|
|
==================
|
|
|
|
``Qlib``'s `Model Zoo` includes models such as ``LightGBM``, ``DNN``, ``LSTM``, etc.. These models are treated as the baselines of ``Interday Model``. The following steps show how to run`` LightGBM`` as an independent module.
|
|
|
|
- Initialize ``Qlib`` with `qlib.init` first, please refer to `Initialization <../start/initialization.html>`_.
|
|
- Run the following code to get the `prediction score` `pred_score`
|
|
.. code-block:: Python
|
|
|
|
from qlib.contrib.model.gbdt import LGBModel
|
|
from qlib.contrib.data.handler import Alpha158
|
|
from qlib.utils import init_instance_by_config, flatten_dict
|
|
from qlib.workflow import R
|
|
from qlib.workflow.record_temp import SignalRecord, PortAnaRecord
|
|
|
|
market = "csi300"
|
|
benchmark = "SH000300"
|
|
|
|
data_handler_config = {
|
|
"start_time": "2008-01-01",
|
|
"end_time": "2020-08-01",
|
|
"fit_start_time": "2008-01-01",
|
|
"fit_end_time": "2014-12-31",
|
|
"instruments": market,
|
|
}
|
|
|
|
task = {
|
|
"model": {
|
|
"class": "LGBModel",
|
|
"module_path": "qlib.contrib.model.gbdt",
|
|
"kwargs": {
|
|
"loss": "mse",
|
|
"colsample_bytree": 0.8879,
|
|
"learning_rate": 0.0421,
|
|
"subsample": 0.8789,
|
|
"lambda_l1": 205.6999,
|
|
"lambda_l2": 580.9768,
|
|
"max_depth": 8,
|
|
"num_leaves": 210,
|
|
"num_threads": 20,
|
|
},
|
|
},
|
|
"dataset": {
|
|
"class": "DatasetH",
|
|
"module_path": "qlib.data.dataset",
|
|
"kwargs": {
|
|
"handler": {
|
|
"class": "Alpha158",
|
|
"module_path": "qlib.contrib.data.handler",
|
|
"kwargs": data_handler_config,
|
|
},
|
|
"segments": {
|
|
"train": ("2008-01-01", "2014-12-31"),
|
|
"valid": ("2015-01-01", "2016-12-31"),
|
|
"test": ("2017-01-01", "2020-08-01"),
|
|
},
|
|
},
|
|
},
|
|
}
|
|
|
|
# model initiaiton
|
|
model = init_instance_by_config(task["model"])
|
|
dataset = init_instance_by_config(task["dataset"])
|
|
|
|
# start exp
|
|
with R.start(experiment_name="workflow"):
|
|
# train
|
|
R.log_params(**flatten_dict(task))
|
|
model.fit(dataset)
|
|
|
|
# prediction
|
|
recorder = R.get_recorder()
|
|
sr = SignalRecord(model, dataset, recorder)
|
|
sr.generate()
|
|
|
|
.. note::
|
|
|
|
`Alpha158` is the data handler provided by ``Qlib``, please refer to `Data Handler <data.html#data-handler>`_.
|
|
`SignalRecord` is the `Record Template` in ``Qlib``, please refer to `Workflow <recorder.html#record-template>`_.
|
|
|
|
Also, the above example has been given in ``examples/train_backtest_analyze.ipynb``.
|
|
|
|
Custom Model
|
|
===================
|
|
|
|
Qlib supports custom models. If users are interested in customizing their own models and integrating the models into ``Qlib``, please refer to `Custom Model Integration <../start/integration.html>`_.
|
|
|
|
|
|
API
|
|
===================
|
|
Please refer to `Model API <../reference/api.html#module-qlib.model.base>`_.
|