diff --git a/README.md b/README.md index 35028328f..d9805b621 100644 --- a/README.md +++ b/README.md @@ -229,8 +229,11 @@ It also provides the API to run specific models at once. For more use cases, ple # Quant Dataset Zoo Dataset plays a very important role in Quant. Here is a list of the datasets built on `Qlib`. -- [Alpha360](./qlib/contrib/data/handler.py) -- [Alpha158](./qlib/contrib/data/handler.py) + +| Dataset | US Market | China Market | +| -- | -- | -- | +| [Alpha360](./qlib/contrib/data/handler.py) | √ | √ | +| [Alpha158](./qlib/contrib/data/handler.py) | √ | √ | [Here](https://qlib.readthedocs.io/en/latest/advanced/alpha.html) is a tutorial to build dataset with `Qlib`. Your PR to build new Quant dataset is highly welcomed. diff --git a/docs/component/workflow.rst b/docs/component/workflow.rst index 4ca010851..c44f1100f 100644 --- a/docs/component/workflow.rst +++ b/docs/component/workflow.rst @@ -19,9 +19,10 @@ With ``qrun``, user can easily run an `experiment`, which includes the following - Processing - Slicing - Model - - Training and inference (static or rolling) + - Training and inference - Saving & loading - Evaluation + - Forecast signal analysis - Backtest For each `experiment`, ``Qlib`` has a complete system to tracking all the information as well as artifacts generated during training, inference and evaluation phase. For more information about how Qlib handles `experiment`, please refer to the related document: `Recorder: Experiment Management <../component/recorder.html>`_. @@ -276,4 +277,4 @@ Here is the configuration details of different `Record Template` such as ``Signa kwargs: config: *port_analysis_config -For more information about the ``Record`` module in ``Qlib``, user can refer to the related document: `Record <../component/recorder.html#record-template>`_. \ No newline at end of file +For more information about the ``Record`` module in ``Qlib``, user can refer to the related document: `Record <../component/recorder.html#record-template>`_. diff --git a/docs/introduction/quick.rst b/docs/introduction/quick.rst index 55835b970..f228ce2af 100644 --- a/docs/introduction/quick.rst +++ b/docs/introduction/quick.rst @@ -61,7 +61,7 @@ Auto Quant Research Workflow - Workflow result - The result of ``qrun`` is as follows, which is also the result of ``Intraday Trading``. Please refer to `Intraday Trading <../component/backtest.html>`_. for more details about the result. + The result of ``qrun`` is as follows, which is also the typical result of ``Forecast model(alpha)``. Please refer to `Intraday Trading <../component/backtest.html>`_. for more details about the result. .. code-block:: python @@ -91,4 +91,4 @@ Auto Quant Research Workflow Custom Model Integration =============================================== -``Qlib`` provides several models such as ``lightGBM`` and ``MLP`` model as the baseline of ``Interday Model``. In addition to the default model, users can integrate their own custom models into ``Qlib``. If users are interested in the custom model, please refer to `Custom Model Integration <../start/integration.html>`_. +``Qlib`` provides a batch of models (such as ``lightGBM`` and ``MLP`` models) as examples of ``Interday Model``. In addition to the default model, users can integrate their own custom models into ``Qlib``. If users are interested in the custom model, please refer to `Custom Model Integration <../start/integration.html>`_. diff --git a/docs/start/initialization.rst b/docs/start/initialization.rst index 5615556b6..05a329df7 100644 --- a/docs/start/initialization.rst +++ b/docs/start/initialization.rst @@ -63,13 +63,14 @@ Besides `provider_uri` and `region`, `qlib.init` has other parameters. The follo If Qlib fails to connect redis via `redis_host` and `redis_port`, cache mechanism will not be used! Please refer to `Cache <../component/data.html#cache>`_ for details. - `exp_manager` Type: dict, optional parameter, the setting of `experiment manager` to be used in qlib. Users can specify an experiment manager class, as well as the tracking URI for all the experiments. However, please be aware that we only support input of a dictionary in the following style for `exp_manager`. For more information about `exp_manager`, users can refer to `Recorder: Experiment Management <../component/recorder.html>`_. - :: + .. code-block:: Python - { + # For example, if you want to set your tracking_uri to a , you can initialize qlib below + qlib.init(provider_uri=provider_uri, region=REG_CN, exp_manager= { "class": "MLflowExpManager", "module_path": "qlib.workflow.expm", "kwargs": { "uri": "python_execution_path/mlruns", "default_exp_name": "Experiment", } - } \ No newline at end of file + }) diff --git a/docs/start/integration.rst b/docs/start/integration.rst index 437c5ef6a..e36805c01 100644 --- a/docs/start/integration.rst +++ b/docs/start/integration.rst @@ -5,7 +5,7 @@ Custom Model Integration Introduction =================== -``Qlib``'s `Model Zoo` includes models such as ``LightGBM``, ``MLP``, ``LSTM``, etc.. These models are treated as the baselines of ``Interday Model``. In addition to the default models ``Qlib`` provide, users can integrate their own custom models into ``Qlib``. +``Qlib``'s `Model Zoo` includes models such as ``LightGBM``, ``MLP``, ``LSTM``, etc.. These models are examples of ``Interday Model``. In addition to the default models ``Qlib`` provide, users can integrate their own custom models into ``Qlib``. Users can integrate their own custom models according to the following steps. @@ -87,6 +87,7 @@ The Custom models need to inherit `qlib.model.base.Model <../reference/api.html# .. code-block:: Python def finetune(self, dataset: DatasetH, num_boost_round=10, verbose_eval=20): + # Based on existing model and finetune by train more rounds dtrain, _ = self._prepare_data(dataset) self.model = lgb.train( self.params, @@ -101,7 +102,7 @@ The Custom models need to inherit `qlib.model.base.Model <../reference/api.html# Configuration File ======================= -The configuration file is described in detail in the `Workflow <../component/workflow.html#complete-example>`_ document. In order to integrate the custom model into ``Qlib``, users need to modify the "model" field in the configuration file. +The configuration file is described in detail in the `Workflow <../component/workflow.html#complete-example>`_ document. In order to integrate the custom model into ``Qlib``, users need to modify the "model" field in the configuration file. The configuration describes which models to use and how we can initialize it. - Example: The following example describes the `model` field of configuration file about the custom lightgbm model mentioned above, where `module_path` is the module path, `class` is the class name, and `args` is the hyperparameter passed into the __init__ method. All parameters in the field is passed to `self._params` by `\*\*kwargs` in `__init__` except `loss = mse`. diff --git a/examples/workflow_by_code.ipynb b/examples/workflow_by_code.ipynb index d5711e0b5..5a992e339 100644 --- a/examples/workflow_by_code.ipynb +++ b/examples/workflow_by_code.ipynb @@ -1,11 +1,11 @@ { "cells": [ { + "cell_type": "markdown", + "metadata": {}, "source": [ "\"Open" - ], - "cell_type": "markdown", - "metadata": {} + ] }, { "cell_type": "code", @@ -28,16 +28,17 @@ "import sys, site\n", "from pathlib import Path\n", "\n", - "TEMP_CODE_DIR = str(Path(\"~/tmp/qlib_code\").expanduser().resolve())\n", "\n", "try:\n", " import qlib\n", - " scripts_dir = Path.cwd().parent.joinpath(\"scripts\")\n", "except ImportError:\n", " # install qlib\n", " ! pip install pyqlib\n", " # reload\n", " site.main()\n", + "\n", + "scripts_dir = Path.cwd().parent.joinpath(\"scripts\")\n", + "if not scripts_dir.joinpath(\"get_data.py\").exists():\n", " # download get_data.py script\n", " scripts_dir = Path(\"~/tmp/qlib_code/scripts\").expanduser().resolve()\n", " scripts_dir.mkdir(parents=True, exist_ok=True)\n", @@ -376,4 +377,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +} diff --git a/examples/workflow_by_code_finetune.py b/examples/workflow_by_code_finetune.py deleted file mode 100644 index 5e7c179ae..000000000 --- a/examples/workflow_by_code_finetune.py +++ /dev/null @@ -1,128 +0,0 @@ -# Copyright (c) Microsoft Corporation. -# Licensed under the MIT License. - -import sys -from pathlib import Path - -import qlib -import pandas as pd -from qlib.config import REG_CN -from qlib.contrib.model.gbdt import LGBModel -from qlib.contrib.data.handler import Alpha158 -from qlib.contrib.strategy.strategy import TopkDropoutStrategy -from qlib.contrib.evaluate import ( - backtest as normal_backtest, - risk_analysis, -) -from qlib.utils import exists_qlib_data, init_instance_by_config -from qlib.workflow import R -from qlib.workflow.record_temp import SignalRecord, PortAnaRecord - - -if __name__ == "__main__": - - # use default data - provider_uri = "~/.qlib/qlib_data/cn_data" # target_dir - if not exists_qlib_data(provider_uri): - print(f"Qlib data is not found in {provider_uri}") - sys.path.append(str(Path(__file__).resolve().parent.parent.joinpath("scripts"))) - from get_data import GetData - - GetData().qlib_data(target_dir=provider_uri, region=REG_CN) - - qlib.init(provider_uri=provider_uri, region=REG_CN) - - market = "csi300" - benchmark = "SH000300" - - ################################### - # train model - ################################### - data_handler_config = { - "start_time": "2008-01-01", - "end_time": "2020-08-01", - "fit_start_time": "2008-01-01", - "fit_end_time": "2014-12-31", - "instruments": market, - } - - task = { - "model": { - "class": "LGBModel", - "module_path": "qlib.contrib.model.gbdt", - "kwargs": { - "loss": "mse", - "colsample_bytree": 0.8879, - "learning_rate": 0.0421, - "subsample": 0.8789, - "lambda_l1": 205.6999, - "lambda_l2": 580.9768, - "max_depth": 8, - "num_leaves": 210, - "num_threads": 20, - }, - }, - "dataset": { - "class": "DatasetH", - "module_path": "qlib.data.dataset", - "kwargs": { - "handler": { - "class": "Alpha158", - "module_path": "qlib.contrib.data.handler", - "kwargs": data_handler_config, - }, - "segments": { - "train": ("2008-01-01", "2014-12-31"), - "valid": ("2015-01-01", "2016-12-31"), - "test": ("2017-01-01", "2020-08-01"), - }, - }, - }, - } - - port_analysis_config = { - "strategy": { - "class": "TopkDropoutStrategy", - "module_path": "qlib.contrib.strategy.strategy", - "kwargs": { - "topk": 50, - "n_drop": 5, - }, - }, - "backtest": { - "verbose": False, - "limit_threshold": 0.095, - "account": 100000000, - "benchmark": benchmark, - "deal_price": "close", - "open_cost": 0.0005, - "close_cost": 0.0015, - "min_cost": 5, - }, - } - - # model initiaiton - model = init_instance_by_config(task["model"]) - dataset = init_instance_by_config(task["dataset"]) - - # start exp to train init model - with R.start(experiment_name="init models"): - model.fit(dataset) - R.save_objects(init_model=model) - rid = R.get_recorder().id - - # Finetune model based on previous trained model - with R.start(experiment_name="finetune model"): - recorder = R.get_recorder(rid, experiment_name="init models") - model = recorder.load_object("init_model") - model.finetune(dataset, num_boost_round=10) - R.save_objects(model=model) - - # prediction - recorder = R.get_recorder() - sr = SignalRecord(model, dataset, recorder) - sr.generate() - - # backtest - par = PortAnaRecord(recorder, port_analysis_config) - par.generate() diff --git a/qlib/__init__.py b/qlib/__init__.py index 3fecc85c3..2b8989303 100644 --- a/qlib/__init__.py +++ b/qlib/__init__.py @@ -2,7 +2,7 @@ # Licensed under the MIT License. -__version__ = "0.5.1.dev0" +__version__ = "0.6.0.alpha" import os import re diff --git a/qlib/contrib/model/catboost_model.py b/qlib/contrib/model/catboost_model.py index 01830d1b5..d57c32b70 100644 --- a/qlib/contrib/model/catboost_model.py +++ b/qlib/contrib/model/catboost_model.py @@ -1,14 +1,5 @@ -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT License. import numpy as np import pandas as pd diff --git a/qlib/contrib/model/gbdt.py b/qlib/contrib/model/gbdt.py index e52c05906..058d9a0e3 100644 --- a/qlib/contrib/model/gbdt.py +++ b/qlib/contrib/model/gbdt.py @@ -80,6 +80,7 @@ class LGBModel(ModelFT): verbose_eval : int verbose level """ + # Based on existing model and finetune by train more rounds dtrain, _ = self._prepare_data(dataset) self.model = lgb.train( self.params, diff --git a/qlib/contrib/model/pytorch_sfm.py b/qlib/contrib/model/pytorch_sfm.py index 8fddd1612..228c0aee5 100644 --- a/qlib/contrib/model/pytorch_sfm.py +++ b/qlib/contrib/model/pytorch_sfm.py @@ -1,15 +1,6 @@ # Copyright (c) Microsoft Corporation. -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# Licensed under the MIT License. + from __future__ import division from __future__ import print_function diff --git a/qlib/contrib/model/xgboost.py b/qlib/contrib/model/xgboost.py index c9e45d4ac..ba2e5789b 100755 --- a/qlib/contrib/model/xgboost.py +++ b/qlib/contrib/model/xgboost.py @@ -1,14 +1,5 @@ -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT License. import numpy as np import pandas as pd diff --git a/qlib/model/base.py b/qlib/model/base.py index fd220cd7e..c9bef1152 100644 --- a/qlib/model/base.py +++ b/qlib/model/base.py @@ -56,6 +56,23 @@ class ModelFT(Model): def finetune(self, dataset: Dataset): """finetune model based given dataset + A typical use case of finetuning model with qlib.workflow.R + + .. code-block:: python + + # start exp to train init model + with R.start(experiment_name="init models"): + model.fit(dataset) + R.save_objects(init_model=model) + rid = R.get_recorder().id + + # Finetune model based on previous trained model + with R.start(experiment_name="finetune model"): + recorder = R.get_recorder(rid, experiment_name="init models") + model = recorder.load_object("init_model") + model.finetune(dataset, num_boost_round=10) + + Parameters ---------- dataset : Dataset diff --git a/setup.py b/setup.py index 3438781b2..0696a766f 100644 --- a/setup.py +++ b/setup.py @@ -12,7 +12,7 @@ from setuptools import find_packages, setup, Extension NAME = "pyqlib" DESCRIPTION = "A Quantitative-research Platform" REQUIRES_PYTHON = ">=3.5.0" -VERSION = "0.5.1.dev0" +VERSION = "0.6.0.alpha" # Detect Cython try: