Merge

2026-07-04 03:21:00 +08:00 · 2020-11-28 16:38:31 +08:00
parent 680e8f9260 3f47a282cc
commit 1aeff4f1e2
14 changed files with 50 additions and 180 deletions
--- a/README.md
+++ b/README.md
@@ -229,8 +229,11 @@ It also provides the API to run specific models at once. For more use cases, ple

 # Quant Dataset Zoo
 Dataset plays a very important role in Quant. Here is a list of the datasets built on `Qlib`.
- [Alpha360](./qlib/contrib/data/handler.py)
- [Alpha158](./qlib/contrib/data/handler.py)
+
+| Dataset                                    | US Market | China Market |
+| --                                         | --        | --           |
+| [Alpha360](./qlib/contrib/data/handler.py) |  √        |  √           |
+| [Alpha158](./qlib/contrib/data/handler.py) |  √        |  √           | 

 [Here](https://qlib.readthedocs.io/en/latest/advanced/alpha.html) is a tutorial to build dataset with `Qlib`.
 Your PR to build new Quant dataset is highly welcomed.
--- a/docs/component/workflow.rst
+++ b/docs/component/workflow.rst
@@ -19,9 +19,10 @@ With ``qrun``, user can easily run an `experiment`, which includes the following
    - Processing
    - Slicing
 - Model
-    - Training and inference (static or rolling)
+    - Training and inference
    - Saving & loading
 - Evaluation
+    - Forecast signal analysis
    - Backtest

 For each `experiment`, ``Qlib`` has a complete system to tracking all the information as well as artifacts generated during training, inference and evaluation phase. For more information about how Qlib handles `experiment`, please refer to the related document: `Recorder: Experiment Management <../component/recorder.html>`_.
@@ -276,4 +277,4 @@ Here is the configuration details of different `Record Template` such as ``Signa
          kwargs: 
            config: *port_analysis_config

-For more information about the ``Record`` module in ``Qlib``, user can refer to the related document: `Record <../component/recorder.html#record-template>`_.
+For more information about the ``Record`` module in ``Qlib``, user can refer to the related document: `Record <../component/recorder.html#record-template>`_.
--- a/docs/introduction/quick.rst
+++ b/docs/introduction/quick.rst
@@ -61,7 +61,7 @@ Auto Quant Research Workflow


    - Workflow result
-        The result of ``qrun`` is as follows, which is also the result of ``Intraday Trading``. Please refer to  `Intraday Trading <../component/backtest.html>`_. for more details about the result.
+        The result of ``qrun`` is as follows, which is also the typical result of ``Forecast model(alpha)``. Please refer to  `Intraday Trading <../component/backtest.html>`_. for more details about the result.

        .. code-block:: python
        
@@ -91,4 +91,4 @@ Auto Quant Research Workflow
 Custom Model Integration
 ===============================================

-``Qlib`` provides several models such as ``lightGBM`` and ``MLP`` model as the baseline of ``Interday Model``. In addition to the default model, users can integrate their own custom models into ``Qlib``. If users are interested in the custom model, please refer to `Custom Model Integration <../start/integration.html>`_.
+``Qlib`` provides a batch of models (such as ``lightGBM`` and ``MLP`` models) as examples of ``Interday Model``. In addition to the default model, users can integrate their own custom models into ``Qlib``. If users are interested in the custom model, please refer to `Custom Model Integration <../start/integration.html>`_.
--- a/docs/start/initialization.rst
+++ b/docs/start/initialization.rst
@@ -63,13 +63,14 @@ Besides `provider_uri` and `region`, `qlib.init` has other parameters. The follo
        If Qlib fails to connect redis via `redis_host` and `redis_port`, cache mechanism will not be used! Please refer to `Cache <../component/data.html#cache>`_ for details.
 - `exp_manager`
    Type: dict, optional parameter, the setting of `experiment manager` to be used in qlib. Users can specify an experiment manager class, as well as the tracking URI for all the experiments. However, please be aware that we only support input of a dictionary in the following style for `exp_manager`. For more information about `exp_manager`, users can refer to `Recorder: Experiment Management <../component/recorder.html>`_.
-    ::
+    .. code-block:: Python

-        {
+        # For example, if you want to set your tracking_uri to a <specific folder>, you can initialize qlib below
+        qlib.init(provider_uri=provider_uri, region=REG_CN, exp_manager= {
            "class": "MLflowExpManager",
            "module_path": "qlib.workflow.expm",
            "kwargs": {
                "uri": "python_execution_path/mlruns",
                "default_exp_name": "Experiment",
            }
-        }
+        })
--- a/docs/start/integration.rst
+++ b/docs/start/integration.rst
@@ -5,7 +5,7 @@ Custom Model Integration
 Introduction
 ===================

-``Qlib``'s `Model Zoo` includes models such as ``LightGBM``, ``MLP``, ``LSTM``, etc.. These models are treated as the baselines of ``Interday Model``. In addition to the default models ``Qlib`` provide, users can integrate their own custom models into ``Qlib``.
+``Qlib``'s `Model Zoo` includes models such as ``LightGBM``, ``MLP``, ``LSTM``, etc.. These models are examples of ``Interday Model``. In addition to the default models ``Qlib`` provide, users can integrate their own custom models into ``Qlib``.

 Users can integrate their own custom models according to the following steps.

@@ -87,6 +87,7 @@ The Custom models need to inherit `qlib.model.base.Model <../reference/api.html#
    .. code-block:: Python

        def finetune(self, dataset: DatasetH, num_boost_round=10, verbose_eval=20):
+            # Based on existing model and finetune by train more rounds
            dtrain, _ = self._prepare_data(dataset)
            self.model = lgb.train(
                self.params,
@@ -101,7 +102,7 @@ The Custom models need to inherit `qlib.model.base.Model <../reference/api.html#
 Configuration File
 =======================

-The configuration file is described in detail in the `Workflow <../component/workflow.html#complete-example>`_ document. In order to integrate the custom model into ``Qlib``, users need to modify the "model" field in the configuration file.
+The configuration file is described in detail in the `Workflow <../component/workflow.html#complete-example>`_ document. In order to integrate the custom model into ``Qlib``, users need to modify the "model" field in the configuration file. The configuration describes which models to use and how we can initialize it.

 - Example: The following example describes the `model` field of configuration file about the custom lightgbm model mentioned above, where `module_path` is the module path, `class` is the class name, and `args` is the hyperparameter passed into the __init__ method. All parameters in the field is passed to `self._params` by `\*\*kwargs` in `__init__` except `loss = mse`. 

--- a/examples/workflow_by_code.ipynb
+++ b/examples/workflow_by_code.ipynb
@@ -1,11 +1,11 @@
 {
 "cells": [
  {
+   "cell_type": "markdown",
+   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/microsoft/qlib/blob/main/examples/workflow_by_code.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
-   ],
-   "cell_type": "markdown",
-   "metadata": {}
+   ]
  },
  {
   "cell_type": "code",
@@ -28,16 +28,17 @@
    "import sys, site\n",
    "from pathlib import Path\n",
    "\n",
-    "TEMP_CODE_DIR = str(Path(\"~/tmp/qlib_code\").expanduser().resolve())\n",
    "\n",
    "try:\n",
    "    import qlib\n",
-    "    scripts_dir = Path.cwd().parent.joinpath(\"scripts\")\n",
    "except ImportError:\n",
    "    # install qlib\n",
    "    ! pip install pyqlib\n",
    "    # reload\n",
    "    site.main()\n",
+    "\n",
+    "scripts_dir = Path.cwd().parent.joinpath(\"scripts\")\n",
+    "if not scripts_dir.joinpath(\"get_data.py\").exists():\n",
    "    # download get_data.py script\n",
    "    scripts_dir = Path(\"~/tmp/qlib_code/scripts\").expanduser().resolve()\n",
    "    scripts_dir.mkdir(parents=True, exist_ok=True)\n",
@@ -376,4 +377,4 @@
 },
 "nbformat": 4,
 "nbformat_minor": 4
-}
+}
--- a/examples/workflow_by_code_finetune.py
+++ b/examples/workflow_by_code_finetune.py
@@ -1,128 +0,0 @@
-#  Copyright (c) Microsoft Corporation.
-#  Licensed under the MIT License.
-
-import sys
-from pathlib import Path
-
-import qlib
-import pandas as pd
-from qlib.config import REG_CN
-from qlib.contrib.model.gbdt import LGBModel
-from qlib.contrib.data.handler import Alpha158
-from qlib.contrib.strategy.strategy import TopkDropoutStrategy
-from qlib.contrib.evaluate import (
-    backtest as normal_backtest,
-    risk_analysis,
-)
-from qlib.utils import exists_qlib_data, init_instance_by_config
-from qlib.workflow import R
-from qlib.workflow.record_temp import SignalRecord, PortAnaRecord
-
-
-if __name__ == "__main__":
-
-    # use default data
-    provider_uri = "~/.qlib/qlib_data/cn_data"  # target_dir
-    if not exists_qlib_data(provider_uri):
-        print(f"Qlib data is not found in {provider_uri}")
-        sys.path.append(str(Path(__file__).resolve().parent.parent.joinpath("scripts")))
-        from get_data import GetData
-
-        GetData().qlib_data(target_dir=provider_uri, region=REG_CN)
-
-    qlib.init(provider_uri=provider_uri, region=REG_CN)
-
-    market = "csi300"
-    benchmark = "SH000300"
-
-    ###################################
-    # train model
-    ###################################
-    data_handler_config = {
-        "start_time": "2008-01-01",
-        "end_time": "2020-08-01",
-        "fit_start_time": "2008-01-01",
-        "fit_end_time": "2014-12-31",
-        "instruments": market,
-    }
-
-    task = {
-        "model": {
-            "class": "LGBModel",
-            "module_path": "qlib.contrib.model.gbdt",
-            "kwargs": {
-                "loss": "mse",
-                "colsample_bytree": 0.8879,
-                "learning_rate": 0.0421,
-                "subsample": 0.8789,
-                "lambda_l1": 205.6999,
-                "lambda_l2": 580.9768,
-                "max_depth": 8,
-                "num_leaves": 210,
-                "num_threads": 20,
-            },
-        },
-        "dataset": {
-            "class": "DatasetH",
-            "module_path": "qlib.data.dataset",
-            "kwargs": {
-                "handler": {
-                    "class": "Alpha158",
-                    "module_path": "qlib.contrib.data.handler",
-                    "kwargs": data_handler_config,
-                },
-                "segments": {
-                    "train": ("2008-01-01", "2014-12-31"),
-                    "valid": ("2015-01-01", "2016-12-31"),
-                    "test": ("2017-01-01", "2020-08-01"),
-                },
-            },
-        },
-    }
-
-    port_analysis_config = {
-        "strategy": {
-            "class": "TopkDropoutStrategy",
-            "module_path": "qlib.contrib.strategy.strategy",
-            "kwargs": {
-                "topk": 50,
-                "n_drop": 5,
-            },
-        },
-        "backtest": {
-            "verbose": False,
-            "limit_threshold": 0.095,
-            "account": 100000000,
-            "benchmark": benchmark,
-            "deal_price": "close",
-            "open_cost": 0.0005,
-            "close_cost": 0.0015,
-            "min_cost": 5,
-        },
-    }
-
-    # model initiaiton
-    model = init_instance_by_config(task["model"])
-    dataset = init_instance_by_config(task["dataset"])
-
-    # start exp to train init model
-    with R.start(experiment_name="init models"):
-        model.fit(dataset)
-        R.save_objects(init_model=model)
-        rid = R.get_recorder().id
-
-    # Finetune model based on previous trained model
-    with R.start(experiment_name="finetune model"):
-        recorder = R.get_recorder(rid, experiment_name="init models")
-        model = recorder.load_object("init_model")
-        model.finetune(dataset, num_boost_round=10)
-        R.save_objects(model=model)
-
-        # prediction
-        recorder = R.get_recorder()
-        sr = SignalRecord(model, dataset, recorder)
-        sr.generate()
-
-        # backtest
-        par = PortAnaRecord(recorder, port_analysis_config)
-        par.generate()
--- a/qlib/init.py
+++ b/qlib/init.py
@@ -2,7 +2,7 @@
 # Licensed under the MIT License.


-__version__ = "0.5.1.dev0"
+__version__ = "0.6.0.alpha"

 import os
 import re
--- a/qlib/contrib/model/catboost_model.py
+++ b/qlib/contrib/model/catboost_model.py
@@ -1,14 +1,5 @@
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.

 import numpy as np
 import pandas as pd
--- a/qlib/contrib/model/gbdt.py
+++ b/qlib/contrib/model/gbdt.py
@@ -80,6 +80,7 @@ class LGBModel(ModelFT):
        verbose_eval : int
            verbose level
        """
+        # Based on existing model and finetune by train more rounds
        dtrain, _ = self._prepare_data(dataset)
        self.model = lgb.train(
            self.params,
--- a/qlib/contrib/model/pytorch_sfm.py
+++ b/qlib/contrib/model/pytorch_sfm.py
@@ -1,15 +1,6 @@
 # Copyright (c) Microsoft Corporation.
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
+# Licensed under the MIT License.
+
 from __future__ import division
 from __future__ import print_function

--- a/qlib/contrib/model/xgboost.py
+++ b/qlib/contrib/model/xgboost.py
@@ -1,14 +1,5 @@
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.

 import numpy as np
 import pandas as pd
--- a/qlib/model/base.py
+++ b/qlib/model/base.py
@@ -56,6 +56,23 @@ class ModelFT(Model):
    def finetune(self, dataset: Dataset):
        """finetune model based given dataset

+        A typical use case of finetuning model with qlib.workflow.R
+
+        .. code-block:: python
+
+            # start exp to train init model
+            with R.start(experiment_name="init models"):
+                model.fit(dataset)
+                R.save_objects(init_model=model)
+                rid = R.get_recorder().id
+
+            # Finetune model based on previous trained model
+            with R.start(experiment_name="finetune model"):
+                recorder = R.get_recorder(rid, experiment_name="init models")
+                model = recorder.load_object("init_model")
+                model.finetune(dataset, num_boost_round=10)
+
+
        Parameters
        ----------
        dataset : Dataset
--- a/setup.py
+++ b/setup.py
@@ -12,7 +12,7 @@ from setuptools import find_packages, setup, Extension
 NAME = "pyqlib"
 DESCRIPTION = "A Quantitative-research Platform"
 REQUIRES_PYTHON = ">=3.5.0"
-VERSION = "0.5.1.dev0"
+VERSION = "0.6.0.alpha"

 # Detect Cython
 try: