Update Version

Remove Json
Because it is a standard library of Python.
2026-06-29 00:51:19 +08:00 · 2023-01-29 18:53:25 +08:00 · 2023-01-20 09:03:08 +08:00 · 2023-01-18 16:17:06 +08:00 · 2023-01-10 09:46:18 +08:00 · 2023-01-06 21:44:23 +08:00
184 changed files with 5395 additions and 1797 deletions
--- a/.github/labeler.yml
+++ b/.github/labeler.yml
@@ -0,0 +1,6 @@
+documentation:
+- 'docs/**/*'
+- '**/*.md'
+
+waiting for triage:
+- any: ['**/*', '!docs/**/*', '!**/*.md']
--- a/.github/workflows/labeler.yml
+++ b/.github/workflows/labeler.yml
@@ -0,0 +1,14 @@
+name: "Add label automatically"
+on:
+- pull_request_target
+
+jobs:
+  triage:
+    permissions:
+      contents: read
+      pull-requests: write
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/labeler@v4
+      with:
+        repo-token: "${{ secrets.GITHUB_TOKEN }}"
--- a/.github/workflows/test_qlib_from_source.yml
+++ b/.github/workflows/test_qlib_from_source.yml
@@ -60,7 +60,7 @@ jobs:
    - name: Make html with sphinx
      run: |
        cd docs 
-        sphinx-build -b html . build
+        sphinx-build -W --keep-going -b html . _build
        cd ..

    # Check Qlib with pylint
@@ -87,9 +87,10 @@ jobs:
      # E1102: not-callable
      # E1136: unsubscriptable-object
    # References for parameters: https://github.com/PyCQA/pylint/issues/4577#issuecomment-1000245962
+    # We use sys.setrecursionlimit(2000) to make the recursion depth larger to ensure that pylint works properly (the default recursion depth is 1000).
    - name: Check Qlib with pylint
      run: |
-        pylint --disable=C0104,C0114,C0115,C0116,C0301,C0302,C0411,C0413,C1802,R0401,R0801,R0902,R0903,R0911,R0912,R0913,R0914,R0915,R1720,W0105,W0123,W0201,W0511,W0613,W1113,W1514,E0401,E1121,C0103,C0209,R0402,R1705,R1710,R1725,R1735,W0102,W0212,W0221,W0223,W0231,W0237,W0612,W0621,W0622,W0703,W1309,E1102,E1136 --const-rgx='[a-z_][a-z0-9_]{2,30}$' qlib --init-hook "import astroid; astroid.context.InferenceContext.max_inferred = 500"
+        pylint --disable=C0104,C0114,C0115,C0116,C0301,C0302,C0411,C0413,C1802,R0401,R0801,R0902,R0903,R0911,R0912,R0913,R0914,R0915,R1720,W0105,W0123,W0201,W0511,W0613,W1113,W1514,E0401,E1121,C0103,C0209,R0402,R1705,R1710,R1725,R1735,W0102,W0212,W0221,W0223,W0231,W0237,W0612,W0621,W0622,W0703,W1309,E1102,E1136 --const-rgx='[a-z_][a-z0-9_]{2,30}$' qlib --init-hook "import astroid; astroid.context.InferenceContext.max_inferred = 500; import sys; sys.setrecursionlimit(2000)"

    # The following flake8 error codes were ignored:
      # E501 line too long
@@ -139,10 +140,7 @@ jobs:

    - name: Test workflow by config (install from source)
      run: |
-        # Version 0.52.0 of numba must be installed manually in CI, otherwise it will cause incompatibility with the latest version of numpy.
-        python -m pip install numba==0.52.0
-        # You must update numpy manually, because when installing python tools, it will try to uninstall numpy and cause CI to fail.
-        python -m pip install --upgrade numpy
+        python -m pip install numba
        python qlib/workflow/cli.py examples/benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml

    - name: Unit tests with Pytest
--- a/.gitignore
+++ b/.gitignore
@@ -24,6 +24,9 @@ qlib/VERSION.txt
 qlib/data/_libs/expanding.cpp
 qlib/data/_libs/rolling.cpp
 examples/estimator/estimator_example/
+examples/rl/data/
+examples/rl/checkpoints/
+examples/rl/outputs/

 *.egg-info/

--- a/CHANGES.rst
+++ b/CHANGES.rst
@@ -85,7 +85,7 @@ Version 0.4.0
 -------------
 - Add `data` package that holds all data-related codes
 - Reform the data provider structure
- Create a server for data centralized management `qlib-server<https://amc-msra.visualstudio.com/trading-algo/_git/qlib-server>`_
+- Create a server for data centralized management `qlib-server <https://amc-msra.visualstudio.com/trading-algo/_git/qlib-server>`_
 - Add a `ClientProvider` to work with server
 - Add a pluggable cache mechanism
 - Add a recursive backtracking algorithm to inspect the furthest reference date for an expression
@@ -166,12 +166,12 @@ Version 0.8.0
    - Nested decision execution framework is supported
    - There are lots of changes for daily trading, it is hard to list all of them. But a few important changes could be noticed
        - The trading limitation is more accurate;
-            - In `previous version <https://github.com/microsoft/qlib/blob/v0.7.2/qlib/contrib/backtest/exchange.py#L160>`_, longing and shorting actions share the same action.
-            - In `current version <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/backtest/exchange.py#L304>`_, the trading limitation is different between logging and shorting action.
+            - In `previous version <https://github.com/microsoft/qlib/blob/v0.7.2/qlib/contrib/backtest/exchange.py#L160>`__, longing and shorting actions share the same action.
+            - In `current version <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/backtest/exchange.py#L304>`__, the trading limitation is different between logging and shorting action.
        - The constant is different when calculating annualized metrics.
-            - `Current version <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/contrib/evaluate.py#L42>`_ uses more accurate constant than `previous version <https://github.com/microsoft/qlib/blob/v0.7.2/qlib/contrib/evaluate.py#L22>`_
-        - `A new version <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/tests/data.py#L17>`_ of data is released. Due to the unstability of Yahoo data source, the data may be different after downloading data again.
-        - Users could check out the backtesting results between  `Current version <https://github.com/microsoft/qlib/tree/7c31012b507a3823117bddcc693fc64899460b2a/examples/benchmarks>`_ and `previous version <https://github.com/microsoft/qlib/tree/v0.7.2/examples/benchmarks>`_
+            - `Current version <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/contrib/evaluate.py#L42>`_ uses more accurate constant than `previous version <https://github.com/microsoft/qlib/blob/v0.7.2/qlib/contrib/evaluate.py#L22>`__
+        - `A new version <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/tests/data.py#L17>`__ of data is released. Due to the unstability of Yahoo data source, the data may be different after downloading data again.
+        - Users could check out the backtesting results between  `Current version <https://github.com/microsoft/qlib/tree/7c31012b507a3823117bddcc693fc64899460b2a/examples/benchmarks>`__ and `previous version <https://github.com/microsoft/qlib/tree/v0.7.2/examples/benchmarks>`__


 Other Versions
--- a/README.md
+++ b/README.md
@@ -11,6 +11,8 @@
 Recent released features
 | Feature | Status |
 | --                      | ------    |
+| Release Qlib v0.9.0 | :octocat: [Released](https://github.com/microsoft/qlib/releases/tag/v0.9.0) on Dec 9, 2022 |
+| RL Learning Framework | :hammer: :chart_with_upwards_trend: Released on Nov 10, 2022. [#1332](https://github.com/microsoft/qlib/pull/1332), [#1322](https://github.com/microsoft/qlib/pull/1322), [#1316](https://github.com/microsoft/qlib/pull/1316),[#1299](https://github.com/microsoft/qlib/pull/1299),[#1263](https://github.com/microsoft/qlib/pull/1263), [#1244](https://github.com/microsoft/qlib/pull/1244), [#1169](https://github.com/microsoft/qlib/pull/1169), [#1125](https://github.com/microsoft/qlib/pull/1125), [#1076](https://github.com/microsoft/qlib/pull/1076)|
 | HIST and IGMTF models | :chart_with_upwards_trend: [Released](https://github.com/microsoft/qlib/pull/1040) on Apr 10, 2022 |
 | Qlib [notebook tutorial](https://github.com/microsoft/qlib/tree/main/examples/tutorial) | 📖 [Released](https://github.com/microsoft/qlib/pull/1037) on Apr 7, 2022 | 
 | Ibovespa index data | :rice: [Released](https://github.com/microsoft/qlib/pull/990) on Apr 6, 2022 |
@@ -67,6 +69,7 @@ For more details, please refer to our paper ["Qlib: An AI-oriented Quantitative
            <li type="circle"><a href="#auto-quant-research-workflow">Auto Quant Research Workflow</a></li>
            <li type="circle"><a href="#building-customized-quant-research-workflow-by-code">Building Customized Quant Research Workflow by Code</a></li></ul>
        <li><a href="#quant-dataset-zoo"><strong>Quant Dataset Zoo</strong></a></li>
+        <li><a href="#learning-framework">Learning Framework</a></li>
        <li><a href="#more-about-qlib">More About Qlib</a></li>
        <li><a href="#offline-mode-and-online-mode">Offline Mode and Online Mode</a>
        <ul>
@@ -105,21 +108,16 @@ Your feedbacks about the features are very important.
 # Framework of Qlib

 <div style="align: center">
-<img src="docs/_static/img/framework.svg" />
+<img src="docs/_static/img/framework-abstract.jpg" />
 </div>

-At the module level, Qlib is a platform that consists of the above components. The components are designed as loose-coupled modules, and each component could be used stand-alone.
+The high-level framework of Qlib can be found above(users can find the [detailed framework](https://qlib.readthedocs.io/en/latest/introduction/introduction.html#framework) of Qlib's design when getting into nitty gritty).
+The components are designed as loose-coupled modules, and each component could be used stand-alone.

-| Name                   | Description                                                                                                                                                                                                                                                                                                                                                             |
-| ------                 | -----                                                                                                                                                                                                                                                                                                                                                                   |
-| `Infrastructure` layer | `Infrastructure` layer provides underlying support for Quant research. `DataServer` provides a high-performance infrastructure for users to manage and retrieve raw data. `Trainer` provides a flexible interface to control the training process of models, which enable algorithms to control the training process.                                                       |
-| `Workflow` layer       | `Workflow` layer covers the whole workflow of quantitative investment. `Information Extractor` extracts data for models. `Forecast Model` focuses on producing all kinds of forecast signals (e.g. _alpha_, risk) for other modules. With these signals `Decision Generator` will generate the target trading decisions(i.e. portfolio, orders)  to be executed by `Execution Env` (i.e. the trading market).  There may be multiple levels of `Trading Agent` and `Execution Env` (e.g. an _order executor trading agent and intraday order execution environment_ could behave like an interday trading environment and nested in  _daily portfolio management trading agent and interday trading environment_  ) |
-| `Interface` layer      | `Interface` layer tries to present a user-friendly interface for the underlying system. `Analyser` module will provide users detailed analysis reports of forecasting signals, portfolios and execution results                                                                                                                                                                 |
-
-* The modules with hand-drawn style are under development and will be released in the future.
-* The modules with dashed borders are highly user-customizable and extendible.
-
-(p.s. framework image is created with https://draw.io/)
+Qlib provides a strong infrastructure to support Quant research. [Data](https://qlib.readthedocs.io/en/latest/component/data.html) is always an important part.
+A strong learning framework is designed to support diverse learning paradigms (e.g. [reinforcement learning](https://qlib.readthedocs.io/en/latest/component/rl.html), [supervised learning](https://qlib.readthedocs.io/en/latest/component/workflow.html#model-section)) and patterns at different levels(e.g. [market dynamic modeling](https://qlib.readthedocs.io/en/latest/component/meta.html)).
+By modeling the market, [trading strategies](https://qlib.readthedocs.io/en/latest/component/strategy.html) will generate trade decisions that will be executed. Multiple trading strategies and executors in different levels or granularities can be [nested to be optimized and run together](https://qlib.readthedocs.io/en/latest/component/highfreq.html).
+At last, a comprehensive [analysis](https://qlib.readthedocs.io/en/latest/component/report.html) will be provided and the model can be [served online](https://qlib.readthedocs.io/en/latest/component/online.html) in a low cost.


 # Quick Start
@@ -170,7 +168,7 @@ Also, users can install the latest dev version ``Qlib`` by the source code accor
    git clone https://github.com/microsoft/qlib.git && cd qlib
    pip install .
    ```
-  **Note**:  You can install Qlib with `python setup.py install` as well. But it is not the recommanded approach. It will skip `pip` and cause obscure problems. For example, **only** the command ``pip install .`` **can** overwrite the stable version installed by ``pip install pyqlib``, while the command ``python setup.py install`` **can't**.
+  **Note**:  You can install Qlib with `python setup.py install` as well. But it is not the recommended approach. It will skip `pip` and cause obscure problems. For example, **only** the command ``pip install .`` **can** overwrite the stable version installed by ``pip install pyqlib``, while the command ``python setup.py install`` **can't**.

 **Tips**: If you fail to install `Qlib` or run the examples in your environment,  comparing your steps and the [CI workflow](.github/workflows/test_qlib_from_source.yml) may help you find the problem.

@@ -404,6 +402,17 @@ Dataset plays a very important role in Quant. Here is a list of the datasets bui
 [Here](https://qlib.readthedocs.io/en/latest/advanced/alpha.html) is a tutorial to build dataset with `Qlib`.
 Your PR to build new Quant dataset is highly welcomed.

+
+# Learning Framework
+Qlib is high customizable and a lot of its components are learnable.
+The learnable components are instances of `Forecast Model` and `Trading Agent`. They are learned based on the `Learning Framework` layer and then applied to multiple scenarios in `Workflow` layer.
+The learning framework leverages the `Workflow` layer as well(e.g. sharing `Information Extractor`, creating environments based on `Execution Env`).
+
+Based on learning paradigms, they can be categorized into reinforcement learning and supervised learning.
+- For supervised learning, the detailed docs can be found [here](https://qlib.readthedocs.io/en/latest/component/model.html).
+- For reinforcement learning, the detailed docs can be found [here](https://qlib.readthedocs.io/en/latest/component/rl.html). Qlib's RL learning framework leverages `Execution Env` in `Workflow` layer to create environments.  It's worth noting that `NestedExecutor` is supported as well. This empowers users to optimize different level of strategies/models/agents together (e.g. optimizing an order execution strategy for a specific portfolio management strategy).
+
+
 # More About Qlib
 If you want to have a quick glance at the most frequently used components of qlib, you can try notebooks [here](examples/tutorial/).

--- a/docs/Makefile
+++ b/docs/Makefile
@@ -17,4 +17,5 @@ help:
 # Catch-all target: route all unknown targets to Sphinx using the new
 # "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
 %: Makefile
+	pip install -r requirements.txt
 	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/docs/_static/img/QlibRL_framework.png
+++ b/docs/_static/img/QlibRL_framework.png
--- a/docs/_static/img/RL_framework.png
+++ b/docs/_static/img/RL_framework.png
--- a/docs/_static/img/framework-abstract.jpg
+++ b/docs/_static/img/framework-abstract.jpg
--- a/docs/_static/img/framework.svg
+++ b/docs/_static/img/framework.svg
--- a/docs/advanced/alpha.rst
+++ b/docs/advanced/alpha.rst
@@ -38,7 +38,7 @@ Example

        DIF = \frac{EMA(CLOSE, 12) - EMA(CLOSE, 26)}{CLOSE}

-    `DEA`means a 9-period EMA of the DIF.
+    `DEA` means a 9-period EMA of the DIF.

    .. math::

--- a/docs/advanced/task_management.rst
+++ b/docs/advanced/task_management.rst
@@ -18,7 +18,7 @@ With this module, users can run their ``task`` automatically at different period

 This whole process can be used in `Online Serving <../component/online.html>`_.

-An example of the entire process is shown `here <https://github.com/microsoft/qlib/tree/main/examples/model_rolling/task_manager_rolling.py>`_.
+An example of the entire process is shown `here <https://github.com/microsoft/qlib/tree/main/examples/model_rolling/task_manager_rolling.py>`__.

 Task Generating
 ===============
@@ -31,9 +31,10 @@ Here is the base class of ``TaskGen``:

 .. autoclass:: qlib.workflow.task.gen.TaskGen
    :members:
+    :noindex:

 ``Qlib`` provides a class `RollingGen <https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/gen.py>`_ to generate a list of ``task`` of the dataset in different date segments.
-This class allows users to verify the effect of data from different periods on the model in one experiment. More information is `here <../reference/api.html#TaskGen>`_.
+This class allows users to verify the effect of data from different periods on the model in one experiment. More information is `here <../reference/api.html#TaskGen>`__.

 Task Storing
 ============
@@ -53,8 +54,9 @@ Users need to provide the MongoDB URL and database name for using ``TaskManager`

 .. autoclass:: qlib.workflow.task.manage.TaskManager
    :members:
+    :noindex:

-More information of ``Task Manager`` can be found in `here <../reference/api.html#TaskManager>`_.
+More information of ``Task Manager`` can be found in `here <../reference/api.html#TaskManager>`__.

 Task Training
 =============
@@ -64,11 +66,13 @@ An easy way to get the ``task_func`` is using ``qlib.model.trainer.task_train``
 It will run the whole workflow defined by ``task``, which includes *Model*, *Dataset*, *Record*.

 .. autofunction:: qlib.workflow.task.manage.run_task
+    :noindex:

 Meanwhile, ``Qlib`` provides a module called ``Trainer``. 

 .. autoclass:: qlib.model.trainer.Trainer
    :members:
+    :noindex:

 ``Trainer`` will train a list of tasks and return a list of model recorders.
 ``Qlib`` offer two kinds of Trainer, TrainerR is the simplest way and TrainerRM is based on TaskManager to help manager tasks lifecycle automatically. 
--- a/docs/component/data.rst
+++ b/docs/component/data.rst
@@ -24,8 +24,8 @@ The introduction of ``Data Layer`` includes the following parts.
 Here is a typical example of Qlib data workflow

 - Users download data and converting data into Qlib format(with filename suffix `.bin`).  In this step, typically only some basic data are stored on disk(such as OHLCV).
- Creating some basic features based on Qlib's expression Engine(e.g. "Ref($close, 60) / $close", the return of last 60 trading days). Supported operators in the expression engine can be found `here <https://github.com/microsoft/qlib/blob/main/qlib/data/ops.py>`_. This step is typically implemented in Qlib's `Data Loader <https://qlib.readthedocs.io/en/latest/component/data.html#data-loader>`_ which is a component of `Data Handler <https://qlib.readthedocs.io/en/latest/component/data.html#data-handler>`_ .
- If users require more complicated data processing (e.g. data normalization),  `Data Handler <https://qlib.readthedocs.io/en/latest/component/data.html#data-handler>`_ support user-customized processors to process data(some predefined processors can be found `here <https://github.com/microsoft/qlib/blob/main/qlib/data/dataset/processor.py>`_).  The processors are different from operators in expression engine. It is designed for some complicated data processing methods which is hard to supported in operators in expression engine.
+- Creating some basic features based on Qlib's expression Engine(e.g. "Ref($close, 60) / $close", the return of last 60 trading days). Supported operators in the expression engine can be found `here <https://github.com/microsoft/qlib/blob/main/qlib/data/ops.py>`__. This step is typically implemented in Qlib's `Data Loader <https://qlib.readthedocs.io/en/latest/component/data.html#data-loader>`_ which is a component of `Data Handler <https://qlib.readthedocs.io/en/latest/component/data.html#data-handler>`_ .
+- If users require more complicated data processing (e.g. data normalization),  `Data Handler <https://qlib.readthedocs.io/en/latest/component/data.html#data-handler>`_ support user-customized processors to process data(some predefined processors can be found `here <https://github.com/microsoft/qlib/blob/main/qlib/data/dataset/processor.py>`__).  The processors are different from operators in expression engine. It is designed for some complicated data processing methods which is hard to supported in operators in expression engine.
 - At last, `Dataset <https://qlib.readthedocs.io/en/latest/component/data.html#dataset>`_ is responsible to prepare model-specific dataset from the processed data of Data Handler

 Data Preparation
@@ -37,7 +37,7 @@ Qlib Format Data
 We've specially designed a data structure to manage financial data, please refer to the `File storage design section in Qlib paper <https://arxiv.org/abs/2009.11189>`_ for detailed information.
 Such data will be stored with filename suffix `.bin` (We'll call them `.bin` file, `.bin` format, or qlib format). `.bin` file is designed for scientific computing on finance data.

-``Qlib`` provides two different off-the-shelf datasets, which can be accessed through this `link <https://github.com/microsoft/qlib/blob/main/qlib/contrib/data/handler.py>`_:
+``Qlib`` provides two different off-the-shelf datasets, which can be accessed through this `link <https://github.com/microsoft/qlib/blob/main/qlib/contrib/data/handler.py>`__:

 ========================  =================  ================
 Dataset                   US Market          China Market
@@ -47,7 +47,7 @@ Alpha360                  √                  √
 Alpha158                  √                  √
 ========================  =================  ================

-Also, ``Qlib`` provides a high-frequency dataset. Users can run a high-frequency dataset example through this `link <https://github.com/microsoft/qlib/tree/main/examples/highfreq>`_.
+Also, ``Qlib`` provides a high-frequency dataset. Users can run a high-frequency dataset example through this `link <https://github.com/microsoft/qlib/tree/main/examples/highfreq>`__.

 Qlib Format Dataset
 -------------------
@@ -332,6 +332,7 @@ Here are some interfaces of the ``QlibDataLoader`` class:

 .. autoclass:: qlib.data.dataset.loader.DataLoader
    :members:
+    :noindex:

 API
 ---
@@ -361,6 +362,7 @@ Here are some important interfaces that ``DataHandlerLP`` provides:

 .. autoclass:: qlib.data.dataset.handler.DataHandlerLP
    :members: __init__, fetch, get_cols
+    :noindex:


 If users want to load features and labels by config, users can define a new handler and call the static method `parse_config_to_fields` of ``qlib.contrib.data.handler.Alpha158``.
@@ -451,6 +453,7 @@ The ``DatasetH`` class is the `dataset` with `Data Handler`. Here is the most im

 .. autoclass:: qlib.data.dataset.__init__.DatasetH
    :members:
+    :noindex:

 API
 ---
@@ -470,9 +473,11 @@ Global Memory Cache

 .. autoclass:: qlib.data.cache.MemCacheUnit
    :members:
+    :noindex:

 .. autoclass:: qlib.data.cache.MemCache
    :members:
+    :noindex:


 ExpressionCache
@@ -487,6 +492,7 @@ The following shows the details about the interfaces:

 .. autoclass:: qlib.data.cache.ExpressionCache
    :members:
+    :noindex:

 ``Qlib`` has currently provided implemented disk cache `DiskExpressionCache` which inherits from `ExpressionCache` . The expressions data will be stored in the disk.

@@ -502,6 +508,7 @@ The following shows the details about the interfaces:

 .. autoclass:: qlib.data.cache.DatasetCache
    :members:
+    :noindex:

 ``Qlib`` has currently provided implemented disk cache `DiskDatasetCache` which inherits from `DatasetCache` . The datasets' data will be stored in the disk.

@@ -512,7 +519,7 @@ Data and Cache File Structure

 We've specially designed a file structure to manage data and cache, please refer to the `File storage design section in Qlib paper <https://arxiv.org/abs/2009.11189>`_ for detailed information. The file structure of data and cache is listed as follows.

-.. code-block:: json
+.. code-block::

    - data/
        [raw data] updated by data providers
--- a/docs/component/highfreq.rst
+++ b/docs/component/highfreq.rst
@@ -8,31 +8,33 @@ Design of Nested Decision Execution Framework for High-Frequency Trading
 Introduction
 ============

-Daily trading (e.g. portfolio management) and intraday trading (e.g. orders execution) are two hot topics in Quant investment and usually studied separately.
+Daily trading (e.g. portfolio management) and intraday trading (e.g. orders execution) are two hot topics in Quant investment and are usually studied separately.

 To get the join trading performance of daily and intraday trading, they must interact with each other and run backtest jointly.
-In order to support the joint backtest strategies in multiple levels, a corresponding framework is required. None of the publicly available high-frequency trading frameworks considers multi-level joint trading, which make the backtesting aforementioned inaccurate.
+In order to support the joint backtest strategies at multiple levels, a corresponding framework is required. None of the publicly available high-frequency trading frameworks considers multi-level joint trading, which makes the backtesting aforementioned inaccurate.

 Besides backtesting, the optimization of strategies from different levels is not standalone and can be affected by each other.
-For example, the best portfolio management strategy may change with the performance of order executions(e.g. a portfolio with higher turnover may becomes a better choice when we improve the order execution strategies).
-To achieve the overall good performance , it is necessary to consider the interaction of strategies in different level.
+For example, the best portfolio management strategy may change with the performance of order executions(e.g. a portfolio with higher turnover may become a better choice when we improve the order execution strategies).
+To achieve overall good performance, it is necessary to consider the interaction of strategies at a different levels.

-Therefore, building a new framework for trading in multiple levels becomes necessary to solve the various problems mentioned above, for which we designed a nested decision execution framework that consider the interaction of strategies.
+Therefore, building a new framework for trading on multiple levels becomes necessary to solve the various problems mentioned above, for which we designed a nested decision execution framework that considers the interaction of strategies.

 .. image:: ../_static/img/framework.svg

 The design of the framework is shown in the yellow part in the middle of the figure above. Each level consists of ``Trading Agent`` and ``Execution Env``. ``Trading Agent`` has its own data processing module (``Information Extractor``), forecasting module (``Forecast Model``) and decision generator (``Decision Generator``). The trading algorithm generates the decisions by the ``Decision Generator`` based on the forecast signals output by the ``Forecast Module``, and the decisions generated by the trading algorithm are passed to the ``Execution Env``, which returns the execution results.

-The frequency of trading algorithm, decision content and execution environment can be customized by users (e.g. intraday trading, daily-frequency trading, weekly-frequency trading), and the execution environment can be nested with finer-grained trading algorithm and execution environment inside (i.e. sub-workflow in the figure, e.g. daily-frequency orders can be turned into finer-grained decisions by splitting orders within the day). The flexibility of nested decision execution framework makes it easy for users to explore the effects of combining different levels of trading strategies and break down the optimization barriers between different levels of trading algorithm.
+The frequency of the trading algorithm, decision content and execution environment can be customized by users (e.g. intraday trading, daily-frequency trading, weekly-frequency trading), and the execution environment can be nested with finer-grained trading algorithm and execution environment inside (i.e. sub-workflow in the figure, e.g. daily-frequency orders can be turned into finer-grained decisions by splitting orders within the day). The flexibility of the nested decision execution framework makes it easy for users to explore the effects of combining different levels of trading strategies and break down the optimization barriers between different levels of the trading algorithm.
+
+The optimization for the nested decision execution framework can be implemented with the support of `QlibRL <https://qlib.readthedocs.io/en/latest/component/rl.html>`_. To know more about how to use the QlibRL, go to API Reference: `RL API <../reference/api.html#rl>`_. 

 Example
 =======

-An example of nested decision execution framework for high-frequency can be found `here <https://github.com/microsoft/qlib/blob/main/examples/nested_decision_execution/workflow.py>`_.
+An example of a nested decision execution framework for high-frequency can be found `here <https://github.com/microsoft/qlib/blob/main/examples/nested_decision_execution/workflow.py>`_.


-Besides, the above examples, here are some other related work about high-frequency trading in Qlib.
+Besides, the above examples, here are some other related works about high-frequency trading in Qlib.

 - `Prediction with high-frequency data <https://github.com/microsoft/qlib/tree/main/examples/highfreq#benchmarks-performance-predicting-the-price-trend-in-high-frequency-data>`_
- `Examples <https://github.com/microsoft/qlib/blob/main/examples/orderbook_data/>`_ to extract features form high-frequency data without fixed frequency.
+- `Examples <https://github.com/microsoft/qlib/blob/main/examples/orderbook_data/>`_ to extract features from high-frequency data without fixed frequency.
 - `A paper <https://github.com/microsoft/qlib/tree/high-freq-execution#high-frequency-execution>`_ for high-frequency trading.
--- a/docs/component/model.rst
+++ b/docs/component/model.rst
@@ -20,6 +20,7 @@ The base class provides the following interfaces:

 .. autoclass:: qlib.model.base.Model
    :members:
+    :noindex:

 ``Qlib`` also provides a base class `qlib.model.base.ModelFT <../reference/api.html#qlib.model.base.ModelFT>`_, which includes the method for finetuning the model.

--- a/docs/component/online.rst
+++ b/docs/component/online.rst
@@ -1,4 +1,4 @@
-.. _online:
+.. _online_serving:

 ==============
 Online Serving
@@ -32,21 +32,25 @@ Online Manager

 .. automodule:: qlib.workflow.online.manager
    :members:
+    :noindex:

 Online Strategy
 ===============

 .. automodule:: qlib.workflow.online.strategy
    :members:
+    :noindex:

 Online Tool
 ===========

 .. automodule:: qlib.workflow.online.utils
    :members:
+    :noindex:

 Updater
 =======

 .. automodule:: qlib.workflow.online.update
    :members:
+    :noindex:
--- a/docs/component/recorder.rst
+++ b/docs/component/recorder.rst
@@ -61,6 +61,7 @@ The ``ExpManager`` module in ``Qlib`` is responsible for managing different expe

 .. autoclass:: qlib.workflow.expm.ExpManager
    :members: get_exp, list_experiments
+    :noindex:

 For other interfaces such as `create_exp`, `delete_exp`, please refer to `Experiment Manager API <../reference/api.html#experiment-manager>`_.

@@ -71,6 +72,7 @@ The ``Experiment`` class is solely responsible for a single experiment, and it w

 .. autoclass:: qlib.workflow.exp.Experiment
    :members: get_recorder, list_recorders
+    :noindex:

 For other interfaces such as `search_records`, `delete_recorder`, please refer to `Experiment API <../reference/api.html#experiment>`_.

@@ -85,6 +87,7 @@ Here are some important APIs that are not included in the ``QlibRecorder``:

 .. autoclass:: qlib.workflow.recorder.Recorder
    :members: list_artifacts, list_metrics, list_params, list_tags
+    :noindex:

 For other interfaces such as `save_objects`, `load_object`, please refer to `Recorder API <../reference/api.html#recorder>`_.

@@ -107,7 +110,7 @@ Here is a simple example of what is done in ``SigAnaRecord``, which users can re

 - ``PortAnaRecord``: This class generates the results of `backtest`. The detailed information about `backtest` as well as the available `strategy`, users can refer to `Strategy <../component/strategy.html>`_ and `Backtest <../component/backtest.html>`_.

-Here is a simple exampke of what is done in ``PortAnaRecord``, which users can refer to if they want to do backtest based on their own prediction and label.
+Here is a simple example of what is done in ``PortAnaRecord``, which users can refer to if they want to do backtest based on their own prediction and label.

 .. code-block:: Python

--- a/docs/component/report.rst
+++ b/docs/component/report.rst
@@ -51,6 +51,7 @@ API

 .. automodule:: qlib.contrib.report.analysis_position.report
    :members:
+    :noindex:

 Graphical Result
 ~~~~~~~~~~~~~~~~
@@ -93,6 +94,7 @@ API

 .. automodule:: qlib.contrib.report.analysis_position.score_ic
    :members:
+    :noindex:


 Graphical Result
@@ -151,6 +153,7 @@ API

 .. automodule:: qlib.contrib.report.analysis_position.risk_analysis
    :members:
+    :noindex:


 Graphical Result
@@ -174,6 +177,7 @@ Graphical Result
                The `Information Ratio` without cost.
            - `excess_return_with_cost`
                The `Information Ratio` with cost.
+
            To know more about `Information Ratio`, please refer to `Information Ratio – IR <https://www.investopedia.com/terms/i/informationratio.asp>`_.
        -  `max_drawdown`
            - `excess_return_without_cost`
@@ -269,6 +273,7 @@ API

 .. automodule:: qlib.contrib.report.analysis_model.analysis_model_performance
    :members:
+    :noindex:


 Graphical Results
--- a/docs/component/rl/framework.rst
+++ b/docs/component/rl/framework.rst
@@ -0,0 +1,49 @@
+The Framework of QlibRL
+=======================
+
+QlibRL contains a full set of components that cover the entire lifecycle of an RL pipeline, including building the simulator of the market, shaping states & actions, training policies (strategies), and backtesting strategies in the simulated environment.
+
+QlibRL is basically implemented with the support of Tianshou and Gym frameworks. The high-level structure of QlibRL is demonstrated below:
+
+.. image:: ../../_static/img/QlibRL_framework.png
+   :width: 600
+   :align: center
+
+Here, we briefly introduce each component in the figure.
+
+EnvWrapper
+------------
+EnvWrapper is the complete capsulation of the simulated environment. It receives actions from outside (policy/strategy/agent), simulates the changes in the market, and then replies rewards and updated states, thus forming an interaction loop.
+
+In QlibRL, EnvWrapper is a subclass of gym.Env, so it implements all necessary interfaces of gym.Env. Any classes or pipelines that accept gym.Env should also accept EnvWrapper. Developers do not need to implement their own EnvWrapper to build their own environment. Instead, they only need to implement 4 components of the EnvWrapper:
+
+- `Simulator`
+    The simulator is the core component responsible for the environment simulation. Developers could implement all the logic that is directly related to the environment simulation in the Simulator in any way they like. In QlibRL, there are already two implementations of Simulator for single asset trading: 1) ``SingleAssetOrderExecution``, which is built based on Qlib's backtest toolkits and hence considers a lot of practical trading details but is slow. 2) ``SimpleSingleAssetOrderExecution``, which is built based on a simplified trading simulator, which ignores a lot of details (e.g. trading limitations, rounding) but is quite fast.
+- `State interpreter` 
+    The state interpreter is responsible for "interpret" states in the original format (format provided by the simulator) into states in a format that the policy could understand. For example, transform unstructured raw features into numerical tensors.
+- `Action interpreter` 
+    The action interpreter is similar to the state interpreter. But instead of states, it interprets actions generated by the policy, from the format provided by the policy to the format that is acceptable to the simulator.
+- `Reward function` 
+    The reward function returns a numerical reward to the policy after each time the policy takes an action. 
+
+EnvWrapper will organically organize these components. Such decomposition allows for better flexibility in development. For example, if the developers want to train multiple types of policies in the same environment, they only need to design one simulator and design different state interpreters/action interpreters/reward functions for different types of policies.
+
+QlibRL has well-defined base classes for all these 4 components. All the developers need to do is define their own components by inheriting the base classes and then implementing all interfaces required by the base classes. The API for the above base components can be found `here <../../reference/api.html#module-qlib.rl>`__.
+
+Policy
+------------
+QlibRL directly uses Tianshou's policy. Developers could use policies provided by Tianshou off the shelf, or implement their own policies by inheriting Tianshou's policies.
+
+Training Vessel & Trainer
+-------------------------
+As stated by their names, training vessels and trainers are helper classes used in training. A training vessel is a ship that contains a simulator/interpreters/reward function/policy, and it controls algorithm-related parts of training. Correspondingly, the trainer is responsible for controlling the runtime parts of training.
+
+As you may have noticed, a training vessel itself holds all the required components to build an EnvWrapper rather than holding an instance of EnvWrapper directly. This allows the training vessel to create duplicates of EnvWrapper dynamically when necessary (for example, under parallel training).
+
+With a training vessel, the trainer could finally launch the training pipeline by simple, Scikit-learn-like interfaces (i.e., ``trainer.fit()``).
+
+The API for Trainer and TrainingVessel and can be found `here <../../reference/api.html#module-qlib.rl.trainer>`__.
+
+The RL module is designed in a loosely-coupled way. Currently, RL examples are integrated with concrete business logic.
+But the core part of RL is much simpler than what you see.
+To demonstrate the simple core of RL, `a dedicated notebook <https://github.com/microsoft/qlib/tree/main/examples/rl/simple_example.ipynb>`__ for RL without business loss is created.
--- a/docs/component/rl/overall.rst
+++ b/docs/component/rl/overall.rst
@@ -0,0 +1,50 @@
+=====================================================
+Reinforcement Learning in Quantitative Trading
+=====================================================
+
+Reinforcement Learning
+======================
+Different from supervised learning tasks such as classification tasks and regression tasks. Another important paradigm in machine learning is Reinforcement Learning, 
+which attempts to optimize an accumulative numerical reward signal by directly interacting with the environment under a few assumptions such as Markov Decision Process(MDP).
+
+As demonstrated in the following figure, an RL system consists of four elements, 1)the agent 2) the environment the agent interacts with 3) the policy that the agent follows to take actions on the environment and 4)the reward signal from the environment to the agent. 
+In general, the agent can perceive and interpret its environment, take actions and learn through reward, to seek long-term and maximum overall reward to achieve an optimal solution.
+
+.. image:: ../../_static/img/RL_framework.png
+   :width: 300
+   :align: center 
+
+RL attempts to learn to produce actions by trial and error. 
+By sampling actions and then observing which one leads to our desired outcome, a policy is obtained to generate optimal actions. 
+In contrast to supervised learning, RL learns this not from a label but from a time-delayed label called a reward. 
+This scalar value lets us know whether the current outcome is good or bad. 
+In a word, the target of RL is to take actions to maximize reward.
+
+The Qlib Reinforcement Learning toolkit (QlibRL) is an RL platform for quantitative investment, which provides support to implement the RL algorithms in Qlib.
+
+
+Potential Application Scenarios in Quantitative Trading
+=======================================================
+RL methods have already achieved outstanding achievement in many applications, such as game playing, resource allocating, recommendation, marketing and advertising, etc.
+Investment is always a continuous process, taking the stock market as an example, investors need to control their positions and stock holdings by one or more buying and selling behaviors, to maximize the investment returns.
+Besides, each buy and sell decision is made by investors after fully considering the overall market information and stock information. 
+From the view of an investor, the process could be described as a continuous decision-making process generated according to interaction with the market, such problems could be solved by the RL algorithms. 
+Following are some scenarios where RL can potentially be used in quantitative investment.
+
+Portfolio Construction
+----------------------
+Portfolio construction is a process of selecting securities optimally by taking a minimum risk to achieve maximum returns. With an RL-based solution, an agent allocates stocks at every time step by obtaining information for each stock and the market. The key is to develop of policy for building a portfolio and make the policy able to pick the optimal portfolio. 
+
+Order Execution
+---------------
+As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument. Essentially, the goal of order execution is twofold: it not only requires to fulfill the whole order but also targets a more economical execution with maximizing profit gain (or minimizing capital loss). The order execution with only one order of liquidation or acquirement is called single-asset order execution.
+
+Considering stock investment always aim to pursue long-term maximized profits, it usually manifests as a sequential process of continuously adjusting the asset portfolios, execution for multiple orders, including order of liquidation and acquirement, brings more constraints and makes the sequence of execution for different orders should be considered, e.g. before executing an order to buy some stocks, we have to sell at least one stock. The order execution with multiple assets is called multi-asset order execution. 
+
+According to the order execution’s trait of sequential decision-making, an RL-based solution could be applied to solve the order execution. With an RL-based solution, an agent optimizes execution strategy by interacting with the market environment. 
+
+With QlibRL, the RL algorithm in the above scenarios can be easily implemented.
+
+Nested Portfolio Construction and Order Executor
+------------------------------------------------
+QlibRL makes it possible to jointly optimize different levels of strategies/models/agents. Take `Nested Decision Execution Framework <https://github.com/microsoft/qlib/blob/main/examples/nested_decision_execution>`_ as an example, the optimization of order execution strategy and portfolio management strategies can interact with each other to maximize returns.
--- a/docs/component/rl/quickstart.rst
+++ b/docs/component/rl/quickstart.rst
@@ -0,0 +1,175 @@
+
+Quick Start
+============
+.. currentmodule:: qlib
+
+QlibRL provides an example of an implementation of a single asset order execution task and the following is an example of the config file to train with QlibRL.
+
+.. code-block:: yaml
+
+    simulator:
+        # Each step contains 30mins
+        time_per_step: 30
+        # Upper bound of volume, should be null or a float between 0 and 1, if it is a float, represent upper bound is calculated by the percentage of the market volume
+        vol_limit: null
+    env:
+        # Concurrent environment workers.
+        concurrency: 1
+        # dummy or subproc or shmem. Corresponding to `parallelism in tianshou <https://tianshou.readthedocs.io/en/master/api/tianshou.env.html#vectorenv>`_.
+        parallel_mode: dummy
+    action_interpreter:
+        class: CategoricalActionInterpreter
+        kwargs:
+            # Candidate actions, it can be a list with length L: [a_1, a_2,..., a_L] or an integer n, in which case the list of length n+1 is auto-generated, i.e., [0, 1/n, 2/n,..., n/n].
+            values: 14
+            # Total number of steps (an upper-bound estimation)
+            max_step: 8
+        module_path: qlib.rl.order_execution.interpreter
+    state_interpreter:
+        class: FullHistoryStateInterpreter
+        kwargs:
+            # Number of dimensions in data.
+            data_dim: 6
+            # Equal to the total number of records. For example, in SAOE per minute, data_ticks is the length of the day in minutes.
+            data_ticks: 240
+            # The total number of steps (an upper-bound estimation). For example, 390min / 30min-per-step = 13 steps.
+            max_step: 8
+            # Provider of the processed data.
+            processed_data_provider:
+                class: PickleProcessedDataProvider
+                module_path: qlib.rl.data.pickle_styled
+                kwargs:
+                    data_dir: ./data/pickle_dataframe/feature
+        module_path: qlib.rl.order_execution.interpreter
+    reward:
+        class: PAPenaltyReward
+        kwargs:
+            # The penalty for a large volume in a short time.
+            penalty: 100.0
+        module_path: qlib.rl.order_execution.reward
+    data:
+        source:
+            order_dir: ./data/training_order_split
+            data_dir: ./data/pickle_dataframe/backtest
+            # number of time indexes
+            total_time: 240
+            # start time index
+            default_start_time: 0
+            # end time index
+            default_end_time: 240
+            proc_data_dim: 6
+        num_workers: 0
+        queue_size: 20
+    network:
+        class: Recurrent
+        module_path: qlib.rl.order_execution.network
+    policy:
+        class: PPO
+        kwargs:
+            lr: 0.0001
+        module_path: qlib.rl.order_execution.policy
+    runtime:
+        seed: 42
+        use_cuda: false
+    trainer:
+        max_epoch: 2
+        # Number of episodes collected in each training iteration
+        repeat_per_collect: 5
+        earlystop_patience: 2
+        # Episodes per collect at training.
+        episode_per_collect: 20
+        batch_size: 16
+        # Perform validation every n iterations
+        val_every_n_epoch: 1
+        checkpoint_path: ./checkpoints
+        checkpoint_every_n_iters: 1
+
+
+And the config file for backtesting:
+
+.. code-block:: yaml
+
+    order_file: ./data/backtest_orders.csv
+    start_time: "9:45"
+    end_time: "14:44"
+    qlib:
+        provider_uri_1min: ./data/bin
+        feature_root_dir: ./data/pickle
+        # feature generated by today's information
+        feature_columns_today: [
+            "$open", "$high", "$low", "$close", "$vwap", "$volume",
+        ]
+        # feature generated by yesterday's information
+        feature_columns_yesterday: [
+            "$open_v1", "$high_v1", "$low_v1", "$close_v1", "$vwap_v1", "$volume_v1",
+        ]
+    exchange:
+        # the expression for buying and selling stock limitation
+        limit_threshold: ['$close == 0', '$close == 0']
+        # deal price for buying and selling
+        deal_price: ["If($close == 0, $vwap, $close)", "If($close == 0, $vwap, $close)"]
+    volume_threshold:
+        # volume limits are both buying and selling, "cum" means that this is a cumulative value over time
+        all: ["cum", "0.2 * DayCumsum($volume, '9:45', '14:44')"]
+        # the volume limits of buying
+        buy: ["current", "$close"]
+        # the volume limits of selling, "current" means that this is a real-time value and will not accumulate over time
+        sell: ["current", "$close"]
+    strategies: 
+        30min: 
+            class: TWAPStrategy
+            module_path: qlib.contrib.strategy.rule_strategy
+            kwargs: {}
+        1day: 
+            class: SAOEIntStrategy
+            module_path: qlib.rl.order_execution.strategy
+            kwargs:
+            state_interpreter:
+                class: FullHistoryStateInterpreter
+                module_path: qlib.rl.order_execution.interpreter
+                kwargs:
+                max_step: 8
+                data_ticks: 240
+                data_dim: 6
+                processed_data_provider:
+                    class: PickleProcessedDataProvider
+                    module_path: qlib.rl.data.pickle_styled
+                    kwargs:
+                    data_dir: ./data/pickle_dataframe/feature
+            action_interpreter: 
+                class: CategoricalActionInterpreter
+                module_path: qlib.rl.order_execution.interpreter
+                kwargs: 
+                values: 14
+                max_step: 8
+            network: 
+                class: Recurrent
+                module_path: qlib.rl.order_execution.network
+                kwargs: {}
+            policy: 
+                class: PPO
+                module_path: qlib.rl.order_execution.policy
+                kwargs: 
+                    lr: 1.0e-4
+                    # Local path to the latest model. The model is generated during training, so please run training first if you want to run backtest with a trained policy. You could also remove this parameter file to run backtest with a randomly initialized policy.
+                    weight_file: ./checkpoints/latest.pth
+    # Concurrent environment workers.
+    concurrency: 5
+
+With the above config files, you can start training the agent by the following command:
+
+.. code-block:: console
+
+    $ python -m qlib.rl.contrib.train_onpolicy.py --config_path train_config.yml
+
+After the training, you can backtest with the following command:
+
+.. code-block:: console
+
+    $ python -m qlib.rl.contrib.backtest.py --config_path backtest_config.yml
+
+In that case, :class:`~qlib.rl.order_execution.simulator_qlib.SingleAssetOrderExecution` and :class:`~qlib.rl.order_execution.simulator_simple.SingleAssetOrderExecutionSimple` as examples for simulator, :class:`qlib.rl.order_execution.interpreter.FullHistoryStateInterpreter` and :class:`qlib.rl.order_execution.interpreter.CategoricalActionInterpreter` as examples for interpreter, :class:`qlib.rl.order_execution.policy.PPO` as an example for policy, and :class:`qlib.rl.order_execution.reward.PAPenaltyReward` as an example for reward.
+For the single asset order execution task, if developers have already defined their simulator/interpreters/reward function/policy, they could launch the training and backtest pipeline by simply modifying the corresponding settings in the config files.
+The details about the example can be found `here <https://github.com/microsoft/qlib/blob/main/examples/rl/README.md>`_. 
+
+In the future, we will provide more examples for different scenarios such as RL-based portfolio construction.
--- a/docs/component/rl/toctree.rst
+++ b/docs/component/rl/toctree.rst
@@ -0,0 +1,10 @@
+.. _rl:
+
+========================================================================
+Reinforcement Learning in Quantitative Trading
+========================================================================
+
+.. toctree::
+    Overall <overall>
+    Quick Start <quickstart>
+    Framework <framework>
--- a/docs/component/strategy.rst
+++ b/docs/component/strategy.rst
@@ -80,6 +80,7 @@ TopkDropoutStrategy
        In most cases, ``TopkDrop`` algorithm sells and buys `Drop` stocks every trading day, which yields a turnover rate of 2$\times$`Drop`/$K$.

        The following images illustrate a typical scenario.
+
        .. image:: ../_static/img/topk_drop.png
            :alt: Topk-Drop

--- a/docs/conf.py
+++ b/docs/conf.py
@@ -77,7 +77,7 @@ language = "en_US"
 # List of patterns, relative to source directory, that match files and
 # directories to ignore when looking for source files.
 # This patterns also effect to html_static_path and html_extra_path
-exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store", "hidden"]

 # The name of the Pygments (syntax highlighting) style to use.
 pygments_style = "sphinx"
--- a/docs/developer/code_standard_and_dev_guide.rst
+++ b/docs/developer/code_standard_and_dev_guide.rst
@@ -15,7 +15,8 @@ Continuous Integration (CI) tools help you stick to the quality standards by run
 When you submit a PR request, you can check whether your code passes the CI tests in the "check" section at the bottom of the web page.

 1. Qlib will check the code format with black. The PR will raise error if your code does not align to the standard of Qlib(e.g. a common error is the mixed use of space and tab).
- You can fix the bug by inputing the following code in the command line.
+
+   You can fix the bug by inputting the following code in the command line.

 .. code-block:: bash

@@ -32,7 +33,8 @@ When you submit a PR request, you can check whether your code passes the CI test


 3. Qlib will check your code style flake8. The checking command is implemented in [github action workflow](https://github.com/microsoft/qlib/blob/0e8b94a552f1c457cfa6cd2c1bb3b87ebb3fb279/.github/workflows/test.yml#L73).
- You can fix the bug by inputing the following code in the command line.
+
+   You can fix the bug by inputing the following code in the command line.

 .. code-block:: bash

@@ -40,7 +42,8 @@ When you submit a PR request, you can check whether your code passes the CI test


 4. Qlib has integrated pre-commit, which will make it easier for developers to format their code.
- Just run the following two commands, and the code will be automatically formatted using black and flake8 when the git commit command is executed.
+
+   Just run the following two commands, and the code will be automatically formatted using black and flake8 when the git commit command is executed.

 .. code-block:: bash

--- a/docs/hidden/client.rst
+++ b/docs/hidden/client.rst
@@ -81,6 +81,7 @@ If running on Windows, open **NFS** features and write correct **mount_path**, i
    * Open ``Programs and Features``.
    * Click ``Turn Windows features on or off``.
    * Scroll down and check the option ``Services for NFS``, then click OK
+
    Reference address: https://graspingtech.com/mount-nfs-share-windows-10/
 2.config correct mount_path
    * In windows, mount path must be not exist path and root path,
@@ -161,7 +162,7 @@ Limitations
 API
 ***

-The client is based on `python-socketio<https://python-socketio.readthedocs.io>`_ which is a framework that supports WebSocket client for Python language. The client can only propose requests and receive results, which do not include any calculating procedure.
+The client is based on `python-socketio <https://python-socketio.readthedocs.io>`_ which is a framework that supports WebSocket client for Python language. The client can only propose requests and receive results, which do not include any calculating procedure.

 Class
 -----
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -33,7 +33,7 @@ Document Structure

 .. toctree::
   :maxdepth: 3
-   :caption: COMPONENTS:
+   :caption: MAIN COMPONENTS:

   Workflow: Workflow Management <component/workflow.rst>
   Data Layer: Data Framework & Usage <component/data.rst>
@@ -44,10 +44,11 @@ Document Structure
   Qlib Recorder: Experiment Management <component/recorder.rst>
   Analysis: Evaluation & Results Analysis <component/report.rst>
   Online Serving: Online Management & Strategy & Tool <component/online.rst>
+   Reinforcement Learning <component/rl/toctree>

 .. toctree::
   :maxdepth: 3
-   :caption: ADVANCED TOPICS:
+   :caption: OTHER COMPONENTS/FEATURES/TOPICS:

   Building Formulaic Alphas <advanced/alpha.rst>
   Online & Offline mode <advanced/server.rst>
@@ -55,6 +56,12 @@ Document Structure
   Task Management <advanced/task_management.rst>
   Point-In-Time database <advanced/PIT.rst>

+.. toctree::
+   :maxdepth: 3
+   :caption: FOR DEVELOPERS:
+
+   Code Standard & Development Guidance <developer/code_standard_and_dev_guide.rst>
+
 .. toctree::
   :maxdepth: 3
   :caption: REFERENCE:
--- a/docs/introduction/introduction.rst
+++ b/docs/introduction/introduction.rst
@@ -15,38 +15,56 @@ With ``Qlib``, users can easily try their ideas to create better Quant investmen
 Framework
 =========

+
 .. image:: ../_static/img/framework.svg
    :align: center


 At the module level, Qlib is a platform that consists of above components. The components are designed as loose-coupled modules and each component could be used stand-alone.

+This framework may be intimidating for new users to Qlib. It tries to accurately include a lot of details of Qlib's design.
+For users new to Qlib, you can skip it first and read it later.


-========================  ==============================================================================
-Name                      Description
-========================  ==============================================================================
-`Infrastructure` layer    `Infrastructure` layer provides underlying support for Quant research.
-                          `DataServer` provides high-performance infrastructure for users to manage
-                          and retrieve raw data. `Trainer` provides flexible interface to control
-                          the training process of models which enable algorithms controlling the
-                          training process.

-`Workflow` layer          `Workflow` layer covers the whole workflow of quantitative investment.
-                          `Information Extractor` extracts data for models. `Forecast Model` focuses
-                          on producing all kinds of forecast signals (e.g. *alpha*, risk) for other
-                          modules.  With these signals `Decision Generator` will generate the target
-                          trading decisions(i.e. portfolio, orders)  to be executed by `Execution Env`
-                          (i.e. the trading market).  There may be multiple levels of `Trading Agent`
-                          and `Execution Env` (e.g. an *order executor trading agent and intraday
-                          order execution environment* could behave like an interday trading
-                          environment and nested in  *daily portfolio management trading agent and
-                          interday trading environment*  )
+===========================  ==============================================================================
+Name                         Description
+===========================  ==============================================================================
+`Infrastructure` layer       `Infrastructure` layer provides underlying support for Quant research.
+                             `DataServer` provides high-performance infrastructure for users to manage
+                             and retrieve raw data. `Trainer` provides flexible interface to control
+                             the training process of models which enable algorithms controlling the
+                             training process.

-`Interface` layer         `Interface` layer tries to present a user-friendly interface for the underlying
-                          system. `Analyser` module will provide users detailed analysis reports of
-                          forecasting signals, portfolios and execution results
-========================  ==============================================================================
+`Learning Framework` layer   The `Forecast Model` and `Trading Agent` are learnable. They are learned
+                             based on the `Learning Framework` layer and then applied to multiple scenarios
+                             in `Workflow` layer. The supported learning paradigms can be categorized into
+                             reinforcement learning and supervised learning.  The learning framework
+                             leverages the `Workflow` layer as well(e.g. sharing `Information Extractor`,
+                             creating environments based on `Execution Env`).
+
+`Workflow` layer             `Workflow` layer covers the whole workflow of quantitative investment.
+                             Both supervised-learning-based strategies and RL-based Strategies
+                             are supported.
+                             `Information Extractor` extracts data for models. `Forecast Model` focuses
+                             on producing all kinds of forecast signals (e.g. *alpha*, risk) for other
+                             modules.  With these signals `Decision Generator` will generate the target
+                             trading decisions(i.e. portfolio, orders)
+                             If RL-based Strategies are adopted, the `Policy` is learned in a end-to-end way,
+                             the trading deicsions are generated directly.
+                             Decisions will be executed by `Execution Env`
+                             (i.e. the trading market).  There may be multiple levels of `Strategy`
+                             and `Executor` (e.g. an *order executor trading strategy and intraday order executor*
+                             could behave like an interday trading loop and be nested in
+                             *daily portfolio management trading strategy and interday trading executor*
+                             trading loop)
+
+`Interface` layer            `Interface` layer tries to present a user-friendly interface for the underlying
+                             system. `Analyser` module will provide users detailed analysis reports of
+                             forecasting signals, portfolios and execution results
+===========================  ==============================================================================

 - The modules with hand-drawn style are under development and will be released in the future.
 - The modules with dashed borders are highly user-customizable and extendible.
+
+(p.s. framework image is created with https://draw.io/)
--- a/docs/introduction/quick.rst
+++ b/docs/introduction/quick.rst
@@ -21,6 +21,7 @@ Users can easily intsall ``Qlib`` according to the following steps:
 - Before installing ``Qlib`` from source, users need to install some dependencies:

    .. code-block::
+
        pip install numpy
        pip install --upgrade  cython

--- a/docs/reference/api.rst
+++ b/docs/reference/api.rst
@@ -1,4 +1,5 @@
 .. _api:
+
 =============
 API Reference
 =============
@@ -116,7 +117,7 @@ Model
 Strategy
 --------

-.. automodule:: qlib.contrib.strategy.strategy
+.. automodule:: qlib.contrib.strategy
    :members:

 Evaluate
@@ -254,5 +255,38 @@ Utils
 Serializable
 ------------

-.. automodule:: qlib.utils.serial.Serializable
+.. automodule:: qlib.utils.serial
    :members:
+
+RL
+==============
+
+Base Component
+--------------
+.. automodule:: qlib.rl
+    :members:
+    :imported-members:
+
+Strategy
+--------
+.. automodule:: qlib.rl.strategy
+    :members:
+    :imported-members:
+
+Trainer
+-------
+.. automodule:: qlib.rl.trainer
+    :members:
+    :imported-members:
+
+Order Execution
+---------------
+.. automodule:: qlib.rl.order_execution
+    :members:
+    :imported-members:
+
+Utils
+---------------
+.. automodule:: qlib.rl.utils
+    :members:
+    :imported-members:
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -4,3 +4,4 @@ numpy
 scipy
 scikit-learn
 pandas
+tianshou
--- a/docs/start/getdata.rst
+++ b/docs/start/getdata.rst
@@ -83,15 +83,14 @@ Load features of certain instruments in a given time range:
   >> from qlib.data import D
   >> instruments = ['SH600000']
   >> fields = ['$close', '$volume', 'Ref($close, 1)', 'Mean($close, 3)', '$high-$low']
-   >> D.features(instruments, fields, start_time='2010-01-01', end_time='2017-12-31', freq='day').head()
-
-                              $close     $volume  Ref($close, 1)  Mean($close, 3)  $high-$low
-      instrument  datetime
-      SH600000    2010-01-04  86.778313  16162960.0       88.825928        88.061483    2.907631
-                  2010-01-05  87.433578  28117442.0       86.778313        87.679273    3.235252
-                  2010-01-06  85.713585  23632884.0       87.433578        86.641825    1.720009
-                  2010-01-07  83.788803  20813402.0       85.713585        85.645322    3.030487
-                  2010-01-08  84.730675  16044853.0       83.788803        84.744354    2.047623
+   >> D.features(instruments, fields, start_time='2010-01-01', end_time='2017-12-31', freq='day').head().to_string()
+   '                           $close     $volume  Ref($close, 1)  Mean($close, 3)  $high-$low
+   ... instrument  datetime
+   ... SH600000    2010-01-04  86.778313  16162960.0       88.825928        88.061483    2.907631
+   ...             2010-01-05  87.433578  28117442.0       86.778313        87.679273    3.235252
+   ...             2010-01-06  85.713585  23632884.0       87.433578        86.641825    1.720009
+   ...             2010-01-07  83.788803  20813402.0       85.713585        85.645322    3.030487
+   ...             2010-01-08  84.730675  16044853.0       83.788803        84.744354    2.047623'

 Load features of certain stock pool in a given time range:

@@ -105,15 +104,14 @@ Load features of certain stock pool in a given time range:
   >> expressionDFilter = ExpressionDFilter(rule_expression='$close>Ref($close,1)')
   >> instruments = D.instruments(market='csi300', filter_pipe=[nameDFilter, expressionDFilter])
   >> fields = ['$close', '$volume', 'Ref($close, 1)', 'Mean($close, 3)', '$high-$low']
-   >> D.features(instruments, fields, start_time='2010-01-01', end_time='2017-12-31', freq='day').head()
-
-                                 $close        $volume  Ref($close, 1)  Mean($close, 3)  $high-$low
-      instrument  datetime
-      SH600655    2010-01-04  2699.567383  158193.328125     2619.070312      2626.097738  124.580566
-                  2010-01-08  2612.359619   77501.406250     2584.567627      2623.220133   83.373047
-                  2010-01-11  2712.982422  160852.390625     2612.359619      2636.636556  146.621582
-                  2010-01-12  2788.688232  164587.937500     2712.982422      2704.676758  128.413818
-                  2010-01-13  2790.604004  145460.453125     2788.688232      2764.091553  128.413818
+   >> D.features(instruments, fields, start_time='2010-01-01', end_time='2017-12-31', freq='day').head().to_string()
+   '                              $close        $volume  Ref($close, 1)  Mean($close, 3)  $high-$low
+   ... instrument  datetime
+   ... SH600655    2010-01-04  2699.567383  158193.328125     2619.070312      2626.097738  124.580566
+   ...             2010-01-08  2612.359619   77501.406250     2584.567627      2623.220133   83.373047
+   ...             2010-01-11  2712.982422  160852.390625     2612.359619      2636.636556  146.621582
+   ...             2010-01-12  2788.688232  164587.937500     2712.982422      2704.676758  128.413818
+   ...             2010-01-13  2790.604004  145460.453125     2788.688232      2764.091553  128.413818'


 For more details about features, please refer `Feature API <../component/data.html>`_.
--- a/docs/start/integration.rst
+++ b/docs/start/integration.rst
@@ -21,84 +21,88 @@ The Custom models need to inherit `qlib.model.base.Model <../reference/api.html#
    - ``Qlib`` passes the initialized parameters to the \_\_init\_\_ method.
    - The hyperparameters of model in the configuration must be consistent with those defined in the `__init__` method.
    - Code Example: In the following example, the hyperparameters of model in the configuration file should contain parameters such as `loss:mse`.
-    .. code-block:: Python

-        def __init__(self, loss='mse', **kwargs):
-            if loss not in {'mse', 'binary'}:
-                raise NotImplementedError
-            self._scorer = mean_squared_error if loss == 'mse' else roc_auc_score
-            self._params.update(objective=loss, **kwargs)
-            self._model = None
+        .. code-block:: Python
+
+            def __init__(self, loss='mse', **kwargs):
+                if loss not in {'mse', 'binary'}:
+                    raise NotImplementedError
+                self._scorer = mean_squared_error if loss == 'mse' else roc_auc_score
+                self._params.update(objective=loss, **kwargs)
+                self._model = None

 - Override the `fit` method
    - ``Qlib`` calls the fit method to train the model.
    - The parameters must include training feature `dataset`, which is designed in the interface.
    - The parameters could include some `optional` parameters with default values, such as `num_boost_round = 1000` for `GBDT`.
    - Code Example: In the following example, `num_boost_round = 1000` is an optional parameter.
-    .. code-block:: Python

-        def fit(self, dataset: DatasetH, num_boost_round = 1000, **kwargs):
+        .. code-block:: Python

-            # prepare dataset for lgb training and evaluation
-            df_train, df_valid = dataset.prepare(
-                ["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L
-            )
-            x_train, y_train = df_train["feature"], df_train["label"]
-            x_valid, y_valid = df_valid["feature"], df_valid["label"]
+            def fit(self, dataset: DatasetH, num_boost_round = 1000, **kwargs):

-            # Lightgbm need 1D array as its label
-            if y_train.values.ndim == 2 and y_train.values.shape[1] == 1:
-                y_train, y_valid = np.squeeze(y_train.values), np.squeeze(y_valid.values)
-            else:
-                raise ValueError("LightGBM doesn't support multi-label training")
+                # prepare dataset for lgb training and evaluation
+                df_train, df_valid = dataset.prepare(
+                    ["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L
+                )
+                x_train, y_train = df_train["feature"], df_train["label"]
+                x_valid, y_valid = df_valid["feature"], df_valid["label"]

-            dtrain = lgb.Dataset(x_train.values, label=y_train)
-            dvalid = lgb.Dataset(x_valid.values, label=y_valid)
+                # Lightgbm need 1D array as its label
+                if y_train.values.ndim == 2 and y_train.values.shape[1] == 1:
+                    y_train, y_valid = np.squeeze(y_train.values), np.squeeze(y_valid.values)
+                else:
+                    raise ValueError("LightGBM doesn't support multi-label training")

-            # fit the model
-            self.model = lgb.train(
-                self.params,
-                dtrain,
-                num_boost_round=num_boost_round,
-                valid_sets=[dtrain, dvalid],
-                valid_names=["train", "valid"],
-                early_stopping_rounds=early_stopping_rounds,
-                verbose_eval=verbose_eval,
-                evals_result=evals_result,
-                **kwargs
-            )
+                dtrain = lgb.Dataset(x_train.values, label=y_train)
+                dvalid = lgb.Dataset(x_valid.values, label=y_valid)
+
+                # fit the model
+                self.model = lgb.train(
+                    self.params,
+                    dtrain,
+                    num_boost_round=num_boost_round,
+                    valid_sets=[dtrain, dvalid],
+                    valid_names=["train", "valid"],
+                    early_stopping_rounds=early_stopping_rounds,
+                    verbose_eval=verbose_eval,
+                    evals_result=evals_result,
+                    **kwargs
+                )

 - Override the `predict` method
    - The parameters must include the parameter `dataset`, which will be userd to get the test dataset.
    - Return the `prediction score`.
    - Please refer to `Model API <../reference/api.html#module-qlib.model.base>`_ for the parameter types of the fit method.
    - Code Example: In the following example, users need to use `LightGBM` to predict the label(such as `preds`) of test data `x_test` and return it.
-    .. code-block:: Python

-        def predict(self, dataset: DatasetH, **kwargs)-> pandas.Series:
-            if self.model is None:
-                raise ValueError("model is not fitted yet!")
-            x_test = dataset.prepare("test", col_set="feature", data_key=DataHandlerLP.DK_I)
-            return pd.Series(self.model.predict(x_test.values), index=x_test.index)
+        .. code-block:: Python
+
+            def predict(self, dataset: DatasetH, **kwargs)-> pandas.Series:
+                if self.model is None:
+                    raise ValueError("model is not fitted yet!")
+                x_test = dataset.prepare("test", col_set="feature", data_key=DataHandlerLP.DK_I)
+                return pd.Series(self.model.predict(x_test.values), index=x_test.index)

 - Override the `finetune` method (Optional)
    - This method is optional to the users. When users want to use this method on their own models, they should inherit the ``ModelFT`` base class, which includes the interface of `finetune`.
    - The parameters must include the parameter `dataset`.
    - Code Example: In the following example, users will use `LightGBM` as the model and finetune it.
-    .. code-block:: Python

-        def finetune(self, dataset: DatasetH, num_boost_round=10, verbose_eval=20):
-            # Based on existing model and finetune by train more rounds
-            dtrain, _ = self._prepare_data(dataset)
-            self.model = lgb.train(
-                self.params,
-                dtrain,
-                num_boost_round=num_boost_round,
-                init_model=self.model,
-                valid_sets=[dtrain],
-                valid_names=["train"],
-                verbose_eval=verbose_eval,
-            )
+        .. code-block:: Python
+
+            def finetune(self, dataset: DatasetH, num_boost_round=10, verbose_eval=20):
+                # Based on existing model and finetune by train more rounds
+                dtrain, _ = self._prepare_data(dataset)
+                self.model = lgb.train(
+                    self.params,
+                    dtrain,
+                    num_boost_round=num_boost_round,
+                    init_model=self.model,
+                    valid_sets=[dtrain],
+                    valid_names=["train"],
+                    verbose_eval=verbose_eval,
+                )

 Configuration File
 ==================
@@ -107,21 +111,21 @@ The configuration file is described in detail in the `Workflow <../component/wor

 - Example: The following example describes the `model` field of configuration file about the custom lightgbm model mentioned above, where `module_path` is the module path, `class` is the class name, and `args` is the hyperparameter passed into the __init__ method. All parameters in the field is passed to `self._params` by `\*\*kwargs` in `__init__` except `loss = mse`.

-.. code-block:: YAML
+    .. code-block:: YAML

-    model:
-        class: LGBModel
-        module_path: qlib.contrib.model.gbdt
-        args:
-            loss: mse
-            colsample_bytree: 0.8879
-            learning_rate: 0.0421
-            subsample: 0.8789
-            lambda_l1: 205.6999
-            lambda_l2: 580.9768
-            max_depth: 8
-            num_leaves: 210
-            num_threads: 20
+        model:
+            class: LGBModel
+            module_path: qlib.contrib.model.gbdt
+            args:
+                loss: mse
+                colsample_bytree: 0.8879
+                learning_rate: 0.0421
+                subsample: 0.8789
+                lambda_l1: 205.6999
+                lambda_l2: 580.9768
+                max_depth: 8
+                num_leaves: 210
+                num_threads: 20

 Users could find configuration file of the baselines of the ``Model`` in ``examples/benchmarks``. All the configurations of different models are listed under the corresponding model folder.

--- a/examples/benchmarks/DoubleEnsemble/workflow_config_doubleensemble_early_stop_Alpha158.yaml
+++ b/examples/benchmarks/DoubleEnsemble/workflow_config_doubleensemble_early_stop_Alpha158.yaml
@@ -0,0 +1,95 @@
+qlib_init:
+    provider_uri: "~/.qlib/qlib_data/cn_data"
+    region: cn
+market: &market csi300
+benchmark: &benchmark SH000300
+data_handler_config: &data_handler_config
+    start_time: 2008-01-01
+    end_time: 2020-08-01
+    fit_start_time: 2008-01-01
+    fit_end_time: 2014-12-31
+    instruments: *market
+port_analysis_config: &port_analysis_config
+    strategy:
+        class: TopkDropoutStrategy
+        module_path: qlib.contrib.strategy
+        kwargs:
+            signal:
+                - <MODEL> 
+                - <DATASET>
+            topk: 50
+            n_drop: 5
+    backtest:
+        start_time: 2017-01-01
+        end_time: 2020-08-01
+        account: 100000000
+        benchmark: *benchmark
+        exchange_kwargs:
+            limit_threshold: 0.095
+            deal_price: close
+            open_cost: 0.0005
+            close_cost: 0.0015
+            min_cost: 5
+task:
+    model:
+        class: DEnsembleModel
+        module_path: qlib.contrib.model.double_ensemble
+        kwargs:
+            base_model: "gbm"
+            loss: mse
+            num_models: 3
+            enable_sr: True
+            enable_fs: True
+            alpha1: 1
+            alpha2: 1
+            bins_sr: 10
+            bins_fs: 5
+            decay: 0.5
+            sample_ratios:
+                - 0.8
+                - 0.7
+                - 0.6
+                - 0.5
+                - 0.4
+            sub_weights:
+                - 1
+                - 1
+                - 1
+            epochs: 1000
+            early_stopping_rounds: 50
+            colsample_bytree: 0.8879
+            learning_rate: 0.2
+            subsample: 0.8789
+            lambda_l1: 205.6999
+            lambda_l2: 580.9768
+            max_depth: 8
+            num_leaves: 210
+            num_threads: 20
+            verbosity: -1
+    dataset:
+        class: DatasetH
+        module_path: qlib.data.dataset
+        kwargs:
+            handler:
+                class: Alpha158
+                module_path: qlib.contrib.data.handler
+                kwargs: *data_handler_config
+            segments:
+                train: [2008-01-01, 2014-12-31]
+                valid: [2015-01-01, 2016-12-31]
+                test: [2017-01-01, 2020-08-01]
+    record: 
+        - class: SignalRecord
+          module_path: qlib.workflow.record_temp
+          kwargs: 
+            model: <MODEL>
+            dataset: <DATASET>
+        - class: SigAnaRecord
+          module_path: qlib.workflow.record_temp
+          kwargs: 
+            ana_long_short: False
+            ann_scaler: 252
+        - class: PortAnaRecord
+          module_path: qlib.workflow.record_temp
+          kwargs: 
+            config: *port_analysis_config
--- a/examples/benchmarks/LightGBM/features_sample.py
+++ b/examples/benchmarks/LightGBM/features_sample.py
@@ -5,6 +5,8 @@ from qlib.data.inst_processor import InstProcessor


 class Resample1minProcessor(InstProcessor):
+    """This processor tries to resample the data. It will reasmple the data from 1min freq to day freq by selecting a specific miniute"""
+
    def __init__(self, hour: int, minute: int, **kwargs):
        self.hour = hour
        self.minute = minute
--- a/examples/benchmarks_dynamic/DDG-DA/workflow.py
+++ b/examples/benchmarks_dynamic/DDG-DA/workflow.py
@@ -170,7 +170,7 @@ class DDGDA:
        # 3) train and logging meta model
        with R.start(experiment_name=self.meta_exp_name):
            R.log_params(**kwargs)
-            mm = MetaModelDS(step=self.step, hist_step_n=kwargs["hist_step_n"], lr=0.001, max_epoch=200, seed=43)
+            mm = MetaModelDS(step=self.step, hist_step_n=kwargs["hist_step_n"], lr=0.001, max_epoch=100, seed=43)
            mm.fit(md)
            R.save_objects(model=mm)

--- a/examples/benchmarks_dynamic/README.md
+++ b/examples/benchmarks_dynamic/README.md
@@ -4,15 +4,21 @@ So adapting the forecasting models/strategies to market dynamics is very importa

 The table below shows the performances of different solutions on different forecasting models.

-## Alpha158 dataset
+## Alpha158 Dataset
+Here is the [crowd sourced version of qlib data](data_collector/crowd_source/README.md): https://github.com/chenditc/investment_data/releases
+```bash
+wget https://github.com/chenditc/investment_data/releases/download/20220720/qlib_bin.tar.gz
+tar -zxvf qlib_bin.tar.gz -C ~/.qlib/qlib_data/cn_data --strip-components=2
+```

 | Model Name       | Dataset | IC | ICIR | Rank IC | Rank ICIR | Annualized Return | Information Ratio | Max Drawdown |
 |------------------|---------|----|------|---------|-----------|-------------------|-------------------|--------------|
-| RR[Linear]       |Alpha158 |0.088|0.570|0.102    |0.622      |0.077              |1.175              |-0.086        |
-| DDG-DA[Linear]   |Alpha158 |0.093|0.622|0.106    |0.670      |0.085              |1.213              |-0.093        |
-| RR[LightGBM]     |Alpha158 |0.079|0.566|0.088    |0.592      |0.075              |1.226              |-0.096        |
-| DDG-DA[LightGBM] |Alpha158 |0.084|0.639|0.093    |0.664      |0.099              |1.442              |-0.071        |
+| RR[Linear]       |Alpha158 |0.089|0.577|0.102    |0.627      |0.093              |1.458              |-0.073        |
+| DDG-DA[Linear]   |Alpha158 |0.096|0.636|0.107    |0.677      |0.067              |0.996              |-0.091        |
+| RR[LightGBM]     |Alpha158 |0.082|0.589|0.091    |0.626      |0.077              |1.320              |-0.091        |
+| DDG-DA[LightGBM] |Alpha158 |0.085|0.658|0.094    |0.686      |0.115              |1.792              |-0.068        |

 - The label horizon of the `Alpha158` dataset is set to 20.
 - The rolling time intervals are set to 20 trading days.
 - The test rolling periods are from January 2017 to August 2020.
+- The results are based on the crowd-sourced version. The Yahoo version of qlib data does not contain `VWAP`, so all related factors are missing and filled with 0, which leads to a rank-deficient matrix (a matrix does not have full rank) and makes lower-level optimization of DDG-DA can not be solved.
--- a/examples/rl/README.md
+++ b/examples/rl/README.md
@@ -0,0 +1,60 @@
+This folder contains a simple example of how to run Qlib RL. It contains:
+
+```
+.
+├── experiment_config
+│   ├── backtest       # Backtest config
+│   └── training       # Training config
+├── README.md          # Readme (the current file)
+└── scripts            # Scripts for data pre-processing
+```
+
+## Data preparation
+
+Use [AzCopy](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10) to download data:
+
+```
+azcopy copy https://qlibpublic.blob.core.windows.net/data/rl/qlib_rl_example_data ./ --recursive
+mv qlib_rl_example_data data
+```
+
+The downloaded data will be placed at `./data`. The original data are in `data/csv`. To create all data needed by the case, run:
+
+```
+bash scripts/data_pipeline.sh
+```
+
+After the execution finishes, the `data/` directory should be like:
+
+```
+data
+├── backtest_orders.csv
+├── bin
+├── csv
+├── pickle
+├── pickle_dataframe
+└── training_order_split
+```
+
+## Run training
+
+Run:
+
+```
+python -m qlib.rl.contrib.train_onpolicy --config_path ./experiment_config/training/config.yml
+```
+
+After training, checkpoints will be stored under `checkpoints/`.
+
+## Run backtest
+
+```
+python -m qlib.rl.contrib.backtest --config_path ./experiment_config/backtest/config.yml
+```
+
+The backtest workflow will use the trained model in `checkpoints/`. The backtest summary can be found in `outputs/`.
+
+## Others
+The RL module is designed in a loosely-coupled way. Currently, RL examples are integrated with concrete business logic.
+But the core part of RL is much simpler than what you see.
+To demonstrate the simple core of RL, [a dedicated notebook](./simple_example.ipynb) for RL without business loss is created.
--- a/examples/rl/experiment_config/backtest/config.yml
+++ b/examples/rl/experiment_config/backtest/config.yml
@@ -0,0 +1,57 @@
+order_file: ./data/backtest_orders.csv
+start_time: "9:45"
+end_time: "14:44"
+qlib:
+  provider_uri_1min: ./data/bin
+  feature_root_dir: ./data/pickle
+  feature_columns_today: [
+    "$open", "$high", "$low", "$close", "$vwap", "$volume",
+  ]
+  feature_columns_yesterday: [
+    "$open_v1", "$high_v1", "$low_v1", "$close_v1", "$vwap_v1", "$volume_v1",
+  ]
+exchange:
+  limit_threshold: ['$close == 0', '$close == 0']
+  deal_price: ["If($close == 0, $vwap, $close)", "If($close == 0, $vwap, $close)"]
+  volume_threshold:
+    all: ["cum", "0.2 * DayCumsum($volume, '9:45', '14:44')"]
+    buy: ["current", "$close"]
+    sell: ["current", "$close"]
+strategies: 
+  30min: 
+    class: TWAPStrategy
+    module_path: qlib.contrib.strategy.rule_strategy
+    kwargs: {}
+  1day: 
+    class: SAOEIntStrategy
+    module_path: qlib.rl.order_execution.strategy
+    kwargs:
+      state_interpreter:
+        class: FullHistoryStateInterpreter
+        module_path: qlib.rl.order_execution.interpreter
+        kwargs:
+          max_step: 8
+          data_ticks: 240
+          data_dim: 6
+          processed_data_provider:
+            class: PickleProcessedDataProvider
+            module_path: qlib.rl.data.pickle_styled
+            kwargs:
+              data_dir: ./data/pickle_dataframe/feature
+      action_interpreter: 
+        class: CategoricalActionInterpreter
+        module_path: qlib.rl.order_execution.interpreter
+        kwargs: 
+          values: 14
+          max_step: 8
+      network: 
+          class: Recurrent
+          module_path: qlib.rl.order_execution.network
+          kwargs: {}
+      policy: 
+          class: PPO
+          module_path: qlib.rl.order_execution.policy
+          kwargs: 
+            lr: 1.0e-4
+            weight_file: ./checkpoints/latest.pth
+concurrency: 5
--- a/examples/rl/experiment_config/training/config.yml
+++ b/examples/rl/experiment_config/training/config.yml
@@ -0,0 +1,59 @@
+simulator:
+  time_per_step: 30
+  vol_limit: null
+env:
+  concurrency: 1
+  parallel_mode: dummy
+action_interpreter:
+  class: CategoricalActionInterpreter
+  kwargs:
+    values: 14
+    max_step: 8
+  module_path: qlib.rl.order_execution.interpreter
+state_interpreter:
+  class: FullHistoryStateInterpreter
+  kwargs:
+    data_dim: 6
+    data_ticks: 240
+    max_step: 8
+    processed_data_provider:
+      class: PickleProcessedDataProvider
+      module_path: qlib.rl.data.pickle_styled
+      kwargs:
+        data_dir: ./data/pickle_dataframe/feature
+  module_path: qlib.rl.order_execution.interpreter
+reward:
+  class: PAPenaltyReward
+  kwargs:
+    penalty: 100.0
+  module_path: qlib.rl.order_execution.reward
+data:
+  source:
+    order_dir: ./data/training_order_split
+    data_dir: ./data/pickle_dataframe/backtest
+    total_time: 240
+    default_start_time: 0
+    default_end_time: 240
+    proc_data_dim: 6
+  num_workers: 0
+  queue_size: 20
+network:
+  class: Recurrent
+  module_path: qlib.rl.order_execution.network
+policy:
+  class: PPO
+  kwargs:
+    lr: 0.0001
+  module_path: qlib.rl.order_execution.policy
+runtime:
+  seed: 42
+  use_cuda: false
+trainer:
+  max_epoch: 2
+  repeat_per_collect: 5
+  earlystop_patience: 2
+  episode_per_collect: 20
+  batch_size: 16
+  val_every_n_epoch: 1
+  checkpoint_path: ./checkpoints
+  checkpoint_every_n_iters: 1
--- a/examples/rl/scripts/collect_pickle_dataframe.py
+++ b/examples/rl/scripts/collect_pickle_dataframe.py
@@ -0,0 +1,21 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+import os
+import pickle
+import pandas as pd
+from tqdm import tqdm
+
+os.makedirs(os.path.join("data", "pickle_dataframe"), exist_ok=True)
+
+for tag in ("backtest", "feature"):
+    df = pickle.load(open(os.path.join("data", "pickle", f"{tag}.pkl"), "rb"))
+    df = pd.concat(list(df.values())).reset_index()
+    df["date"] = df["datetime"].dt.date.astype("datetime64")
+    instruments = sorted(set(df["instrument"]))
+
+    os.makedirs(os.path.join("data", "pickle_dataframe", tag), exist_ok=True)
+    for instrument in tqdm(instruments):
+        cur = df[df["instrument"] == instrument].sort_values(by=["datetime"])
+        cur = cur.set_index(["instrument", "datetime", "date"])
+        pickle.dump(cur, open(os.path.join("data", "pickle_dataframe", tag, f"{instrument}.pkl"), "wb"))
--- a/examples/rl/scripts/data_pipeline.sh
+++ b/examples/rl/scripts/data_pipeline.sh
@@ -0,0 +1,14 @@
+# Generate `bin` format data
+set -e
+python ../../scripts/dump_bin.py dump_all --csv_path ./data/csv --qlib_dir ./data/bin --include_fields open,close,high,low,vwap,volume --symbol_field_name symbol --date_field_name date --freq 1min
+
+# Generate pickle format data
+python scripts/gen_pickle_data.py -c scripts/pickle_data_config.yml
+if [ -e stat/ ]; then
+    rm -r stat/
+fi
+python scripts/collect_pickle_dataframe.py
+
+# Sample orders
+python scripts/gen_training_orders.py
+python scripts/gen_backtest_orders.py
--- a/examples/rl/scripts/gen_backtest_orders.py
+++ b/examples/rl/scripts/gen_backtest_orders.py
@@ -0,0 +1,55 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+import argparse
+import os
+import pandas as pd
+import numpy as np
+import pickle
+
+parser = argparse.ArgumentParser()
+parser.add_argument("--seed", type=int, default=20220926)
+parser.add_argument("--num_order", type=int, default=10)
+args = parser.parse_args()
+
+np.random.seed(args.seed)
+
+path = os.path.join("data", "pickle", "backtesttest.pkl")
+df = pickle.load(open(path, "rb")).reset_index()
+df["date"] = df["datetime"].dt.date.astype("datetime64")
+
+instruments = sorted(set(df["instrument"]))
+
+# TODO: The example is expected to be able to handle data containing missing values.
+# TODO: Currently, we just simply skip dates that contain missing data. We will add
+# TODO: this feature in the future.
+skip_dates = {}
+for instrument in instruments:
+    csv_df = pd.read_csv(os.path.join("data", "csv", f"{instrument}.csv"))
+    csv_df = csv_df[csv_df["close"].isna()]
+    dates = set([str(d).split(" ")[0] for d in csv_df["date"]])
+    skip_dates[instrument] = dates
+
+df_list = []
+for instrument in instruments:
+    print(instrument)
+
+    cur_df = df[df["instrument"] == instrument]
+
+    dates = sorted(set([str(d).split(" ")[0] for d in cur_df["date"]]))
+    dates = [date for date in dates if date not in skip_dates[instrument]]
+
+    n = args.num_order
+    df_list.append(
+        pd.DataFrame(
+            {
+                "date": sorted(np.random.choice(dates, size=n, replace=False)),
+                "instrument": [instrument] * n,
+                "amount": np.random.randint(low=3, high=11, size=n) * 100.0,
+                "order_type": np.random.randint(low=0, high=2, size=n),
+            }
+        ).set_index(["date", "instrument"]),
+    )
+
+total_df = pd.concat(df_list)
+total_df.to_csv("data/backtest_orders.csv")
--- a/examples/rl/scripts/gen_pickle_data.py
+++ b/examples/rl/scripts/gen_pickle_data.py
@@ -0,0 +1,43 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+import yaml
+import argparse
+import os
+from copy import deepcopy
+
+from qlib.contrib.data.highfreq_provider import HighFreqProvider
+
+loader = yaml.FullLoader
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-c", "--config", type=str, default="config.yml")
+    parser.add_argument("-d", "--dest", type=str, default=".")
+    parser.add_argument("-s", "--split", type=str, choices=["none", "date", "stock", "both"], default="stock")
+    args = parser.parse_args()
+
+    conf = yaml.load(open(args.config), Loader=loader)
+
+    for k, v in conf.items():
+        if isinstance(v, dict) and "path" in v:
+            v["path"] = os.path.join(args.dest, v["path"])
+    provider = HighFreqProvider(**conf)
+
+    # Gen dataframe
+    if "feature_conf" in conf:
+        feature = provider._gen_dataframe(deepcopy(provider.feature_conf))
+    if "backtest_conf" in conf:
+        backtest = provider._gen_dataframe(deepcopy(provider.backtest_conf))
+
+    provider.feature_conf["path"] = os.path.splitext(provider.feature_conf["path"])[0] + "/"
+    provider.backtest_conf["path"] = os.path.splitext(provider.backtest_conf["path"])[0] + "/"
+    # Split by date
+    if args.split == "date" or args.split == "both":
+        provider._gen_day_dataset(deepcopy(provider.feature_conf), "feature")
+        provider._gen_day_dataset(deepcopy(provider.backtest_conf), "backtest")
+
+    # Split by stock
+    if args.split == "stock" or args.split == "both":
+        provider._gen_stock_dataset(deepcopy(provider.feature_conf), "feature")
+        provider._gen_stock_dataset(deepcopy(provider.backtest_conf), "backtest")
--- a/examples/rl/scripts/gen_training_orders.py
+++ b/examples/rl/scripts/gen_training_orders.py
@@ -0,0 +1,39 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+import argparse
+import os
+import pandas as pd
+import numpy as np
+import pickle
+
+parser = argparse.ArgumentParser()
+parser.add_argument("--seed", type=int, default=20220926)
+parser.add_argument("--stock", type=str, default="AAPL")
+parser.add_argument("--train_size", type=int, default=10)
+parser.add_argument("--valid_size", type=int, default=2)
+parser.add_argument("--test_size", type=int, default=2)
+args = parser.parse_args()
+
+np.random.seed(args.seed)
+
+os.makedirs(os.path.join("data", "training_order_split"), exist_ok=True)
+
+for group, n in zip(("train", "valid", "test"), (args.train_size, args.valid_size, args.test_size)):
+    path = os.path.join("data", "pickle", f"backtest{group}.pkl")
+    df = pickle.load(open(path, "rb")).reset_index()
+    df["date"] = df["datetime"].dt.date.astype("datetime64")
+
+    dates = sorted(set([str(d).split(" ")[0] for d in df["date"]]))
+
+    data_df = pd.DataFrame(
+        {
+            "date": sorted(np.random.choice(dates, size=n, replace=False)),
+            "instrument": [args.stock] * n,
+            "amount": np.random.randint(low=3, high=11, size=n) * 100.0,
+            "order_type": [0] * n,
+        }
+    ).set_index(["date", "instrument"])
+
+    os.makedirs(os.path.join("data", "training_order_split", group), exist_ok=True)
+    pickle.dump(data_df, open(os.path.join("data", "training_order_split", group, f"{args.stock}.pkl"), "wb"))
--- a/examples/rl/scripts/pickle_data_config.yml
+++ b/examples/rl/scripts/pickle_data_config.yml
@@ -0,0 +1,57 @@
+# start & end time for training/validation/test datasets
+start_time: !!str &start 2020-01-01
+end_time: !!str &end 2020-07-31
+train_end_time: !!str &tend 2020-03-31
+valid_start_time: !!str &vstart 2020-04-01
+valid_end_time: !!str &vend 2020-05-31
+test_start_time: !!str &tstart 2020-06-01
+# the instrument set
+instruments: &ins all
+# qlib related configuration
+qlib_conf:
+    provider_uri: ./data/bin # path to generated qlib bin
+    redis_port: 233
+feature_conf:
+    path: ./data/pickle/feature.pkl # output path of feature
+    class: DatasetH
+    module_path: qlib.data.dataset
+    kwargs:
+        handler:
+            class: HighFreqGeneralHandler
+            module_path: qlib.contrib.data.highfreq_handler
+            kwargs:
+                start_time: *start
+                end_time: *end
+                fit_start_time: *start
+                fit_end_time: *tend
+                instruments: *ins
+                day_length: 240 # how many minutes in one trading day
+                infer_processors:
+                - class: HighFreqNorm
+                  module_path: qlib.contrib.data.highfreq_processor
+                  kwargs:
+                    feature_save_dir: ./stat/  #  output path of statistics of features (for feature normalization)
+                    norm_groups: 
+                        price: 10
+                        volume: 2
+        segments:
+            train: !!python/tuple [*start, *tend]
+            valid: !!python/tuple [*vstart, *vend]
+            test: !!python/tuple [*tstart, *end]
+backtest_conf:
+    path: ./data/pickle/backtest.pkl # output path of backtest
+    class: DatasetH
+    module_path: qlib.data.dataset
+    kwargs:
+        handler:
+            class: HighFreqGeneralBacktestHandler
+            module_path: qlib.contrib.data.highfreq_handler
+            kwargs:
+                start_time: *start
+                end_time: *end
+                instruments: *ins
+                day_length: 240
+        segments:
+            train: !!python/tuple [*start, *tend]
+            valid: !!python/tuple [*vstart, *vend]
+            test: !!python/tuple [*tstart, *end]
--- a/examples/rl/simple_example.ipynb
+++ b/examples/rl/simple_example.ipynb
--- a/examples/run_all_model.py
+++ b/examples/run_all_model.py
@@ -253,7 +253,7 @@ class ModelRunner:
            default "" indicates that
        qlib_uri : str
            the uri to install qlib with pip
-            it could be url on the we or local path (NOTE: the local path must be a absolute path)
+            it could be URI on the remote or local path (NOTE: the local path must be an absolute path)
        exp_folder_name: str
            the name of the experiment folder
        wait_before_rm_env : bool
--- a/qlib/init.py
+++ b/qlib/init.py
@@ -2,7 +2,7 @@
 # Licensed under the MIT License.
 from pathlib import Path

-__version__ = "0.8.6.99"
+__version__ = "0.9.1"
 __version__bak = __version__  # This version is backup for QlibConfig.reset_qlib_version
 import os
 from typing import Union
@@ -34,8 +34,7 @@ def init(default_conf="client", **kwargs):
    from .config import C  # pylint: disable=C0415
    from .data.cache import H  # pylint: disable=C0415

-    # FIXME: this logger ignored the level in config
-    logger = get_module_logger("Initialization", level=logging.INFO)
+    logger = get_module_logger("Initialization")

    skip_if_reg = kwargs.pop("skip_if_reg", False)
    if skip_if_reg and C.registered:
@@ -48,6 +47,7 @@ def init(default_conf="client", **kwargs):
    if clear_mem_cache:
        H.clear()
    C.set(default_conf, **kwargs)
+    get_module_logger.setLevel(C.logging_level)

    # mount nfs
    for _freq, provider_uri in C.provider_uri.items():
--- a/qlib/backtest/init.py
+++ b/qlib/backtest/init.py
@@ -10,7 +10,6 @@ from typing import TYPE_CHECKING, Any, Generator, List, Optional, Tuple, Union
 import pandas as pd

 from .account import Account
-from .report import Indicator, PortfolioMetrics

 if TYPE_CHECKING:
    from ..strategy.base import BaseStrategy
@@ -20,7 +19,7 @@ if TYPE_CHECKING:
 from ..config import C
 from ..log import get_module_logger
 from ..utils import init_instance_by_config
-from .backtest import backtest_loop, collect_data_loop
+from .backtest import INDICATOR_METRIC, PORT_METRIC, backtest_loop, collect_data_loop
 from .decision import Order
 from .exchange import Exchange
 from .utils import CommonInfrastructure
@@ -114,7 +113,7 @@ def get_exchange(
 def create_account_instance(
    start_time: Union[pd.Timestamp, str],
    end_time: Union[pd.Timestamp, str],
-    benchmark: str,
+    benchmark: Optional[str],
    account: Union[float, int, dict],
    pos_type: str = "Position",
 ) -> Account:
@@ -163,7 +162,9 @@ def create_account_instance(
        init_cash=init_cash,
        position_dict=position_dict,
        pos_type=pos_type,
-        benchmark_config={
+        benchmark_config={}
+        if benchmark is None
+        else {
            "benchmark": benchmark,
            "start_time": start_time,
            "end_time": end_time,
@@ -176,7 +177,7 @@ def get_strategy_executor(
    end_time: Union[pd.Timestamp, str],
    strategy: Union[str, dict, object, Path],
    executor: Union[str, dict, object, Path],
-    benchmark: str = "SH000300",
+    benchmark: Optional[str] = "SH000300",
    account: Union[float, int, dict] = 1e9,
    exchange_kwargs: dict = {},
    pos_type: str = "Position",
@@ -221,7 +222,7 @@ def backtest(
    account: Union[float, int, dict] = 1e9,
    exchange_kwargs: dict = {},
    pos_type: str = "Position",
-) -> Tuple[PortfolioMetrics, Indicator]:
+) -> Tuple[PORT_METRIC, INDICATOR_METRIC]:
    """initialize the strategy and executor, then backtest function for the interaction of the outermost strategy and
    executor in the nested decision execution

@@ -242,7 +243,7 @@ def backtest(
    benchmark: str
        the benchmark for reporting.
    account : Union[float, int, Position]
-        information for describing how to creating the account
+        information for describing how to create the account
        For `float` or `int`:
            Using Account with only initial cash
        For `Position`:
@@ -254,9 +255,9 @@ def backtest(

    Returns
    -------
-    portfolio_metrics_dict: Dict[PortfolioMetrics]
+    portfolio_dict: PORT_METRIC
        it records the trading portfolio_metrics information
-    indicator_dict: Dict[Indicator]
+    indicator_dict: INDICATOR_METRIC
        it computes the trading indicator
        It is organized in a dict format

@@ -271,8 +272,7 @@ def backtest(
        exchange_kwargs,
        pos_type=pos_type,
    )
-    portfolio_metrics, indicator = backtest_loop(start_time, end_time, trade_strategy, trade_executor)
-    return portfolio_metrics, indicator
+    return backtest_loop(start_time, end_time, trade_strategy, trade_executor)


 def collect_data(
@@ -345,4 +345,4 @@ def format_decisions(
    return res


-__all__ = ["Order", "backtest"]
+__all__ = ["Order", "backtest", "get_strategy_executor"]
--- a/qlib/backtest/account.py
+++ b/qlib/backtest/account.py
@@ -236,7 +236,7 @@ class Account:
        if not self.current_position.skip_update():
            stock_list = self.current_position.get_stock_list()
            for code in stock_list:
-                # if suspend, no new price to be updated, profit is 0
+                # if suspended, no new price to be updated, profit is 0
                if trade_exchange.check_stock_suspended(code, trade_start_time, trade_end_time):
                    continue
                bar_close = cast(float, trade_exchange.get_close(code, trade_start_time, trade_end_time))
--- a/qlib/backtest/backtest.py
+++ b/qlib/backtest/backtest.py
@@ -3,12 +3,12 @@

 from __future__ import annotations

-from typing import TYPE_CHECKING, Generator, Optional, Tuple, Union, cast
+from typing import Dict, TYPE_CHECKING, Generator, Optional, Tuple, Union, cast

 import pandas as pd

 from qlib.backtest.decision import BaseTradeDecision
-from qlib.backtest.report import Indicator, PortfolioMetrics
+from qlib.backtest.report import Indicator

 if TYPE_CHECKING:
    from qlib.strategy.base import BaseStrategy
@@ -19,30 +19,35 @@ from tqdm.auto import tqdm
 from ..utils.time import Freq


+PORT_METRIC = Dict[str, Tuple[pd.DataFrame, dict]]
+INDICATOR_METRIC = Dict[str, Tuple[pd.DataFrame, Indicator]]
+
+
 def backtest_loop(
    start_time: Union[pd.Timestamp, str],
    end_time: Union[pd.Timestamp, str],
    trade_strategy: BaseStrategy,
    trade_executor: BaseExecutor,
-) -> Tuple[PortfolioMetrics, Indicator]:
+) -> Tuple[PORT_METRIC, INDICATOR_METRIC]:
    """backtest function for the interaction of the outermost strategy and executor in the nested decision execution

    please refer to the docs of `collect_data_loop`

    Returns
    -------
-    portfolio_metrics: PortfolioMetrics
+    portfolio_dict: PORT_METRIC
        it records the trading portfolio_metrics information
-    indicator: Indicator
+    indicator_dict: INDICATOR_METRIC
        it computes the trading indicator
    """
    return_value: dict = {}
    for _decision in collect_data_loop(start_time, end_time, trade_strategy, trade_executor, return_value):
        pass

-    portfolio_metrics = cast(PortfolioMetrics, return_value.get("portfolio_metrics"))
-    indicator = cast(Indicator, return_value.get("indicator"))
-    return portfolio_metrics, indicator
+    portfolio_dict = cast(PORT_METRIC, return_value.get("portfolio_dict"))
+    indicator_dict = cast(INDICATOR_METRIC, return_value.get("indicator_dict"))
+
+    return portfolio_dict, indicator_dict


 def collect_data_loop(
@@ -83,18 +88,23 @@ def collect_data_loop(
        while not trade_executor.finished():
            _trade_decision: BaseTradeDecision = trade_strategy.generate_trade_decision(_execute_result)
            _execute_result = yield from trade_executor.collect_data(_trade_decision, level=0)
+            trade_strategy.post_exe_step(_execute_result)
            bar.update(1)
+        trade_strategy.post_upper_level_exe_step()

    if return_value is not None:
        all_executors = trade_executor.get_all_executors()
-        all_portfolio_metrics = {
-            "{}{}".format(*Freq.parse(_executor.time_per_step)): _executor.trade_account.get_portfolio_metrics()
-            for _executor in all_executors
-            if _executor.trade_account.is_port_metr_enabled()
-        }
-        all_indicators = {}
-        for _executor in all_executors:
-            key = "{}{}".format(*Freq.parse(_executor.time_per_step))
-            all_indicators[key] = _executor.trade_account.get_trade_indicator().generate_trade_indicators_dataframe()
-            all_indicators[key + "_obj"] = _executor.trade_account.get_trade_indicator()
-        return_value.update({"portfolio_metrics": all_portfolio_metrics, "indicator": all_indicators})
+
+        portfolio_dict: PORT_METRIC = {}
+        indicator_dict: INDICATOR_METRIC = {}
+
+        for executor in all_executors:
+            key = "{}{}".format(*Freq.parse(executor.time_per_step))
+            if executor.trade_account.is_port_metr_enabled():
+                portfolio_dict[key] = executor.trade_account.get_portfolio_metrics()
+
+            indicator_df = executor.trade_account.get_trade_indicator().generate_trade_indicators_dataframe()
+            indicator_obj = executor.trade_account.get_trade_indicator()
+            indicator_dict[key] = (indicator_df, indicator_obj)
+
+        return_value.update({"portfolio_dict": portfolio_dict, "indicator_dict": indicator_dict})
--- a/qlib/backtest/decision.py
+++ b/qlib/backtest/decision.py
@@ -135,6 +135,21 @@ class Order:
        else:
            raise NotImplementedError(f"This type of input is not supported")

+    @property
+    def key_by_day(self) -> tuple:
+        """A hashable & unique key to identify this order, under the granularity in day."""
+        return self.stock_id, self.date, self.direction
+
+    @property
+    def key(self) -> tuple:
+        """A hashable & unique key to identify this order."""
+        return self.stock_id, self.start_time, self.end_time, self.direction
+
+    @property
+    def date(self) -> pd.Timestamp:
+        """Date of the order."""
+        return pd.Timestamp(self.start_time.replace(hour=0, minute=0, second=0))
+

 class OrderHelper:
    """
@@ -286,7 +301,7 @@ class TradeRangeByTime(TradeRange):

 class BaseTradeDecision(Generic[DecisionType]):
    """
-    Trade decisions ara made by strategy and executed by executor
+    Trade decisions are made by strategy and executed by executor

    Motivation:
        Here are several typical scenarios for `BaseTradeDecision`
@@ -561,3 +576,21 @@ class TradeDecisionWO(BaseTradeDecision[Order]):
            f"trade_range: {self.trade_range}; "
            f"order_list[{len(self.order_list)}]"
        )
+
+
+class TradeDecisionWithDetails(TradeDecisionWO):
+    """
+    Decision with detail information.
+    Detail information is used to generate execution reports.
+    """
+
+    def __init__(
+        self,
+        order_list: List[Order],
+        strategy: BaseStrategy,
+        trade_range: Optional[Tuple[int, int]] = None,
+        details: Optional[Any] = None,
+    ) -> None:
+        super().__init__(order_list, strategy, trade_range)
+
+        self.details = details
--- a/qlib/backtest/exchange.py
+++ b/qlib/backtest/exchange.py
@@ -18,7 +18,7 @@ import pandas as pd
 from qlib.backtest.position import BasePosition

 from ..config import C
-from ..constant import REG_CN
+from ..constant import REG_CN, REG_TW
 from ..data.data import D
 from ..log import get_module_logger
 from .decision import Order, OrderDir, OrderHelper
@@ -26,6 +26,15 @@ from .high_performance_ds import BaseQuote, NumpyQuote


 class Exchange:
+    # `quote_df` is a pd.DataFrame class that contains basic information for backtesting
+    # After some processing, the data will later be maintained by `quote_cls` object for faster data retrieving.
+    # Some conventions for `quote_df`
+    # - $close is for calculating the total value at end of each day.
+    #   - if $close is None, the stock on that day is regarded as suspended.
+    # - $factor is for rounding to the trading unit;
+    #   - if any $factor is missing when $close exists, trading unit rounding will be disabled
+    quote_df: pd.DataFrame
+
    def __init__(
        self,
        freq: str = "day",
@@ -132,17 +141,17 @@ class Exchange:
        if deal_price is None:
            deal_price = C.deal_price

-        # we have some verbose information here. So logging is enable
+        # we have some verbose information here. So logging is enabled
        self.logger = get_module_logger("online operator")

        # TODO: the quote, trade_dates, codes are not necessary.
        # It is just for performance consideration.
        self.limit_type = self._get_limit_type(limit_threshold)
        if limit_threshold is None:
-            if C.region == REG_CN:
+            if C.region in [REG_CN, REG_TW]:
                self.logger.warning(f"limit_threshold not set. The stocks hit the limit may be bought/sold")
        elif self.limit_type == self.LT_FLT and abs(cast(float, limit_threshold)) > 0.1:
-            if C.region == REG_CN:
+            if C.region in [REG_CN, REG_TW]:
                self.logger.warning(f"limit_threshold may not be set to a reasonable value")

        if isinstance(deal_price, str):
@@ -159,6 +168,7 @@ class Exchange:
        self.codes = codes
        # Necessary fields
        # $close is for calculating the total value at end of each day.
+        # - if $close is None, the stock on that day is regarded as suspended.
        # $factor is for rounding to the trading unit
        # $change is for calculating the limit of the stock

@@ -199,7 +209,7 @@ class Exchange:
            self.end_time,
            freq=self.freq,
            disk_cache=True,
-        ).dropna(subset=["$close"])
+        )
        self.quote_df.columns = self.all_fields

        # check buy_price data and sell_price data
@@ -209,7 +219,7 @@ class Exchange:
                self.logger.warning("{} field data contains nan.".format(pstr))

        # update trade_w_adj_price
-        if self.quote_df["$factor"].isna().any():
+        if (self.quote_df["$factor"].isna() & ~self.quote_df["$close"].isna()).any():
            # The 'factor.day.bin' file not exists, and `factor` field contains `nan`
            # Use adjusted price
            self.trade_w_adj_price = True
@@ -245,9 +255,9 @@ class Exchange:
            assert set(self.extra_quote.columns) == set(self.quote_df.columns) - {"$change"}
            self.quote_df = pd.concat([self.quote_df, self.extra_quote], sort=False, axis=0)

-    LT_TP_EXP = "(exp)"  # Tuple[str, str]
-    LT_FLT = "float"  # float
-    LT_NONE = "none"  # none
+    LT_TP_EXP = "(exp)"  # Tuple[str, str]:  the limitation is calculated by a Qlib expression.
+    LT_FLT = "float"  # float:  the trading limitation is based on `abs($change) < limit_threshold`
+    LT_NONE = "none"  # none:  there is no trading limitation

    def _get_limit_type(self, limit_threshold: Union[tuple, float, None]) -> str:
        """get limit type"""
@@ -261,20 +271,25 @@ class Exchange:
            raise NotImplementedError(f"This type of `limit_threshold` is not supported")

    def _update_limit(self, limit_threshold: Union[Tuple, float, None]) -> None:
+        # $close may contain NaN, the nan indicates that the stock is not tradable at that timestamp
+        suspended = self.quote_df["$close"].isna()
        # check limit_threshold
        limit_type = self._get_limit_type(limit_threshold)
        if limit_type == self.LT_NONE:
-            self.quote_df["limit_buy"] = False
-            self.quote_df["limit_sell"] = False
+            self.quote_df["limit_buy"] = suspended
+            self.quote_df["limit_sell"] = suspended
        elif limit_type == self.LT_TP_EXP:
            # set limit
            limit_threshold = cast(tuple, limit_threshold)
-            self.quote_df["limit_buy"] = self.quote_df[limit_threshold[0]]
-            self.quote_df["limit_sell"] = self.quote_df[limit_threshold[1]]
+            # astype bool is necessary, because quote_df is an expression and could be float
+            self.quote_df["limit_buy"] = self.quote_df[limit_threshold[0]].astype("bool") | suspended
+            self.quote_df["limit_sell"] = self.quote_df[limit_threshold[1]].astype("bool") | suspended
        elif limit_type == self.LT_FLT:
            limit_threshold = cast(float, limit_threshold)
-            self.quote_df["limit_buy"] = self.quote_df["$change"].ge(limit_threshold)
-            self.quote_df["limit_sell"] = self.quote_df["$change"].le(-limit_threshold)  # pylint: disable=E1130
+            self.quote_df["limit_buy"] = self.quote_df["$change"].ge(limit_threshold) | suspended
+            self.quote_df["limit_sell"] = (
+                self.quote_df["$change"].le(-limit_threshold) | suspended
+            )  # pylint: disable=E1130

    @staticmethod
    def _get_vol_limit(volume_threshold: Union[tuple, dict, None]) -> Tuple[Optional[list], Optional[list], set]:
@@ -338,8 +353,18 @@ class Exchange:
            - if direction is None, check if tradable for buying and selling.
            - if direction == Order.BUY, check the if tradable for buying
            - if direction == Order.SELL, check the sell limit for selling.
+
+        Returns
+        -------
+        True: the trading of the stock is limited (maybe hit the highest/lowest price), hence the stock is not tradable
+        False: the trading of the stock is not limited, hence the stock may be tradable
        """
+        # NOTE:
+        # **all** is used when checking limitation.
+        # For example, the stock trading is limited in a day if every minute is limited in a day if every minute is limited.
        if direction is None:
+            # The trading limitation is related to the trading direction
+            # if the direction is not provided, then any limitation from buy or sell will result in trading limitation
            buy_limit = self.quote.get_data(stock_id, start_time, end_time, field="limit_buy", method="all")
            sell_limit = self.quote.get_data(stock_id, start_time, end_time, field="limit_sell", method="all")
            return bool(buy_limit or sell_limit)
@@ -356,10 +381,24 @@ class Exchange:
        start_time: pd.Timestamp,
        end_time: pd.Timestamp,
    ) -> bool:
+        """if stock is suspended(hence not tradable), True will be returned"""
        # is suspended
        if stock_id in self.quote.get_all_stock():
-            return self.quote.get_data(stock_id, start_time, end_time, "$close") is None
+            # suspended stocks are represented by None $close stock
+            # The $close may contain NaN,
+            close = self.quote.get_data(stock_id, start_time, end_time, "$close")
+            if close is None:
+                # if no close record exists
+                return True
+            elif isinstance(close, IndexData):
+                # **any** non-NaN $close represents trading opportunity may exist
+                #  if all returned is nan, then the stock is suspended
+                return cast(bool, cast(IndexData, close).isna().all())
+            else:
+                # it is single value, make sure is not None
+                return np.isnan(close)
        else:
+            # if the stock is not in the stock list, then it is not tradable and regarded as suspended
            return True

    def is_stock_tradable(
@@ -501,8 +540,8 @@ class Exchange:
        direction: OrderDir = OrderDir.BUY,
    ) -> dict:
        """
-        The generate the target position according to the weight and the cash.
-        NOTE: All the cash will assigned to the tradable stock.
+        Generates the target position according to the weight and the cash.
+        NOTE: All the cash will be assigned to the tradable stock.
        Parameter:
        weight_position : dict {stock_id : weight}; allocate cash by weight_position
            among then, weight must be in this range: 0 < weight < 1
@@ -600,7 +639,7 @@ class Exchange:
        random.shuffle(sorted_ids)
        for stock_id in sorted_ids:

-            # Do not generate order for the nontradable stocks
+            # Do not generate order for the non-tradable stocks
            if not self.is_stock_tradable(stock_id=stock_id, start_time=start_time, end_time=end_time):
                continue

--- a/qlib/backtest/executor.py
+++ b/qlib/backtest/executor.py
@@ -114,7 +114,7 @@ class BaseExecutor:
        self.track_data = track_data
        self._trade_exchange = trade_exchange
        self.level_infra = LevelInfrastructure()
-        self.level_infra.reset_infra(common_infra=common_infra)
+        self.level_infra.reset_infra(common_infra=common_infra, executor=self)
        self._settle_type = settle_type
        self.reset(start_time=start_time, end_time=end_time, common_infra=common_infra)
        if common_infra is None:
@@ -134,6 +134,8 @@ class BaseExecutor:
        else:
            self.common_infra.update(common_infra)

+        self.level_infra.reset_infra(common_infra=self.common_infra)
+
        if common_infra.has("trade_account"):
            # NOTE: there is a trick in the code.
            # shallow copy is used instead of deepcopy.
@@ -256,6 +258,7 @@ class BaseExecutor:
        object
            trade decision
        """
+
        if self.track_data:
            yield trade_decision

@@ -296,6 +299,7 @@ class BaseExecutor:

        if return_value is not None:
            return_value.update({"execute_result": res})
+
        return res

    def get_all_executors(self) -> List[BaseExecutor]:
@@ -396,7 +400,7 @@ class NestedExecutor(BaseExecutor):
            trade_decision = updated_trade_decision
            # NEW UPDATE
            # create a hook for inner strategy to update outer decision
-            self.inner_strategy.alter_outer_trade_decision(trade_decision)
+            trade_decision = self.inner_strategy.alter_outer_trade_decision(trade_decision)
        return trade_decision

    def _collect_data(
@@ -473,6 +477,9 @@ class NestedExecutor(BaseExecutor):
                # do nothing and just step forward
                sub_cal.step()

+        # Let inner strategy know that the outer level execution is done.
+        self.inner_strategy.post_upper_level_exe_step()
+
        return execute_result, {"inner_order_indicators": inner_order_indicators, "decision_list": decision_list}

    def post_inner_exe_step(self, inner_exe_res: List[object]) -> None:
@@ -580,20 +587,18 @@ class SimulatorExecutor(BaseExecutor):
            raise NotImplementedError(f"This type of input is not supported")
        return order_it

-    def _update_dealt_order_amount(self, order: Order) -> None:
-        """update date and dealt order amount in the day."""
-
-        now_deal_day = self.trade_calendar.get_step_time()[0].floor(freq="D")
-        if self.deal_day is None or now_deal_day > self.deal_day:
-            self.dealt_order_amount = defaultdict(float)
-            self.deal_day = now_deal_day
-        self.dealt_order_amount[order.stock_id] += order.deal_amount
-
    def _collect_data(self, trade_decision: BaseTradeDecision, level: int = 0) -> Tuple[List[object], dict]:
        trade_start_time, _ = self.trade_calendar.get_step_time()
        execute_result: list = []

        for order in self._get_order_iterator(trade_decision):
+            # Each time we move into a new date, clear `self.dealt_order_amount` since it only maintains intraday
+            # information.
+            now_deal_day = self.trade_calendar.get_step_time()[0].floor(freq="D")
+            if self.deal_day is None or now_deal_day > self.deal_day:
+                self.dealt_order_amount = defaultdict(float)
+                self.deal_day = now_deal_day
+
            # execute the order.
            # NOTE: The trade_account will be changed in this function
            trade_val, trade_cost, trade_price = self.trade_exchange.deal_order(
@@ -602,7 +607,9 @@ class SimulatorExecutor(BaseExecutor):
                dealt_order_amount=self.dealt_order_amount,
            )
            execute_result.append((order, trade_val, trade_cost, trade_price))
-            self._update_dealt_order_amount(order)
+
+            self.dealt_order_amount[order.stock_id] += order.deal_amount
+
            if self.verbose:
                print(
                    "[I {:%Y-%m-%d %H:%M:%S}]: {} {}, price {:.2f}, amount {}, deal_amount {}, factor {}, "
--- a/qlib/backtest/utils.py
+++ b/qlib/backtest/utils.py
@@ -3,9 +3,8 @@

 from __future__ import annotations

-import bisect
 from abc import abstractmethod
-from typing import TYPE_CHECKING, Any, Set, Tuple, Union
+from typing import Any, Set, Tuple, TYPE_CHECKING, Union

 import numpy as np

@@ -184,8 +183,8 @@ class TradeCalendarManager:
        Tuple[int, int]:
            the index of the range.  **the left and right are closed**
        """
-        left = bisect.bisect_right(list(self._calendar), start_time) - 1
-        right = bisect.bisect_right(list(self._calendar), end_time) - 1
+        left = int(np.searchsorted(self._calendar, start_time, side="right") - 1)
+        right = int(np.searchsorted(self._calendar, end_time, side="right") - 1)
        left -= self.start_index
        right -= self.start_index

@@ -248,7 +247,7 @@ class LevelInfrastructure(BaseInfrastructure):
        sub_level_infra:
        - **NOTE**: this will only work after _init_sub_trading !!!
        """
-        return {"trade_calendar", "sub_level_infra", "common_infra"}
+        return {"trade_calendar", "sub_level_infra", "common_infra", "executor"}

    def reset_cal(
        self,
--- a/qlib/config.py
+++ b/qlib/config.py
@@ -75,7 +75,8 @@ class Config:
    def set_conf_from_C(self, config_c):
        self.update(**config_c.__dict__["_config"])

-    def register_from_C(self, config, skip_register=True):
+    @staticmethod
+    def register_from_C(config, skip_register=True):
        from .utils import set_log_with_config  # pylint: disable=C0415

        if C.registered and skip_register:
@@ -172,6 +173,9 @@ _default_config = {
            }
        },
        "loggers": {"qlib": {"level": logging.DEBUG, "handlers": ["console"]}},
+        # To let qlib work with other packages, we shouldn't disable existing loggers.
+        # Note that this param is default to True according to the documentation of logging.
+        "disable_existing_loggers": False,
    },
    # Default config for experiment manager
    "exp_manager": {
@@ -199,7 +203,7 @@ _default_config = {
        "task_url": "mongodb://localhost:27017/",
        "task_db_name": "default_task_db",
    },
-    # Shift minute for highfreq minite data, used in backtest
+    # Shift minute for highfreq minute data, used in backtest
    # if min_data_shift == 0, use default market time [9:30, 11:29, 1:00, 2:59]
    # if min_data_shift != 0, use shifted market time [9:30, 11:29, 1:00, 2:59] - shift*minute
    "min_data_shift": 0,
@@ -408,8 +412,7 @@ class QlibConfig(Config):
        if _logging_config:
            set_log_with_config(_logging_config)

-        # FIXME: this logger ignored the level in config
-        logger = get_module_logger("Initialization", level=logging.INFO)
+        logger = get_module_logger("Initialization", kwargs.get("logging_level", self.logging_level))
        logger.info(f"default_conf: {default_conf}.")

        self.set_mode(default_conf)
--- a/qlib/constant.py
+++ b/qlib/constant.py
@@ -2,6 +2,11 @@
 # Licensed under the MIT License.

 # REGION CONST
+from typing import TypeVar
+
+import numpy as np
+import pandas as pd
+
 REG_CN = "cn"
 REG_US = "us"
 REG_TW = "tw"
@@ -10,4 +15,8 @@ REG_TW = "tw"
 EPS = 1e-12

 # Infinity in integer
-INF = 10**18
+INF = int(1e18)
+ONE_DAY = pd.Timedelta("1day")
+ONE_MIN = pd.Timedelta("1min")
+EPS_T = pd.Timedelta("1s")  # use 1 second to exclude the right interval point
+float_or_ndarray = TypeVar("float_or_ndarray", float, np.ndarray)
--- a/qlib/contrib/data/handler.py
+++ b/qlib/contrib/data/handler.py
@@ -57,7 +57,7 @@ class Alpha360(DataHandlerLP):
        fit_end_time=None,
        filter_pipe=None,
        inst_processor=None,
-        **kwargs,
+        **kwargs
    ):
        infer_processors = check_transform_proc(infer_processors, fit_start_time, fit_end_time)
        learn_processors = check_transform_proc(learn_processors, fit_start_time, fit_end_time)
@@ -67,7 +67,7 @@ class Alpha360(DataHandlerLP):
            "kwargs": {
                "config": {
                    "feature": self.get_feature_config(),
-                    "label": kwargs.get("label", self.get_label_config()),
+                    "label": kwargs.pop("label", self.get_label_config()),
                },
                "filter_pipe": filter_pipe,
                "freq": freq,
@@ -82,12 +82,14 @@ class Alpha360(DataHandlerLP):
            data_loader=data_loader,
            learn_processors=learn_processors,
            infer_processors=infer_processors,
+            **kwargs
        )

    def get_label_config(self):
-        return (["Ref($close, -2)/Ref($close, -1) - 1"], ["LABEL0"])
+        return ["Ref($close, -2)/Ref($close, -1) - 1"], ["LABEL0"]

-    def get_feature_config(self):
+    @staticmethod
+    def get_feature_config():
        # NOTE:
        # Alpha360 tries to provide a dataset with original price data
        # the original price data includes the prices and volume in the last 60 days.
@@ -99,33 +101,33 @@ class Alpha360(DataHandlerLP):
        names = []

        for i in range(59, 0, -1):
-            fields += ["Ref($close, %d)/$close" % (i)]
-            names += ["CLOSE%d" % (i)]
+            fields += ["Ref($close, %d)/$close" % i]
+            names += ["CLOSE%d" % i]
        fields += ["$close/$close"]
        names += ["CLOSE0"]
        for i in range(59, 0, -1):
-            fields += ["Ref($open, %d)/$close" % (i)]
-            names += ["OPEN%d" % (i)]
+            fields += ["Ref($open, %d)/$close" % i]
+            names += ["OPEN%d" % i]
        fields += ["$open/$close"]
        names += ["OPEN0"]
        for i in range(59, 0, -1):
-            fields += ["Ref($high, %d)/$close" % (i)]
-            names += ["HIGH%d" % (i)]
+            fields += ["Ref($high, %d)/$close" % i]
+            names += ["HIGH%d" % i]
        fields += ["$high/$close"]
        names += ["HIGH0"]
        for i in range(59, 0, -1):
-            fields += ["Ref($low, %d)/$close" % (i)]
-            names += ["LOW%d" % (i)]
+            fields += ["Ref($low, %d)/$close" % i]
+            names += ["LOW%d" % i]
        fields += ["$low/$close"]
        names += ["LOW0"]
        for i in range(59, 0, -1):
-            fields += ["Ref($vwap, %d)/$close" % (i)]
-            names += ["VWAP%d" % (i)]
+            fields += ["Ref($vwap, %d)/$close" % i]
+            names += ["VWAP%d" % i]
        fields += ["$vwap/$close"]
        names += ["VWAP0"]
        for i in range(59, 0, -1):
-            fields += ["Ref($volume, %d)/($volume+1e-12)" % (i)]
-            names += ["VOLUME%d" % (i)]
+            fields += ["Ref($volume, %d)/($volume+1e-12)" % i]
+            names += ["VOLUME%d" % i]
        fields += ["$volume/($volume+1e-12)"]
        names += ["VOLUME0"]

@@ -134,7 +136,7 @@ class Alpha360(DataHandlerLP):

 class Alpha360vwap(Alpha360):
    def get_label_config(self):
-        return (["Ref($vwap, -2)/Ref($vwap, -1) - 1"], ["LABEL0"])
+        return ["Ref($vwap, -2)/Ref($vwap, -1) - 1"], ["LABEL0"]


 class Alpha158(DataHandlerLP):
@@ -151,7 +153,7 @@ class Alpha158(DataHandlerLP):
        process_type=DataHandlerLP.PTYPE_A,
        filter_pipe=None,
        inst_processor=None,
-        **kwargs,
+        **kwargs
    ):
        infer_processors = check_transform_proc(infer_processors, fit_start_time, fit_end_time)
        learn_processors = check_transform_proc(learn_processors, fit_start_time, fit_end_time)
@@ -161,7 +163,7 @@ class Alpha158(DataHandlerLP):
            "kwargs": {
                "config": {
                    "feature": self.get_feature_config(),
-                    "label": kwargs.get("label", self.get_label_config()),
+                    "label": kwargs.pop("label", self.get_label_config()),
                },
                "filter_pipe": filter_pipe,
                "freq": freq,
@@ -176,6 +178,7 @@ class Alpha158(DataHandlerLP):
            infer_processors=infer_processors,
            learn_processors=learn_processors,
            process_type=process_type,
+            **kwargs
        )

    def get_feature_config(self):
@@ -190,7 +193,7 @@ class Alpha158(DataHandlerLP):
        return self.parse_config_to_fields(conf)

    def get_label_config(self):
-        return (["Ref($close, -2)/Ref($close, -1) - 1"], ["LABEL0"])
+        return ["Ref($close, -2)/Ref($close, -1) - 1"], ["LABEL0"]

    @staticmethod
    def parse_config_to_fields(config):
@@ -426,4 +429,4 @@ class Alpha158(DataHandlerLP):

 class Alpha158vwap(Alpha158):
    def get_label_config(self):
-        return (["Ref($vwap, -2)/Ref($vwap, -1) - 1"], ["LABEL0"])
+        return ["Ref($vwap, -2)/Ref($vwap, -1) - 1"], ["LABEL0"]
--- a/qlib/contrib/data/highfreq_handler.py
+++ b/qlib/contrib/data/highfreq_handler.py
@@ -1,5 +1,7 @@
 from qlib.data.dataset.handler import DataHandler, DataHandlerLP

+from .handler import check_transform_proc
+
 EPSILON = 1e-4


@@ -15,20 +17,9 @@ class HighFreqHandler(DataHandlerLP):
        fit_end_time=None,
        drop_raw=True,
    ):
-        def check_transform_proc(proc_l):
-            new_l = []
-            for p in proc_l:
-                p["kwargs"].update(
-                    {
-                        "fit_start_time": fit_start_time,
-                        "fit_end_time": fit_end_time,
-                    }
-                )
-                new_l.append(p)
-            return new_l

-        infer_processors = check_transform_proc(infer_processors)
-        learn_processors = check_transform_proc(learn_processors)
+        infer_processors = check_transform_proc(infer_processors, fit_start_time, fit_end_time)
+        learn_processors = check_transform_proc(learn_processors, fit_start_time, fit_end_time)

        data_loader = {
            "class": "QlibDataLoader",
@@ -110,6 +101,100 @@ class HighFreqHandler(DataHandlerLP):
        return fields, names


+class HighFreqGeneralHandler(DataHandlerLP):
+    def __init__(
+        self,
+        instruments="csi300",
+        start_time=None,
+        end_time=None,
+        infer_processors=[],
+        learn_processors=[],
+        fit_start_time=None,
+        fit_end_time=None,
+        drop_raw=True,
+        day_length=240,
+        freq="1min",
+        columns=["$open", "$high", "$low", "$close", "$vwap"],
+    ):
+        self.day_length = day_length
+        self.columns = columns
+
+        infer_processors = check_transform_proc(infer_processors, fit_start_time, fit_end_time)
+        learn_processors = check_transform_proc(learn_processors, fit_start_time, fit_end_time)
+
+        data_loader = {
+            "class": "QlibDataLoader",
+            "kwargs": {
+                "config": self.get_feature_config(),
+                "swap_level": False,
+                "freq": freq,
+            },
+        }
+        super().__init__(
+            instruments=instruments,
+            start_time=start_time,
+            end_time=end_time,
+            data_loader=data_loader,
+            infer_processors=infer_processors,
+            learn_processors=learn_processors,
+            drop_raw=drop_raw,
+        )
+
+    def get_feature_config(self):
+        fields = []
+        names = []
+
+        template_if = "If(IsNull({1}), {0}, {1})"
+        template_paused = f"Cut({{0}}, {self.day_length * 2}, None)"
+
+        def get_normalized_price_feature(price_field, shift=0):
+            # norm with the close price of 237th minute of yesterday.
+            if shift == 0:
+                template_norm = f"{{0}}/DayLast(Ref({{1}}, {self.day_length * 2}))"
+            else:
+                template_norm = f"Ref({{0}}, " + str(shift) + f")/DayLast(Ref({{1}}, {self.day_length}))"
+
+            template_fillnan = "FFillNan({0})"
+            # calculate -> ffill -> remove paused
+            feature_ops = template_paused.format(
+                template_fillnan.format(
+                    template_norm.format(template_if.format("$close", price_field), template_fillnan.format("$close"))
+                )
+            )
+            return feature_ops
+
+        for column_name in self.columns:
+            fields.append(get_normalized_price_feature(column_name, 0))
+            names.append(column_name)
+
+        for column_name in self.columns:
+            fields.append(get_normalized_price_feature(column_name, self.day_length))
+            names.append(column_name + "_1")
+
+        # calculate and fill nan with 0
+        fields += [
+            template_paused.format(
+                "If(IsNull({0}), 0, {0})".format(
+                    f"{{0}}/Ref(DayLast(Mean({{0}}, {self.day_length * 30})), {self.day_length})".format("$volume")
+                )
+            )
+        ]
+        names += ["$volume"]
+
+        fields += [
+            template_paused.format(
+                "If(IsNull({0}), 0, {0})".format(
+                    f"Ref({{0}}, {self.day_length})/Ref(DayLast(Mean({{0}}, {self.day_length * 30})), {self.day_length})".format(
+                        "$volume"
+                    )
+                )
+            )
+        ]
+        names += ["$volume_1"]
+
+        return fields, names
+
+
 class HighFreqBacktestHandler(DataHandler):
    def __init__(
        self,
@@ -163,6 +248,59 @@ class HighFreqBacktestHandler(DataHandler):
        return fields, names


+class HighFreqGeneralBacktestHandler(DataHandler):
+    def __init__(
+        self,
+        instruments="csi300",
+        start_time=None,
+        end_time=None,
+        day_length=240,
+        freq="1min",
+        columns=["$close", "$vwap", "$volume"],
+    ):
+        self.day_length = day_length
+        self.columns = set(columns)
+        data_loader = {
+            "class": "QlibDataLoader",
+            "kwargs": {
+                "config": self.get_feature_config(),
+                "swap_level": False,
+                "freq": freq,
+            },
+        }
+        super().__init__(
+            instruments=instruments,
+            start_time=start_time,
+            end_time=end_time,
+            data_loader=data_loader,
+        )
+
+    def get_feature_config(self):
+        fields = []
+        names = []
+
+        if "$close" in self.columns:
+            template_paused = f"Cut({{0}}, {self.day_length * 2}, None)"
+            template_fillnan = "FFillNan({0})"
+            template_if = "If(IsNull({1}), {0}, {1})"
+            fields += [
+                template_paused.format(template_fillnan.format("$close")),
+            ]
+            names += ["$close0"]
+
+        if "$vwap" in self.columns:
+            fields += [
+                template_paused.format(template_if.format(template_fillnan.format("$close"), "$vwap")),
+            ]
+            names += ["$vwap0"]
+
+        if "$volume" in self.columns:
+            fields += [template_paused.format("If(IsNull({0}), 0, {0})".format("$volume"))]
+            names += ["$volume0"]
+
+        return fields, names
+
+
 class HighFreqOrderHandler(DataHandlerLP):
    def __init__(
        self,
@@ -175,20 +313,9 @@ class HighFreqOrderHandler(DataHandlerLP):
        fit_end_time=None,
        drop_raw=True,
    ):
-        def check_transform_proc(proc_l):
-            new_l = []
-            for p in proc_l:
-                p["kwargs"].update(
-                    {
-                        "fit_start_time": fit_start_time,
-                        "fit_end_time": fit_end_time,
-                    }
-                )
-                new_l.append(p)
-            return new_l

-        infer_processors = check_transform_proc(infer_processors)
-        learn_processors = check_transform_proc(learn_processors)
+        infer_processors = check_transform_proc(infer_processors, fit_start_time, fit_end_time)
+        learn_processors = check_transform_proc(learn_processors, fit_start_time, fit_end_time)

        data_loader = {
            "class": "QlibDataLoader",
@@ -356,7 +483,6 @@ class HighFreqBacktestOrderHandler(DataHandler):

        template_if = "If(IsNull({1}), {0}, {1})"
        template_paused = "Select(Gt($hx_paused_num, 1.001), {0})"
-        # template_paused = "{0}"
        template_fillnan = "FFillNan({0})"
        fields += [
            template_fillnan.format(template_paused.format("$close")),
--- a/qlib/contrib/data/highfreq_provider.py
+++ b/qlib/contrib/data/highfreq_provider.py
@@ -4,6 +4,7 @@ import datetime
 from typing import Optional

 import qlib
+from qlib import get_module_logger
 from qlib.data import D
 from qlib.config import REG_CN
 from qlib.utils import init_instance_by_config
@@ -12,7 +13,6 @@ from qlib.data.data import Cal
 from qlib.contrib.ops.high_freq import get_calendar_day, DayLast, FFillNan, BFillNan, Date, Select, IsNull, IsInf, Cut
 import pickle as pkl
 from joblib import Parallel, delayed
-from utilsd.logging import print_log


 class HighFreqProvider:
@@ -28,6 +28,7 @@ class HighFreqProvider:
        feature_conf: dict,
        label_conf: Optional[dict] = None,
        backtest_conf: dict = None,
+        freq: str = "1min",
        **kwargs,
    ) -> None:
        self.start_time = start_time
@@ -41,6 +42,8 @@ class HighFreqProvider:
        self.label_conf = label_conf
        self.backtest_conf = backtest_conf
        self.qlib_conf = qlib_conf
+        self.logger = get_module_logger("HighFreqProvider")
+        self.freq = freq

    def get_pre_datasets(self):
        """Generate the training, validation and test datasets for prediction
@@ -115,8 +118,8 @@ class HighFreqProvider:
        # This code used the copy-on-write feature of Linux
        # to avoid calculating the calendar multiple times in the subprocess.
        # This code may accelerate, but may be not useful on Windows and Mac Os
-        Cal.calendar(freq="1min")
-        get_calendar_day(freq="1min")
+        Cal.calendar(freq=self.freq)
+        get_calendar_day(freq=self.freq)

    def _gen_dataframe(self, config, datasets=["train", "valid", "test"]):
        try:
@@ -125,7 +128,7 @@ class HighFreqProvider:
            raise ValueError("Must specify the path to save the dataset.") from e
        if os.path.isfile(path):
            start = time.time()
-            print_log("Dataset exists, load from disk.", __name__)
+            self.logger.info("Dataset exists, load from disk.", __name__)

            # res = dataset.prepare(['train', 'valid', 'test'])
            with open(path, "rb") as f:
@@ -134,11 +137,11 @@ class HighFreqProvider:
                res = [data[i] for i in datasets]
            else:
                res = data.prepare(datasets)
-            print_log(f"Data loaded, time cost: {time.time() - start:.2f}", __name__)
+            self.logger.info(f"Data loaded, time cost: {time.time() - start:.2f}", __name__)
        else:
            if not os.path.exists(os.path.dirname(path)):
                os.makedirs(os.path.dirname(path))
-            print_log("Generating dataset", __name__)
+            self.logger.info("Generating dataset", __name__)
            start_time = time.time()
            self._prepare_calender_cache()
            dataset = init_instance_by_config(config)
@@ -157,7 +160,7 @@ class HighFreqProvider:
            with open(path[:-4] + "test.pkl", "wb") as f:
                pkl.dump(testset, f)
            res = [data[i] for i in datasets]
-            print_log(f"Data generated, time cost: {(time.time() - start_time):.2f}", __name__)
+            self.logger.info(f"Data generated, time cost: {(time.time() - start_time):.2f}", __name__)
        return res

    def _gen_data(self, config, datasets=["train", "valid", "test"]):
@@ -167,7 +170,7 @@ class HighFreqProvider:
            raise ValueError("Must specify the path to save the dataset.") from e
        if os.path.isfile(path):
            start = time.time()
-            print_log("Dataset exists, load from disk.", __name__)
+            self.logger.info("Dataset exists, load from disk.", __name__)

            # res = dataset.prepare(['train', 'valid', 'test'])
            with open(path, "rb") as f:
@@ -176,18 +179,18 @@ class HighFreqProvider:
                res = [data[i] for i in datasets]
            else:
                res = data.prepare(datasets)
-            print_log(f"Data loaded, time cost: {time.time() - start:.2f}", __name__)
+            self.logger.info(f"Data loaded, time cost: {time.time() - start:.2f}", __name__)
        else:
            if not os.path.exists(os.path.dirname(path)):
                os.makedirs(os.path.dirname(path))
-            print_log("Generating dataset", __name__)
+            self.logger.info("Generating dataset", __name__)
            start_time = time.time()
            self._prepare_calender_cache()
            dataset = init_instance_by_config(config)
            dataset.config(dump_all=True, recursive=True)
            dataset.to_pickle(path)
            res = dataset.prepare(datasets)
-            print_log(f"Data generated, time cost: {(time.time() - start_time):.2f}", __name__)
+            self.logger.info(f"Data generated, time cost: {(time.time() - start_time):.2f}", __name__)
        return res

    def _gen_dataset(self, config):
@@ -197,21 +200,21 @@ class HighFreqProvider:
            raise ValueError("Must specify the path to save the dataset.") from e
        if os.path.isfile(path):
            start = time.time()
-            print_log("Dataset exists, load from disk.", __name__)
+            self.logger.info("Dataset exists, load from disk.", __name__)

            with open(path, "rb") as f:
                dataset = pkl.load(f)
-            print_log(f"Data loaded, time cost: {time.time() - start:.2f}", __name__)
+            self.logger.info(f"Data loaded, time cost: {time.time() - start:.2f}", __name__)
        else:
            start = time.time()
            if not os.path.exists(os.path.dirname(path)):
                os.makedirs(os.path.dirname(path))
-            print_log("Generating dataset", __name__)
+            self.logger.info("Generating dataset", __name__)
            self._prepare_calender_cache()
            dataset = init_instance_by_config(config)
-            print_log(f"Dataset init, time cost: {time.time() - start:.2f}", __name__)
+            self.logger.info(f"Dataset init, time cost: {time.time() - start:.2f}", __name__)
            dataset.prepare(["train", "valid", "test"])
-            print_log(f"Dataset prepared, time cost: {time.time() - start:.2f}", __name__)
+            self.logger.info(f"Dataset prepared, time cost: {time.time() - start:.2f}", __name__)
            dataset.config(dump_all=True, recursive=True)
            dataset.to_pickle(path)
        return dataset
@@ -224,22 +227,22 @@ class HighFreqProvider:

        if os.path.isfile(path + "tmp_dataset.pkl"):
            start = time.time()
-            print_log("Dataset exists, load from disk.", __name__)
+            self.logger.info("Dataset exists, load from disk.", __name__)
        else:
            start = time.time()
            if not os.path.exists(os.path.dirname(path)):
                os.makedirs(os.path.dirname(path))
-            print_log("Generating dataset", __name__)
+            self.logger.info("Generating dataset", __name__)
            self._prepare_calender_cache()
            dataset = init_instance_by_config(config)
-            print_log(f"Dataset init, time cost: {time.time() - start:.2f}", __name__)
+            self.logger.info(f"Dataset init, time cost: {time.time() - start:.2f}", __name__)
            dataset.config(dump_all=False, recursive=True)
            dataset.to_pickle(path + "tmp_dataset.pkl")

        with open(path + "tmp_dataset.pkl", "rb") as f:
            new_dataset = pkl.load(f)

-        time_list = D.calendar(start_time=self.start_time, end_time=self.end_time, freq="1min")[::240]
+        time_list = D.calendar(start_time=self.start_time, end_time=self.end_time, freq=self.freq)[::240]

        def generate_dataset(times):
            if os.path.isfile(path + times.strftime("%Y-%m-%d") + ".pkl"):
@@ -265,15 +268,15 @@ class HighFreqProvider:

        if os.path.isfile(path + "tmp_dataset.pkl"):
            start = time.time()
-            print_log("Dataset exists, load from disk.", __name__)
+            self.logger.info("Dataset exists, load from disk.", __name__)
        else:
            start = time.time()
            if not os.path.exists(os.path.dirname(path)):
                os.makedirs(os.path.dirname(path))
-            print_log("Generating dataset", __name__)
+            self.logger.info("Generating dataset", __name__)
            self._prepare_calender_cache()
            dataset = init_instance_by_config(config)
-            print_log(f"Dataset init, time cost: {time.time() - start:.2f}", __name__)
+            self.logger.info(f"Dataset init, time cost: {time.time() - start:.2f}", __name__)
            dataset.config(dump_all=False, recursive=True)
            dataset.to_pickle(path + "tmp_dataset.pkl")

@@ -282,7 +285,7 @@ class HighFreqProvider:

        instruments = D.instruments(market="all")
        stock_list = D.list_instruments(
-            instruments=instruments, start_time=self.start_time, end_time=self.end_time, freq="1min", as_list=True
+            instruments=instruments, start_time=self.start_time, end_time=self.end_time, freq=self.freq, as_list=True
        )

        def generate_dataset(stock):
--- a/qlib/contrib/evaluate.py
+++ b/qlib/contrib/evaluate.py
@@ -96,9 +96,11 @@ def indicator_analysis(df, method="mean"):
        index: Index(datetime)
    method : str, optional
        statistics method of pa/ffr, by default "mean"
+
        - if method is 'mean', count the mean statistical value of each trade indicator
        - if method is 'amount_weighted', count the deal_amount weighted mean statistical value of each trade indicator
        - if method is 'value_weighted', count the value weighted mean statistical value of each trade indicator
+
        Note: statistics method of pos is always "mean"

    Returns
@@ -154,6 +156,7 @@ def backtest_daily(
        E.g.

        .. code-block:: python
+
            # dict
            strategy = {
                "class": "TopkDropoutStrategy",
@@ -180,16 +183,19 @@ def backtest_daily(
            # 3) specify module path with class name
            #     - "a.b.c.ClassName" getattr(<a.b.c.module>, "ClassName")() will be used.

-
    executor : Union[str, dict, BaseExecutor]
        for initializing the outermost executor.
    benchmark: str
        the benchmark for reporting.
    account : Union[float, int, Position]
        information for describing how to creating the account
+
        For `float` or `int`:
+
            Using Account with only initial cash
+
        For `Position`:
+
            Using Account with a Position
    exchange_kwargs : dict
        the kwargs for initializing Exchange
@@ -283,8 +289,8 @@ def long_short_backtest(
                       NOTE: This will be faster with offline qlib.
    :return:            The result of backtest, it is represented by a dict.
                        { "long": long_returns(excess),
-                          "short": short_returns(excess),
-                          "long_short": long_short_returns}
+                        "short": short_returns(excess),
+                        "long_short": long_short_returns}
    """
    if get_level_index(pred, level="datetime") == 1:
        pred = pred.swaplevel().sort_index()
--- a/qlib/contrib/model/init.py
+++ b/qlib/contrib/model/init.py
@@ -4,7 +4,7 @@ try:
    from .catboost_model import CatBoostModel
 except ModuleNotFoundError:
    CatBoostModel = None
-    print("Please install necessary libs for CatBoostModel.")
+    print("ModuleNotFoundError. CatBoostModel are skipped. (optional: maybe installing CatBoostModel can fix it.)")
 try:
    from .double_ensemble import DEnsembleModel
    from .gbdt import LGBModel
--- a/qlib/contrib/model/double_ensemble.py
+++ b/qlib/contrib/model/double_ensemble.py
@@ -30,6 +30,7 @@ class DEnsembleModel(Model, FeatureInt):
        sample_ratios=None,
        sub_weights=None,
        epochs=100,
+        early_stopping_rounds=None,
        **kwargs
    ):
        self.base_model = base_model  # "gbm" or "mlp", specifically, we use lgbm for "gbm"
@@ -59,6 +60,7 @@ class DEnsembleModel(Model, FeatureInt):
        self.params = {"objective": loss}
        self.params.update(kwargs)
        self.loss = loss
+        self.early_stopping_rounds = early_stopping_rounds

    def fit(self, dataset: DatasetH):
        df_train, df_valid = dataset.prepare(
@@ -103,14 +105,19 @@ class DEnsembleModel(Model, FeatureInt):
    def train_submodel(self, df_train, df_valid, weights, features):
        dtrain, dvalid = self._prepare_data_gbm(df_train, df_valid, weights, features)
        evals_result = dict()
+
+        callbacks = [lgb.log_evaluation(20), lgb.record_evaluation(evals_result)]
+        if self.early_stopping_rounds:
+            callbacks.append(lgb.early_stopping(self.early_stopping_rounds))
+            self.logger.info("Training with early_stopping...")
+
        model = lgb.train(
            self.params,
            dtrain,
            num_boost_round=self.epochs,
            valid_sets=[dtrain, dvalid],
            valid_names=["train", "valid"],
-            verbose_eval=20,
-            evals_result=evals_result,
+            callbacks=callbacks,
        )
        evals_result["train"] = list(evals_result["train"].values())[0]
        evals_result["valid"] = list(evals_result["valid"].values())[0]
--- a/qlib/contrib/model/pytorch_adarnn.py
+++ b/qlib/contrib/model/pytorch_adarnn.py
@@ -28,7 +28,7 @@ class ADARNN(Model):
    d_feat : int
        input dimension for each time step
    metric: str
-        the evaluate metric used in early stop
+        the evaluation metric used in early stop
    optimizer : str
        optimizer name
    GPU : str
@@ -56,7 +56,7 @@ class ADARNN(Model):
        n_splits=2,
        GPU=0,
        seed=None,
-        **kwargs
+        **_
    ):
        # Set logger.
        self.logger = get_module_logger("ADARNN")
@@ -81,7 +81,7 @@ class ADARNN(Model):
        self.optimizer = optimizer.lower()
        self.loss = loss
        self.n_splits = n_splits
-        self.device = torch.device("cuda:%d" % (GPU) if torch.cuda.is_available() and GPU >= 0 else "cpu")
+        self.device = torch.device("cuda:%d" % GPU if torch.cuda.is_available() and GPU >= 0 else "cpu")
        self.seed = seed

        self.logger.info(
@@ -213,7 +213,8 @@ class ADARNN(Model):
            weight_mat = self.transform_type(out_weight_list)
            return weight_mat, None

-    def calc_all_metrics(self, pred):
+    @staticmethod
+    def calc_all_metrics(pred):
        """pred is a pandas dataframe that has two attributes: score (pred) and label (real)"""
        res = {}
        ic = pred.groupby(level="datetime").apply(lambda x: x.label.corr(x.score))
@@ -259,8 +260,6 @@ class ADARNN(Model):

        save_path = get_or_create_path(save_path)
        stop_steps = 0
-        best_score = -np.inf
-        best_epoch = 0
        evals_result["train"] = []
        evals_result["valid"] = []

@@ -400,7 +399,7 @@ class AdaRNN(nn.Module):
        self.model_type = model_type
        self.trans_loss = trans_loss
        self.len_seq = len_seq
-        self.device = torch.device("cuda:%d" % (GPU) if torch.cuda.is_available() and GPU >= 0 else "cpu")
+        self.device = torch.device("cuda:%d" % GPU if torch.cuda.is_available() and GPU >= 0 else "cpu")
        in_size = self.n_input

        features = nn.ModuleList()
@@ -499,7 +498,8 @@ class AdaRNN(nn.Module):
        res = self.softmax(weight).squeeze()
        return res

-    def get_features(self, output_list):
+    @staticmethod
+    def get_features(output_list):
        fea_list_src, fea_list_tar = [], []
        for fea in output_list:
            fea_list_src.append(fea[0 : fea.size(0) // 2])
@@ -561,7 +561,7 @@ class TransferLoss:
        """
        self.loss_type = loss_type
        self.input_dim = input_dim
-        self.device = torch.device("cuda:%d" % (GPU) if torch.cuda.is_available() and GPU >= 0 else "cpu")
+        self.device = torch.device("cuda:%d" % GPU if torch.cuda.is_available() and GPU >= 0 else "cpu")

    def compute(self, X, Y):
        """Compute adaptation loss
@@ -676,7 +676,8 @@ class MMD_loss(nn.Module):
        self.fix_sigma = None
        self.kernel_type = kernel_type

-    def guassian_kernel(self, source, target, kernel_mul=2.0, kernel_num=5, fix_sigma=None):
+    @staticmethod
+    def guassian_kernel(source, target, kernel_mul=2.0, kernel_num=5, fix_sigma=None):
        n_samples = int(source.size()[0]) + int(target.size()[0])
        total = torch.cat([source, target], dim=0)
        total0 = total.unsqueeze(0).expand(int(total.size(0)), int(total.size(0)), int(total.size(1)))
@@ -691,7 +692,8 @@ class MMD_loss(nn.Module):
        kernel_val = [torch.exp(-L2_distance / bandwidth_temp) for bandwidth_temp in bandwidth_list]
        return sum(kernel_val)

-    def linear_mmd(self, X, Y):
+    @staticmethod
+    def linear_mmd(X, Y):
        delta = X.mean(axis=0) - Y.mean(axis=0)
        loss = delta.dot(delta.T)
        return loss
--- a/qlib/contrib/model/pytorch_add.py
+++ b/qlib/contrib/model/pytorch_add.py
@@ -36,7 +36,7 @@ class ADD(Model):
     d_feat : int
         input dimensions for each time step
     metric : str
-         the evaluate metric used in early stop
+         the evaluation metric used in early stop
     optimizer : str
         optimizer name
     GPU : int
--- a/qlib/contrib/model/pytorch_alstm.py
+++ b/qlib/contrib/model/pytorch_alstm.py
@@ -30,7 +30,7 @@ class ALSTM(Model):
    d_feat : int
        input dimension for each time step
    metric: str
-        the evaluate metric used in early stop
+        the evaluation metric used in early stop
    optimizer : str
        optimizer name
    GPU : int
--- a/qlib/contrib/model/pytorch_alstm_ts.py
+++ b/qlib/contrib/model/pytorch_alstm_ts.py
@@ -33,7 +33,7 @@ class ALSTM(Model):
    d_feat : int
        input dimension for each time step
    metric: str
-        the evaluate metric used in early stop
+        the evaluation metric used in early stop
    optimizer : str
        optimizer name
    GPU : int
--- a/qlib/contrib/model/pytorch_gats.py
+++ b/qlib/contrib/model/pytorch_gats.py
@@ -33,7 +33,7 @@ class GATs(Model):
    d_feat : int
        input dimensions for each time step
    metric : str
-        the evaluate metric used in early stop
+        the evaluation metric used in early stop
    optimizer : str
        optimizer name
    GPU : int
--- a/qlib/contrib/model/pytorch_gats_ts.py
+++ b/qlib/contrib/model/pytorch_gats_ts.py
@@ -50,7 +50,7 @@ class GATs(Model):
    d_feat : int
        input dimensions for each time step
    metric : str
-        the evaluate metric used in early stop
+        the evaluation metric used in early stop
    optimizer : str
        optimizer name
    GPU : int
--- a/qlib/contrib/model/pytorch_gru.py
+++ b/qlib/contrib/model/pytorch_gru.py
@@ -30,7 +30,7 @@ class GRU(Model):
    d_feat : int
        input dimension for each time step
    metric: str
-        the evaluate metric used in early stop
+        the evaluation metric used in early stop
    optimizer : str
        optimizer name
    GPU : str
--- a/qlib/contrib/model/pytorch_gru_ts.py
+++ b/qlib/contrib/model/pytorch_gru_ts.py
@@ -31,7 +31,7 @@ class GRU(Model):
    d_feat : int
        input dimension for each time step
    metric: str
-        the evaluate metric used in early stop
+        the evaluation metric used in early stop
    optimizer : str
        optimizer name
    GPU : str
--- a/qlib/contrib/model/pytorch_hist.py
+++ b/qlib/contrib/model/pytorch_hist.py
@@ -34,7 +34,7 @@ class HIST(Model):
    d_feat : int
        input dimensions for each time step
    metric : str
-        the evaluate metric used in early stop
+        the evaluation metric used in early stop
    optimizer : str
        optimizer name
    GPU : str
--- a/qlib/contrib/model/pytorch_igmtf.py
+++ b/qlib/contrib/model/pytorch_igmtf.py
@@ -32,7 +32,7 @@ class IGMTF(Model):
    d_feat : int
        input dimension for each time step
    metric: str
-        the evaluate metric used in early stop
+        the evaluation metric used in early stop
    optimizer : str
        optimizer name
    GPU : str
--- a/qlib/contrib/model/pytorch_lstm.py
+++ b/qlib/contrib/model/pytorch_lstm.py
@@ -29,7 +29,7 @@ class LSTM(Model):
    d_feat : int
        input dimension for each time step
    metric: str
-        the evaluate metric used in early stop
+        the evaluation metric used in early stop
    optimizer : str
        optimizer name
    GPU : str
--- a/qlib/contrib/model/pytorch_lstm_ts.py
+++ b/qlib/contrib/model/pytorch_lstm_ts.py
@@ -30,7 +30,7 @@ class LSTM(Model):
    d_feat : int
        input dimension for each time step
    metric: str
-        the evaluate metric used in early stop
+        the evaluation metric used in early stop
    optimizer : str
        optimizer name
    GPU : str
--- a/qlib/contrib/model/pytorch_tcn.py
+++ b/qlib/contrib/model/pytorch_tcn.py
@@ -33,7 +33,7 @@ class TCN(Model):
    n_chans: int
        number of channels
    metric: str
-        the evaluate metric used in early stop
+        the evaluation metric used in early stop
    optimizer : str
        optimizer name
    GPU : str
--- a/qlib/contrib/model/pytorch_tcn_ts.py
+++ b/qlib/contrib/model/pytorch_tcn_ts.py
@@ -30,7 +30,7 @@ class TCN(Model):
    d_feat : int
        input dimension for each time step
    metric: str
-        the evaluate metric used in early stop
+        the evaluation metric used in early stop
    optimizer : str
        optimizer name
    GPU : str
--- a/qlib/contrib/model/pytorch_tcts.py
+++ b/qlib/contrib/model/pytorch_tcts.py
@@ -29,7 +29,7 @@ class TCTS(Model):
    d_feat : int
        input dimension for each time step
    metric: str
-        the evaluate metric used in early stop
+        the evaluation metric used in early stop
    optimizer : str
        optimizer name
    GPU : str
--- a/qlib/contrib/report/analysis_model/analysis_model_performance.py
+++ b/qlib/contrib/report/analysis_model/analysis_model_performance.py
@@ -1,5 +1,6 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT License.
+from functools import partial

 import pandas as pd

@@ -10,7 +11,11 @@ import matplotlib.pyplot as plt

 from scipy import stats

+from typing import Sequence
+from qlib.typehint import Literal
+
 from ..graph import ScatterGraph, SubplotsGraph, BarGraph, HeatmapGraph
+from ..utils import guess_plotly_rangebreaks


 def _group_return(pred_label: pd.DataFrame = None, reverse: bool = False, N: int = 5, **kwargs) -> tuple:
@@ -48,12 +53,13 @@ def _group_return(pred_label: pd.DataFrame = None, reverse: bool = False, N: int
    t_df["long-average"] = t_df["Group1"] - pred_label.groupby(level="datetime")["label"].mean()

    t_df = t_df.dropna(how="all")  # for days which does not contain label
-    # FIXME: support HIGH-FREQ
-    t_df.index = t_df.index.strftime("%Y-%m-%d")
    # Cumulative Return By Group
    group_scatter_figure = ScatterGraph(
        t_df.cumsum(),
-        layout=dict(title="Cumulative Return", xaxis=dict(type="category", tickangle=45)),
+        layout=dict(
+            title="Cumulative Return",
+            xaxis=dict(tickangle=45, rangebreaks=kwargs.get("rangebreaks", guess_plotly_rangebreaks(t_df.index))),
+        ),
    ).figure

    t_df = t_df.loc[:, ["long-short", "long-average"]]
@@ -110,22 +116,36 @@ def _plot_qq(data: pd.Series = None, dist=stats.norm) -> go.Figure:
    return fig


-def _pred_ic(pred_label: pd.DataFrame = None, rank: bool = False, **kwargs) -> tuple:
+def _pred_ic(
+    pred_label: pd.DataFrame = None, methods: Sequence[Literal["IC", "Rank IC"]] = ("IC", "Rank IC"), **kwargs
+) -> tuple:
    """

-    :param pred_label:
-    :param rank:
+    :param pred_label: pd.DataFrame
+    must contain one column of realized return with name `label` and one column of predicted score names `score`.
+    :param methods: Sequence[Literal["IC", "Rank IC"]]
+    IC series to plot.
+    IC is sectional pearson correlation between label and score
+    Rank IC is the spearman correlation between label and score
+    For the Monthly IC, IC histogram, IC Q-Q plot.  Only the first type of IC will be plotted.
    :return:
    """
-    if rank:
-        ic = pred_label.groupby(level="datetime").apply(
-            lambda x: x["label"].rank(pct=True).corr(x["score"].rank(pct=True))
-        )
-    else:
-        ic = pred_label.groupby(level="datetime").apply(lambda x: x["label"].corr(x["score"]))
+    _methods_mapping = {"IC": "pearson", "Rank IC": "spearman"}

-    _index = ic.index.get_level_values(0).astype("str").str.replace("-", "").str.slice(0, 6)
-    _monthly_ic = ic.groupby(_index).mean()
+    def _corr_series(x, method):
+        return x["label"].corr(x["score"], method=method)
+
+    ic_df = pd.concat(
+        [
+            pred_label.groupby(level="datetime").apply(partial(_corr_series, method=_methods_mapping[m])).rename(m)
+            for m in methods
+        ],
+        axis=1,
+    )
+    _ic = ic_df.iloc(axis=1)[0]
+
+    _index = _ic.index.get_level_values(0).astype("str").str.replace("-", "").str.slice(0, 6)
+    _monthly_ic = _ic.groupby(_index).mean()
    _monthly_ic.index = pd.MultiIndex.from_arrays(
        [_monthly_ic.index.str.slice(0, 4), _monthly_ic.index.str.slice(4, 6)],
        names=["year", "month"],
@@ -148,27 +168,27 @@ def _pred_ic(pred_label: pd.DataFrame = None, rank: bool = False, **kwargs) -> t

    _monthly_ic = _monthly_ic.reindex(fill_index)

-    _ic_df = ic.to_frame("ic")
-    ic_bar_figure = ic_figure(_ic_df, kwargs.get("show_nature_day", True))
+    ic_bar_figure = ic_figure(ic_df, kwargs.get("show_nature_day", False))

    ic_heatmap_figure = HeatmapGraph(
        _monthly_ic.unstack(),
-        layout=dict(title="Monthly IC", yaxis=dict(tickformat=",d")),
+        layout=dict(title="Monthly IC", xaxis=dict(dtick=1), yaxis=dict(tickformat="04d", dtick=1)),
        graph_kwargs=dict(xtype="array", ytype="array"),
    ).figure

    dist = stats.norm
-    _qqplot_fig = _plot_qq(ic, dist)
+    _qqplot_fig = _plot_qq(_ic, dist)

    if isinstance(dist, stats.norm.__class__):
        dist_name = "Normal"
    else:
        dist_name = "Unknown"

+    _ic_df = _ic.to_frame("IC")
    _bin_size = ((_ic_df.max() - _ic_df.min()) / 20).min()
    _sub_graph_data = [
        (
-            "ic",
+            "IC",
            dict(
                row=1,
                col=1,
@@ -202,12 +222,13 @@ def _pred_autocorr(pred_label: pd.DataFrame, lag=1, **kwargs) -> tuple:
    pred = pred_label.copy()
    pred["score_last"] = pred.groupby(level="instrument")["score"].shift(lag)
    ac = pred.groupby(level="datetime").apply(lambda x: x["score"].rank(pct=True).corr(x["score_last"].rank(pct=True)))
-    # FIXME: support HIGH-FREQ
    _df = ac.to_frame("value")
-    _df.index = _df.index.strftime("%Y-%m-%d")
    ac_figure = ScatterGraph(
        _df,
-        layout=dict(title="Auto Correlation", xaxis=dict(type="category", tickangle=45)),
+        layout=dict(
+            title="Auto Correlation",
+            xaxis=dict(tickangle=45, rangebreaks=kwargs.get("rangebreaks", guess_plotly_rangebreaks(_df.index))),
+        ),
    ).figure
    return (ac_figure,)

@@ -233,32 +254,33 @@ def _pred_turnover(pred_label: pd.DataFrame, N=5, lag=1, **kwargs) -> tuple:
            "Bottom": bottom,
        }
    )
-    # FIXME: support HIGH-FREQ
-    r_df.index = r_df.index.strftime("%Y-%m-%d")
    turnover_figure = ScatterGraph(
        r_df,
-        layout=dict(title="Top-Bottom Turnover", xaxis=dict(type="category", tickangle=45)),
+        layout=dict(
+            title="Top-Bottom Turnover",
+            xaxis=dict(tickangle=45, rangebreaks=kwargs.get("rangebreaks", guess_plotly_rangebreaks(r_df.index))),
+        ),
    ).figure
    return (turnover_figure,)


 def ic_figure(ic_df: pd.DataFrame, show_nature_day=True, **kwargs) -> go.Figure:
-    """IC figure
+    r"""IC figure

    :param ic_df: ic DataFrame
    :param show_nature_day: whether to display the abscissa of non-trading day
+    :param \*\*kwargs: contains some parameters to control plot style in plotly. Currently, supports
+       - `rangebreaks`: https://plotly.com/python/time-series/#Hiding-Weekends-and-Holidays
    :return: plotly.graph_objs.Figure
    """
    if show_nature_day:
        date_index = pd.date_range(ic_df.index.min(), ic_df.index.max())
        ic_df = ic_df.reindex(date_index)
-    # FIXME: support HIGH-FREQ
-    ic_df.index = ic_df.index.strftime("%Y-%m-%d")
    ic_bar_figure = BarGraph(
        ic_df,
        layout=dict(
            title="Information Coefficient (IC)",
-            xaxis=dict(type="category", tickangle=45),
+            xaxis=dict(tickangle=45, rangebreaks=kwargs.get("rangebreaks", guess_plotly_rangebreaks(ic_df.index))),
        ),
    ).figure
    return ic_bar_figure
@@ -272,12 +294,13 @@ def model_performance_graph(
    rank=False,
    graph_names: list = ["group_return", "pred_ic", "pred_autocorr"],
    show_notebook: bool = True,
-    show_nature_day=True,
+    show_nature_day: bool = False,
+    **kwargs,
 ) -> [list, tuple]:
-    """Model performance
+    r"""Model performance

-    :param pred_label: index is **pd.MultiIndex**, index name is **[instrument, datetime]**; columns names is **[score,
-    label]**. It is usually same as the label of model training(e.g. "Ref($close, -2)/Ref($close, -1) - 1").
+    :param pred_label: index is **pd.MultiIndex**, index name is **[instrument, datetime]**; columns names is **[score, label]**.
+           It is usually same as the label of model training(e.g. "Ref($close, -2)/Ref($close, -1) - 1").


            .. code-block:: python
@@ -297,17 +320,14 @@ def model_performance_graph(
    :param graph_names: graph names; default ['cumulative_return', 'pred_ic', 'pred_autocorr', 'pred_turnover'].
    :param show_notebook: whether to display graphics in notebook, the default is `True`.
    :param show_nature_day: whether to display the abscissa of non-trading day.
+    :param \*\*kwargs: contains some parameters to control plot style in plotly. Currently, supports
+       - `rangebreaks`: https://plotly.com/python/time-series/#Hiding-Weekends-and-Holidays
    :return: if show_notebook is True, display in notebook; else return `plotly.graph_objs.Figure` list.
    """
    figure_list = []
    for graph_name in graph_names:
        fun_res = eval(f"_{graph_name}")(
-            pred_label=pred_label,
-            lag=lag,
-            N=N,
-            reverse=reverse,
-            rank=rank,
-            show_nature_day=show_nature_day,
+            pred_label=pred_label, lag=lag, N=N, reverse=reverse, rank=rank, show_nature_day=show_nature_day, **kwargs
        )
        figure_list += fun_res

--- a/qlib/contrib/report/analysis_position/cumulative_return.py
+++ b/qlib/contrib/report/analysis_position/cumulative_return.py
@@ -218,6 +218,7 @@ def cumulative_return_graph(


        Graph desc:
+
            - Axis X: Trading day.
            - Axis Y:
            - Above axis Y: `(((Ref($close, -1)/$close - 1) * weight).sum() / weight.sum()).cumsum()`.
@@ -242,7 +243,8 @@ def cumulative_return_graph(


    :param label_data: `D.features` result; index is `pd.MultiIndex`, index name is [`instrument`, `datetime`]; columns names is [`label`].
-    **The label T is the change from T to T+1**, it is recommended to use ``close``, example: `D.features(D.instruments('csi500'), ['Ref($close, -1)/$close-1'])`
+
+        **The label T is the change from T to T+1**, it is recommended to use ``close``, example: `D.features(D.instruments('csi500'), ['Ref($close, -1)/$close-1'])`


            .. code-block:: python
--- a/qlib/contrib/report/analysis_position/parse_position.py
+++ b/qlib/contrib/report/analysis_position/parse_position.py
@@ -39,6 +39,7 @@ def parse_position(position: dict = None) -> pd.DataFrame:

    result_df = pd.DataFrame()
    for _trading_date, _value in position.items():
+        _value = _value.position
        # pd_date type: pd.Timestamp
        _cash = _value.pop("cash")
        for _item in ["now_account_value"]:
--- a/qlib/contrib/report/analysis_position/rank_label.py
+++ b/qlib/contrib/report/analysis_position/rank_label.py
@@ -99,7 +99,8 @@ def rank_label_graph(

    :param position: position data; **qlib.backtest.backtest** result.
    :param label_data: **D.features** result; index is **pd.MultiIndex**, index name is **[instrument, datetime]**; columns names is **[label]**.
-    **The label T is the change from T to T+1**, it is recommended to use ``close``, example: `D.features(D.instruments('csi500'), ['Ref($close, -1)/$close-1'])`.
+
+        **The label T is the change from T to T+1**, it is recommended to use ``close``, example: `D.features(D.instruments('csi500'), ['Ref($close, -1)/$close-1'])`.


            .. code-block:: python
--- a/qlib/contrib/report/analysis_position/risk_analysis.py
+++ b/qlib/contrib/report/analysis_position/risk_analysis.py
@@ -119,7 +119,7 @@ def _get_risk_analysis_figure(analysis_df: pd.DataFrame) -> Iterable[py.Figure]:
    _figure = SubplotsGraph(
        _get_all_risk_analysis(analysis_df),
        kind_map=dict(kind="BarGraph", kwargs={}),
-        subplots_kwargs={"rows": 4, "cols": 1},
+        subplots_kwargs={"rows": 1, "cols": 4},
    ).figure
    return (_figure,)

--- a/qlib/contrib/report/analysis_position/score_ic.py
+++ b/qlib/contrib/report/analysis_position/score_ic.py
@@ -4,6 +4,7 @@
 import pandas as pd

 from ..graph import ScatterGraph
+from ..utils import guess_plotly_rangebreaks


 def _get_score_ic(pred_label: pd.DataFrame):
@@ -19,7 +20,7 @@ def _get_score_ic(pred_label: pd.DataFrame):
    return pd.DataFrame({"ic": _ic, "rank_ic": _rank_ic})


-def score_ic_graph(pred_label: pd.DataFrame, show_notebook: bool = True) -> [list, tuple]:
+def score_ic_graph(pred_label: pd.DataFrame, show_notebook: bool = True, **kwargs) -> [list, tuple]:
    """score IC

        Example:
@@ -53,11 +54,13 @@ def score_ic_graph(pred_label: pd.DataFrame, show_notebook: bool = True) -> [lis
    :return: if show_notebook is True, display in notebook; else return **plotly.graph_objs.Figure** list.
    """
    _ic_df = _get_score_ic(pred_label)
-    # FIXME: support HIGH-FREQ
-    _ic_df.index = _ic_df.index.strftime("%Y-%m-%d")
+
    _figure = ScatterGraph(
        _ic_df,
-        layout=dict(title="Score IC", xaxis=dict(type="category", tickangle=45)),
+        layout=dict(
+            title="Score IC",
+            xaxis=dict(tickangle=45, rangebreaks=kwargs.get("rangebreaks", guess_plotly_rangebreaks(_ic_df.index))),
+        ),
        graph_kwargs={"mode": "lines+markers"},
    ).figure
    if show_notebook:
--- a/qlib/contrib/report/data/ana.py
+++ b/qlib/contrib/report/data/ana.py
@@ -139,8 +139,8 @@ class FeaACAna(FeaAnalyser):

 class FeaSkewTurt(NumFeaAnalyser):
    def calc_stat_values(self):
-        self._skew = datetime_groupby_apply(self._dataset, "skew", skip_group=True)
-        self._kurt = datetime_groupby_apply(self._dataset, pd.DataFrame.kurt, skip_group=True)
+        self._skew = datetime_groupby_apply(self._dataset, "skew")
+        self._kurt = datetime_groupby_apply(self._dataset, pd.DataFrame.kurt)

    def plot_single(self, col, ax):
        self._skew[col].plot(ax=ax, label="skew")
--- a/qlib/contrib/report/utils.py
+++ b/qlib/contrib/report/utils.py
@@ -1,6 +1,7 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT License.
 import matplotlib.pyplot as plt
+import pandas as pd


 def sub_fig_generator(sub_fs=(3, 3), col_n=10, row_n=1, wspace=None, hspace=None, sharex=False, sharey=False):
@@ -43,3 +44,31 @@ def sub_fig_generator(sub_fs=(3, 3), col_n=10, row_n=1, wspace=None, hspace=None
                res = res.item()
            yield res
        plt.show()
+
+
+def guess_plotly_rangebreaks(dt_index: pd.DatetimeIndex):
+    """
+    This function `guesses` the rangebreaks required to remove gaps in datetime index.
+    It basically calculates the difference between a `continuous` datetime index and index given.
+
+    For more details on `rangebreaks` params in plotly, see
+    https://plotly.com/python/reference/layout/xaxis/#layout-xaxis-rangebreaks
+
+    Parameters
+    ----------
+    dt_index: pd.DatetimeIndex
+    The datetimes of the data.
+
+    Returns
+    -------
+    the `rangebreaks` to be passed into plotly axis.
+
+    """
+    dt_idx = dt_index.sort_values()
+    gaps = dt_idx[1:] - dt_idx[:-1]
+    min_gap = gaps.min()
+    gaps_to_break = {}
+    for gap, d in zip(gaps, dt_idx[:-1]):
+        if gap > min_gap:
+            gaps_to_break.setdefault(gap - min_gap, []).append(d + min_gap)
+    return [dict(values=v, dvalue=int(k.total_seconds() * 1000)) for k, v in gaps_to_break.items()]
--- a/qlib/contrib/strategy/cost_control.py
+++ b/qlib/contrib/strategy/cost_control.py
@@ -25,12 +25,14 @@ class SoftTopkStrategy(WeightStrategyBase):
        common_infra=None,
        **kwargs,
    ):
-        """Parameter
+        """
+        Parameters
+        ----------
        topk : int
            top-N stocks to buy
        risk_degree : float
-            position percentage of total value
-            buy_method :
+            position percentage of total value buy_method:
+
                rank_fill: assign the weight stocks that rank high first(1/topk max)
                average_fill: assign the weight to the stocks rank high averagely.
        """
@@ -51,12 +53,19 @@ class SoftTopkStrategy(WeightStrategyBase):
        return self.risk_degree

    def generate_target_weight_position(self, score, current, trade_start_time, trade_end_time):
-        """Parameter:
-        score : pred score for this trade date, pd.Series, index is stock_id, contain 'score' column
-        current : current position, use Position() class
-        trade_date : trade date
-        generate target position from score for this date and the current position
-        The cache is not considered in the position
+        """
+        Parameters
+        ----------
+        score:
+            pred score for this trade date, pd.Series, index is stock_id, contain 'score' column
+        current:
+            current position, use Position() class
+        trade_date:
+            trade date
+
+            generate target position from score for this date and the current position
+
+            The cache is not considered in the position
        """
        # TODO:
        # If the current stock list is more than topk(eg. The weights are modified
--- a/qlib/contrib/strategy/order_generator.py
+++ b/qlib/contrib/strategy/order_generator.py
@@ -33,10 +33,14 @@ class OrderGenerator:
        :type target_weight_position: dict
        :param risk_degree:
        :type risk_degree: float
-        :param pred_date: the date the score is predicted
-        :type pred_date: pd.Timestamp
-        :param trade_date: the date the stock is traded
-        :type trade_date: pd.Timestamp
+        :param pred_start_time:
+        :type pred_start_time: pd.Timestamp
+        :param pred_end_time:
+        :type pred_end_time: pd.Timestamp
+        :param trade_start_time:
+        :type trade_start_time: pd.Timestamp
+        :param trade_end_time:
+        :type trade_end_time: pd.Timestamp

        :rtype: list
        """
@@ -72,10 +76,14 @@ class OrderGenWInteract(OrderGenerator):
        :type target_weight_position: dict
        :param risk_degree:
        :type risk_degree: float
-        :param pred_date:
-        :type pred_date: pd.Timestamp
-        :param trade_date:
-        :type trade_date: pd.Timestamp
+        :param pred_start_time:
+        :type pred_start_time: pd.Timestamp
+        :param pred_end_time:
+        :type pred_end_time: pd.Timestamp
+        :param trade_start_time:
+        :type trade_start_time: pd.Timestamp
+        :param trade_end_time:
+        :type trade_end_time: pd.Timestamp

        :rtype: list
        """
@@ -147,9 +155,12 @@ class OrderGenWOInteract(OrderGenerator):
    ) -> list:
        """generate_order_list_from_target_weight_position

-        generate order list directly not using the information (e.g. whether can be traded, the accurate trade price) at trade date.
-        In target weight position, generating order list need to know the price of objective stock in trade date, but we cannot get that
-        value when do not interact with exchange, so we check the %close price at pred_date or price recorded in current position.
+        generate order list directly not using the information (e.g. whether can be traded, the accurate trade price)
+         at trade date.
+        In target weight position, generating order list need to know the price of objective stock in trade date,
+        but we cannot get that
+        value when do not interact with exchange, so we check the %close price at pred_date or price recorded
+        in current position.

        :param current:
        :type current: Position
@@ -159,10 +170,14 @@ class OrderGenWOInteract(OrderGenerator):
        :type target_weight_position: dict
        :param risk_degree:
        :type risk_degree: float
-        :param pred_date:
-        :type pred_date: pd.Timestamp
-        :param trade_date:
-        :type trade_date: pd.Timestamp
+        :param pred_start_time:
+        :type pred_start_time: pd.Timestamp
+        :param pred_end_time:
+        :type pred_end_time: pd.Timestamp
+        :param trade_start_time:
+        :type trade_start_time: pd.Timestamp
+        :param trade_end_time:
+        :type trade_end_time: pd.Timestamp

        :rtype: list of generated orders
        """
@@ -185,7 +200,8 @@ class OrderGenWOInteract(OrderGenerator):
                    * target_weight_position[stock_id]
                    / trade_exchange.get_close(stock_id, start_time=pred_start_time, end_time=pred_end_time)
                )
-                # TODO: Qlib use None to represent trading suspension. So last close price can't be the estimated trading price.
+                # TODO: Qlib use None to represent trading suspension.
+                #  So last close price can't be the estimated trading price.
                # Maybe a close price with forward fill will be a better solution.
            elif stock_id in current_stock:
                amount_dict[stock_id] = (
--- a/qlib/contrib/strategy/signal_strategy.py
+++ b/qlib/contrib/strategy/signal_strategy.py
@@ -7,6 +7,7 @@ import numpy as np
 import pandas as pd

 from typing import Dict, List, Text, Tuple, Union
+from abc import ABC

 from qlib.data import D
 from qlib.data.dataset import Dataset
@@ -17,11 +18,11 @@ from qlib.backtest.signal import Signal, create_signal_from
 from qlib.backtest.decision import Order, OrderDir, TradeDecisionWO
 from qlib.log import get_module_logger
 from qlib.utils import get_pre_trading_date, load_dataset
-from qlib.contrib.strategy.order_generator import OrderGenWOInteract
+from qlib.contrib.strategy.order_generator import OrderGenerator, OrderGenWOInteract
 from qlib.contrib.strategy.optimizer import EnhancedIndexingOptimizer


-class BaseSignalStrategy(BaseStrategy):
+class BaseSignalStrategy(BaseStrategy, ABC):
    def __init__(
        self,
        *,
@@ -47,7 +48,7 @@ class BaseSignalStrategy(BaseStrategy):
            - If `trade_exchange` is None, self.trade_exchange will be set with common_infra
            - It allowes different trade_exchanges is used in different executions.
            - For example:
-                - In daily execution, both daily exchange and minutely are usable, but the daily exchange is recommended because it run faster.
+                - In daily execution, both daily exchange and minutely are usable, but the daily exchange is recommended because it runs faster.
                - In minutely execution, the daily exchange is not usable, only the minutely exchange is recommended.

        """
@@ -64,7 +65,7 @@ class BaseSignalStrategy(BaseStrategy):

    def get_risk_degree(self, trade_step=None):
        """get_risk_degree
-        Return the proportion of your total value you will used in investment.
+        Return the proportion of your total value you will use in investment.
        Dynamically risk_degree will result in Market timing.
        """
        # It will use 95% amount of your total value by default
@@ -76,6 +77,7 @@ class TopkDropoutStrategy(BaseSignalStrategy):
    # 1. Supporting leverage the get_range_limit result from the decision
    # 2. Supporting alter_outer_trade_decision
    # 3. Supporting checking the availability of trade decision
+    # 4. Regenerate results with forbid_all_trade_at_limit set to false and flip the default to false, as it is consistent with reality.
    def __init__(
        self,
        *,
@@ -85,6 +87,7 @@ class TopkDropoutStrategy(BaseSignalStrategy):
        method_buy="top",
        hold_thresh=1,
        only_tradable=False,
+        forbid_all_trade_at_limit=True,
        **kwargs,
    ):
        """
@@ -103,10 +106,25 @@ class TopkDropoutStrategy(BaseSignalStrategy):
            before sell stock , will check current.get_stock_count(order.stock_id) >= self.hold_thresh.
        only_tradable : bool
            will the strategy only consider the tradable stock when buying and selling.
+
            if only_tradable:
+
                strategy will make decision with the tradable state of the stock info and avoid buy and sell them.
+
            else:
+
                strategy will make buy sell decision without checking the tradable state of the stock.
+        forbid_all_trade_at_limit : bool
+            if forbid all trades when limit_up or limit_down reached.
+
+            if forbid_all_trade_at_limit:
+
+                strategy will not do any trade when price reaches limit up/down, even not sell at limit up nor buy at
+                limit down, though allowed in reality.
+
+            else:
+
+                strategy will sell at limit up and buy ad limit down.
        """
        super().__init__(**kwargs)
        self.topk = topk
@@ -115,6 +133,7 @@ class TopkDropoutStrategy(BaseSignalStrategy):
        self.method_buy = method_buy
        self.hold_thresh = hold_thresh
        self.only_tradable = only_tradable
+        self.forbid_all_trade_at_limit = forbid_all_trade_at_limit

    def generate_trade_decision(self, execute_result=None):
        # get the number of trading step finished, trade_step can be [0, 1, 2, ..., trade_len - 1]
@@ -157,7 +176,7 @@ class TopkDropoutStrategy(BaseSignalStrategy):
                ]

        else:
-            # Otherwise, the stock will make decision with out the stock tradable info
+            # Otherwise, the stock will make decision without the stock tradable info
            def get_first_n(li, n):
                return list(li)[:n]

@@ -167,7 +186,7 @@ class TopkDropoutStrategy(BaseSignalStrategy):
            def filter_stock(li):
                return li

-        current_temp = copy.deepcopy(self.trade_position)
+        current_temp: Position = copy.deepcopy(self.trade_position)
        # generate order list for this adjust date
        sell_order_list = []
        buy_order_list = []
@@ -212,7 +231,10 @@ class TopkDropoutStrategy(BaseSignalStrategy):
        buy = today[: len(sell) + self.topk - len(last)]
        for code in current_stock_list:
            if not self.trade_exchange.is_stock_tradable(
-                stock_id=code, start_time=trade_start_time, end_time=trade_end_time
+                stock_id=code,
+                start_time=trade_start_time,
+                end_time=trade_end_time,
+                direction=None if self.forbid_all_trade_at_limit else OrderDir.SELL,
            ):
                continue
            if code in sell:
@@ -222,9 +244,6 @@ class TopkDropoutStrategy(BaseSignalStrategy):
                    continue
                # sell order
                sell_amount = current_temp.get_stock_amount(code=code)
-                factor = self.trade_exchange.get_factor(
-                    stock_id=code, start_time=trade_start_time, end_time=trade_end_time
-                )
                # sell_amount = self.trade_exchange.round_amount_by_trade_unit(sell_amount, factor)
                sell_order = Order(
                    stock_id=code,
@@ -243,7 +262,7 @@ class TopkDropoutStrategy(BaseSignalStrategy):
                    cash += trade_val - trade_cost
        # buy new stock
        # note the current has been changed
-        current_stock_list = current_temp.get_stock_list()
+        # current_stock_list = current_temp.get_stock_list()
        value = cash * self.risk_degree / len(buy) if len(buy) > 0 else 0

        # open_cost should be considered in the real trading environment, while the backtest in evaluate.py does not
@@ -252,7 +271,10 @@ class TopkDropoutStrategy(BaseSignalStrategy):
        for code in buy:
            # check is stock suspended
            if not self.trade_exchange.is_stock_tradable(
-                stock_id=code, start_time=trade_start_time, end_time=trade_end_time
+                stock_id=code,
+                start_time=trade_start_time,
+                end_time=trade_end_time,
+                direction=None if self.forbid_all_trade_at_limit else OrderDir.BUY,
            ):
                continue
            # buy order
@@ -290,31 +312,33 @@ class WeightStrategyBase(BaseSignalStrategy):
            the decision of the strategy will base on the given signal
        trade_exchange : Exchange
            exchange that provides market info, used to deal order and generate report
+
            - If `trade_exchange` is None, self.trade_exchange will be set with common_infra
            - It allowes different trade_exchanges is used in different executions.
            - For example:
-                - In daily execution, both daily exchange and minutely are usable, but the daily exchange is recommended because it run faster.
+
+                - In daily execution, both daily exchange and minutely are usable, but the daily exchange is recommended because it runs faster.
                - In minutely execution, the daily exchange is not usable, only the minutely exchange is recommended.
        """
        super().__init__(**kwargs)

        if isinstance(order_generator_cls_or_obj, type):
-            self.order_generator = order_generator_cls_or_obj()
+            self.order_generator: OrderGenerator = order_generator_cls_or_obj()
        else:
-            self.order_generator = order_generator_cls_or_obj
+            self.order_generator: OrderGenerator = order_generator_cls_or_obj

    def generate_target_weight_position(self, score, current, trade_start_time, trade_end_time):
        """
        Generate target position from score for this date and the current position.The cash is not considered in the position
+
        Parameters
        -----------
        score : pd.Series
            pred score for this trade date, index is stock_id, contain 'score' column.
        current : Position()
            current position.
-        trade_exchange : Exchange()
-        trade_date : pd.Timestamp
-            trade date.
+        trade_start_time: pd.Timestamp
+        trade_end_time: pd.Timestamp
        """
        raise NotImplementedError()

@@ -358,12 +382,14 @@ class EnhancedIndexingStrategy(WeightStrategyBase):

    Users need to prepare their risk model data like below:

-    ├── /path/to/riskmodel
-    ├──── 20210101
-    ├────── factor_exp.{csv|pkl|h5}
-    ├────── factor_cov.{csv|pkl|h5}
-    ├────── specific_risk.{csv|pkl|h5}
-    ├────── blacklist.{csv|pkl|h5}  # optional
+    .. code-block:: text
+
+        ├── /path/to/riskmodel
+        ├──── 20210101
+        ├────── factor_exp.{csv|pkl|h5}
+        ├────── factor_cov.{csv|pkl|h5}
+        ├────── specific_risk.{csv|pkl|h5}
+        ├────── blacklist.{csv|pkl|h5}  # optional

    The risk model data can be obtained from risk data provider. You can also use
    `qlib.model.riskmodel.structured.StructuredCovEstimator` to prepare these data.
@@ -422,7 +448,7 @@ class EnhancedIndexingStrategy(WeightStrategyBase):
        specific_risk = load_dataset(root + "/" + self.specific_risk_path, index_col=[0])

        if not factor_exp.index.equals(specific_risk.index):
-            # NOTE: for stocks missing specific_risk, we always assume it have the highest volatility
+            # NOTE: for stocks missing specific_risk, we always assume it has the highest volatility
            specific_risk = specific_risk.reindex(factor_exp.index, fill_value=specific_risk.max())

        universe = factor_exp.index.tolist()
--- a/qlib/data/base.py
+++ b/qlib/data/base.py
@@ -16,8 +16,10 @@ class Expression(abc.ABC):

    Expression is designed to handle the calculation of data with the format below
    data with two dimension for each instrument,
+
    - feature
    - time:  it  could be observation time or period time.
+
        - period time is designed for Point-in-time database.  For example, the period time maybe 2014Q4, its value can observed for multiple times(different value may be observed at different time due to amendment).
    """

@@ -142,9 +144,12 @@ class Expression(abc.ABC):
        This function is responsible for loading feature/expression based on the expression engine.

        The concrete implementation will be separated into two parts:
+
        1) caching data, handle errors.
+
            - This part is shared by all the expressions and implemented in Expression
        2) processing and calculating data based on the specific expression.
+
            - This part is different in each expression and implemented in each expression

        Expression Engine is shared by different data.
--- a/qlib/data/cache.py
+++ b/qlib/data/cache.py
@@ -141,8 +141,10 @@ class MemCache:

        Parameters
        ----------
-        mem_cache_size_limit: cache max size.
-        limit_type: length or sizeof; length(call fun: len), size(call fun: sys.getsizeof).
+        mem_cache_size_limit:
+            cache max size.
+        limit_type:
+            length or sizeof; length(call fun: len), size(call fun: sys.getsizeof).
        """

        size_limit = C.mem_cache_size_limit if mem_cache_size_limit is None else mem_cache_size_limit
@@ -394,7 +396,7 @@ class DatasetCache(BaseProviderCache):

        .. note:: The server use redis_lock to make sure
            read-write conflicts will not be triggered
-                but client readers are not considered.
+            but client readers are not considered.
        """
        if disk_cache == 0:
            # skip cache
@@ -471,7 +473,7 @@ class DatasetCache(BaseProviderCache):
        not_space_fields = remove_fields_space(fields)
        data = data.loc[:, not_space_fields]
        # set features fields
-        data.columns = list(fields)
+        data.columns = [str(i) for i in fields]
        return data

    @staticmethod
@@ -858,7 +860,7 @@ class DiskDatasetCache(DatasetCache):
        """gen_dataset_cache

        .. note:: This function does not consider the cache read write lock. Please
-        Acquire the lock outside this function
+            acquire the lock outside this function

        The format the cache contains 3 parts(followed by typical filename).

@@ -874,10 +876,10 @@ class DiskDatasetCache(DatasetCache):
                    1999-11-12 00:00:00     2   3
                    ...

-            .. note:: The start is closed. The end is open!!!!!
+                .. note:: The start is closed. The end is open!!!!!

            - Each line contains two element <start_index, end_index> with a timestamp as its index.
-            - It indicates the `start_index`(included) and `end_index`(excluded) of the data for `timestamp`
+            - It indicates the `start_index` (included) and `end_index` (excluded) of the data for `timestamp`

        - meta data: cache/d41366901e25de3ec47297f12e2ba11d.meta

--- a/qlib/data/data.py
+++ b/qlib/data/data.py
@@ -220,7 +220,8 @@ class InstrumentProvider(abc.ABC):
        ----------
        dict: if isinstance(market, str)
            dict of stockpool config.
-            {`market`=>base market name, `filter_pipe`=>list of filters}
+
+            {`market` => base market name, `filter_pipe` => list of filters}

            example :

@@ -432,9 +433,12 @@ class ExpressionProvider(abc.ABC):
            data of a certain expression

            The data has two types of format
+
            1) expression with datetime index
+
            2) expression with integer index
-            - because the datetime is not as good as
+
+                - because the datetime is not as good as
        """
        raise NotImplementedError("Subclass of ExpressionProvider must implement `Expression` method")

@@ -890,6 +894,7 @@ class LocalDatasetProvider(DatasetProvider):
            Will we align the time to calendar
            the frequency is flexible in some dataset and can't be aligned.
            For the data with fixed frequency with a shared calendar, the align data to the calendar will provides following benefits
+
            - Align queries to the same parameters, so the cache can be shared.
        """
        super().__init__()
@@ -1167,11 +1172,12 @@ class BaseProvider:
        inst_processors=[],
    ):
        """
-        Parameters:
-        -----------
+        Parameters
+        ----------
        disk_cache : int
            whether to skip(0)/use(1)/replace(2) disk_cache

+
        This function will try to use cache method which has a keyword `disk_cache`,
        and will use provider method if a type error is raised because the DatasetD instance
        is a provider class.
@@ -1221,10 +1227,12 @@ class ClientProvider(BaseProvider):
    """Client Provider

    Requesting data from server as a client. Can propose requests:
+
        - Calendar : Directly respond a list of calendars
        - Instruments (without filter): Directly respond a list/dict of instruments
        - Instruments (with filters):  Respond a list/dict of instruments
        - Features : Respond a cache uri
+
    The general workflow is described as follows:
    When the user use client provider to propose a request, the client provider will connect the server and send the request. The client will start to wait for the response. The response will be made instantly indicating whether the cache is available. The waiting procedure will terminate only when the client get the response saying `feature_available` is true.
    `BUG` : Everytime we make request for certain data we need to connect to the server, wait for the response and disconnect from it. We can't make a sequence of requests within one connection. You can refer to https://python-socketio.readthedocs.io/en/latest/client.html for documentation of python-socketIO client.
--- a/qlib/data/dataset/init.py
+++ b/qlib/data/dataset/init.py
@@ -82,7 +82,11 @@ class DatasetH(Dataset):
    """

    def __init__(
-        self, handler: Union[Dict, DataHandler], segments: Dict[Text, Tuple], fetch_kwargs: Dict = {}, **kwargs
+        self,
+        handler: Union[Dict, DataHandler],
+        segments: Dict[Text, Tuple],
+        fetch_kwargs: Dict = {},
+        **kwargs,
    ):
        """
        Setup the underlying data.
@@ -201,8 +205,9 @@ class DatasetH(Dataset):
        col_set : str
            The col_set will be passed to self.handler when fetching data.
            TODO: make it automatic:
-                - select DK_I for test data
-                - select DK_L for training data.
+
+            - select DK_I for test data
+            - select DK_L for training data.
        data_key : str
            The data to fetch:  DK_*
            Default is DK_I, which indicate fetching data for **inference**.
@@ -284,10 +289,69 @@ class TSDataSampler:
    - For performance issues, this Sampler will convert dataframe into arrays for better performance. This could result
      in a different data type

+
+    Indices design:
+        TSDataSampler has a index mechanism to help users query time-series data efficiently.
+
+        The definition of related variables:
+            data_arr: np.ndarray
+                The original data. it will contains all the original data.
+                The querying are often for time-series of a specific stock.
+                By leveraging this data charactoristics to speed up querying, the multi-index of data_arr is rearranged in (instrument, datetime) order
+
+            data_index: pd.MultiIndex with index order <instrument, datetime>
+                it has the same shape with `idx_map`. Each elements of them are expected to be aligned.
+
+            idx_map: np.ndarray
+                It is the indexable data. It originates from data_arr, and then filtered by 1) `start` and `end`  2) `flt_data`
+                    The extra data in data_arr is useful in following cases
+                    1) creating meaningful time series data before `start` instead of padding them with zeros
+                    2) some data are excluded by `flt_data` (e.g. no <X, y> sample pair for that index). but they are still used in time-series in X
+
+                Finnally, it will look like.
+
+                array([[  0,   0],
+                       [  1,   0],
+                       [  2,   0],
+                       ...,
+                       [241, 348],
+                       [242, 348],
+                       [243, 348]], dtype=int32)
+
+                It list all indexable data(some data only used in historical time series data may not be indexabla), the values are the corresponding row and col in idx_df
+            idx_df: pd.DataFrame
+                It aims to map the <datetime, instrument> key to the original position in data_arr
+
+                For example, it may look like (NOTE: the index for a instrument time-series is continoues in memory)
+
+                    instrument SH600000 SH600008 SH600009 SH600010 SH600011 SH600015  ...
+                    datetime
+                    2017-01-03        0      242      473      717      NaN      974  ...
+                    2017-01-04        1      243      474      718      NaN      975  ...
+                    2017-01-05        2      244      475      719      NaN      976  ...
+                    2017-01-06        3      245      476      720      NaN      977  ...
+
+            With these two indices(idx_map, idx_df) and original data(data_arr), we can make the following queries fast (implemented in __getitem__)
+            (1) Get the i-th indexable sample(time-series):   (indexable sample index) -> [idx_map] -> (row col) -> [idx_df] -> (index in data_arr)
+            (2) Get the specific sample by <datetime, instrument>:  (<datetime, instrument>, i.e. <row, col>) -> [idx_df] -> (index in data_arr)
+            (3) Get the index of a time-series data:   (get the <row, col>, refer to (1), (2)) -> [idx_df] -> (all indices in data_arr for time-series)
    """

+    # Please refer to the docstring of TSDataSampler for the definition of following attributes
+    data_arr: np.ndarray
+    data_index: pd.MultiIndex
+    idx_map: np.ndarray
+    idx_df: pd.DataFrame
+
    def __init__(
-        self, data: pd.DataFrame, start, end, step_len: int, fillna_type: str = "none", dtype=None, flt_data=None
+        self,
+        data: pd.DataFrame,
+        start,
+        end,
+        step_len: int,
+        fillna_type: str = "none",
+        dtype=None,
+        flt_data=None,
    ):
        """
        Build a dataset which looks like torch.data.utils.Dataset.
@@ -295,7 +359,7 @@ class TSDataSampler:
        Parameters
        ----------
        data : pd.DataFrame
-            The raw tabular data
+            The raw tabular data whose index order is <"datetime", "instrument">
        start :
            The indexable start time
        end :
@@ -311,7 +375,7 @@ class TSDataSampler:
            ffill+bfill:
                ffill with previous samples first and fill with later samples second
        flt_data : pd.Series
-            a column of data(True or False) to filter data.
+            a column of data(True or False) to filter data. Its index order is <"datetime", "instrument">
            None:
                kepp all data

@@ -321,7 +385,10 @@ class TSDataSampler:
        self.step_len = step_len
        self.fillna_type = fillna_type
        assert get_level_index(data, "datetime") == 0
-        self.data = lazy_sort_index(data)
+        self.data = data.swaplevel().sort_index().copy()
+        data.drop(
+            data.columns, axis=1, inplace=True
+        )  # data is useless since it's passed to a transposed one, hard code to free the memory of this dataframe to avoid three big dataframe in the memory(including: data, self.data, self.data_arr)

        kwargs = {"object": self.data}
        if dtype is not None:
@@ -332,7 +399,9 @@ class TSDataSampler:
        # - append last line with full NaN for better performance in `__getitem__`
        # - Keep the same dtype will result in a better performance
        self.data_arr = np.append(
-            self.data_arr, np.full((1, self.data_arr.shape[1]), np.nan, dtype=self.data_arr.dtype), axis=0
+            self.data_arr,
+            np.full((1, self.data_arr.shape[1]), np.nan, dtype=self.data_arr.dtype),
+            axis=0,
        )
        self.nan_idx = -1  # The last line is all NaN

@@ -347,19 +416,36 @@ class TSDataSampler:
                flt_data = flt_data.iloc[:, 0]
            # NOTE: bool(np.nan) is True !!!!!!!!
            # make sure reindex comes first. Otherwise extra NaN may appear.
+            flt_data = flt_data.swaplevel()
            flt_data = flt_data.reindex(self.data_index).fillna(False).astype(np.bool)
            self.flt_data = flt_data.values
            self.idx_map = self.flt_idx_map(self.flt_data, self.idx_map)
            self.data_index = self.data_index[np.where(self.flt_data)[0]]
        self.idx_map = self.idx_map2arr(self.idx_map)
-
-        self.start_idx, self.end_idx = self.data_index.slice_locs(
-            start=time_to_slc_point(start), end=time_to_slc_point(end)
+        self.idx_map, self.data_index = self.slice_idx_map_and_data_index(
+            self.idx_map, self.idx_df, self.data_index, start, end
        )
-        self.idx_arr = np.array(self.idx_df.values, dtype=np.float64)  # for better performance

+        self.idx_arr = np.array(self.idx_df.values, dtype=np.float64)  # for better performance
        del self.data  # save memory

+    @staticmethod
+    def slice_idx_map_and_data_index(
+        idx_map,
+        idx_df,
+        data_index,
+        start,
+        end,
+    ):
+        assert (
+            len(idx_map) == data_index.shape[0]
+        )  # make sure idx_map and data_index is same so index of idx_map can be used on data_index
+
+        start_row_idx, end_row_idx = idx_df.index.slice_locs(start=time_to_slc_point(start), end=time_to_slc_point(end))
+
+        time_flter_idx = (idx_map[:, 0] < end_row_idx) & (idx_map[:, 0] >= start_row_idx)
+        return idx_map[time_flter_idx], data_index[time_flter_idx]
+
    @staticmethod
    def idx_map2arr(idx_map):
        # pytorch data sampler will have better memory control without large dict or list
@@ -394,7 +480,7 @@ class TSDataSampler:
        Get the pandas index of the data, it will be useful in following scenarios
        - Special sampler will be used (e.g. user want to sample day by day)
        """
-        return self.data_index[self.start_idx : self.end_idx]
+        return self.data_index.swaplevel()  # to align the order of multiple index of original data received by __init__

    def config(self, **kwargs):
        # Config the attributes
@@ -409,25 +495,33 @@ class TSDataSampler:
        Parameters
        ----------
        data : pd.DataFrame
-            The dataframe with <datetime, DataFrame>
+            A DataFrame with index in order <instrument, datetime>
+
+                                      RSQR5     RESI5     WVMA5    LABEL0
+            instrument datetime
+            SH600000   2017-01-03  0.016389  0.461632 -1.154788 -0.048056
+                       2017-01-04  0.884545 -0.110597 -1.059332 -0.030139
+                       2017-01-05  0.507540 -0.535493 -1.099665 -0.644983
+                       2017-01-06 -1.267771 -0.669685 -1.636733  0.295366
+                       2017-01-09  0.339346  0.074317 -0.984989  0.765540

        Returns
        -------
        Tuple[pd.DataFrame, dict]:
            1) the first element:  reshape the original index into a <datetime(row), instrument(column)> 2D dataframe
-                instrument SH600000 SH600004 SH600006 SH600007 SH600008 SH600009  ...
+                instrument SH600000 SH600008 SH600009 SH600010 SH600011 SH600015  ...
                datetime
-                2021-01-11        0        1        2        3        4        5  ...
-                2021-01-12     4146     4147     4148     4149     4150     4151  ...
-                2021-01-13     8293     8294     8295     8296     8297     8298  ...
-                2021-01-14    12441    12442    12443    12444    12445    12446  ...
+                2017-01-03        0      242      473      717      NaN      974  ...
+                2017-01-04        1      243      474      718      NaN      975  ...
+                2017-01-05        2      244      475      719      NaN      976  ...
+                2017-01-06        3      245      476      720      NaN      977  ...
            2) the second element:  {<original index>: <row, col>}
        """
        # object incase of pandas converting int to float
        idx_df = pd.Series(range(data.shape[0]), index=data.index, dtype=object)
        idx_df = lazy_sort_index(idx_df.unstack())
        # NOTE: the correctness of `__getitem__` depends on columns sorted here
-        idx_df = lazy_sort_index(idx_df, axis=1)
+        idx_df = lazy_sort_index(idx_df, axis=1).T

        idx_map = {}
        for i, (_, row) in enumerate(idx_df.iterrows()):
@@ -485,11 +579,11 @@ class TSDataSampler:
        """
        # The the right row number `i` and col number `j` in idx_df
        if isinstance(idx, (int, np.integer)):
-            real_idx = self.start_idx + idx
-            if self.start_idx <= real_idx < self.end_idx:
+            real_idx = idx
+            if 0 <= real_idx < len(self.idx_map):
                i, j = self.idx_map[real_idx]  # TODO: The performance of this line is not good
            else:
-                raise KeyError(f"{real_idx} is out of [{self.start_idx}, {self.end_idx})")
+                raise KeyError(f"{real_idx} is out of [0, {len(self.idx_map)})")
        elif isinstance(idx, tuple):
            # <TSDataSampler object>["datetime", "instruments"]
            date, inst = idx
@@ -532,7 +626,10 @@ class TSDataSampler:
        # precision problems. It will not cause any problems in my tests at least
        indices = np.nan_to_num(indices.astype(np.float64), nan=self.nan_idx).astype(int)

-        data = self.data_arr[indices]
+        if (np.diff(indices) == 1).all():  # slicing instead of indexing for speeding up.
+            data = self.data_arr[indices[0] : indices[-1] + 1]
+        else:
+            data = self.data_arr[indices]
        if isinstance(idx, mtit):
            # if we get multiple indexes, addition dimension should be added.
            # <sample_idx, step_idx, feature_idx>
@@ -540,7 +637,7 @@ class TSDataSampler:
        return data

    def __len__(self):
-        return self.end_idx - self.start_idx
+        return len(self.idx_map)


 class TSDatasetH(DatasetH):
@@ -611,8 +708,15 @@ class TSDatasetH(DatasetH):
        else:
            flt_data = None

-        tsds = TSDataSampler(data=data, start=start, end=end, step_len=self.step_len, dtype=dtype, flt_data=flt_data)
+        tsds = TSDataSampler(
+            data=data,
+            start=start,
+            end=end,
+            step_len=self.step_len,
+            dtype=dtype,
+            flt_data=flt_data,
+        )
        return tsds


-__all__ = ["Optional"]
+__all__ = ["Optional", "Dataset", "DatasetH"]
--- a/qlib/data/dataset/handler.py
+++ b/qlib/data/dataset/handler.py
@@ -35,7 +35,7 @@ class DataHandler(Serializable):
    Example of the data:
    The multi-index of the columns is optional.

-    .. code-block:: python
+    .. code-block:: text

                                feature                                                            label
                                $close     $volume  Ref($close, 1)  Mean($close, 3)  $high-$low  LABEL0
@@ -137,7 +137,7 @@ class DataHandler(Serializable):
        # Setup data.
        # _data may be with multiple column index level. The outer level indicates the feature set name
        with TimeInspector.logt("Loading data"):
-            # make sure the fetch method is based on a index-sorted pd.DataFrame
+            # make sure the fetch method is based on an index-sorted pd.DataFrame
            self._data = lazy_sort_index(self.data_loader.load(self.instruments, self.start_time, self.end_time))
        # TODO: cache

@@ -160,13 +160,17 @@ class DataHandler(Serializable):
        selector : Union[pd.Timestamp, slice, str]
            describe how to select data by index
            It can be categories as following
+
            - fetch single index
            - fetch a range of index
+
                - a slice range
                - pd.Index for specific indexes

-            Following conflictions may occurs
-            - Does [20200101", "20210101"] mean selecting this slice or these two days?
+            Following conflicts may occur
+
+            - Does ["20200101", "20210101"] mean selecting this slice or these two days?
+
                - slice have higher priorities

        level : Union[str, int]
@@ -178,7 +182,8 @@ class DataHandler(Serializable):

                select a set of meaningful, pd.Index columns.(e.g. features, columns)

-                if col_set == CS_RAW:
+                - if col_set == CS_RAW:
+
                    the raw dataset will be returned.

            - if isinstance(col_set, List[str]):
@@ -186,8 +191,10 @@ class DataHandler(Serializable):
                select several sets of meaningful columns, the returned data has multiple levels

        proc_func: Callable
+
            - Give a hook for processing data before fetching
            - An example to explain the necessity of the hook:
+
                - A Dataset learned some processors to process data which is related to data segmentation
                - It will apply them every time when preparing data.
                - The learned processor require the dataframe remains the same format when fitting and applying
@@ -222,7 +229,7 @@ class DataHandler(Serializable):
        # This method is extracted for sharing in subclasses
        from .storage import BaseHandlerStorage  # pylint: disable=C0415

-        # Following conflictions may occurs
+        # Following conflicts may occur
        # - Does [20200101", "20210101"] mean selecting this slice or these two days?
        # To solve this issue
        #   - slice have higher priorities (except when level is none)
@@ -306,7 +313,7 @@ class DataHandler(Serializable):
        self, periods: int, min_periods: Optional[int] = None, **kwargs
    ) -> Iterator[Tuple[pd.Timestamp, pd.DataFrame]]:
        """
-        get a iterator of sliced data with given periods
+        get an iterator of sliced data with given periods

        Args:
            periods (int): number of periods.
@@ -326,18 +333,23 @@ class DataHandlerLP(DataHandler):
    DataHandler with **(L)earnable (P)rocessor**

    This handler will produce three pieces of data in pd.DataFrame format.
+
    - DK_R / self._data: the raw data loaded from the loader
    - DK_I / self._infer: the data processed for inference
    - DK_L / self._learn: the data processed for learning model.

    The motivation of using different processor workflows for learning and inference
    Here are some examples.
+
    - The instrument universe for learning and inference may be different.
    - The processing of some samples may rely on label (for example, some samples hit the limit may need extra processing or be dropped).
-        These processors only apply to the learning phase.
+
+        - These processors only apply to the learning phase.

    Tips to improve the performance of data handler
+
    - To reduce the memory cost
+
        - `drop_raw=True`: this will modify the data inplace on raw data;
    """

@@ -400,13 +412,13 @@ class DataHandlerLP(DataHandler):
        process_type: str
            PTYPE_I = 'independent'

-            - self._infer will processed by infer_processors
+            - self._infer will be processed by infer_processors

            - self._learn will be processed by learn_processors

            PTYPE_A = 'append'

-            - self._infer will processed by infer_processors
+            - self._infer will be processed by infer_processors

            - self._learn will be processed by infer_processors + learn_processors

@@ -482,12 +494,18 @@ class DataHandlerLP(DataHandler):
        Notation: (data)  [processor]

        # data processing flow of self.process_type == DataHandlerLP.PTYPE_I
-        (self._data)-[shared_processors]-(_shared_df)-[learn_processors]-(_learn_df)
-                                               \
-                                                -[infer_processors]-(_infer_df)
+
+        .. code-block:: text
+
+            (self._data)-[shared_processors]-(_shared_df)-[learn_processors]-(_learn_df)
+                                                   \\
+                                                    -[infer_processors]-(_infer_df)

        # data processing flow of self.process_type == DataHandlerLP.PTYPE_A
-        (self._data)-[shared_processors]-(_shared_df)-[infer_processors]-(_infer_df)-[learn_processors]-(_learn_df)
+
+        .. code-block:: text
+
+            (self._data)-[shared_processors]-(_shared_df)-[infer_processors]-(_infer_df)-[learn_processors]-(_learn_df)

        Parameters
        ----------
@@ -653,7 +671,9 @@ class DataHandlerLP(DataHandler):
    def cast(cls, handler: "DataHandlerLP") -> "DataHandlerLP":
        """
        Motivation
-        - A user create a datahandler in his customized package. Then he want to share the processed handler to other users without introduce the package dependency and complicated data processing logic.
+
+        - A user creates a datahandler in his customized package. Then he wants to share the processed handler to
+          other users without introduce the package dependency and complicated data processing logic.
        - This class make it possible by casting the class to DataHandlerLP and only keep the processed data

        Parameters
@@ -667,7 +687,7 @@ class DataHandlerLP(DataHandler):
            the converted processed data
        """
        new_hd: DataHandlerLP = object.__new__(DataHandlerLP)
-        new_hd.from_cast = True  # add a mark for the casted instance
+        new_hd.from_cast = True  # add a mark for the cast instance

        for key in list(DataHandlerLP.ATTR_MAP.values()) + [
            "instruments",
--- a/qlib/data/dataset/loader.py
+++ b/qlib/data/dataset/loader.py
@@ -27,7 +27,7 @@ class DataLoader(abc.ABC):

        Example of the data (The multi-index of the columns is optional.):

-            .. code-block:: python
+            .. code-block:: text

                                        feature                                                             label
                                        $close     $volume     Ref($close, 1)  Mean($close, 3)  $high-$low  LABEL0
@@ -278,7 +278,9 @@ class DataLoaderDH(DataLoader):
    - If you just want to load data from single datahandler, you can write them in single data handler

    TODO: What make this module not that easy to use.
+
    - For online scenario
+
        - The underlayer data handler should be configured. But data loader doesn't provide such interface & hook.
    """

--- a/qlib/data/dataset/processor.py
+++ b/qlib/data/dataset/processor.py
@@ -211,16 +211,19 @@ class MinMaxNorm(Processor):
        self.min_val = np.nanmin(df[cols].values, axis=0)
        self.max_val = np.nanmax(df[cols].values, axis=0)
        self.ignore = self.min_val == self.max_val
+        # To improve the speed, we set the value of `min_val` to `0` for the columns that do not need to be processed,
+        # and the value of `max_val` to `1`, when using `(x - min_val) / (max_val - min_val)` for uniform calculation,
+        # the columns that do not need to be processed will be calculated by `(x - 0) / (1 - 0)`,
+        # as you can see, the columns that do not need to be processed, will not be affected.
+        for _i, _con in enumerate(self.ignore):
+            if _con:
+                self.min_val[_i] = 0
+                self.max_val[_i] = 1
        self.cols = cols

    def __call__(self, df):
-        def normalize(x, min_val=self.min_val, max_val=self.max_val, ignore=self.ignore):
-            if (~ignore).all():
-                return (x - min_val) / (max_val - min_val)
-            for i in range(ignore.size):
-                if not ignore[i]:
-                    x[i] = (x[i] - min_val) / (max_val - min_val)
-            return x
+        def normalize(x, min_val=self.min_val, max_val=self.max_val):
+            return (x - min_val) / (max_val - min_val)

        df.loc(axis=1)[self.cols] = normalize(df[self.cols].values)
        return df
@@ -242,16 +245,19 @@ class ZScoreNorm(Processor):
        self.mean_train = np.nanmean(df[cols].values, axis=0)
        self.std_train = np.nanstd(df[cols].values, axis=0)
        self.ignore = self.std_train == 0
+        # To improve the speed, we set the value of `std_train` to `1` for the columns that do not need to be processed,
+        # and the value of `mean_train` to `0`, when using `(x - mean_train) / std_train` for uniform calculation,
+        # the columns that do not need to be processed will be calculated by `(x - 0) / 1`,
+        # as you can see, the columns that do not need to be processed, will not be affected.
+        for _i, _con in enumerate(self.ignore):
+            if _con:
+                self.std_train[_i] = 1
+                self.mean_train[_i] = 0
        self.cols = cols

    def __call__(self, df):
-        def normalize(x, mean_train=self.mean_train, std_train=self.std_train, ignore=self.ignore):
-            if (~ignore).all():
-                return (x - mean_train) / std_train
-            for i in range(ignore.size):
-                if not ignore[i]:
-                    x[i] = (x[i] - mean_train) / std_train
-            return x
+        def normalize(x, mean_train=self.mean_train, std_train=self.std_train):
+            return (x - mean_train) / std_train

        df.loc(axis=1)[self.cols] = normalize(df[self.cols].values)
        return df
@@ -289,9 +295,9 @@ class RobustZScoreNorm(Processor):
        X = df[self.cols]
        X -= self.mean_train
        X /= self.std_train
-        df[self.cols] = X
        if self.clip_outlier:
-            df.clip(-3, 3, inplace=True)
+            X = np.clip(X, -3, 3)
+        df[self.cols] = X
        return df


@@ -313,7 +319,7 @@ class CSZScoreNorm(Processor):
            self.fields_group = [self.fields_group]
        for g in self.fields_group:
            cols = get_group_columns(df, g)
-            df[cols] = df[cols].groupby("datetime").apply(self.zscore_func)
+            df[cols] = df[cols].groupby("datetime", group_keys=False).apply(self.zscore_func)
        return df


@@ -361,7 +367,7 @@ class CSZFillna(Processor):

    def __call__(self, df):
        cols = get_group_columns(df, self.fields_group)
-        df[cols] = df[cols].groupby("datetime").apply(lambda x: x.fillna(x.mean()))
+        df[cols] = df[cols].groupby("datetime", group_keys=False).apply(lambda x: x.fillna(x.mean()))
        return df


--- a/qlib/data/dataset/storage.py
+++ b/qlib/data/dataset/storage.py
@@ -8,7 +8,8 @@ from .utils import get_level_index, fetch_df_by_index, fetch_df_by_col


 class BaseHandlerStorage:
-    """Base data storage for datahandler
+    """
+    Base data storage for datahandler
    - pd.DataFrame is the default data storage format in Qlib datahandler
    - If users want to use custom data storage, they should define subclass inherited BaseHandlerStorage, and implement the following method
    """
--- a/qlib/data/filter.py
+++ b/qlib/data/filter.py
@@ -272,8 +272,8 @@ class NameDFilter(SeriesDFilter):
    def __init__(self, name_rule_re, fstart_time=None, fend_time=None):
        """Init function for name filter class

-        params:
-        ------
+        Parameters
+        ----------
        name_rule_re: str
            regular expression for the name rule.
        """
@@ -325,8 +325,8 @@ class ExpressionDFilter(SeriesDFilter):
    def __init__(self, rule_expression, fstart_time=None, fend_time=None, keep=False):
        """Init function for expression filter class

-        params:
-        ------
+        Parameters
+        ----------
        fstart_time: str
            filter the feature starting from this time.
        fend_time: str
--- a/Show More
+++ b/Show More