1
0
mirror of https://github.com/microsoft/qlib.git synced 2026-07-03 11:00:57 +08:00

release-0.5.0 (#1)

* init commit

* change the version number

* rich the docs&fix cache docs

* update index readme

* Modify cache class name

* Modify sharpe to information_ratio

* Modify Group- to Group

* add the description of graphical results & fix the backtest docs

* fix docs in details

* update docs

* Update introduction.rst

* Update README.md

* Update introduction.rst

* Update introduction.rst

* Update introduction.rst

* Update installation.rst

* Update installation.rst

* Update initialization.rst

* Update getdata.rst

* Update integration.rst

* Update initialization.rst

* Update getdata.rst

* Update estimator.rst

Modify some typos.

* Update README.md

Modify the typos.

* Update initialization.rst

* Update data.rst

* Update report.rst

* Update estimator.rst

* Update cumulative_return.py

* Update model.rst

* Update rank_label.py

* Update cumulative_return.py

* Update strategy.rst

* Update getdata.rst

* Update backtest.rst

* Update integration.rst

* Update getdata.rst

* Update introduction.rst

* Update introduction.rst

* Update README.md

* Update report.rst

* Update integration.rst

Fix typos

* Update installation.rst

Fix typos

* Update getdata.rst

* Update initialization.rst

Fix typos.

* add quick start docs&fix detials

* fix estimator docs & fix strategy docs

* fix the cahce in data.rst

* update documents

* Fix Corr && Rsquare

* fix data retrival example to csi300 & fix a data bug

* fix filter bug

* Fix data collector

* Modift model args

* add the log & fix README.md\quick.rst

* add enviroment depend & add intoduction of qlib-server online mode

* fix image center fomat & set log_only of docs is True

* fix README.md format

* update data preparation & readme logo image

* get_data support version

* Modify analysis names

* Modify analysis graph

* update report.rst & data.rst

* commmit estimator for merge

* minimal requirements

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update READEME.md

* Update READEME.md

* update estimator

* Fix doc urls

* fix get_data.py docstring

* update test_get_data.py

* Upate docs

* Upate docs

* Upate docs

Co-authored-by: bxdd <bxddream@gmail.com>
Co-authored-by: zhupr <zhu.pengrong@foxmail.com>
Co-authored-by: Wendi Li <wendili.academic@qq.com>
Co-authored-by: Dingsu Wang <dingsu.wang@gmail.com>
Co-authored-by: bxdd <45119470+bxdd@users.noreply.github.com>
Co-authored-by: cslwqxx <cslwqxx@users.noreply.github.com>
This commit is contained in:
you-n-g
2020-09-23 23:01:39 -05:00
committed by GitHub
parent 99ebd87cba
commit de9e13b171
82 changed files with 1580 additions and 1145 deletions

View File

@@ -7,7 +7,7 @@ Intraday Trading: Model&Strategy Testing
Introduction
===================
``Intraday Trading`` is designed to test models and strategies, which help users to check the performance of custom model/strategy.
``Intraday Trading`` is designed to test models and strategies, which help users to check the performance of a custom model/strategy.
.. note::
@@ -19,11 +19,11 @@ Introduction
Example
===========================
Users need to generate a prediction score(a pandas DataFrame) with MultiIndex<instrument, datetime> and a `score` column. And users need to assign a strategy used in backtest, if strategy is not assigned,
Users need to generate a `prediction score`(a pandas DataFrame) with MultiIndex<instrument, datetime> and a `score` column. And users need to assign a strategy used in backtest, if strategy is not assigned,
a `TopkDropoutStrategy` strategy with `(topk=50, n_drop=5, risk_degree=0.95, limit_threshold=0.0095)` will be used.
If ``Strategy`` module is not user's interested part, `TopkDropoutStrategy` is enough.
If ``Strategy`` module is not users' interested part, `TopkDropoutStrategy` is enough.
The simple example with default strategy is as follows.
The simple example of the default strategy is as follows.
.. code-block:: python
@@ -31,14 +31,14 @@ The simple example with default strategy is as follows.
# pred_score is the prediction score
report, positions = backtest(pred_score, topk=50, n_drop=0.5, verbose=False, limit_threshold=0.0095)
To know more about backtesting with specific strategy, please refer to `Strategy <strategy.html>`_.
To know more about backtesting with a specific strategy, please refer to `Strategy <strategy.html>`_.
To know more about the prediction score `pred_score` output by ``Model``, please refer to `Interday Model: Model Training & Prediction <model.html>`_.
Prediction Score
-----------------
The prediction score is a pandas DataFrame. Its index is <instrument(str), datetime(pd.Timestamp)> and it must
The `prediction score` is a pandas DataFrame. Its index is <instrument(str), datetime(pd.Timestamp)> and it must
contains a `score` column.
A prediction sample is shown as follows.
@@ -67,37 +67,44 @@ The backtest results are in the following form:
.. code-block:: python
sub_bench mean 0.000662
std 0.004487
annual 0.166720
sharpe 2.340526
mdd -0.080516
sub_cost mean 0.000577
std 0.004482
annual 0.145392
sharpe 2.043494
mdd -0.083584
risk
excess_return_without_cost mean 0.000605
std 0.005481
annualized_return 0.152373
information_ratio 1.751319
max_drawdown -0.059055
excess_return_with_cost mean 0.000410
std 0.005478
annualized_return 0.103265
information_ratio 1.187411
max_drawdown -0.075024
- `sub_bench`
Returns of the portfolio without deduction of fees
- `sub_cost`
Returns of the portfolio with deduction of fees
- `mean`
Mean value of the returns sequence(difference sequence of assets).
- `excess_return_without_cost`
- `mean`
Mean value of the `CAR` (cumulative abnormal return) without cost
- `std`
The `Standard Deviation` of `CAR` (cumulative abnormal return) without cost.
- `annualized_return`
The `Annualized Rate` of `CAR` (cumulative abnormal return) without cost.
- `information_ratio`
The `Information Ratio` without cost. please refer to `Information Ratio IR <https://www.investopedia.com/terms/i/informationratio.asp>`_.
- `max_drawdown`
The `Maximum Drawdown` of `CAR` (cumulative abnormal return) without cost, please refer to `Maximum Drawdown (MDD) <https://www.investopedia.com/terms/m/maximum-drawdown-mdd.asp>`_.
- `std`
Standard deviation of the returns sequence(difference sequence of assets).
- `excess_return_with_cost`
- `mean`
Mean value of the `CAR` (cumulative abnormal return) series with cost
- `std`
The `Standard Deviation` of `CAR` (cumulative abnormal return) series with cost.
- `annualized_return`
The `Annualized Rate` of `CAR` (cumulative abnormal return) with cost.
- `information_ratio`
The `Information Ratio` with cost. please refer to `Information Ratio IR <https://www.investopedia.com/terms/i/informationratio.asp>`_.
- `max_drawdown`
The `Maximum Drawdown` of `CAR` (cumulative abnormal return) with cost, please refer to `Maximum Drawdown (MDD) <https://www.investopedia.com/terms/m/maximum-drawdown-mdd.asp>`_.
- `annual`
Average annualized returns of the portfolio.
- `ir`
Information Ratio, please refer to `Information Ratio IR <https://www.investopedia.com/terms/i/informationratio.asp>`_.
- `mdd`
Maximum Drawdown, please refer to `Maximum Drawdown (MDD) <https://www.investopedia.com/terms/m/maximum-drawdown-mdd.asp>`_.
Reference

View File

@@ -6,79 +6,106 @@ Data Layer: Data Framework&Usage
Introduction
============================
``Data Layer`` is designed to download raw data, retrieve data, construct datasets and get frequently-used data.
``Data Layer`` provides user-friendly APIs to manage and retrieve data. It provides high-performance data infrastructure.
Also, users can building formulaic alphas with ``Data Layer`` easliy. If users are interesting formulaic alphas, please refer to `Building Formulaic Alphas <../advanced/alpha.html>`_.
It is designed for quantitative investment. For example, users could build formulaic alphas with ``Data Layer`` easily. Please refer to `Building Formulaic Alphas <../advanced/alpha.html>`_ for more details.
The ``Data Layer`` framework includes four components as follows.
The introduction of ``Data Layer`` includes the following parts.
- Raw Data
- Data Preparation
- Data API
- Data Handler
- Cache
- Data and Cache File Structure
Raw Data
Data Preparation
============================
``Qlib`` provides the script ``scripts/get_data.py`` to download the raw data that will be used to initialize the qlib package, please refer to `Initialization <../start/initialization.rst>`_.
Qlib Format Data
------------------
When ``Qlib`` is initialized, users can choose china-stock mode or US-stock mode, please refer to `Initialization <../start/initialization.rst>`_.
We've specially designed a data structure to manage financial data, please refer to the `File storage design section in Qlib paper <https://arxiv.org/abs/2009.11189>`_ for detailed information.
Such data will be stored with filename suffix `.bin` (We'll call them `.bin` file, `.bin` format or qlib format). `.bin` file is designed for scientific computing on finance data
China-Stock Market Mode
Qlib Format Dataset
--------------------
``Qlib`` has provided an off-the-shelf dataset in `.bin` format, users could use the script ``scripts/get_data.py`` to download the dataset as follows.
.. code-block:: bash
python scripts/get_data.py qlib_data_cn --target_dir ~/.qlib/qlib_data/cn_data
After running the above command, users can find china-stock data in Qlib format in the ``~/.qlib/csv_data/cn_data`` directory.
``Qlib`` also provides the scripts in ``scripts/data_collector`` to help users crawl the latest data on the Internet and convert it to qlib format.
When ``Qlib`` is initialized with this dataset, users could build and evaluate their own models with it. Please refer to `Initialization <../start/initialization.html>`_ for more details.
Converting CSV Format into Qlib Format
-------------------------------------------
``Qlib`` has provided the script ``scripts/dump_bin.py`` to convert data in CSV format into `.bin` files(Qlib format).
Users can download the china-stock data in CSV format as follows for reference to the CSV format.
.. code-block:: bash
python scripts/get_data.py csv_data_cn --target_dir ~/.qlib/csv_data/cn_data
Supposed that users prepare their CSV format data in the directory ``~/.qlib/csv_data/my_data``, they can run the following command to start the conversion.
.. code-block:: bash
python scripts/dump_bin.py dump --csv_path ~/.qlib/csv_data/my_data --qlib_dir ~/.qlib/qlib_data/my_data --include_fields open,close,high,low,volume,factor
After conversion, users can find their Qlib format data in the directory `~/.qlib/qlib_data/my_data`.
.. note::
The arguments of `--include_fields` should correspond with the columns names of CSV files. The columns names of dataset provided by ``Qlib`` includes open,close,high,low,volume,factor.
- `open`
The opening price
- `close`
The closing price
- `high`
The highest price
- `low`
The lowest price
- `volume`
The trading volume
- `factor`
The Restoration factor
China-Stock Mode & US-Stock Mode
--------------------------------
If users use ``Qlib`` in china-stock mode, china-stock data is required. The script ``scripts/get_data.py`` can be used to download china-stock data. If users want to use ``Qlib`` in china-stock mode, they need to do as follows.
- If users use ``Qlib`` in china-stock mode, china-stock data is required. Users can use ``Qlib`` in china-stock mode according to the following steps:
- Download china-stock in qlib format, please refer to section `Qlib Format Dataset <#qlib-format-dataset>`_.
- Initialize ``Qlib`` in china-stock mode
Supposed that users download their Qlib format data in the directory ``~/.qlib/csv_data/cn_data``. Users only need to initialize ``Qlib`` as follows.
.. code-block:: python
- Download data in qlib format
Run the following command to download china-stock data in csv format.
.. code-block:: bash
python scripts/get_data.py qlib_data_cn --target_dir ~/.qlib/qlib_data/cn_data
Users can find china-stock data in qlib format in the'~/.qlib/csv_data/cn_data' directory.
- Initialize ``Qlib`` in china-stock mode
Users only need to initialize ``Qlib`` as follows.
.. code-block:: python
from qlib.config import REG_CN
qlib.init(provider_uri='~/.qlib/qlib_data/cn_data', region=REG_CN)
from qlib.config import REG_CN
qlib.init(provider_uri='~/.qlib/qlib_data/cn_data', region=REG_CN)
US-Stock Market Mode
-------------------------
If users use ``Qlib`` in US-stock mode, US-stock data is required. ``Qlib`` does not provide script to download US-stock data. If users want to use ``Qlib`` in US-stock market mode, they need to do as follows.
- Prepare data in csv format
Users need to prepare US-stock data in csv format by themselves, which is in the same format as the china-stock data in csv format. Please download the china-stock data in csv format as follows for reference of format.
.. code-block:: bash
python scripts/get_data.py csv_data_cn --target_dir ~/.qlib/csv_data/cn_data
- Convert data from csv format to ``Qlib`` format
``Qlib`` provides the script ``scripts/dump_bin.py`` to convert data from csv format to qlib format.
Assuming that the users store the US-stock data in csv format in path '~/.qlib/csv_data/us_data', they need to execute the following command to convert the data from csv format to ``Qlib`` format:
.. code-block:: bash
python scripts/dump_bin.py dump --csv_path ~/.qlib/csv_data/us_data --qlib_dir ~/.qlib/qlib_data/us_data --include_fields open,close,high,low,volume,factor
- Initialize ``Qlib`` in US-stock mode
Users only need to initialize ``Qlib`` as follows.
.. code-block:: python
from qlib.config import REG_US
qlib.init(provider_uri='~/.qlib/qlib_data/us_data', region=REG_US)
- If users use ``Qlib`` in US-stock mode, US-stock data is required. ``Qlib`` does not provide a script to download US-stock data. Users can use ``Qlib`` in US-stock mode according to the following steps:
- Prepare data in CSV format
- Convert data from CSV format to Qlib format, please refer to section `Converting CSV Format into Qlib Format <#converting-csv-format-into-qlib-format>`_.
- Initialize ``Qlib`` in US-stock mode
Supposed that users prepare their Qlib format data in the directory ``~/.qlib/csv_data/us_data``. Users only need to initialize ``Qlib`` as follows.
.. code-block:: python
Please refer to `Script API <../reference/api.html>`_ for more details.
from qlib.config import REG_US
qlib.init(provider_uri='~/.qlib/qlib_data/us_data', region=REG_US)
Data API
========================
@@ -90,10 +117,10 @@ Users can use APIs in ``qlib.data`` to retrieve data, please refer to `Data Retr
Feature
------------------
``Qlib`` provides `Feature` and `ExpressionOps` to fetch the features according to users' need.
``Qlib`` provides `Feature` and `ExpressionOps` to fetch the features according to users' needs.
- `Feature`
Load data from data provider.
Load data from the data provider. User can get the features like `$high`, `$low`, `$open`, `$close`, .etc, which should correspond with the arguments of `--include_fields`, please refer to section `Converting CSV Format into Qlib Format <#converting-csv-format-into-qlib-format>`_.
- `ExpressionOps`
`ExpressionOps` will use operator for feature construction.
@@ -103,7 +130,7 @@ To know more about ``Feature``, please refer to `Feature API <../reference/api.
Filter
-------------------
``Qlib`` provides `NameDFilter` and `ExpressionDFilter` to filter the instruments according to users' need.
``Qlib`` provides `NameDFilter` and `ExpressionDFilter` to filter the instruments according to users' needs.
- `NameDFilter`
Name dynamic instrument filter. Filter the instruments based on a regulated name format. A name rule regular expression is required.
@@ -121,14 +148,14 @@ To know more about ``Filter``, please refer to `Filter API <../reference/api.htm
API
-------------
To know more about ``Data Api``, please refer to `Data Api <../reference/api.html>`_.
To know more about ``Data API``, please refer to `Data API <../reference/api.html>`_.
Data Handler
=================
``Data Handler`` is a part of ``estimator`` and can also be used as a single module.
Users can use ``Data Handler`` in an automatic workflow by ``Estimator``, refer to `Estimator <estimator.html>`_ for more details.
``Data Handler`` can be used to load raw data, prepare features and label columns, preprocess data(standardization, remove NaN, etc.), split training, validation, and test sets. It is a subclass of ``qlib.contrib.estimator.handler.BaseDataHandler``, which provides some interfaces, for example:
Also, ``Data Handler`` can be used as an independent module, by which users can easily preprocess data(standardization, remove NaN, etc.) and build datasets. It is a subclass of ``qlib.contrib.estimator.handler.BaseDataHandler``, which provides some interfaces as follows.
Base Class & Interface
----------------------
@@ -139,20 +166,20 @@ Qlib provides a base class `qlib.contrib.estimator.BaseDataHandler <../reference
Implement the interface to load the data features.
- `setup_label`
Implement the interface to load the data labels and calculate user's labels.
Implement the interface to load the data labels and calculate the users' labels.
- `setup_processed_data`
Implement the interface for data preprocessing, such as preparing feature columns, discarding blank lines, and so on.
Qlib also provides two functions to help user init the data handler, user can override them for user's need.
Qlib also provides two functions to help users init the data handler, users can override them for users' needs.
- `_init_kwargs`
User can init the kwargs of the data handler in this function, some kwargs may be used when init the raw df.
Users can init the kwargs of the data handler in this function, some kwargs may be used when init the raw df.
Kwargs are the other attributes in data.args, like dropna_label, dropna_feature
- `_init_raw_df`
User can init the raw df, feature names and label names of data handler in this function.
If the index of feature df and label df are not same, user need to override this method to merge them (e.g. inner, left, right merge).
Users can init the raw df, feature names, and label names of data handler in this function.
If the index of feature df and label df are not same, users need to override this method to merge them (e.g. inner, left, right merge).
If users want to load features and labels by config, users can inherit ``qlib.contrib.estimator.handler.ConfigDataHandler``, ``Qlib`` also have provided some preprocess method in this subclass.
If users want to use qlib data, `QLibDataHandler` is recommended. Users can inherit their custom class from `QLibDataHandler`, which is also a subclass of `ConfigDataHandler`.
@@ -160,7 +187,8 @@ If users want to use qlib data, `QLibDataHandler` is recommended. Users can inhe
Usage
--------------
'Data Handler' can be used as a single module, which provides the following mehtod:
``Data Handler`` can be used as a single module, which provides the following mehtods:
- `get_split_data`
- According to the start and end dates, return features and labels of the pandas DataFrame type used for the 'Model'
@@ -178,21 +206,21 @@ Example
Know more about how to run ``Data Handler`` with ``estimator``, please refer to `Estimator <estimator.html#about-data>`_.
Qlib provides implemented data handler `QLibDataHandlerV1`. The following example shows how to run 'QLibDataHandlerV1' as a single module.
Qlib provides implemented data handler `QLibDataHandlerClose`. The following example shows how to run `QLibDataHandlerV1` as a single module.
.. note:: User needs to initialize ``Qlib`` with `qlib.init` first, please refer to `initialization <initialization.rst>`_.
.. note:: Users need to initialize ``Qlib`` with `qlib.init` first, please refer to `initialization <../start/initialization.html>`_.
.. code-block:: Python
from qlib.contrib.estimator.handler import QLibDataHandlerV1
from qlib.contrib.estimator.handler import QLibDataHandlerClose
from qlib.contrib.model.gbdt import LGBModel
DATA_HANDLER_CONFIG = {
"dropna_label": True,
"start_date": "2007-01-01",
"end_date": "2020-08-01",
"market": "csi500",
"market": "csi300",
}
TRAINER_CONFIG = {
@@ -204,7 +232,7 @@ Qlib provides implemented data handler `QLibDataHandlerV1`. The following exampl
"test_end_date": "2020-08-01",
}
exampleDataHandler = QLibDataHandlerV1(**DATA_HANDLER_CONFIG)
exampleDataHandler = QLibDataHandlerClose(**DATA_HANDLER_CONFIG)
# example of 'get_split_data'
x_train, y_train, x_validate, y_validate, x_test, y_test = exampleDataHandler.get_split_data(**TRAINER_CONFIG)
@@ -222,22 +250,17 @@ Also, the above example has been given in ``examples.estimator.train_backtest_an
API
---------
To know more abot ``Data Handler``, please refer to `Data Handler API <../reference/api.html#handler>`_.
To know more about ``Data Handler``, please refer to `Data Handler API <../reference/api.html#handler>`_.
Cache
==========
``Cache`` is an optional module that helps accelerate providing data by saving some frequently-used data as cache file.
``Cache`` is an optional module that helps accelerate providing data by saving some frequently-used data as cache file. ``Qlib`` provides a `Memcache` class to cache the most-frequently-used data in memory, an inheritable `ExpressionCache` class and an inheritable `DatasetCache` class.
Memory Cache
--------------
Global Memory Cache
---------------------
Base Class & Interface
~~~~~~~~~~~~~~~~~~~~~~~
``Qlib`` provides a `Memcache` class to cache the most-frequently-used data in memory, an inheritable `ExpressionCache` class, and an inheritable `DatasetCache` class.
`Memcache` is a memory cache mechanism that composes of three `MemCacheUnit` instances to cache **Calendar**, **Instruments**, and **Features**. The MemCache is defined globally in `cache.py` as `H`. User can use `H['c'], H['i'], H['f']` to get/set memcache.
`Memcache` is a global memory cache mechanism that composes of three `MemCacheUnit` instances to cache **Calendar**, **Instruments**, and **Features**. The `MemCache` is defined globally in `cache.py` as `H`. Users can use `H['c'], H['i'], H['f']` to get/set `memcache`.
.. autoclass:: qlib.data.cache.MemCacheUnit
:members:
@@ -246,60 +269,42 @@ Base Class & Interface
:members:
Disk Cache
--------------
Base Class & Interface
~~~~~~~~~~~~~~~~~~~~~~~
`ExpressionCache` is a disk cache mechanism that saves expressions such as **Mean($close, 5)**. Users can inherit this base class to define their own cache mechanism. Users need to override `self._uri` method to define how their cache file path is generated, `self._expression` method to define what data they want to cache and how to cache it.
`DatasetCache` is a disk cache mechanism that saves datasets. A certain dataset is regulated by a stockpool configuration (or a series of instruments, though not recommended), a list of expressions or static feature fields, the start time and end time for the collected features and the frequency. Users need to override `self._uri` method to define how their cache file path is generated, `self._expression` method to define what data they want to cache and how to cache it.
`ExpressionCache` and `DatasetCache` actually provides the same interfaces with `ExpressionProvider` and `DatasetProvider` so that the disk cache layer is transparent to users and will only be used if they want to define their own cache mechanism. The users can plug the cache mechanism into the server system by assigning the cache class they want to use in `config.py`:
.. code-block:: python
'ExpressionCache': 'ServerExpressionCache',
'DatasetCache': 'ServerDatasetCache',
Users can find the cache interface here.
ExpressionCache
^^^^^^^^^^^^^^^^^^^^
-----------------
`ExpressionCache` is a cache mechanism that saves expressions such as **Mean($close, 5)**. Users can inherit this base class to define their own cache mechanism that saves expressions according to the following steps.
- Override `self._uri` method to define how the cache file path is generated
- Override `self._expression` method to define what data will be cached and how to cache it.
The following shows the details about the interfaces:
.. autoclass:: qlib.data.cache.ExpressionCache
:members:
``Qlib`` has currently provided implemented disk cache `DiskExpressionCache` which inherits from `ExpressionCache` . The expressions data will be stored in the disk.
DatasetCache
^^^^^^^^^^^^^^^^^^^^
-----------------
`DatasetCache` is a cache mechanism that saves datasets. A certain dataset is regulated by a stock pool configuration (or a series of instruments, though not recommended), a list of expressions or static feature fields, the start time, and end time for the collected features and the frequency. Users can inherit this base class to define their own cache mechanism that saves datasets according to the following steps.
- Override `self._uri` method to define how their cache file path is generated
- Override `self._expression` method to define what data will be cached and how to cache it.
The following shows the details about the interfaces:
.. autoclass:: qlib.data.cache.DatasetCache
:members:
``Qlib`` has currently provided implemented disk cache `DiskDatasetCache` which inherits from `DatasetCache` . The datasets data will be stored in the disk.
Implemented Disk Cache
~~~~~~~~~~~~~~~~~~~~~~~
.. note::
If the user does not use QlibServer, please ignore the content of this section
Qlib has currently provided `ServerExpressionCache` class and `ServerDatasetCache` class as the cache mechanisms used for QlibServer. The class interface and file structure designed for server cache mechanism is listed below.
DiskExpressionCache
^^^^^^^^^^^^^^^^^^^^
.. autoclass:: qlib.data.cache.ServerExpressionCache
DiskDatasetCache
^^^^^^^^^^^^^^^^^^^^
.. autoclass:: qlib.data.cache.ServerDatasetCache
Data and Cache File Structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
==================================
We've specially designed a file structure to manage data and cache, please refer to the `File storage design section in Qlib paper <https://arxiv.org/abs/2009.11189>`_ for detailed information.The file structure of data and cache is listed as follows.
.. code-block:: json
@@ -317,7 +322,7 @@ Data and Cache File Structure
- close.day.bin
- ...
- ...
[cached data] updated by server when raw data is updated
[cached data] updated when raw data is updated
- calculated features/
- sh600000/
- [hash(instrtument, field_expression, freq)]
@@ -331,3 +336,5 @@ Data and Cache File Structure
- .index : an assorted index file recording the line index of all calendars
- ...
.. TODO: refer to paper

View File

@@ -7,10 +7,10 @@ Estimator: Workflow Management
Introduction
===================
The components in `Qlib Framework <../introduction/introduction.html#framework>`_ is designed in a loosely-coupled way. Users could build their own quant research workflow with these components like `Example <http://TODO_URL>`_
The components in `Qlib Framework <../introduction/introduction.html#framework>`_ are designed in a loosely-coupled way. Users could build their own Quant research workflow with these components like `Example <https://github.com/microsoft/qlib/blob/main/examples/train_and_backtest.py>`_
Besides, ``Qlib`` provides more user-friendly interfaces named ``Estimator`` to automatically run the whole workflow defined by a config. A concrete execution of the whole workflow is called an `experiment`.
Besides, ``Qlib`` provides more user-friendly interfaces named ``Estimator`` to automatically run the whole workflow defined by configuration. A concrete execution of the whole workflow is called an `experiment`.
With ``Estimator``, user can easily run an `experiment`, which includes the following steps:
- Data
@@ -22,18 +22,13 @@ With ``Estimator``, user can easily run an `experiment`, which includes the foll
- Saving & loading
- Evaluation(Back-testing)
For each `experiment`, ``Qlib`` will capture the details of model training, performance evalution results and basic infomation(e.g. names, ids). The captured data will be stored in backend-storge(disk or database).
For each `experiment`, ``Qlib`` will capture the model training details, performance evaluation results and basic information (e.g. names, ids). The captured data will be stored in backend-storage (disk or database).
Example
Complete Example
===================
The following is an example:
.. note:: Make sure install the latest version of `qlib`, please refer to `Qlib installation <../start/installation.html>`_.
If users want to use the models and data provided by `Qlib`, they only need to do as follows.
First, Write a simple configuration file as following,
Before getting into details, here is a complete example of ``Estimator``, which defines the workflow in typical Quant research.
Below is a typical config file of ``Estimator``.
.. code-block:: YAML
@@ -90,36 +85,37 @@ First, Write a simple configuration file as following,
provider_uri: "~/.qlib/qlib_data/cn_data"
region: "cn"
Then run the following command:
After saving the config into `configuration.yaml`, users could start the workflow and test their ideas with a single command below.
.. code-block:: bash
estimator -c configuration.yaml
.. note:: 'estimator' is a built-in command of our program.
.. note:: `estimator` will be placed in your $PATH directory when installing ``Qlib``.
Configuration File
===================
Before using ``estimator``, users need to prepare a configuration file. The following shows how to prepare each part of the configuration file.
Let's get into details of ``Estimator`` in this section.
Experiment Field
Before using ``estimator``, users need to prepare a configuration file. The following content shows how to prepare each part of the configuration file.
Experiment Section
--------------------
First, the configuration file needs to have a field about the experiment, whose key is `experiment`. This field and its contents determine how `estimator` tracks and persists this `experiment`. ``Qlib`` used `sacred`, a lightweight open-source tool designed to configure, organize, generate logs, and manage experiment results. The field `experiment` will determine the partial behavior of `sacred`.
At first, the configuration file needs to contain a section named `experiment` about the basic information. This section describes how `estimator` tracks and persists current `experiment`. ``Qlib`` used `sacred`, a lightweight open-source tool, to configure, organize, generate logs, and manage experiment results. Partial behaviors of `sacred` will base on the `experiment` section.
Usually, in the running process of `estimator`, those following will be managed by `sacred`:
Following files will be saved by `sacred` after `estimator` finish an `experiment`:
- `model.bin`, model binary file
- `pred.pkl`, model prediction result file
- `analysis.pkl`, backtest performance analysis file
- `positions.pkl`, backtest position record file
- `positions.pkl`, backtest position records file
- `run`, the experiment information object, usually contains some meta information such as the experiment name, experiment date, etc.
Usually, it should contain the following:
Here is the typical configuration of `experiment section`
.. code-block:: YAML
@@ -138,14 +134,14 @@ Usually, it should contain the following:
The meaning of each field is as follows:
- `name`
The experiment name, str type, `sacred` will use this experiment name as an identifier for some important internal processes. Usually, users can see this field in `sacred` by `run` object. The default value is `test_experiment`.
The experiment name, str type, `sacred <https://github.com/IDSIA/sacred>_` will use this experiment name as an identifier for some important internal processes. Users can find this field in `run` object of `sacred`. The default value is `test_experiment`.
- `observer_type`
Observer type, str type, there are two values which are `file_storage` and `mongo` respectively. If it is `file_storage`, all the above-mentioned managed contents will be stored in the `dir` directory, separated by the number of times of experiments as a subfolder. If it is `mongo`, the content will be stored in the database. The default is `file_storage`.
- `observer_type`
Observer type, str type, there are two choices which include `file_storage` and `mongo` respectively. If `file_storage` is selected, all the above-mentioned managed contents will be stored in the `dir` directory, separated by the number of times of experiments as a subfolder. If it is `mongo`, the content will be stored in the database. The default is `file_storage`.
- For `file_storage` observer.
- `dir`
Directory url, str type, directory for `file_storage` observer type, files captures and managed by sacred with observer type of `file_storage` will be save to this directory, default is the directory of `config.json`.
- `dir`
Directory URL, str type, directory for `file_storage` observer type, files captured and managed by sacred with `file_storage` observer will be saved to this directory, which is the same directory as `config.json` by default.
- For `mongo` observer.
- `mongo_url`
@@ -155,15 +151,17 @@ The meaning of each field is as follows:
Database name, str type, required if the observer type is `mongo`.
- `finetune`
Estimator will produce a model based on this flag
``Estimator``'s behaviors to train models will base on this flag.
If you just want to train models from scratch each time instead of based on existing models, please leave `finetune=false`. Otherwise please read the
details below.
The following table is the processing logic for different situations.
========== =========================================== ==================================== =========================================== ==========================================
. Static Rolling
. Finetune=True Finetune=False Finetune=True Finetune=False
. finetune:true finetune:false finetune:true finetune:false
========== =========================================== ==================================== =========================================== ==========================================
Train - Need to provide model(Static or Rolling) - No need to provide model - Need to provide model(Static or Rolling) - Need to provide model(Static or Rolling)
Train - Need to provide model (Static or Rolling) - No need to provide model - Need to provide model (Static or Rolling) - Need to provide model (Static or Rolling)
- The args in model section will be - The args in model section will be - The args in model section will be - The args in model section will be
used for finetuning used for training used for finetuning used for finetuning
- Update based on the provided model - Train model from scratch - Update based on the provided model - Based on the provided model update
@@ -185,34 +183,40 @@ The meaning of each field is as follows:
3. If `loader.model_index` is None:
- In 'Static Finetune=True', if provide 'Rolling', use the last model to update.
- For RollingTrainer with Finetune=Ture.
- For `RollingTrainer` with Finetune=True.
- If StaticTrainer is used in loader, the model will be used for initialization for finetuning.
- If `StaticTrainer` is used in loader, the model will be used for initialization for finetuning.
- If RollingTrainer is used in loader, the existing models will be used without any modification and the new models will be initialized with the model in the last period and finetune one by one.
- If `RollingTrainer` is used in loader, the existing models will be used without any modification and the new models will be initialized with the model in the last period and finetune one by one.
- `exp_info_path`
experiment info save path, str type, save the experiment info and model prediction score after the experiment is finished. Optional parameter, the default value is `config_file_dir/ex_name/exp_info.json`
save path of experiment info, str type, save the experiment info and model `prediction score` after the experiment is finished. Optional parameter, the default value is `<config_file_dir>/ex_name/exp_info.json`.
- `mode`
`train` or `test`, str type, if `mode` is test, it will load the model according to the parameters of `loader`. The default value is `train`.
Also note that when the load model failed, it will `fit` model.
`train` or `test`, str type.
- `test mode` is designed for inference. Under `test mode`, it will load the model according to the parameters of `loader` and skip model training.
- `train model` is the default value. It will train new models by default and
Please note that when it fails to load model, it will fall back to `fit` model.
.. note::
if users choose `mode` test, they need to make sure:
if users choose ` test mode`, they need to make sure:
- The loader of `test_start_date` must be less than or equal to the current `test_start_date`.
- If other parameters of the `loader` model args are different, a warning will appear.
- `loader`
If the `mode` is `test` or `finetune` is `true`, it will be used.
If you just want to train models from scratch each time instead of based on existing models, please ignore `loader` section. Otherwise please read the
details below.
The `loader` section only works when the `mode` is `test` or `finetune` is `true`.
- `model_index`
Model index, int type. The index of the loaded model in loader_models (starting at 0) for the first `finetune`. The default value is None.
- `exp_info_path`
Loader model experiment info path, str type. If the field exists, the following parameters will be parsed from `exp_info_path`, and the following parameters will not work. This field and `id` must exist one.
Loader model experiment info path, str type. If the field exists, the following parameters will be parsed from `exp_info_path`, and the following parameters will not work. One of this field and `id` must exist at least .
- `id`
The experiment id of the model that needs to be loaded, int type. If the `mode` is `test`, this value is required. This field and `exp_info_path` must exist one.
@@ -222,7 +226,8 @@ The meaning of each field is as follows:
- `observer_type`
The experiment observer type of the model that needs to be loaded, str type. The default value is the current experiment `observer_type`.
.. note:: The observer type is a concept of the `sacred` module, which determines how files, standard input and output which are managed by sacred are stored.
.. note:: The observer type is a concept of the `sacred` module, which determines how files, standard input, and output which are managed by sacred are stored.
- `file_storage`
@@ -249,11 +254,11 @@ The meaning of each field is as follows:
.. note::
If users choose mongo observer, they need to make sure:
- have an environment with the mongodb installed and a mongo database dedicated for storing the experiments results.
- The python environment(the version of python and package) to run the experiments and the one to fetch the results are consistent.
If users choose the mongo observer, they need to make sure:
- Have an environment with the mongodb installed and a mongo database dedicated to storing the results of the experiments.
- The python environment (the version of python and package) to run the experiments and the one to fetch the results are consistent.
Model Field
Model Section
-----------------
Users can use a specified model by configuration with hyper-parameters.
@@ -261,7 +266,7 @@ Users can use a specified model by configuration with hyper-parameters.
Custom Models
~~~~~~~~~~~~~~~~~
Qlib support custom models, but it must be a subclass of the `qlib.contrib.model.Model`, the config for custom model may be as following.
Qlib supports custom models, but it must be a subclass of the `qlib.contrib.model.Model`, the config for a custom model may be as following.
.. code-block:: YAML
@@ -274,12 +279,12 @@ Qlib support custom models, but it must be a subclass of the `qlib.contrib.model
The class `SomeModel` should be in the module `custom_model`, and ``Qlib`` could parse the `module_path` to load the class.
To Know more about ``Model``, please refer to `Model <model.html>`_.
To know more about ``Model``, please refer to `Model <model.html>`_.
Data Field
Data Section
-----------------
``Data Handler`` can be used to load raw data, prepare features and label columns, preprocess data(standardization, remove NaN, etc.), split training, validation, and test sets. It is a subclass of `qlib.contrib.estimator.handler.BaseDataHandler`.
``Data Handler`` can be used to load raw data, prepare features and label columns, preprocess data (standardization, remove NaN, etc.), split training, validation, and test sets. It is a subclass of `qlib.contrib.estimator.handler.BaseDataHandler`.
Users can use the specified data handler by config as follows.
@@ -310,32 +315,32 @@ Users can use the specified data handler by config as follows.
fend_time: 2018-12-11
- `class`
Data handler class, str type, which should be a subclass of `qlib.contrib.estimator.handler.BaseDataHandler`, and implements 5 important interfaces for loading features, loading raw data, preprocessing raw data, slicing train, validation, and test data. The default value is `ALPHA360`. If users want to write a data handler to retrieve the data in qlib, `QlibDataHandler` is suggested.
Data handler class, str type, which should be a subclass of `qlib.contrib.estimator.handler.BaseDataHandler`, and implements 5 important interfaces for loading features, loading raw data, preprocessing raw data, slicing train, validation, and test data. The default value is `ALPHA360`. If users want to write a data handler to retrieve the data in ``Qlib``, `QlibDataHandler` is suggested.
- `module_path`
The module path, str type, absolute url is also supported, indicates the path of the `class` implementation of data processor class. The default value is `qlib.contrib.estimator.handler`.
The module path, str type, absolute url is also supported, indicates the path of the `class` implementation of the data processor class. The default value is `qlib.contrib.estimator.handler`.
- `args`
Parameters used for ``Data Handler`` initialization.
- `train_start_date`
Training start time, str type, default value is `2005-01-01`.
Training start time, str type, the default value is `2005-01-01`.
- `start_date`
Data start date, str type.
- `end_date`
Data end date, str type. the data from start_date to end_date decides which part of data will be loaded in datahandler, users can only use these data in the following parts.
Data end date, str type. the data from start_date to end_date decides which part of data will be loaded in `datahandler`, users can only use these data in the following parts.
- `dropna_feature` (Optional in args)
Drop Nan feature, bool type, default value is False.
Drop Nan feature, bool type, the default value is False.
- `dropna_label` (Optional in args)
Drop Nan label, bool type, default value is True. Some multi-label tasks will use this.
Drop Nan label, bool type, the default value is True. Some multi-label tasks will use this.
- `normalize_method` (Optional in args)
Normalzie data by given method. str type. ``Qlib`` give two normalize method, `MinMax` and `Std`.
If users wants to build their own method, please override `_process_normalize_feature`.
Normalize data by a given method. str type. ``Qlib`` gives two normalizing methods, `MinMax` and `Std`.
If users want to build their own method, please override `_process_normalize_feature`.
- `filter`
Dynamically filtering the stocks based on the filter pipeline.
@@ -353,7 +358,7 @@ Users can use the specified data handler by config as follows.
The module path, str type.
- `args`
The filter class parameters, this parameters are set according to the `class`, and all the parameters as kwargs to `class`.
The filter class parameters, these parameters are set according to the `class`, and all the parameters as kwargs to `class`.
Custom Data Handler
~~~~~~~~~~~~~~~~~~~~~~
@@ -371,15 +376,15 @@ Qlib support custom data handler, but it must be a subclass of the ``qlib.contri
The class `SomeDataHandler` should be in the module `custom_data_handler`, and ``Qlib`` could parse the `module_path` to load the class.
If users want to load features and labels by config, they can inherit ``qlib.contrib.estimator.handler.ConfigDataHandler``, ``Qlib`` also has provided some preprocess method in this subclass.
If users want to use qlib data, `QLibDataHandler` is recommended, from which users can inherit custom class. `QLibDataHandler` is also a subclass of `ConfigDataHandler`.
If users want to load features and labels by config, they can inherit ``qlib.contrib.estimator.handler.ConfigDataHandler``, ``Qlib`` also has provided some preprocess methods in this subclass.
If users want to use qlib data, `QLibDataHandler` is recommended, from which users can inherit the custom class. `QLibDataHandler` is also a subclass of `ConfigDataHandler`.
To Know more about ``Data Handler``, please refer to `Data Framework&Usage <data.html>`_.
To know more about ``Data Handler``, please refer to `Data Framework&Usage <data.html>`_.
Trainer Field
Trainer Section
-----------------
Users can specify the trainer ``Trainer`` by the config file, which is subclass of ``qlib.contrib.estimator.trainer.BaseTrainer`` and implement three important interfaces for training the model, restoring the model, and getting model predictions as follows.
Users can specify the trainer ``Trainer`` by the config file, which is a subclass of ``qlib.contrib.estimator.trainer.BaseTrainer`` and implement three important interfaces for training the model, restoring the model, and getting model predictions as follows.
- `train`
Implement this interface to train the model.
@@ -447,7 +452,7 @@ Users can specify `trainer` with the configuration file:
Custom Trainer
~~~~~~~~~~~~~~~~~~
Qlib support custom trainer, but it must be a subclass of the `qlib.contrib.estimator.trainer.BaseTrainer`, the config for custom trainer may be as following,
Qlib supports custom trainer, but it must be a subclass of the `qlib.contrib.estimator.trainer.BaseTrainer`, the config for a custom trainer may be as following:
.. code-block:: YAML
@@ -465,7 +470,7 @@ Qlib support custom trainer, but it must be a subclass of the `qlib.contrib.esti
The class `SomeTrainer` should be in the module `custom_trainer`, and ``Qlib`` could parse the `module_path` to load the class.
Strategy Field
Strategy Section
-----------------
Users can specify strategy through a config file, for example:
@@ -496,7 +501,7 @@ Users can specify strategy through a config file, for example:
Custom Strategy
^^^^^^^^^^^^^^^^^^^
Qlib support custom strategy, but it must be a subclass of the ``qlib.contrib.strategy.strategy.BaseStrategy``, the config for custom strategy may be as following,
Qlib supports custom strategy, but it must be a subclass of the ``qlib.contrib.strategy.strategy.BaseStrategy``, the config for custom strategy may be as following:
.. code-block:: YAML
@@ -507,9 +512,9 @@ Qlib support custom strategy, but it must be a subclass of the ``qlib.contrib.st
The class `SomeStrategy` should be in the module `custom_strategy`, and ``Qlib`` could parse the `module_path` to load the class.
To Know more about ``Strategy``, please refer to `Strategy <strategy.html>`_.
To know more about ``Strategy``, please refer to `Strategy <strategy.html>`_.
Backtest Field
Backtest Section
-----------------
Users can specify `backtest` through a config file, for example:
@@ -532,7 +537,7 @@ Users can specify `backtest` through a config file, for example:
Normal backtest parameters. All the parameters in this section will be passed to the ``qlib.contrib.evaluate.backtest`` function in the form of `**kwargs`.
- `benchmark`
Stock index symbol, str or list type, the default value is `None`.
Stock index symbol, str, or list type, the default value is `None`.
.. note::
@@ -556,7 +561,7 @@ Users can specify `backtest` through a config file, for example:
Subscribe quote fields, array type, the default value is [`deal_price`, $close, $change, $factor].
Qlib Data Field
Qlib Data Section
--------------------
The `qlib_data` field describes the parameters of qlib initialization.
@@ -574,65 +579,76 @@ The `qlib_data` field describes the parameters of qlib initialization.
- If region == ``qlib.config.REG_CN``, 'qlib' will be initialized in US-stock mode.
- If region == ``qlib.config.REG_US``, 'qlib' will be initialized in china-stock mode.
Please refer to `Initialization <../start/initialization.rst>`_.
Please refer to `Initialization <../start/initialization.html>`_.
Experiment Result
===================
Form of Experimental Result
----------------------------
The result of the experiment is the result of the backtest, please refer to `Backtest <backtest.html>`_.
The result of the experiment is also the result of the ``Interdat Trading(Backtest)``, please refer to `Interday Trading <backtest.html>`_.
Get Experiment Result
----------------------------
Users can check the experiment results from file storage directly, or check the experiment results from database, or get the experiment results through two API of a module `fetcher` provided by ``Qlib``.
Base Class & Interface
~~~~~~~~~~~~~~~~~~~~~~~
- `get_experiments()`
The API takes two parameters. The first parameter is the experiment name. The default is all experiments. The second parameter is the observer type. Users can get the experiment name dictionary with a list of ids and test end date by the API as follows.
Users can check the experiment results from file storage directly, or check the experiment results from the database, or get the experiment results through two interfaces of a base class `Fetcher` provided by ``Qlib``.
.. code-block:: JSON
The `Fetcher` provides the following interface
- `get_experiments(self, exp_name=None):`
The interface takes one parameters. The `exp_name` is the experiment name, the default is all experiments. Users can get the returned dictionary with a list of ids and test end date as follows.
{
"ex_a": [
{
"id": 1,
"test_end_date": "2017-01-01"
}
],
"ex_b": [
...
]
}
.. code-block:: JSON
{
"ex_a": [
{
"id": 1,
"test_end_date": "2017-01-01"
}
],
"ex_b": [
...
]
}
- `get_experiment(exp_name, exp_id, fields=None)`
The API takes three parameters, the first parameter is the experiment name, the second parameter is the experiment id, and the third parameter is field list.
If fields is None, will get all fields.
.. note::
Currently supported fields:
['model', 'analysis', 'positions', 'report_normal', 'pred', 'task_config', 'label']
- `get_experiment(exp_name, exp_id, fields=None)`
The interface takes three parameters. The first parameter is the experiment name, the second parameter is the experiment id, and the third parameter is list of fields. The default value of `fields` is None, which means all fields.
.. code-block:: JSON
.. note::
Currently supported fields:
['model', 'analysis', 'positions', 'report_normal', 'pred', 'task_config', 'label']
{
'analysis': analysis_df,
'pred': pred_df,
'positions': positions_dic,
'report_normal': report_normal_df,
}
Users can get the returned dictionary as follows.
.. code-block:: JSON
Here is a simple example of `FileFetcher`, which could fetch files from `file_storage` observer.
{
'analysis': analysis_df,
'pred': pred_df,
'positions': positions_dic,
'report_normal': report_normal_df,
}
Implemented `Fetcher` s & Examples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``Qlib`` provides two implemented `Fetcher` s as follows.
`FileFetcher`
^^^^^^^^^^^^^^^
The `FileFetcher` is a subclass of `Fetcher`, which could fetch files from `file_storage` observer. The following is an example:
.. code-block:: python
>>> from qlib.contrib.estimator.fetcher import FileFetcher
>>> f = FileFetcher(experiments_dir=r'./')
>>> print(f.get_experiments())
{
'test_experiment': [
{
@@ -649,23 +665,25 @@ Here is a simple example of `FileFetcher`, which could fetch files from `file_st
}
]
}
>>> print(f.get_experiment('test_experiment', '1'))
risk
excess_return_without_cost mean 0.000605
std 0.005481
annualized_return 0.152373
information_ratio 1.751319
max_drawdown -0.059055
excess_return_with_cost mean 0.000410
std 0.005478
annualized_return 0.103265
information_ratio 1.187411
max_drawdown -0.075024
risk
sub_bench mean 0.000662
std 0.004487
annual 0.166720
sharpe 2.340526
mdd -0.080516
sub_cost mean 0.000577
std 0.004482
annual 0.145392
sharpe 2.043494
mdd -0.083584
If users use mongo observer when training, they should initialize their fether with mongo_url
`MongoFetcher`
^^^^^^^^^^^^^^^
The `FileFetcher` is a subclass of `Fetcher`, which could fetch files from `mongo` observer. Users should initialize the fetcher with `mongo_url`. The following is an example:
.. code-block:: python

View File

@@ -6,14 +6,14 @@ Interday Model: Model Training & Prediction
Introduction
===================
``Interday Model`` is designed to make the prediction score about stocks. Users can use the ``Interday Model`` in an automatic workflow by ``Estimator``, please refer to `Estimator <estimator.html>`_.
``Interday Model`` is designed to make the `prediction score` about stocks. Users can use the ``Interday Model`` in an automatic workflow by ``Estimator``, please refer to `Estimator <estimator.html>`_.
Because the components in ``Qlib`` are designed in a loosely-coupled way, ``Interday Model`` can be used as a independent module also.
Because the components in ``Qlib`` are designed in a loosely-coupled way, ``Interday Model`` can be used as an independent module also.
Base Class & Interface
======================
``Qlib`` provides a base class `qlib.contrib.model.base.Model <../reference/api.html#module-qlib.contrib.model.base>`_, which all models should inherit from.
``Qlib`` provides a base class `qlib.contrib.model.base.Model <../reference/api.html#module-qlib.contrib.model.base>`_ from which all models should inherit.
The base class provides the following interfaces:
@@ -48,7 +48,7 @@ The base class provides the following interfaces:
.. note::
The number and names of the columns is determined by the data handler, please refer to `Data Handler <data.html#data-handler>`_ and `Estimator Data <estimator.html#about-data>`_.
The number and names of the columns are determined by the data handler, please refer to `Data Handler <data.html#data-handler>`_ and `Estimator Data <estimator.html#about-data>`_.
- `y_train`, pd.DataFrame type, train label
The following example explains the value of `y_train`:
@@ -73,7 +73,7 @@ The base class provides the following interfaces:
.. note::
The number and names of the columns is determined by the ``Data Handler``, please refer to `Data Handler <data.html#data-handler>`_.
The number and names of the columns are determined by the ``Data Handler``, please refer to `Data Handler <data.html#data-handler>`_.
- `x_valid`, pd.DataFrame type, validation feature
The format of `x_valid` is same as `x_train`
@@ -86,7 +86,7 @@ The base class provides the following interfaces:
`w_train` is a pandas DataFrame, whose shape and index is same as `x_train`. The float value in `w_train` represents the weight of the feature at the same position in `x_train`.
- `w_train`(Optional args, default is None), pd.DataFrame type, validation weight
`w_train` is a pandas DataFrame, whose shape and index is same as `x_valid`. The float value in `w_train` represents the weight of the feature at the same position in `x_train`.
`w_train` is a pandas DataFrame, whose shape and index is the same as `x_valid`. The float value in `w_train` represents the weight of the feature at the same position in `x_train`.
- `predict(self, x_test, **kwargs)`
- Predict test data 'x_test'
@@ -115,10 +115,10 @@ For other interfaces such as `save`, `load`, `finetune`, please refer to `Model
Example
==================
``Qlib`` provides ``LightGBM`` and ``DNN`` models as the baseline, the following steps shows how to run`` LightGBM`` as an independent module.
``Qlib`` provides ``LightGBM`` and ``DNN`` models as the baseline, the following steps show how to run`` LightGBM`` as an independent module.
- Initialize ``Qlib`` with `qlib.init` first, please refer to `initialization <initialization.rst>`_.
- Run the following code to get the prediction score `pred_score`
- Initialize ``Qlib`` with `qlib.init` first, please refer to `initialization <../start/initialization.html>`_.
- Run the following code to get the `prediction score` `pred_score`
.. code-block:: Python
from qlib.contrib.estimator.handler import QLibDataHandlerClose

View File

@@ -6,7 +6,7 @@ Aanalysis: Evaluation & Results Analysis
Introduction
===================
``Aanalysis`` is designed to show the graphical reports of ``Intraday Trading`` , which helps users to evaluate and analyse investment portfolios visually. There are the following graphics to view:
``Aanalysis`` is designed to show the graphical reports of ``Intraday Trading`` , which helps users to evaluate and analyse investment portfolios visually. The following are some graphics to view:
- analysis_position
- report_graph
@@ -26,8 +26,8 @@ Users can run the following code to get all supported reports.
.. code-block:: python
>>> import qlib.contrib.report as qcr
>>> print(qcr.GRAPH_NAME_LISt)
>> import qlib.contrib.report as qcr
>> print(qcr.GRAPH_NAME_LIST)
['analysis_position.report_graph', 'analysis_position.score_ic_graph', 'analysis_position.cumulative_return_graph', 'analysis_position.risk_analysis_graph', 'analysis_position.rank_label_graph', 'analysis_model.model_performance_graph']
.. note::
@@ -36,7 +36,7 @@ Users can run the following code to get all supported reports.
Usage&Example
Usage & Example
===================
Usage of `analysis_position.report`
@@ -54,9 +54,29 @@ Graphical Result
.. note::
- Axis X: Trading day
- Axis Y: Accumulated value
- The shaded part above: Maximum drawdown corresponding to `cum return`
- The shaded part below: Maximum drawdown corresponding to `cum ex return wo cost` %
- Axis Y:
- `cum bench`
Cumulative returns series of benchmark
- `cum return wo cost`
Cumulative returns series of portfolio without cost
- `cum return w cost`
Cumulative returns series of portfolio with cost
- `return wo mdd`
Maximum drawdown series of cumulative return without cost
- `return w cost mdd`:
Maximum drawdown series of cumulative return with cost
- `cum ex return wo cost`
The `CAR` (cumulative abnormal return) series of the portfolio compared to the benchmark without cost.
- `cum ex return w cost`
The `CAR` (cumulative abnormal return) series of the portfolio compared to the benchmark with cost.
- `turnover`
Turnover rate series
- `cum ex return wo cost mdd`
Drawdown series of `CAR` (cumulative abnormal return) without cost
- `cum ex return w cost mdd`
Drawdown series of `CAR` (cumulative abnormal return) with cost
- The shaded part above: Maximum drawdown corresponding to `cum return wo cost`
- The shaded part below: Maximum drawdown corresponding to `cum ex return wo cost`
.. image:: ../_static/img/analysis/report.png
@@ -77,7 +97,13 @@ Graphical Result
.. note::
- Axis X: Trading day
- Axis Y: `Ref($close, -1)/$close - 1` and `score` IC%
- Axis Y:
- `ic`
The `Pearson correlation coefficient` series between `label` and `prediction score`.
In the above example, the `label` is formulated as `Ref($close, -1)/$close - 1`. Please refer to `Data API Featrue <data.html>`_ for more details.
- `rank_ic`
The `Spearman's rank correlation coefficient` series between `label` and `prediction score`.
.. image:: ../_static/img/analysis/score_ic.png
@@ -96,14 +122,13 @@ Graphical Result
.. note::
- Cumulative return graphics.
- Axis X: Trading day
- Axis Y:
- Above axis Y: `(((Ref($close, -1)/$close - 1) * weight).sum() / weight.sum()).cumsum()`
- Below axis Y: Daily weight sum
- In the **sell** graph, `y < 0` stands for profit; in other cases, `y > 0` stands for profit.
- In the **buy_minus_sell** graph, the **y** value of the **weight** graph at the bottom is `buy_weight + sell_weight`.
- In each graph, the **red line** in the histogram on the right represents the average.%
- Axis X: Trading day
- Axis Y:
- Above axis Y: `(((Ref($close, -1)/$close - 1) * weight).sum() / weight.sum()).cumsum()`
- Below axis Y: Daily weight sum
- In the **sell** graph, `y < 0` stands for profit; in other cases, `y > 0` stands for profit.
- In the **buy_minus_sell** graph, the **y** value of the **weight** graph at the bottom is `buy_weight + sell_weight`.
- In each graph, the **red line** in the histogram on the right represents the average.
.. image:: ../_static/img/analysis/cumulative_return_buy.png
@@ -124,24 +149,76 @@ API
:members:
.. note::
- annual/mdd/sharpe/std graphics
- Axis X: Trading days are grouped by month
- Axis Y: monthly(trading date) value
Graphical Result
~~~~~~~~~~~~~~~~~
.. note::
- general graphics
- `std`
- `excess_return_without_cost`
The `Standard Deviation` of `CAR` (cumulative abnormal return) without cost.
- `excess_return_with_cost`
The `Standard Deviation` of `CAR` (cumulative abnormal return) with cost.
- `annualized_return`
- `excess_return_without_cost`
The `Annualized Rate` of `CAR` (cumulative abnormal return) without cost.
- `excess_return_with_cost`
The `Annualized Rate` of `CAR` (cumulative abnormal return) with cost.
- `information_ratio`
- `excess_return_without_cost`
The `Information Ratio` without cost.
- `excess_return_with_cost`
The `Information Ratio` with cost.
To know more about `Information Ratio`, please refer to `Information Ratio IR <https://www.investopedia.com/terms/i/informationratio.asp>`_.
- `max_drawdown`
- `excess_return_without_cost`
The `Maximum Drawdown` of `CAR` (cumulative abnormal return) without cost.
- `excess_return_with_cost`
The `Maximum Drawdown` of `CAR` (cumulative abnormal return) with cost.
.. image:: ../_static/img/analysis/risk_analysis_bar.png
:align: center
.. image:: ../_static/img/analysis/risk_analysis_annual.png
.. note::
.. image:: ../_static/img/analysis/risk_analysis_mdd.png
- annualized_return/max_drawdown/information_ratio/std graphics
- Axis X: Trading days grouped by month
- Axis Y:
- annualized_return graphics
- `excess_return_without_cost_annualized_return`
The `Annualized Rate` series of monthly `CAR` (cumulative abnormal return) without cost.
- `excess_return_with_cost_annualized_return`
The `Annualized Rate` series of monthly `CAR` (cumulative abnormal return) with cost.
- max_drawdown graphics
- `excess_return_without_cost_max_drawdown`
The `Maximum Drawdown` series of monthly `CAR` (cumulative abnormal return) without cost.
- `excess_return_with_cost_max_drawdown`
The `Maximum Drawdown` series of monthly `CAR` (cumulative abnormal return) with cost.
- information_ratio graphics
- `excess_return_without_cost_information_ratio`
The `Information Ratio` series of monthly `CAR` (cumulative abnormal return) without cost.
- `excess_return_with_cost_information_ratio`
The `Information Ratio` series of monthly `CAR` (cumulative abnormal return) with cost.
- std graphics
- `excess_return_without_cost_max_drawdown`
The `Standard Deviation` series of monthly `CAR` (cumulative abnormal return) without cost.
- `excess_return_with_cost_max_drawdown`
The `Standard Deviation` series of monthly `CAR` (cumulative abnormal return) with cost.
.. image:: ../_static/img/analysis/risk_analysis_sharpe.png
.. image:: ../_static/img/analysis/risk_analysis_annualized_return.png
:align: center
.. image:: ../_static/img/analysis/risk_analysis_max_drawdown.png
:align: center
.. image:: ../_static/img/analysis/risk_analysis_information_ratio.png
:align: center
.. image:: ../_static/img/analysis/risk_analysis_std.png
:align: center
Usage of `analysis_position.rank_label`
@@ -161,13 +238,22 @@ Graphical Result
- hold/sell/buy graphics:
- Axis X: Trading day
- Axis Y: Percentage of `'Ref($close, -1)/$close - 1'.rank(ascending=False) / (number of lines on the day) * 100` every trading day. (`ascending=False`: The higher the value, the higher the ranking)%
- Axis Y:
Average `ranking ratio`of `label` for stocks that is held/sold/bought on the trading day.
In the above example, the `label` is formulated as `Ref($close, -1)/$close - 1`. The `ranking ratio` can be formulated as follows.
.. math::
ranking\ ratio = \frac{Ascending\ Ranking\ of\ label}{Number\ of\ Stocks\ in\ the\ Portfolio}
.. image:: ../_static/img/analysis/rank_label_hold.png
:align: center
.. image:: ../_static/img/analysis/rank_label_buy.png
:align: center
.. image:: ../_static/img/analysis/rank_label_sell.png
:align: center
@@ -181,17 +267,74 @@ API
:members:
Graphical Result
~~~~~~~~~~~~~~~~~
Graphical Results
~~~~~~~~~~~~~~~~~~
.. note::
- cumulative return graphics
- `Group1`:
The `Cumulative Return` series of stocks group with (`ranking ratio` of label <= 20%)
- `Group2`:
The `Cumulative Return` series of stocks group with (20% < `ranking ratio` of label <= 40%)
- `Group3`:
The `Cumulative Return` series of stocks group with (40% < `ranking ratio` of label <= 60%)
- `Group4`:
The `Cumulative Return` series of stocks group with (60% < `ranking ratio` of label <= 80%)
- `Group5`:
The `Cumulative Return` series of stocks group with (80% < `ranking ratio` of label)
- `long-short`:
The Difference series between `Cumulative Return` of `Group1` and of `Group5`
- `long-average`
The Difference series between `Cumulative Return` of `Group1` and average `Cumulative Return` for all stocks.
The `ranking ratio` can be formulated as follows.
.. math::
ranking\ ratio = \frac{Ascending\ Ranking\ of\ label}{Number\ of\ Stocks\ in\ the\ Portfolio}
.. image:: ../_static/img/analysis/analysis_model_cumulative_return.png
:align: center
.. note::
- long-short/long-average
The distribution of long-short/long-average returns on each trading day
.. image:: ../_static/img/analysis/analysis_model_long_short.png
:align: center
.. TODO: ask xiao yang for detial
.. note::
- Information Coefficient
- The `Pearson correlation coefficient` series between `labels` and `prediction scores` of stocks in portfolio.
- The graphics reports can be used to evaluate the `prediction scores`.
.. image:: ../_static/img/analysis/analysis_model_IC.png
:align: center
.. note::
- Monthly IC
Monthly average of the `Information Coefficient`
.. image:: ../_static/img/analysis/analysis_model_monthly_IC.png
:align: center
.. note::
- IC
The distribution of the `Information Coefficient` on each trading day.
- IC Normal Dist. Q-Q
The `Quantile-Quantile Plot` is used for the normal distribution of `Information Coefficient` on each trading day.
.. image:: ../_static/img/analysis/analysis_model_NDQ.png
:align: center
.. image:: ../_static/img/analysis/analysis_model_auto_correlation.png
.. note::
- Auto Correlation
- The `Pearson correlation coefficient` series between the latest `prediction scores` and the `prediction scores` `lag` days ago of stocks in portfolio on each trading day.
- The graphics reports can be used to estimate the turnover rate.
.. image:: ../_static/img/analysis/analysis_model_auto_correlation.png
:align: center

View File

@@ -9,9 +9,9 @@ Introduction
``Interday Strategy`` is designed to adopt different trading strategies, which means that users can adopt different algorithms to generate investment portfolios based on the prediction scores of the ``Interday Model``. Users can use the ``Interday Strategy`` in an automatic workflow by ``Estimator``, please refer to `Estimator <estimator.html>`_.
Because the componets in ``Qlib`` are designed in a loosely-coupled way, ``Interday Strategy`` can be used as a independent module also.
Because the components in ``Qlib`` are designed in a loosely-coupled way, ``Interday Strategy`` can be used as an independent module also.
``Qlib`` provides several implemented trading strategy. Also, ``Qlib`` supports costom strategy, users can customize strategies according to their own needs.
``Qlib`` provides several implemented trading strategies. Also, ``Qlib`` supports custom strategy, users can customize strategies according to their own needs.
Base Class & Interface
======================
@@ -27,7 +27,7 @@ Qlib provides a base class ``qlib.contrib.strategy.BaseStrategy``. All strategy
- `generate_order_list`
Rerturn the order list.
User can inherit `BaseStrategy` to costomize their strategy class.
Users can inherit `BaseStrategy` to customize their strategy class.
WeightStrategyBase
--------------------
@@ -49,19 +49,18 @@ Qlib alse provides a class ``qlib.contrib.strategy.WeightStrategyBase`` that is
- Generate the target amount of stocks from the target position.
- Generate the order list from the target amount
Users can inherit `WeightStrategyBase` and implement the inteface `generate_target_weight_position` to costomize their strategy class, which only focuses on the target positions.
Users can inherit `WeightStrategyBase` and implement the interface `generate_target_weight_position` to customize their strategy class, which only focuses on the target positions.
Implemented Strategy
====================
Qlib provides several implemented strategy classes `TopkDropoutStrategy`.
Qlib provides a implemented strategy classes named `TopkDropoutStrategy`.
TopkDropoutStrategy
------------------
`TopkDropoutStrategy` is a subclass of `BaseStrategy` and implement the interface `generate_order_list` whose process is as follows.
- Adopt the the ``Topk-Drop`` algorithm to calculate the target amount of each stock
- Adopt the ``Topk-Drop`` algorithm to calculate the target amount of each stock
.. note::
``Topk-Drop`` algorithm
@@ -70,7 +69,7 @@ TopkDropoutStrategy
- `Drop`: The number of stocks sold on each trading day
Currently, the number of held stocks is `Topk`.
On each trading day, the `Drop` number of held stocks with worst prediction score will be sold, and the same number of unheld stocks with best prediction score will be bought.
On each trading day, the `Drop` number of held stocks with the worst `prediction score` will be sold, and the same number of unheld stocks with the best `prediction score` will be bought.
.. image:: ../_static/img/topk_drop.png
:alt: Topk-Drop
@@ -103,17 +102,17 @@ Usage & Example
# custom Strategy, refer to: TODO: Strategy API url
strategy = TopkDropoutStrategy(**STRATEGY_CONFIG)
# pred_score is the prediction score output by Model
# pred_score is the `prediction score` output by Model
report_normal, positions_normal = backtest(
pred_score, strategy=strategy, **BACKTEST_CONFIG
)
Also, the above example has been given in ``examples\train_backtest_analyze.ipynb``.
To know more about the prediction score `pred_score` output by ``Interday Model``, please refer to `Interday Model: Model Training & Prediction <model.html>`_.
To know more about the `prediction score` `pred_score` output by ``Interday Model``, please refer to `Interday Model: Model Training & Prediction <model.html>`_.
To know more about ``Intraday Trading``, please refer to `Intraday Trading: Model&Strategy Testing <backtest.html>`_.
Reference
===================
TO konw more about ``Interday Strategy``, please refer to `Strategy API <../reference/api.html>`_.
To know more about ``Interday Strategy``, please refer to `Strategy API <../reference/api.html>`_.