mirror of
https://github.com/microsoft/qlib.git
synced 2026-06-10 16:03:53 +08:00
Compare commits
127 Commits
backtest_i
...
v0.8.3
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
97aa16a078 | ||
|
|
094be9be86 | ||
|
|
d9b9386032 | ||
|
|
b86a30aae7 | ||
|
|
2c5a4691f3 | ||
|
|
54344c4426 | ||
|
|
303cdb8ce3 | ||
|
|
1a0ac1ab6d | ||
|
|
a79e446724 | ||
|
|
bdf1fb29a6 | ||
|
|
86e1265f69 | ||
|
|
628eb7fa73 | ||
|
|
2a1b512cd2 | ||
|
|
50e7901e87 | ||
|
|
3ba54cd1ab | ||
|
|
483d01f0c1 | ||
|
|
61836cba3d | ||
|
|
aeb5e40c77 | ||
|
|
116f0fa7a7 | ||
|
|
5296cce725 | ||
|
|
292fcc9e98 | ||
|
|
d3fbf066cf | ||
|
|
52ecb79e0b | ||
|
|
59c52eac0a | ||
|
|
f455305a2a | ||
|
|
a67f67db6e | ||
|
|
5c2e99aee3 | ||
|
|
2bb8a4ce0e | ||
|
|
7f274b1e4e | ||
|
|
2aee9e0145 | ||
|
|
a62e2ec4de | ||
|
|
e7954bdb32 | ||
|
|
d6f69aefea | ||
|
|
1bebe9780e | ||
|
|
7a4a92bc69 | ||
|
|
271782c9dd | ||
|
|
d0113ea7df | ||
|
|
c3996955ef | ||
|
|
8261965015 | ||
|
|
6f71f8a46b | ||
|
|
edd8badeaf | ||
|
|
19689024d4 | ||
|
|
0304df0d5b | ||
|
|
181ee3c070 | ||
|
|
cf35562e84 | ||
|
|
184ce34a34 | ||
|
|
382ababc01 | ||
|
|
bcf18c14de | ||
|
|
6c1332f604 | ||
|
|
93088485c3 | ||
|
|
c633d3fec0 | ||
|
|
0b6d99bd38 | ||
|
|
03cce8c908 | ||
|
|
e76b409d9a | ||
|
|
3e79a088ef | ||
|
|
dfc0ed3c01 | ||
|
|
f59cfe51e0 | ||
|
|
1ecdfd45fe | ||
|
|
622303b83a | ||
|
|
6bafd0a09b | ||
|
|
aed9c09091 | ||
|
|
1b8f0b4575 | ||
|
|
4709909782 | ||
|
|
a0f49fe2e7 | ||
|
|
2840570dd3 | ||
|
|
00ad122175 | ||
|
|
3493f29e16 | ||
|
|
e33de44cb9 | ||
|
|
e843e021a2 | ||
|
|
5aa5a6f356 | ||
|
|
f490708025 | ||
|
|
41a5778684 | ||
|
|
ef161715f7 | ||
|
|
d087054a59 | ||
|
|
350fbe91c9 | ||
|
|
2aca74cd21 | ||
|
|
92ff3d20b9 | ||
|
|
0552120a2e | ||
|
|
3480fd932f | ||
|
|
957f9a18e9 | ||
|
|
6c83632fc4 | ||
|
|
125922b77a | ||
|
|
5e69d089c0 | ||
|
|
c10c349b20 | ||
|
|
7cb1f7cee0 | ||
|
|
d0ff5eea9d | ||
|
|
e99f00b445 | ||
|
|
e50ad4309e | ||
|
|
d89ae2370f | ||
|
|
ee3d4092ae | ||
|
|
ae83f9056f | ||
|
|
c276de4040 | ||
|
|
84103c7d43 | ||
|
|
6d6c586dc2 | ||
|
|
54ef18ec4e | ||
|
|
0dfbf8c413 | ||
|
|
f9c35284e1 | ||
|
|
3974bfe746 | ||
|
|
45ebb1d0e0 | ||
|
|
103f8579c1 | ||
|
|
654033733d | ||
|
|
d224ea447e | ||
|
|
9265b66e09 | ||
|
|
d4b56d97b5 | ||
|
|
0b3b95f22f | ||
|
|
0596174b94 | ||
|
|
779b1786bd | ||
|
|
007082a112 | ||
|
|
4e380b611e | ||
|
|
1e410c99be | ||
|
|
b223c4304d | ||
|
|
f2771f1beb | ||
|
|
01bdf6c1b1 | ||
|
|
9639a8cac9 | ||
|
|
cae4c9c924 | ||
|
|
a2be6e28e9 | ||
|
|
fdbc666678 | ||
|
|
7800dd4ec9 | ||
|
|
b18132dce1 | ||
|
|
16957176a9 | ||
|
|
f0b9a807ea | ||
|
|
5ee2d9496b | ||
|
|
7b15682c63 | ||
|
|
df36839a7f | ||
|
|
4cecaba618 | ||
|
|
63b823f343 | ||
|
|
e41c0ac90a |
1
.github/PULL_REQUEST_TEMPLATE.md
vendored
1
.github/PULL_REQUEST_TEMPLATE.md
vendored
@@ -8,6 +8,7 @@
|
||||
<!--- Why is this change required? What problem does it solve? -->
|
||||
|
||||
## How Has This Been Tested?
|
||||
<! --- Put an `x` in all the boxes that apply: --->
|
||||
- [ ] Pass the test by running: `pytest qlib/tests/test_all_pipeline.py` under upper directory of `qlib`.
|
||||
- [ ] If you are adding a new feature, test on your own test scripts.
|
||||
|
||||
|
||||
3
.github/workflows/python-publish.yml
vendored
3
.github/workflows/python-publish.yml
vendored
@@ -12,7 +12,8 @@ jobs:
|
||||
runs-on: ${{ matrix.os }}
|
||||
strategy:
|
||||
matrix:
|
||||
os: [windows-latest, macos-latest, macos-11]
|
||||
os: [windows-latest, macos-11]
|
||||
# FIXME: macos-latest will raise error now.
|
||||
# not supporting 3.6 due to annotations is not supported https://stackoverflow.com/a/52890129
|
||||
python-version: [3.7, 3.8]
|
||||
|
||||
|
||||
2
.github/workflows/test_macos.yml
vendored
2
.github/workflows/test_macos.yml
vendored
@@ -60,7 +60,7 @@ jobs:
|
||||
python -m pip install --upgrade cython
|
||||
python -m pip install numpy jupyter jupyter_contrib_nbextensions
|
||||
python -m pip install -U scipy scikit-learn # installing without this line will cause errors on GitHub Actions, while instsalling locally won't
|
||||
python setup.py install
|
||||
pip install -e .
|
||||
- name: Install test dependencies
|
||||
run: |
|
||||
python -m pip install --upgrade pip
|
||||
|
||||
@@ -17,5 +17,5 @@ python:
|
||||
version: 3.7
|
||||
install:
|
||||
- requirements: docs/requirements.txt
|
||||
- method: setuptools
|
||||
path: .
|
||||
- method: pip
|
||||
path: .
|
||||
|
||||
@@ -30,7 +30,7 @@ Version 0.2.1
|
||||
--------------------
|
||||
- Support registering user-defined ``Provider``.
|
||||
- Support use operators in string format, e.g. ``['Ref($close, 1)']`` is valid field format.
|
||||
- Support dynamic fields in ``$some_field`` format. And exising fields like ``Close()`` may be deprecated in the future.
|
||||
- Support dynamic fields in ``$some_field`` format. And existing fields like ``Close()`` may be deprecated in the future.
|
||||
|
||||
Version 0.2.2
|
||||
--------------------
|
||||
@@ -78,7 +78,7 @@ Version 0.3.5
|
||||
- Support multi-label training, you can provide multiple label in ``handler``. (But LightGBM doesn't support due to the algorithm itself)
|
||||
- Refactor ``handler`` code, dataset.py is no longer used, and you can deploy your own labels and features in ``feature_label_config``
|
||||
- Handler only offer DataFrame. Also, ``trainer`` and model.py only receive DataFrame
|
||||
- Change ``split_rolling_data``, we roll the data on market calender now, not on normal date
|
||||
- Change ``split_rolling_data``, we roll the data on market calendar now, not on normal date
|
||||
- Move some date config from ``handler`` to ``trainer``
|
||||
|
||||
Version 0.4.0
|
||||
@@ -167,11 +167,11 @@ Version 0.8.0
|
||||
- There are lots of changes for daily trading, it is hard to list all of them. But a few important changes could be noticed
|
||||
- The trading limitation is more accurate;
|
||||
- In `previous version <https://github.com/microsoft/qlib/blob/v0.7.2/qlib/contrib/backtest/exchange.py#L160>`_, longing and shorting actions share the same action.
|
||||
- In `current verison <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/backtest/exchange.py#L304>`_, the trading limitation is different between loging and shorting action.
|
||||
- In `current version <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/backtest/exchange.py#L304>`_, the trading limitation is different between logging and shorting action.
|
||||
- The constant is different when calculating annualized metrics.
|
||||
- `Current version <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/contrib/evaluate.py#L42>`_ uses more accurate constant than `previous version <https://github.com/microsoft/qlib/blob/v0.7.2/qlib/contrib/evaluate.py#L22>`_
|
||||
- `A new version <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/tests/data.py#L17>`_ of data is released. Due to the unstability of Yahoo data source, the data may be different after downloading data again.
|
||||
- Users could chec kout the backtesting results between `Current version <https://github.com/microsoft/qlib/tree/7c31012b507a3823117bddcc693fc64899460b2a/examples/benchmarks>`_ and `previous version <https://github.com/microsoft/qlib/tree/v0.7.2/examples/benchmarks>`_
|
||||
- Users could check out the backtesting results between `Current version <https://github.com/microsoft/qlib/tree/7c31012b507a3823117bddcc693fc64899460b2a/examples/benchmarks>`_ and `previous version <https://github.com/microsoft/qlib/tree/v0.7.2/examples/benchmarks>`_
|
||||
|
||||
|
||||
Other Versions
|
||||
|
||||
144
README.md
144
README.md
@@ -11,16 +11,24 @@
|
||||
Recent released features
|
||||
| Feature | Status |
|
||||
| -- | ------ |
|
||||
|Temporal Routing Adaptor (TRA) | [Released](https://github.com/microsoft/qlib/pull/531) on July 30, 2021 |
|
||||
| Transformer & Localformer | [Released](https://github.com/microsoft/qlib/pull/508) on July 22, 2021 |
|
||||
| Release Qlib v0.7.0 | [Released](https://github.com/microsoft/qlib/releases/tag/v0.7.0) on July 12, 2021 |
|
||||
| TCTS Model | [Released](https://github.com/microsoft/qlib/pull/491) on July 1, 2021 |
|
||||
| Online serving and automatic model rolling | :star: [Released](https://github.com/microsoft/qlib/pull/290) on May 17, 2021 |
|
||||
| DoubleEnsemble Model | [Released](https://github.com/microsoft/qlib/pull/286) on Mar 2, 2021 |
|
||||
| High-frequency data processing example | [Released](https://github.com/microsoft/qlib/pull/257) on Feb 5, 2021 |
|
||||
| High-frequency trading example | [Part of code released](https://github.com/microsoft/qlib/pull/227) on Jan 28, 2021 |
|
||||
| High-frequency data(1min) | [Released](https://github.com/microsoft/qlib/pull/221) on Jan 27, 2021 |
|
||||
| Tabnet Model | [Released](https://github.com/microsoft/qlib/pull/205) on Jan 22, 2021 |
|
||||
| Arctic Provider Backend & Orderbook data example | :hammer: [Rleased](https://github.com/microsoft/qlib/pull/744) on Jan 17, 2022 |
|
||||
| Meta-Learning-based framework & DDG-DA | :chart_with_upwards_trend: :hammer: [Released](https://github.com/microsoft/qlib/pull/743) on Jan 10, 2022 |
|
||||
| Planning-based portfolio optimization | :hammer: [Released](https://github.com/microsoft/qlib/pull/754) on Dec 28, 2021 |
|
||||
| Release Qlib v0.8.0 | :octocat: [Released](https://github.com/microsoft/qlib/releases/tag/v0.8.0) on Dec 8, 2021 |
|
||||
| ADD model | :chart_with_upwards_trend: [Released](https://github.com/microsoft/qlib/pull/704) on Nov 22, 2021 |
|
||||
| ADARNN model | :chart_with_upwards_trend: [Released](https://github.com/microsoft/qlib/pull/689) on Nov 14, 2021 |
|
||||
| TCN model | :chart_with_upwards_trend: [Released](https://github.com/microsoft/qlib/pull/668) on Nov 4, 2021 |
|
||||
| Nested Decision Framework | :hammer: [Released](https://github.com/microsoft/qlib/pull/438) on Oct 1, 2021. [Example](https://github.com/microsoft/qlib/blob/main/examples/nested_decision_execution/workflow.py) and [Doc](https://qlib.readthedocs.io/en/latest/component/highfreq.html) |
|
||||
| Temporal Routing Adaptor (TRA) | :chart_with_upwards_trend: [Released](https://github.com/microsoft/qlib/pull/531) on July 30, 2021 |
|
||||
| Transformer & Localformer | :chart_with_upwards_trend: [Released](https://github.com/microsoft/qlib/pull/508) on July 22, 2021 |
|
||||
| Release Qlib v0.7.0 | :octocat: [Released](https://github.com/microsoft/qlib/releases/tag/v0.7.0) on July 12, 2021 |
|
||||
| TCTS Model | :chart_with_upwards_trend: [Released](https://github.com/microsoft/qlib/pull/491) on July 1, 2021 |
|
||||
| Online serving and automatic model rolling | :hammer: [Released](https://github.com/microsoft/qlib/pull/290) on May 17, 2021 |
|
||||
| DoubleEnsemble Model | :chart_with_upwards_trend: [Released](https://github.com/microsoft/qlib/pull/286) on Mar 2, 2021 |
|
||||
| High-frequency data processing example | :hammer: [Released](https://github.com/microsoft/qlib/pull/257) on Feb 5, 2021 |
|
||||
| High-frequency trading example | :chart_with_upwards_trend: [Part of code released](https://github.com/microsoft/qlib/pull/227) on Jan 28, 2021 |
|
||||
| High-frequency data(1min) | :rice: [Released](https://github.com/microsoft/qlib/pull/221) on Jan 27, 2021 |
|
||||
| Tabnet Model | :chart_with_upwards_trend: [Released](https://github.com/microsoft/qlib/pull/205) on Jan 22, 2021 |
|
||||
|
||||
Features released before 2021 are not listed here.
|
||||
|
||||
@@ -44,9 +52,12 @@ For more details, please refer to our paper ["Qlib: An AI-oriented Quantitative
|
||||
- [Data Preparation](#data-preparation)
|
||||
- [Auto Quant Research Workflow](#auto-quant-research-workflow)
|
||||
- [Building Customized Quant Research Workflow by Code](#building-customized-quant-research-workflow-by-code)
|
||||
- [**Quant Model(Paper) Zoo**](#quant-model-paper-zoo)
|
||||
- [Run a single model](#run-a-single-model)
|
||||
- [Run multiple models](#run-multiple-models)
|
||||
- [Main Challenges & Solutions in Quant Research](#main-challenges--solutions-in-quant-research)
|
||||
- [Forecasting: Finding Valuable Signals/Patterns](#forecasting-finding-valuable-signalspatterns)
|
||||
- [**Quant Model (Paper) Zoo**](#quant-model-paper-zoo)
|
||||
- [Run a Single Model](#run-a-single-model)
|
||||
- [Run Multiple Models](#run-multiple-models)
|
||||
- [Adapting to Market Dynamics](#adapting-to-market-dynamics)
|
||||
- [**Quant Dataset Zoo**](#quant-dataset-zoo)
|
||||
- [More About Qlib](#more-about-qlib)
|
||||
- [Offline Mode and Online Mode](#offline-mode-and-online-mode)
|
||||
@@ -61,11 +72,7 @@ New features under development(order by estimated release time).
|
||||
Your feedbacks about the features are very important.
|
||||
| Feature | Status |
|
||||
| -- | ------ |
|
||||
| Planning-based portfolio optimization | Under review: https://github.com/microsoft/qlib/pull/280 |
|
||||
| Fund data supporting and analysis | Under review: https://github.com/microsoft/qlib/pull/292 |
|
||||
| Point-in-Time database | Under review: https://github.com/microsoft/qlib/pull/343 |
|
||||
| High-frequency trading | Under review: https://github.com/microsoft/qlib/pull/408 |
|
||||
| Meta-Learning-based data selection | Initial opensource version under development |
|
||||
|
||||
# Framework of Qlib
|
||||
|
||||
@@ -79,7 +86,7 @@ At the module level, Qlib is a platform that consists of the above components. T
|
||||
| Name | Description |
|
||||
| ------ | ----- |
|
||||
| `Infrastructure` layer | `Infrastructure` layer provides underlying support for Quant research. `DataServer` provides a high-performance infrastructure for users to manage and retrieve raw data. `Trainer` provides a flexible interface to control the training process of models, which enable algorithms to control the training process. |
|
||||
| `Workflow` layer | `Workflow` layer covers the whole workflow of quantitative investment. `Information Extractor` extracts data for models. `Forecast Model` focuses on producing all kinds of forecast signals (e.g. _alpha_, risk) for other modules. With these signals `Portfolio Generator` will generate the target portfolio and produce orders to be executed by `Order Executor`. |
|
||||
| `Workflow` layer | `Workflow` layer covers the whole workflow of quantitative investment. `Information Extractor` extracts data for models. `Forecast Model` focuses on producing all kinds of forecast signals (e.g. _alpha_, risk) for other modules. With these signals `Decision Generator` will generate the target trading decisions(i.e. portfolio, orders) to be executed by `Execution Env` (i.e. the trading market). There may be multiple levels of `Trading Agent` and `Execution Env` (e.g. an _order executor trading agent and intraday order execution environment_ could behave like an interday trading environment and nested in _daily portfolio management trading agent and interday trading environment_ ) |
|
||||
| `Interface` layer | `Interface` layer tries to present a user-friendly interface for the underlying system. `Analyser` module will provide users detailed analysis reports of forecasting signals, portfolios and execution results |
|
||||
|
||||
* The modules with hand-drawn style are under development and will be released in the future.
|
||||
@@ -108,6 +115,7 @@ This table demonstrates the supported Python version of `Qlib`:
|
||||
1. **Conda** is suggested for managing your Python environment.
|
||||
1. Please pay attention that installing cython in Python 3.6 will raise some error when installing ``Qlib`` from source. If users use Python 3.6 on their machines, it is recommended to *upgrade* Python to version 3.7 or use `conda`'s Python to install ``Qlib`` from source.
|
||||
1. For Python 3.9, `Qlib` supports running workflows such as training models, doing backtest and plot most of the related figures (those included in [notebook](examples/workflow_by_code.ipynb)). However, plotting for the *model performance* is not supported for now and we will fix this when the dependent packages are upgraded in the future.
|
||||
1. `Qlib`Requires `tables` package, `hdf5` in tables does not support python3.9.
|
||||
|
||||
### Install with pip
|
||||
Users can easily install ``Qlib`` by pip according to the following command.
|
||||
@@ -129,17 +137,11 @@ Also, users can install the latest dev version ``Qlib`` by the source code accor
|
||||
```
|
||||
|
||||
* Clone the repository and install ``Qlib`` as follows.
|
||||
* If you haven't installed qlib by the command ``pip install pyqlib`` before:
|
||||
```bash
|
||||
git clone https://github.com/microsoft/qlib.git && cd qlib
|
||||
python setup.py install
|
||||
```
|
||||
* If you have already installed the stable version by the command ``pip install pyqlib``:
|
||||
```bash
|
||||
git clone https://github.com/microsoft/qlib.git && cd qlib
|
||||
pip install .
|
||||
```
|
||||
**Note**: **Only** the command ``pip install .`` **can** overwrite the stable version installed by ``pip install pyqlib``, while the command ``python setup.py install`` **can't**.
|
||||
**Note**: You can install Qlib with `python setup.py install` as well. But it is not the recommanded approach. It will skip `pip` and cause obscure problems. For example, **only** the command ``pip install .`` **can** overwrite the stable version installed by ``pip install pyqlib``, while the command ``python setup.py install`` **can't**.
|
||||
|
||||
**Tips**: If you fail to install `Qlib` or run the examples in your environment, comparing your steps and the [CI workflow](.github/workflows/test.yml) may help you find the problem.
|
||||
|
||||
@@ -156,15 +158,17 @@ Load and prepare data by running the following code:
|
||||
|
||||
This dataset is created by public data collected by [crawler scripts](scripts/data_collector/), which have been released in
|
||||
the same repository.
|
||||
Users could create the same dataset with it.
|
||||
Users could create the same dataset with it. [Description of dataset](https://github.com/microsoft/qlib/tree/main/scripts/data_collector#description-of-dataset)
|
||||
|
||||
*Please pay **ATTENTION** that the data is collected from [Yahoo Finance](https://finance.yahoo.com/lookup), and the data might not be perfect.
|
||||
We recommend users to prepare their own data if they have a high-quality dataset. For more information, users can refer to the [related document](https://qlib.readthedocs.io/en/latest/component/data.html#converting-csv-format-into-qlib-format)*.
|
||||
|
||||
### Automatic update of daily frequency data (from yahoo finance)
|
||||
> This step is *Optional* if users only want to try their models and strategies on history data.
|
||||
>
|
||||
> It is recommended that users update the data manually once (--trading_date 2021-05-25) and then set it to update automatically.
|
||||
|
||||
> For more information refer to: [yahoo collector](https://github.com/microsoft/qlib/tree/main/scripts/data_collector/yahoo#automatic-update-of-daily-frequency-datafrom-yahoo-finance)
|
||||
>
|
||||
> For more information, please refer to: [yahoo collector](https://github.com/microsoft/qlib/tree/main/scripts/data_collector/yahoo#automatic-update-of-daily-frequency-datafrom-yahoo-finance)
|
||||
|
||||
* Automatic update of data to the "qlib" directory each trading day(Linux)
|
||||
* use *crontab*: `crontab -e`
|
||||
@@ -189,7 +193,7 @@ We recommend users to prepare their own data if they have a high-quality dataset
|
||||
```python
|
||||
import qlib
|
||||
from qlib.data import D
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
|
||||
# Initialization
|
||||
mount_path = "~/.qlib/qlib_data/cn_data" # target_dir
|
||||
@@ -274,32 +278,45 @@ Qlib provides a tool named `qrun` to run the whole workflow automatically (inclu
|
||||
## Building Customized Quant Research Workflow by Code
|
||||
The automatic workflow may not suit the research workflow of all Quant researchers. To support a flexible Quant research workflow, Qlib also provides a modularized interface to allow researchers to build their own workflow by code. [Here](examples/workflow_by_code.ipynb) is a demo for customized Quant research workflow by code.
|
||||
|
||||
# Main Challenges & Solutions in Quant Research
|
||||
Quant investment is an very unique scenario with lots of key challenges to be solved.
|
||||
Currently, Qlib provides some solutions for several of them.
|
||||
|
||||
# [Quant Model (Paper) Zoo](examples/benchmarks)
|
||||
## Forecasting: Finding Valuable Signals/Patterns
|
||||
Accurate forecasting of the stock price trend is a very important part to construct profitable portfolios.
|
||||
However, huge amount of data with various formats in the financial market which make it challenging to build forecasting models.
|
||||
|
||||
An increasing number of SOTA Quant research works/papers, which focus on building forecasting models to mine valuable signals/patterns in complex financial data, are released in `Qlib`
|
||||
|
||||
|
||||
### [Quant Model (Paper) Zoo](examples/benchmarks)
|
||||
|
||||
Here is a list of models built on `Qlib`.
|
||||
- [GBDT based on XGBoost (Tianqi Chen, et al. KDD 2016)](qlib/contrib/model/xgboost.py)
|
||||
- [GBDT based on LightGBM (Guolin Ke, et al. NIPS 2017)](qlib/contrib/model/gbdt.py)
|
||||
- [GBDT based on Catboost (Liudmila Prokhorenkova, et al. NIPS 2018)](qlib/contrib/model/catboost_model.py)
|
||||
- [MLP based on pytorch](qlib/contrib/model/pytorch_nn.py)
|
||||
- [LSTM based on pytorch (Sepp Hochreiter, et al. Neural omputation 1997)](qlib/contrib/model/pytorch_lstm.py)
|
||||
- [GRU based on pytorch (Kyunghyun Cho, et al. 2014)](qlib/contrib/model/pytorch_gru.py)
|
||||
- [ALSTM based on pytorch (Yao Qin, et al. IJCAI 2017)](qlib/contrib/model/pytorch_alstm.py)
|
||||
- [GATs based on pytorch (Petar Velickovic, et al. 2017)](qlib/contrib/model/pytorch_gats.py)
|
||||
- [SFM based on pytorch (Liheng Zhang, et al. KDD 2017)](qlib/contrib/model/pytorch_sfm.py)
|
||||
- [TFT based on tensorflow (Bryan Lim, et al. International Journal of Forecasting 2019)](examples/benchmarks/TFT/tft.py)
|
||||
- [TabNet based on pytorch (Sercan O. Arik, et al. AAAI 2019)](qlib/contrib/model/pytorch_tabnet.py)
|
||||
- [DoubleEnsemble based on LightGBM (Chuheng Zhang, et al. ICDM 2020)](qlib/contrib/model/double_ensemble.py)
|
||||
- [TCTS based on pytorch (Xueqing Wu, et al. ICML 2021)](qlib/contrib/model/pytorch_tcts.py)
|
||||
- [Transformer based on pytorch (Ashish Vaswani, et al. NeurIPS 2017)](qlib/contrib/model/pytorch_transformer.py)
|
||||
- [Localformer based on pytorch (Juyong Jiang, et al.)](qlib/contrib/model/pytorch_localformer.py)
|
||||
- [TRA based on pytorch (Hengxu, Dong, et al. KDD 2021)](qlib/contrib/model/pytorch_tra.py)
|
||||
- [GBDT based on XGBoost (Tianqi Chen, et al. KDD 2016)](examples/benchmarks/XGBoost/)
|
||||
- [GBDT based on LightGBM (Guolin Ke, et al. NIPS 2017)](examples/benchmarks/LightGBM/)
|
||||
- [GBDT based on Catboost (Liudmila Prokhorenkova, et al. NIPS 2018)](examples/benchmarks/CatBoost/)
|
||||
- [MLP based on pytorch](examples/benchmarks/MLP/)
|
||||
- [LSTM based on pytorch (Sepp Hochreiter, et al. Neural computation 1997)](examples/benchmarks/LSTM/)
|
||||
- [GRU based on pytorch (Kyunghyun Cho, et al. 2014)](examples/benchmarks/GRU/)
|
||||
- [ALSTM based on pytorch (Yao Qin, et al. IJCAI 2017)](examples/benchmarks/ALSTM)
|
||||
- [GATs based on pytorch (Petar Velickovic, et al. 2017)](examples/benchmarks/GATs/)
|
||||
- [SFM based on pytorch (Liheng Zhang, et al. KDD 2017)](examples/benchmarks/SFM/)
|
||||
- [TFT based on tensorflow (Bryan Lim, et al. International Journal of Forecasting 2019)](examples/benchmarks/TFT/)
|
||||
- [TabNet based on pytorch (Sercan O. Arik, et al. AAAI 2019)](examples/benchmarks/TabNet/)
|
||||
- [DoubleEnsemble based on LightGBM (Chuheng Zhang, et al. ICDM 2020)](examples/benchmarks/DoubleEnsemble/)
|
||||
- [TCTS based on pytorch (Xueqing Wu, et al. ICML 2021)](examples/benchmarks/TCTS/)
|
||||
- [Transformer based on pytorch (Ashish Vaswani, et al. NeurIPS 2017)](examples/benchmarks/Transformer/)
|
||||
- [Localformer based on pytorch (Juyong Jiang, et al.)](examples/benchmarks/Localformer/)
|
||||
- [TRA based on pytorch (Hengxu, Dong, et al. KDD 2021)](examples/benchmarks/TRA/)
|
||||
- [TCN based on pytorch (Shaojie Bai, et al. 2018)](examples/benchmarks/TCN/)
|
||||
- [ADARNN based on pytorch (YunTao Du, et al. 2021)](examples/benchmarks/ADARNN/)
|
||||
- [ADD based on pytorch (Hongshun Tang, et al.2020)](examples/benchmarks/ADD/)
|
||||
|
||||
Your PR of new Quant models is highly welcomed.
|
||||
|
||||
The performance of each model on the `Alpha158` and `Alpha360` dataset can be found [here](examples/benchmarks/README.md).
|
||||
|
||||
## Run a single model
|
||||
### Run a single model
|
||||
All the models listed above are runnable with ``Qlib``. Users can find the config files we provide and some details about the model through the [benchmarks](examples/benchmarks) folder. More information can be retrieved at the model files listed above.
|
||||
|
||||
`Qlib` provides three different ways to run a single model, users can pick the one that fits their cases best:
|
||||
@@ -309,7 +326,7 @@ All the models listed above are runnable with ``Qlib``. Users can find the confi
|
||||
- Users can use the script [`run_all_model.py`](examples/run_all_model.py) listed in the `examples` folder to run a model. Here is an example of the specific shell command to be used: `python run_all_model.py run --models=lightgbm`, where the `--models` arguments can take any number of models listed above(the available models can be found in [benchmarks](examples/benchmarks/)). For more use cases, please refer to the file's [docstrings](examples/run_all_model.py).
|
||||
- **NOTE**: Each baseline has different environment dependencies, please make sure that your python version aligns with the requirements(e.g. TFT only supports Python 3.6~3.7 due to the limitation of `tensorflow==1.15.0`)
|
||||
|
||||
## Run multiple models
|
||||
### Run multiple models
|
||||
`Qlib` also provides a script [`run_all_model.py`](examples/run_all_model.py) which can run multiple models for several iterations. (**Note**: the script only support *Linux* for now. Other OS will be supported in the future. Besides, it doesn't support parallel running the same model for multiple times as well, and this will be fixed in the future development too.)
|
||||
|
||||
The script will create a unique virtual environment for each model, and delete the environments after training. Thus, only experiment results such as `IC` and `backtest` results will be generated and stored.
|
||||
@@ -321,6 +338,14 @@ python run_all_model.py run 10
|
||||
|
||||
It also provides the API to run specific models at once. For more use cases, please refer to the file's [docstrings](examples/run_all_model.py).
|
||||
|
||||
## [Adapting to Market Dynamics](examples/benchmarks_dynamic)
|
||||
|
||||
Due to the non-stationary nature of the environment of the financial market, the data distribution may change in different periods, which makes the performance of models build on training data decays in the future test data.
|
||||
So adapting the forecasting models/strategies to market dynamics is very important to the model/strategies' performance.
|
||||
|
||||
Here is a list of solutions built on `Qlib`.
|
||||
- [Rolling Retraining](examples/benchmarks_dynamic/baseline/)
|
||||
- [DDG-DA on pytorch (Wendi, et al. AAAI 2022)](examples/benchmarks_dynamic/DDG-DA/)
|
||||
|
||||
# Quant Dataset Zoo
|
||||
Dataset plays a very important role in Quant. Here is a list of the datasets built on `Qlib`:
|
||||
@@ -391,17 +416,36 @@ Join IM discussion groups:
|
||||
||
|
||||
|
||||
# Contributing
|
||||
We appreciate all contributions and thank all the contributors!
|
||||
<a href="https://github.com/microsoft/qlib/graphs/contributors"><img src="https://contrib.rocks/image?repo=microsoft/qlib" /></a>
|
||||
|
||||
Before we released Qlib as an open-source project on Github in Sep 2020, Qlib is an internal project in our group. Unfortunately, the internal commit history is not kept. A lot of members in our group have also contributed a lot to Qlib, which includes Ruihua Wang, Yinda Zhang, Haisu Yu, Shuyu Wang, Bochen Pang, and [Dong Zhou](https://github.com/evanzd/evanzd). Especially thanks to [Dong Zhou](https://github.com/evanzd/evanzd) due to his initial version of Qlib.
|
||||
|
||||
## Guidance
|
||||
|
||||
This project welcomes contributions and suggestions.
|
||||
**Here are some
|
||||
[code standards](docs/developer/code_standard.rst) when you submit a pull request.**
|
||||
[code standards](docs/developer/code_standard.rst) for submiting a pull request.**
|
||||
|
||||
If you want to contribute to Qlib's document, you can follow the steps in the figure below.
|
||||
Making contributions is not a hard thing. Solving an issue(maybe just answering a question raised in [issues list](https://github.com/microsoft/qlib/issues) or [gitter](https://gitter.im/Microsoft/qlib)), fixing/issuing a bug, improving the documents and even fixing a typo are important contributions to Qlib.
|
||||
|
||||
For example, if you want to contribute to Qlib's document/code, you can follow the steps in the figure below.
|
||||
<p align="center">
|
||||
<img src="https://github.com/demon143/qlib/blob/main/docs/_static/img/change%20doc.gif" />
|
||||
</p>
|
||||
|
||||
If you don't know how to start to contribute, you can refer to the following examples.
|
||||
| Type | Examples |
|
||||
| -- | -- |
|
||||
| Solving issues | [Answer a question](https://github.com/microsoft/qlib/issues/749); [issuing](https://github.com/microsoft/qlib/issues/765) or [fixing](https://github.com/microsoft/qlib/pull/792) a bug |
|
||||
| Docs | [Improve docs quality](https://github.com/microsoft/qlib/pull/797/files) ; [Fix a typo](https://github.com/microsoft/qlib/pull/774) |
|
||||
| Feature | Implement a [requested feature](https://github.com/microsoft/qlib/projects) like [this](https://github.com/microsoft/qlib/pull/754); [Refactor interfaces](https://github.com/microsoft/qlib/pull/539/files) |
|
||||
| Dataset | [Add a dataset](https://github.com/microsoft/qlib/pull/733) |
|
||||
| Models | [Implement a new model](https://github.com/microsoft/qlib/pull/689) |
|
||||
|
||||
If you would like to become one of Qlib's maintainers to contribute more (e.g. help merge PR, triage issues), please contact us by email([qlib@microsoft.com](mailto:qlib@microsoft.com)). We are glad to help you to set the right permission.
|
||||
|
||||
## Licence
|
||||
Most contributions require you to agree to a
|
||||
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
|
||||
the right to use your contribution. For details, visit https://cla.opensource.microsoft.com.
|
||||
|
||||
@@ -1 +0,0 @@
|
||||
0.7.2.99
|
||||
4
docs/_static/img/Task-Gen-Recorder-Collector.svg
vendored
Normal file
4
docs/_static/img/Task-Gen-Recorder-Collector.svg
vendored
Normal file
File diff suppressed because one or more lines are too long
|
After Width: | Height: | Size: 198 KiB |
@@ -11,7 +11,10 @@ Introduction
|
||||
|
||||
The `Workflow <../component/introduction.html>`_ part introduces how to run research workflow in a loosely-coupled way. But it can only execute one ``task`` when you use ``qrun``.
|
||||
To automatically generate and execute different tasks, ``Task Management`` provides a whole process including `Task Generating`_, `Task Storing`_, `Task Training`_ and `Task Collecting`_.
|
||||
With this module, users can run their ``task`` automatically at different periods, in different losses, or even by different models.
|
||||
With this module, users can run their ``task`` automatically at different periods, in different losses, or even by different models.The processes of task generation, model training and combine and collect data are shown in the following figure.
|
||||
|
||||
.. image:: ../_static/img/Task-Gen-Recorder-Collector.svg
|
||||
:align: center
|
||||
|
||||
This whole process can be used in `Online Serving <../component/online.html>`_.
|
||||
|
||||
@@ -74,6 +77,8 @@ If you do not want to use ``Task Manager`` to manage tasks, then use TrainerR to
|
||||
|
||||
Task Collecting
|
||||
===============
|
||||
Before collecting model training results, you need to use the ``qlib.init`` to specify the path of mlruns.
|
||||
|
||||
To collect the results of ``task`` after training, ``Qlib`` provides `Collector <../reference/api.html#Collector>`_, `Group <../reference/api.html#Group>`_ and `Ensemble <../reference/api.html#Ensemble>`_ to collect the results in a readable, expandable and loosely-coupled way.
|
||||
|
||||
`Collector <../reference/api.html#Collector>`_ can collect objects from everywhere and process them such as merging, grouping, averaging and so on. It has 2 step action including ``collect`` (collect anything in a dict) and ``process_collect`` (process collected dict).
|
||||
@@ -82,8 +87,10 @@ To collect the results of ``task`` after training, ``Qlib`` provides `Collector
|
||||
For example: {(A,B,C1): object, (A,B,C2): object} ---``group``---> {(A,B): {C1: object, C2: object}} ---``reduce``---> {(A,B): object}
|
||||
|
||||
`Ensemble <../reference/api.html#Ensemble>`_ can merge the objects in an ensemble.
|
||||
For example: {C1: object, C2: object} ---``Ensemble``---> object
|
||||
For example: {C1: object, C2: object} ---``Ensemble``---> object.
|
||||
You can set the ensembles you want in the ``Collector``'s process_list.
|
||||
Common ensembles include ``AverageEnsemble`` and ``RollingEnsemble``. Average ensemble is used to ensemble the results of different models in the same time period. Rollingensemble is used to ensemble the results of different models in the same time period
|
||||
|
||||
So the hierarchy is ``Collector``'s second step corresponds to ``Group``. And ``Group``'s second step correspond to ``Ensemble``.
|
||||
|
||||
For more information, please see `Collector <../reference/api.html#Collector>`_, `Group <../reference/api.html#Group>`_ and `Ensemble <../reference/api.html#Ensemble>`_, or the `example <https://github.com/microsoft/qlib/tree/main/examples/model_rolling/task_manager_rolling.py>`_.
|
||||
For more information, please see `Collector <../reference/api.html#Collector>`_, `Group <../reference/api.html#Group>`_ and `Ensemble <../reference/api.html#Ensemble>`_, or the `example <https://github.com/microsoft/qlib/tree/main/examples/model_rolling/task_manager_rolling.py>`_.
|
||||
|
||||
@@ -1,114 +0,0 @@
|
||||
.. _backtest:
|
||||
|
||||
============================================
|
||||
Intraday Trading: Model&Strategy Testing
|
||||
============================================
|
||||
.. currentmodule:: qlib
|
||||
|
||||
Introduction
|
||||
===================
|
||||
|
||||
``Intraday Trading`` is designed to test models and strategies, which help users to check the performance of a custom model/strategy.
|
||||
|
||||
|
||||
.. note::
|
||||
|
||||
``Intraday Trading`` uses ``Order Executor`` to trade and execute orders output by ``Portfolio Strategy``. ``Order Executor`` is a component in `Qlib Framework <../introduction/introduction.html#framework>`_, which can execute orders. ``VWAP Executor`` and ``Close Executor`` is supported by ``Qlib`` now. In the future, ``Qlib`` will support ``HighFreq Executor`` also.
|
||||
|
||||
|
||||
|
||||
Example
|
||||
===========================
|
||||
|
||||
Users need to generate a `prediction score`(a pandas DataFrame) with MultiIndex<instrument, datetime> and a `score` column. And users need to assign a strategy used in backtest, if strategy is not assigned,
|
||||
a `TopkDropoutStrategy` strategy with `(topk=50, n_drop=5, risk_degree=0.95, limit_threshold=0.0095)` will be used.
|
||||
If ``Strategy`` module is not users' interested part, `TopkDropoutStrategy` is enough.
|
||||
|
||||
The simple example of the default strategy is as follows.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from qlib.contrib.evaluate import backtest
|
||||
# pred_score is the prediction score
|
||||
report, positions = backtest(pred_score, topk=50, n_drop=0.5, limit_threshold=0.0095)
|
||||
|
||||
To know more about backtesting with a specific ``Strategy``, please refer to `Portfolio Strategy <strategy.html>`_.
|
||||
|
||||
To know more about the prediction score `pred_score` output by ``Forecast Model``, please refer to `Forecast Model: Model Training & Prediction <model.html>`_.
|
||||
|
||||
Prediction Score
|
||||
-----------------
|
||||
|
||||
The `prediction score` is a pandas DataFrame. Its index is <datetime(pd.Timestamp), instrument(str)> and it must
|
||||
contains a `score` column.
|
||||
|
||||
A prediction sample is shown as follows.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
datetime instrument score
|
||||
2019-01-04 SH600000 -0.505488
|
||||
2019-01-04 SZ002531 -0.320391
|
||||
2019-01-04 SZ000999 0.583808
|
||||
2019-01-04 SZ300569 0.819628
|
||||
2019-01-04 SZ001696 -0.137140
|
||||
... ...
|
||||
2019-04-30 SZ000996 -1.027618
|
||||
2019-04-30 SH603127 0.225677
|
||||
2019-04-30 SH603126 0.462443
|
||||
2019-04-30 SH603133 -0.302460
|
||||
2019-04-30 SZ300760 -0.126383
|
||||
|
||||
``Forecast Model`` module can make predictions, please refer to `Forecast Model: Model Training & Prediction <model.html>`_.
|
||||
|
||||
Backtest Result
|
||||
------------------
|
||||
|
||||
The backtest results are in the following form:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
risk
|
||||
excess_return_without_cost mean 0.000605
|
||||
std 0.005481
|
||||
annualized_return 0.152373
|
||||
information_ratio 1.751319
|
||||
max_drawdown -0.059055
|
||||
excess_return_with_cost mean 0.000410
|
||||
std 0.005478
|
||||
annualized_return 0.103265
|
||||
information_ratio 1.187411
|
||||
max_drawdown -0.075024
|
||||
|
||||
|
||||
|
||||
- `excess_return_without_cost`
|
||||
- `mean`
|
||||
Mean value of the `CAR` (cumulative abnormal return) without cost
|
||||
- `std`
|
||||
The `Standard Deviation` of `CAR` (cumulative abnormal return) without cost.
|
||||
- `annualized_return`
|
||||
The `Annualized Rate` of `CAR` (cumulative abnormal return) without cost.
|
||||
- `information_ratio`
|
||||
The `Information Ratio` without cost. please refer to `Information Ratio – IR <https://www.investopedia.com/terms/i/informationratio.asp>`_.
|
||||
- `max_drawdown`
|
||||
The `Maximum Drawdown` of `CAR` (cumulative abnormal return) without cost, please refer to `Maximum Drawdown (MDD) <https://www.investopedia.com/terms/m/maximum-drawdown-mdd.asp>`_.
|
||||
|
||||
- `excess_return_with_cost`
|
||||
- `mean`
|
||||
Mean value of the `CAR` (cumulative abnormal return) series with cost
|
||||
- `std`
|
||||
The `Standard Deviation` of `CAR` (cumulative abnormal return) series with cost.
|
||||
- `annualized_return`
|
||||
The `Annualized Rate` of `CAR` (cumulative abnormal return) with cost.
|
||||
- `information_ratio`
|
||||
The `Information Ratio` with cost. please refer to `Information Ratio – IR <https://www.investopedia.com/terms/i/informationratio.asp>`_.
|
||||
- `max_drawdown`
|
||||
The `Maximum Drawdown` of `CAR` (cumulative abnormal return) with cost, please refer to `Maximum Drawdown (MDD) <https://www.investopedia.com/terms/m/maximum-drawdown-mdd.asp>`_.
|
||||
|
||||
|
||||
|
||||
Reference
|
||||
==============
|
||||
|
||||
To know more about ``Intraday Trading``, please refer to `Intraday Trading <../reference/api.html#module-qlib.contrib.evaluate>`_.
|
||||
@@ -21,6 +21,12 @@ The introduction of ``Data Layer`` includes the following parts.
|
||||
- Cache
|
||||
- Data and Cache File Structure
|
||||
|
||||
Here is a typical example of Qlib data workflow
|
||||
|
||||
- Users download data and converting data into Qlib format(with filename suffix `.bin`). In this step, typically only some basic data are stored on disk(such as OHLCV).
|
||||
- Creating some basic features based on Qlib's expression Engine(e.g. "Ref($close, 60) / $close", the return of last 60 trading days). Supported operators in the expression engine can be found `here <https://github.com/microsoft/qlib/blob/main/qlib/data/ops.py>`_. This step is typically implemented in Qlib's `Data Loader <https://qlib.readthedocs.io/en/latest/component/data.html#data-loader>`_ which is a component of `Data Handler <https://qlib.readthedocs.io/en/latest/component/data.html#data-handler>`_ .
|
||||
- If users require more complicated data processing (e.g. data normalization), `Data Handler <https://qlib.readthedocs.io/en/latest/component/data.html#data-handler>`_ support user-customized processors to process data(some predefined processors can be found `here <https://github.com/microsoft/qlib/blob/main/qlib/data/dataset/processor.py>`_). The processors are different from operators in expression engine. It is designed for some complicated data processing methods which is hard to supported in operators in expression engine.
|
||||
- At last, `Dataset <https://qlib.readthedocs.io/en/latest/component/data.html#dataset>`_ is responsible to prepare model-specific dataset from the processed data of Data Handler
|
||||
|
||||
Data Preparation
|
||||
============================
|
||||
@@ -46,6 +52,7 @@ Also, ``Qlib`` provides a high-frequency dataset. Users can run a high-frequency
|
||||
Qlib Format Dataset
|
||||
--------------------
|
||||
``Qlib`` has provided an off-the-shelf dataset in `.bin` format, users could use the script ``scripts/get_data.py`` to download the China-Stock dataset as follows.
|
||||
The price volume data look different from the actual dealling price because of they are **adjusted** (`adjusted price <https://www.investopedia.com/terms/a/adjusted_closing_price.asp>`_). And then you may find that the adjusted price may be different from different data sources. This is because different data sources may vary in the way of adjusting prices. Qlib normalize the price on first trading day of each stock to 1 when adjusting them.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
@@ -213,7 +220,7 @@ The `trade unit` defines the unit number of stocks can be used in a trade, and t
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
qlib.init(provider_uri='~/.qlib/qlib_data/cn_data', region=REG_CN)
|
||||
|
||||
|
||||
@@ -338,7 +345,7 @@ DataHandlerLP
|
||||
|
||||
In addition to use ``Data Handler`` in an automatic workflow with ``qrun``, ``Data Handler`` can be used as an independent module, by which users can easily preprocess data (standardization, remove NaN, etc.) and build datasets.
|
||||
|
||||
In order to achieve so, ``Qlib`` provides a base class `qlib.data.dataset.DataHandlerLP <../reference/api.html#qlib.data.dataset.handler.DataHandlerLP>`_. The core idea of this class is that: we will have some leanable ``Processors`` which can learn the parameters of data processing(e.g., parameters for zscore normalization). When new data comes in, these `trained` ``Processors`` can then process the new data and thus processing real-time data in an efficient way becomes possible. More information about ``Processors`` will be listed in the next subsection.
|
||||
In order to achieve so, ``Qlib`` provides a base class `qlib.data.dataset.DataHandlerLP <../reference/api.html#qlib.data.dataset.handler.DataHandlerLP>`_. The core idea of this class is that: we will have some learnable ``Processors`` which can learn the parameters of data processing(e.g., parameters for zscore normalization). When new data comes in, these `trained` ``Processors`` can then process the new data and thus processing real-time data in an efficient way becomes possible. More information about ``Processors`` will be listed in the next subsection.
|
||||
|
||||
|
||||
Interface
|
||||
|
||||
@@ -1,120 +1,31 @@
|
||||
.. _highfreq:
|
||||
|
||||
============================================
|
||||
Design of hierarchical order execution framework
|
||||
Design of Nested Decision Execution Framework for High-Frequency Trading
|
||||
============================================
|
||||
.. currentmodule:: qlib
|
||||
|
||||
Introduction
|
||||
===================
|
||||
|
||||
In order to support reinforcement learning algorithms for high-frequency trading, a corresponding framework is required. None of the publicly available high-frequency trading frameworks now consider multi-layer trading mechanisms, and the currently designed algorithms cannot directly use existing frameworks.
|
||||
In addition to supporting the basic intraday multi-layer trading, the linkage with the day-ahead strategy is also a factor that affects the performance evaluation of the strategy. Different day strategies generate different order distributions and different patterns on different stocks. To verify that high-frequency trading strategies perform well on real trading orders, it is necessary to support day-frequency and high-frequency multi-level linkage trading. In addition to more accurate backtesting of high-frequency trading algorithms, if the distribution of day-frequency orders is considered when training a high-frequency trading model, the algorithm can also be optimized more for product-specific day-frequency orders.
|
||||
Therefore, innovation in the high-frequency trading framework is necessary to solve the various problems mentioned above, for which we designed a hierarchical order execution framework that can link daily-frequency and intra-day trading at different granularities.
|
||||
Daily trading (e.g. portfolio management) and intraday trading (e.g. orders execution) are two hot topics in Quant investment and usually studied separately.
|
||||
|
||||
To get the join trading performance of daily and intraday trading, they must interact with each other and run backtest jointly.
|
||||
In order to support the joint backtest strategies in multiple levels, a corresponding framework is required. None of the publicly available high-frequency trading frameworks considers multi-level joint trading, which make the backtesting aforementioned inaccurate.
|
||||
|
||||
Besides backtesting, the optimization of strategies from different levels is not standalone and can be affected by each other.
|
||||
For example, the best portfolio management strategy may change with the performance of order executions(e.g. a portfolio with higher turnover may becomes a better choice when we improve the order execution strategies).
|
||||
To achieve the overall good performance , it is necessary to consider the interaction of strategies in different level.
|
||||
|
||||
Therefore, building a new framework for trading in multiple levels becomes necessary to solve the various problems mentioned above, for which we designed a nested decision execution framework that consider the interaction of strategies.
|
||||
|
||||
.. image:: ../_static/img/framework.svg
|
||||
|
||||
The design of the framework is shown in the figure above. At each layer consists of Trading Agent and Execution Env. The Trading Agent has its own data processing module (Information Extractor), forecasting module (Forecast Model) and decision generator (Decision Generator). The trading algorithm generates the corresponding decisions by the Decision Generator based on the forecast signals output by the Forecast Module, and the decisions generated by the trading algorithm are passed to the Execution Env, which returns the execution results. Here the frequency of trading algorithm, decision content and execution environment can be customized by users (e.g. intra-day trading, daily-frequency trading, weekly-frequency trading), and the execution environment can be nested with finer-grained trading algorithm and execution environment inside (i.e. sub-workflow in the figure, e.g. daily-frequency orders can be turned into finer-grained decisions by splitting orders within the day). The hierarchical order execution framework is user-defined in terms of hierarchy division and decision frequency, making it easy for users to explore the effects of combining different levels of trading algorithms and breaking down the barriers between different levels of trading algorithm optimization.
|
||||
In addition to the innovation in the framework, the hierarchical order execution framework also takes into account various details of the real backtesting environment, minimizing the differences with the final real environment as much as possible. At the same time, the framework is designed to unify the interface between online and offline (e.g. data pre-processing level supports using the same set of code to process both offline and online data) to reduce the cost of strategy go-live as much as possible.
|
||||
|
||||
Prepare Data
|
||||
===================
|
||||
.. _data:: ../../examples/highfreq/README.md
|
||||
The design of the framework is shown in the yellow part in the middle of the figure above. Each level consists of ``Trading Agent`` and ``Execution Env``. ``Trading Agent`` has its own data processing module (``Information Extractor``), forecasting module (``Forecast Model``) and decision generator (``Decision Generator``). The trading algorithm generates the decisions by the ``Decision Generator`` based on the forecast signals output by the ``Forecast Module``, and the decisions generated by the trading algorithm are passed to the ``Execution Env``, which returns the execution results.
|
||||
|
||||
The frequency of trading algorithm, decision content and execution environment can be customized by users (e.g. intraday trading, daily-frequency trading, weekly-frequency trading), and the execution environment can be nested with finer-grained trading algorithm and execution environment inside (i.e. sub-workflow in the figure, e.g. daily-frequency orders can be turned into finer-grained decisions by splitting orders within the day). The flexibility of nested decision execution framework makes it easy for users to explore the effects of combining different levels of trading strategies and break down the optimization barriers between different levels of trading algorithm.
|
||||
|
||||
Example
|
||||
===========================
|
||||
|
||||
Here is an example of highfreq execution.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import qlib
|
||||
# init qlib
|
||||
provider_uri_day = "~/.qlib/qlib_data/cn_data"
|
||||
provider_uri_1min = "~/.qlib/qlib_data/cn_data_1min"
|
||||
provider_uri_map = {"1min": provider_uri_1min, "day": provider_uri_day}
|
||||
qlib.init(provider_uri=provider_uri_day, expression_cache=None, dataset_cache=None)
|
||||
|
||||
# data freq and backtest time
|
||||
freq = "1min"
|
||||
inst_list = D.list_instruments(D.instruments("all"), as_list=True)
|
||||
start_time = "2020-01-01"
|
||||
start_time = "2020-01-31"
|
||||
|
||||
When initializing qlib, if the default data is used, then both daily and minute frequency data need to be passed in.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# random order strategy config
|
||||
strategy_config = {
|
||||
"class": "RandomOrderStrategy",
|
||||
"module_path": "qlib.contrib.strategy.rule_strategy",
|
||||
"kwargs": {
|
||||
"trade_range": TradeRangeByTime("9:30", "15:00"),
|
||||
"sample_ratio": 1.0,
|
||||
"volume_ratio": 0.01,
|
||||
"market": market,
|
||||
},
|
||||
}
|
||||
|
||||
.. code-block:: python
|
||||
# backtest config
|
||||
backtest_config = {
|
||||
"start_time": start_time,
|
||||
"end_time": end_time,
|
||||
"account": 100000000,
|
||||
"benchmark": None,
|
||||
"exchange_kwargs": {
|
||||
"freq": freq,
|
||||
"limit_threshold": 0.095,
|
||||
"deal_price": "close",
|
||||
"open_cost": 0.0005,
|
||||
"close_cost": 0.0015,
|
||||
"min_cost": 5,
|
||||
"codes": market,
|
||||
},
|
||||
"pos_type": "InfPosition", # Position with infinitive position
|
||||
}
|
||||
|
||||
please refer to "../../qlib/backtest".
|
||||
|
||||
.. code-block:: python
|
||||
# excutor config
|
||||
executor_config = {
|
||||
"class": "NestedExecutor",
|
||||
"module_path": "qlib.backtest.executor",
|
||||
"kwargs": {
|
||||
"time_per_step": "day",
|
||||
"inner_executor": {
|
||||
"class": "SimulatorExecutor",
|
||||
"module_path": "qlib.backtest.executor",
|
||||
"kwargs": {
|
||||
"time_per_step": freq,
|
||||
"generate_portfolio_metrics": True,
|
||||
"verbose": False,
|
||||
# "verbose": True,
|
||||
"indicator_config": {
|
||||
"show_indicator": False,
|
||||
},
|
||||
},
|
||||
},
|
||||
"inner_strategy": {
|
||||
"class": "TWAPStrategy",
|
||||
"module_path": "qlib.contrib.strategy.rule_strategy",
|
||||
},
|
||||
"track_data": True,
|
||||
"generate_portfolio_metrics": True,
|
||||
"indicator_config": {
|
||||
"show_indicator": True,
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
NestedExecutor represents not the innermost layer, the initialization parameters should contain inner_executor and inner_strategy. simulatorExecutor represents the current excutor is the innermost layer, the innermost strategy used here is the TWAP strategy, the framework currently also supports the VWAP strategy
|
||||
|
||||
.. code-block:: python
|
||||
# backtest
|
||||
portfolio_metrics_dict, indicator_dict = backtest(executor=executor_config, strategy=strategy_config, **backtest_config)
|
||||
|
||||
The metrics of backtest are included in the portfolio_metrics_dict and indicator_dict.
|
||||
An example of nested decision execution framework for high-frequency can be found `here <https://github.com/microsoft/qlib/blob/main/examples/nested_decision_execution/workflow.py>`_.
|
||||
68
docs/component/meta.rst
Normal file
68
docs/component/meta.rst
Normal file
@@ -0,0 +1,68 @@
|
||||
.. _meta:
|
||||
|
||||
=================================
|
||||
Meta Controller: Meta-Task & Meta-Dataset & Meta-Model
|
||||
=================================
|
||||
.. currentmodule:: qlib
|
||||
|
||||
|
||||
Introduction
|
||||
=============
|
||||
``Meta Controller`` provides guidance to ``Forecast Model``, which aims to learn regular patterns among a series of forecasting tasks and use learned patterns to guide forthcoming forecasting tasks. Users can implement their own meta-model instance based on ``Meta Controller`` module.
|
||||
|
||||
Meta Task
|
||||
=============
|
||||
|
||||
A `Meta Task` instance is the basic element in the meta-learning framework. It saves the data that can be used for the `Meta Model`. Multiple `Meta Task` instances may share the same `Data Handler`, controlled by `Meta Dataset`. Users should use `prepare_task_data()` to obtain the data that can be directly fed into the `Meta Model`.
|
||||
|
||||
.. autoclass:: qlib.model.meta.task.MetaTask
|
||||
:members:
|
||||
|
||||
Meta Dataset
|
||||
=============
|
||||
|
||||
`Meta Dataset` controls the meta-information generating process. It is on the duty of providing data for training the `Meta Model`. Users should use `prepare_tasks` to retrieve a list of `Meta Task` instances.
|
||||
|
||||
.. autoclass:: qlib.model.meta.dataset.MetaTaskDataset
|
||||
:members:
|
||||
|
||||
Meta Model
|
||||
=============
|
||||
|
||||
General Meta Model
|
||||
------------------
|
||||
`Meta Model` instance is the part that controls the workflow. The usage of the `Meta Model` includes:
|
||||
1. Users train their `Meta Model` with the `fit` function.
|
||||
2. The `Meta Model` instance guides the workflow by giving useful information via the `inference` function.
|
||||
|
||||
.. autoclass:: qlib.model.meta.model.MetaModel
|
||||
:members:
|
||||
|
||||
Meta Task Model
|
||||
------------------
|
||||
This type of meta-model may interact with task definitions directly. Then, the `Meta Task Model` is the class for them to inherit from. They guide the base tasks by modifying the base task definitions. The function `prepare_tasks` can be used to obtain the modified base task definitions.
|
||||
|
||||
.. autoclass:: qlib.model.meta.model.MetaTaskModel
|
||||
:members:
|
||||
|
||||
Meta Guide Model
|
||||
------------------
|
||||
This type of meta-model participates in the training process of the base forecasting model. The meta-model may guide the base forecasting models during their training to improve their performances.
|
||||
|
||||
.. autoclass:: qlib.model.meta.model.MetaGuideModel
|
||||
:members:
|
||||
|
||||
|
||||
Example
|
||||
=============
|
||||
``Qlib`` provides an implementation of ``Meta Model`` module, ``DDG-DA``,
|
||||
which adapts to the market dynamics.
|
||||
|
||||
``DDG-DA`` includes four steps:
|
||||
|
||||
1. Calculate meta-information and encapsulate it into ``Meta Task`` instances. All the meta-tasks form a ``Meta Dataset`` instance.
|
||||
2. Train ``DDG-DA`` based on the training data of the meta-dataset.
|
||||
3. Do the inference of the ``DDG-DA`` to get guide information.
|
||||
4. Apply guide information to the forecasting models to improve their performances.
|
||||
|
||||
The `above example <https://github.com/microsoft/qlib/tree/main/examples/benchmarks_dynamic/DDG-DA>`_ can be found in ``examples/benchmarks_dynamic/DDG-DA/workflow.py``.
|
||||
@@ -106,6 +106,9 @@ Example
|
||||
`SignalRecord` is the `Record Template` in ``Qlib``, please refer to `Workflow <recorder.html#record-template>`_.
|
||||
|
||||
Also, the above example has been given in ``examples/train_backtest_analyze.ipynb``.
|
||||
Technically, the meaning of the model prediction depends on the label setting designed by user.
|
||||
By default, the meaning of the score is normally the rating of the instruments by the forecasting model. The higher the score, the more profit the instruments.
|
||||
|
||||
|
||||
Custom Model
|
||||
===================
|
||||
|
||||
@@ -23,6 +23,10 @@ The `examples <https://github.com/microsoft/qlib/tree/main/examples/online_srv>`
|
||||
|
||||
**NOTE**: User should keep his data source updated to support online serving. For example, Qlib provides `a batch of scripts <https://github.com/microsoft/qlib/blob/main/scripts/data_collector/yahoo/README.md#automatic-update-of-daily-frequency-datafrom-yahoo-finance>`_ to help users update Yahoo daily data.
|
||||
|
||||
Known limitations currently
|
||||
- Currently, the daily updating prediction for the next trading day is supported. But generating orders for the next trading day is not supported due to the `limitations of public data <https://github.com/microsoft/qlib/issues/215#issuecomment-766293563>_`
|
||||
|
||||
|
||||
Online Manager
|
||||
=============
|
||||
|
||||
|
||||
@@ -37,7 +37,7 @@ Here is a general view of the structure of the system:
|
||||
|
||||
This experiment management system defines a set of interface and provided a concrete implementation ``MLflowExpManager``, which is based on the machine learning platform: ``MLFlow`` (`link <https://mlflow.org/>`_).
|
||||
|
||||
If users set the implementation of ``ExpManager`` to be ``MLflowExpManager``, they can use the command `mlflow ui` to visualize and check the experiment results. For more information, pleaes refer to the related documents `here <https://www.mlflow.org/docs/latest/cli.html#mlflow-ui>`_.
|
||||
If users set the implementation of ``ExpManager`` to be ``MLflowExpManager``, they can use the command `mlflow ui` to visualize and check the experiment results. For more information, please refer to the related documents `here <https://www.mlflow.org/docs/latest/cli.html#mlflow-ui>`_.
|
||||
|
||||
Qlib Recorder
|
||||
===================
|
||||
|
||||
@@ -8,11 +8,13 @@ Portfolio Strategy: Portfolio Management
|
||||
Introduction
|
||||
===================
|
||||
|
||||
``Portfolio Strategy`` is designed to adopt different portfolio strategies, which means that users can adopt different algorithms to generate investment portfolios based on the prediction scores of the ``Forecast Model``. Users can use the ``Portfolio Strategy`` in an automatic workflow by ``Workflow`` module, please refer to `Workflow: Workflow Management <workflow.html>`_.
|
||||
``Portfolio Strategy`` is designed to adopt different portfolio strategies, which means that users can adopt different algorithms to generate investment portfolios based on the prediction scores of the ``Forecast Model``. Users can use the ``Portfolio Strategy`` in an automatic workflow by ``Workflow`` module, please refer to `Workflow: Workflow Management <workflow.html>`_.
|
||||
|
||||
Because the components in ``Qlib`` are designed in a loosely-coupled way, ``Portfolio Strategy`` can be used as an independent module also.
|
||||
|
||||
``Qlib`` provides several implemented portfolio strategies. Also, ``Qlib`` supports custom strategy, users can customize strategies according to their own needs.
|
||||
``Qlib`` provides several implemented portfolio strategies. Also, ``Qlib`` supports custom strategy, users can customize strategies according to their own requirements.
|
||||
|
||||
After users specifying the models(forecasting signals) and strategies, running backtest will help users to check the performance of a custom model(forecasting signals)/strategy.
|
||||
|
||||
Base Class & Interface
|
||||
======================
|
||||
@@ -20,20 +22,22 @@ Base Class & Interface
|
||||
BaseStrategy
|
||||
------------------
|
||||
|
||||
Qlib provides a base class ``qlib.contrib.strategy.BaseStrategy``. All strategy classes need to inherit the base class and implement its interface.
|
||||
Qlib provides a base class ``qlib.strategy.base.BaseStrategy``. All strategy classes need to inherit the base class and implement its interface.
|
||||
|
||||
- `get_risk_degree`
|
||||
Return the proportion of your total value you will use in investment. Dynamically risk_degree will result in Market timing.
|
||||
|
||||
- `generate_order_list`
|
||||
Return the order list.
|
||||
Return the order list.
|
||||
The frequency to call this method depends on the executor frequency("time_per_step"="day" by default). But the trading frequency can be decided by users' implementation.
|
||||
For example, if the user wants to trading in weekly while the `time_per_step` is "day" in executor, user can return non-empty TradeDecision weekly(otherwise return empty like `this <https://github.com/microsoft/qlib/blob/main/qlib/contrib/strategy/signal_strategy.py#L132>`_ ).
|
||||
|
||||
Users can inherit `BaseStrategy` to customize their strategy class.
|
||||
|
||||
WeightStrategyBase
|
||||
--------------------
|
||||
|
||||
Qlib also provides a class ``qlib.contrib.strategy.WeightStrategyBase`` that is a subclass of `BaseStrategy`.
|
||||
Qlib also provides a class ``qlib.contrib.strategy.WeightStrategyBase`` that is a subclass of `BaseStrategy`.
|
||||
|
||||
`WeightStrategyBase` only focuses on the target positions, and automatically generates an order list based on positions. It provides the `generate_target_weight_position` interface.
|
||||
|
||||
@@ -69,51 +73,229 @@ TopkDropoutStrategy
|
||||
|
||||
- `Topk`: The number of stocks held
|
||||
- `Drop`: The number of stocks sold on each trading day
|
||||
|
||||
|
||||
Currently, the number of held stocks is `Topk`.
|
||||
On each trading day, the `Drop` number of held stocks with the worst `prediction score` will be sold, and the same number of unheld stocks with the best `prediction score` will be bought.
|
||||
|
||||
|
||||
.. image:: ../_static/img/topk_drop.png
|
||||
:alt: Topk-Drop
|
||||
|
||||
``TopkDrop`` algorithm sells `Drop` stocks every trading day, which guarantees a fixed turnover rate.
|
||||
|
||||
|
||||
- Generate the order list from the target amount
|
||||
|
||||
EnhancedIndexingStrategy
|
||||
------------------------
|
||||
`EnhancedIndexingStrategy` Enhanced indexing combines the arts of active management and passive management,
|
||||
with the aim of outperforming a benchmark index (e.g., S&P 500) in terms of portfolio return while controlling
|
||||
the risk exposure (a.k.a. tracking error).
|
||||
|
||||
For more information, please refer to `qlib.contrib.strategy.signal_strategy.EnhancedIndexingStrategy`
|
||||
and `qlib.contrib.strategy.optimizer.enhanced_indexing.EnhancedIndexingOptimizer`.
|
||||
|
||||
|
||||
Usage & Example
|
||||
====================
|
||||
``Portfolio Strategy`` can be specified in the ``Intraday Trading(Backtest)``, the example is as follows.
|
||||
|
||||
First, user can create a model to get trading signals(the variable name is ``pred_score`` in following cases).
|
||||
|
||||
Prediction Score
|
||||
-----------------
|
||||
|
||||
The `prediction score` is a pandas DataFrame. Its index is <datetime(pd.Timestamp), instrument(str)> and it must
|
||||
contains a `score` column.
|
||||
|
||||
A prediction sample is shown as follows.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from qlib.contrib.strategy.strategy import TopkDropoutStrategy
|
||||
from qlib.contrib.evaluate import backtest
|
||||
STRATEGY_CONFIG = {
|
||||
"topk": 50,
|
||||
"n_drop": 5,
|
||||
}
|
||||
BACKTEST_CONFIG = {
|
||||
"limit_threshold": 0.095,
|
||||
"account": 100000000,
|
||||
"benchmark": BENCHMARK,
|
||||
"deal_price": "close",
|
||||
"open_cost": 0.0005,
|
||||
"close_cost": 0.0015,
|
||||
"min_cost": 5,
|
||||
|
||||
}
|
||||
# use default strategy
|
||||
strategy = TopkDropoutStrategy(**STRATEGY_CONFIG)
|
||||
datetime instrument score
|
||||
2019-01-04 SH600000 -0.505488
|
||||
2019-01-04 SZ002531 -0.320391
|
||||
2019-01-04 SZ000999 0.583808
|
||||
2019-01-04 SZ300569 0.819628
|
||||
2019-01-04 SZ001696 -0.137140
|
||||
... ...
|
||||
2019-04-30 SZ000996 -1.027618
|
||||
2019-04-30 SH603127 0.225677
|
||||
2019-04-30 SH603126 0.462443
|
||||
2019-04-30 SH603133 -0.302460
|
||||
2019-04-30 SZ300760 -0.126383
|
||||
|
||||
# pred_score is the `prediction score` output by Model
|
||||
report_normal, positions_normal = backtest(
|
||||
pred_score, strategy=strategy, **BACKTEST_CONFIG
|
||||
)
|
||||
``Forecast Model`` module can make predictions, please refer to `Forecast Model: Model Training & Prediction <model.html>`_.
|
||||
|
||||
To know more about the `prediction score` `pred_score` output by ``Forecast Model``, please refer to `Forecast Model: Model Training & Prediction <model.html>`_.
|
||||
Normally, the prediction score is the output of the models. But some models are learned from a label with a different scale. So the scale of the prediction score may be different from your expectation(e.g. the return of instruments).
|
||||
|
||||
Qlib didn't add a step to scale the prediction score to a unified scale. Because not every trading strategy cares about the scale(e.g. TopkDropoutStrategy only cares about the order). So the strategy is responsible for rescaling the prediction score(e.g. some portfolio-optimization-based strategies may require a meaningful scale).
|
||||
|
||||
Running backtest
|
||||
-----------------
|
||||
|
||||
- In most cases, users could backtest their portfolio management strategy with ``backtest_daily``.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from pprint import pprint
|
||||
|
||||
import qlib
|
||||
import pandas as pd
|
||||
from qlib.utils.time import Freq
|
||||
from qlib.utils import flatten_dict
|
||||
from qlib.contrib.evaluate import backtest_daily
|
||||
from qlib.contrib.evaluate import risk_analysis
|
||||
from qlib.contrib.strategy import TopkDropoutStrategy
|
||||
|
||||
# init qlib
|
||||
qlib.init(provider_uri=<qlib data dir>)
|
||||
|
||||
CSI300_BENCH = "SH000300"
|
||||
STRATEGY_CONFIG = {
|
||||
"topk": 50,
|
||||
"n_drop": 5,
|
||||
# pred_score, pd.Series
|
||||
"signal": pred_score,
|
||||
}
|
||||
|
||||
|
||||
strategy_obj = TopkDropoutStrategy(**STRATEGY_CONFIG)
|
||||
report_normal, positions_normal = backtest_daily(
|
||||
start_time="2017-01-01", end_time="2020-08-01", strategy=strategy_obj
|
||||
)
|
||||
analysis = dict()
|
||||
analysis["excess_return_without_cost"] = risk_analysis(
|
||||
report_normal["return"] - report_normal["bench"], freq=analysis_freq
|
||||
)
|
||||
analysis["excess_return_with_cost"] = risk_analysis(
|
||||
report_normal["return"] - report_normal["bench"] - report_normal["cost"], freq=analysis_freq
|
||||
)
|
||||
|
||||
analysis_df = pd.concat(analysis) # type: pd.DataFrame
|
||||
pprint(analysis_df)
|
||||
|
||||
|
||||
|
||||
- If users would like to control their strategies in a more detailed(e.g. users have a more advanced version of executor), user could follow this example.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from pprint import pprint
|
||||
|
||||
import qlib
|
||||
import pandas as pd
|
||||
from qlib.utils.time import Freq
|
||||
from qlib.utils import flatten_dict
|
||||
from qlib.backtest import backtest, executor
|
||||
from qlib.contrib.evaluate import risk_analysis
|
||||
from qlib.contrib.strategy import TopkDropoutStrategy
|
||||
|
||||
# init qlib
|
||||
qlib.init(provider_uri=<qlib data dir>)
|
||||
|
||||
CSI300_BENCH = "SH000300"
|
||||
FREQ = "day"
|
||||
STRATEGY_CONFIG = {
|
||||
"topk": 50,
|
||||
"n_drop": 5,
|
||||
# pred_score, pd.Series
|
||||
"signal": pred_score,
|
||||
}
|
||||
|
||||
EXECUTOR_CONFIG = {
|
||||
"time_per_step": "day",
|
||||
"generate_portfolio_metrics": True,
|
||||
}
|
||||
|
||||
backtest_config = {
|
||||
"start_time": "2017-01-01",
|
||||
"end_time": "2020-08-01",
|
||||
"account": 100000000,
|
||||
"benchmark": CSI300_BENCH,
|
||||
"exchange_kwargs": {
|
||||
"freq": FREQ,
|
||||
"limit_threshold": 0.095,
|
||||
"deal_price": "close",
|
||||
"open_cost": 0.0005,
|
||||
"close_cost": 0.0015,
|
||||
"min_cost": 5,
|
||||
},
|
||||
}
|
||||
|
||||
# strategy object
|
||||
strategy_obj = TopkDropoutStrategy(**STRATEGY_CONFIG)
|
||||
# executor object
|
||||
executor_obj = executor.SimulatorExecutor(**EXECUTOR_CONFIG)
|
||||
# backtest
|
||||
portfolio_metric_dict, indicator_dict = backtest(executor=executor_obj, strategy=strategy_obj, **backtest_config)
|
||||
analysis_freq = "{0}{1}".format(*Freq.parse(FREQ))
|
||||
# backtest info
|
||||
report_normal, positions_normal = portfolio_metric_dict.get(analysis_freq)
|
||||
|
||||
# analysis
|
||||
analysis = dict()
|
||||
analysis["excess_return_without_cost"] = risk_analysis(
|
||||
report_normal["return"] - report_normal["bench"], freq=analysis_freq
|
||||
)
|
||||
analysis["excess_return_with_cost"] = risk_analysis(
|
||||
report_normal["return"] - report_normal["bench"] - report_normal["cost"], freq=analysis_freq
|
||||
)
|
||||
|
||||
analysis_df = pd.concat(analysis) # type: pd.DataFrame
|
||||
# log metrics
|
||||
analysis_dict = flatten_dict(analysis_df["risk"].unstack().T.to_dict())
|
||||
# print out results
|
||||
pprint(f"The following are analysis results of benchmark return({analysis_freq}).")
|
||||
pprint(risk_analysis(report_normal["bench"], freq=analysis_freq))
|
||||
pprint(f"The following are analysis results of the excess return without cost({analysis_freq}).")
|
||||
pprint(analysis["excess_return_without_cost"])
|
||||
pprint(f"The following are analysis results of the excess return with cost({analysis_freq}).")
|
||||
pprint(analysis["excess_return_with_cost"])
|
||||
|
||||
|
||||
Result
|
||||
------------------
|
||||
|
||||
The backtest results are in the following form:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
risk
|
||||
excess_return_without_cost mean 0.000605
|
||||
std 0.005481
|
||||
annualized_return 0.152373
|
||||
information_ratio 1.751319
|
||||
max_drawdown -0.059055
|
||||
excess_return_with_cost mean 0.000410
|
||||
std 0.005478
|
||||
annualized_return 0.103265
|
||||
information_ratio 1.187411
|
||||
max_drawdown -0.075024
|
||||
|
||||
|
||||
- `excess_return_without_cost`
|
||||
- `mean`
|
||||
Mean value of the `CAR` (cumulative abnormal return) without cost
|
||||
- `std`
|
||||
The `Standard Deviation` of `CAR` (cumulative abnormal return) without cost.
|
||||
- `annualized_return`
|
||||
The `Annualized Rate` of `CAR` (cumulative abnormal return) without cost.
|
||||
- `information_ratio`
|
||||
The `Information Ratio` without cost. please refer to `Information Ratio – IR <https://www.investopedia.com/terms/i/informationratio.asp>`_.
|
||||
- `max_drawdown`
|
||||
The `Maximum Drawdown` of `CAR` (cumulative abnormal return) without cost, please refer to `Maximum Drawdown (MDD) <https://www.investopedia.com/terms/m/maximum-drawdown-mdd.asp>`_.
|
||||
|
||||
- `excess_return_with_cost`
|
||||
- `mean`
|
||||
Mean value of the `CAR` (cumulative abnormal return) series with cost
|
||||
- `std`
|
||||
The `Standard Deviation` of `CAR` (cumulative abnormal return) series with cost.
|
||||
- `annualized_return`
|
||||
The `Annualized Rate` of `CAR` (cumulative abnormal return) with cost.
|
||||
- `information_ratio`
|
||||
The `Information Ratio` with cost. please refer to `Information Ratio – IR <https://www.investopedia.com/terms/i/informationratio.asp>`_.
|
||||
- `max_drawdown`
|
||||
The `Maximum Drawdown` of `CAR` (cumulative abnormal return) with cost, please refer to `Maximum Drawdown (MDD) <https://www.investopedia.com/terms/m/maximum-drawdown-mdd.asp>`_.
|
||||
|
||||
To know more about ``Intraday Trading``, please refer to `Intraday Trading: Model&Strategy Testing <backtest.html>`_.
|
||||
|
||||
Reference
|
||||
===================
|
||||
To know more about ``Portfolio Strategy``, please refer to `Strategy API <../reference/api.html#module-qlib.contrib.strategy.strategy>`_.
|
||||
To know more about the `prediction score` `pred_score` output by ``Forecast Model``, please refer to `Forecast Model: Model Training & Prediction <model.html>`_.
|
||||
|
||||
@@ -124,9 +124,47 @@ Configuration File
|
||||
===================
|
||||
|
||||
Let's get into details of ``qrun`` in this section.
|
||||
|
||||
Before using ``qrun``, users need to prepare a configuration file. The following content shows how to prepare each part of the configuration file.
|
||||
|
||||
The design logic of the configuration file is very simple. It predefines fixed workflows and provide this yaml interface to users to define how to initialize each component.
|
||||
It follow the design of `init_instance_by_config <https://github.com/microsoft/qlib/blob/2aee9e0145decc3e71def70909639b5e5a6f4b58/qlib/utils/__init__.py#L264>`_ . It defines the initialization of each component of Qlib, which typically include the class and the initialization arguments.
|
||||
|
||||
For example, the following yaml and code are equivalent.
|
||||
|
||||
.. code-block:: YAML
|
||||
|
||||
model:
|
||||
class: LGBModel
|
||||
module_path: qlib.contrib.model.gbdt
|
||||
kwargs:
|
||||
loss: mse
|
||||
colsample_bytree: 0.8879
|
||||
learning_rate: 0.0421
|
||||
subsample: 0.8789
|
||||
lambda_l1: 205.6999
|
||||
lambda_l2: 580.9768
|
||||
max_depth: 8
|
||||
num_leaves: 210
|
||||
num_threads: 20
|
||||
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from qlib.contrib.model.gbdt import LGBModel
|
||||
kwargs = {
|
||||
"loss": "mse" ,
|
||||
"colsample_bytree": 0.8879,
|
||||
"learning_rate": 0.0421,
|
||||
"subsample": 0.8789,
|
||||
"lambda_l1": 205.6999,
|
||||
"lambda_l2": 580.9768,
|
||||
"max_depth": 8,
|
||||
"num_leaves": 210,
|
||||
"num_threads": 20,
|
||||
}
|
||||
LGBModel(kwargs)
|
||||
|
||||
|
||||
Qlib Init Section
|
||||
--------------------
|
||||
|
||||
|
||||
@@ -31,7 +31,7 @@ Let's see an example,
|
||||
|
||||
First make sure you have the latest version of `qlib` installed.
|
||||
|
||||
Then, you need to privide a configuration to setup the experiment.
|
||||
Then, you need to provide a configuration to setup the experiment.
|
||||
We write a simple configuration example as following,
|
||||
|
||||
.. code-block:: YAML
|
||||
@@ -217,13 +217,13 @@ The tuner pipeline contains different tuners, and the `tuner` program will proce
|
||||
Each part represents a tuner, and its modules which are to be tuned. Space in each part is the hyper-parameters' space of a certain module, you need to create your searching space and modify it in `/qlib/contrib/tuner/space.py`. We use `hyperopt` package to help us to construct the space, you can see the detail of how to use it in https://github.com/hyperopt/hyperopt/wiki/FMin .
|
||||
|
||||
- model
|
||||
You need to provide the `class` and the `space` of the model. If the model is user's own implementation, you need to privide the `module_path`.
|
||||
You need to provide the `class` and the `space` of the model. If the model is user's own implementation, you need to provide the `module_path`.
|
||||
|
||||
- trainer
|
||||
You need to proveide the `class` of the trainer. If the trainer is user's own implementation, you need to privide the `module_path`.
|
||||
You need to provide the `class` of the trainer. If the trainer is user's own implementation, you need to provide the `module_path`.
|
||||
|
||||
- strategy
|
||||
You need to provide the `class` and the `space` of the strategy. If the strategy is user's own implementation, you need to privide the `module_path`.
|
||||
You need to provide the `class` and the `space` of the strategy. If the strategy is user's own implementation, you need to provide the `module_path`.
|
||||
|
||||
- data_label
|
||||
The label of the data, you can search which kinds of labels will lead to a better result. This part is optional, and you only need to provide `space`.
|
||||
@@ -273,7 +273,7 @@ You need to use the same dataset to evaluate your different `estimator` experime
|
||||
About the data and backtest
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
`data` and `backtest` are all same in the whole `tuner` experiment. Different `estimator` experiments must use the same data and backtest method. So, these two parts of config are same with that in `estimator` configuration. You can see the precise defination of these parts in `estimator` introduction. We only provide an example here.
|
||||
`data` and `backtest` are all same in the whole `tuner` experiment. Different `estimator` experiments must use the same data and backtest method. So, these two parts of config are same with that in `estimator` configuration. You can see the precise definition of these parts in `estimator` introduction. We only provide an example here.
|
||||
|
||||
.. code-block:: YAML
|
||||
|
||||
|
||||
@@ -36,10 +36,11 @@ Document Structure
|
||||
:caption: COMPONENTS:
|
||||
|
||||
Workflow: Workflow Management <component/workflow.rst>
|
||||
Data Layer: Data Framework&Usage <component/data.rst>
|
||||
Data Layer: Data Framework & Usage <component/data.rst>
|
||||
Forecast Model: Model Training & Prediction <component/model.rst>
|
||||
Strategy: Portfolio Management <component/strategy.rst>
|
||||
Intraday Trading: Model&Strategy Testing <component/backtest.rst>
|
||||
Portfolio Management and Backtest <component/strategy.rst>
|
||||
Nested Decision Execution: High-Frequency Trading <component/highfreq.rst>
|
||||
Meta Controller: Meta-Task & Meta-Dataset & Meta-Model <component/meta.rst>
|
||||
Qlib Recorder: Experiment Management <component/recorder.rst>
|
||||
Analysis: Evaluation & Results Analysis <component/report.rst>
|
||||
Online Serving: Online Management & Strategy & Tool <component/online.rst>
|
||||
|
||||
@@ -34,9 +34,14 @@ Name Description
|
||||
|
||||
`Workflow` layer `Workflow` layer covers the whole workflow of quantitative investment.
|
||||
`Information Extractor` extracts data for models. `Forecast Model` focuses
|
||||
on producing all kinds of forecast signals (e.g. _alpha_, risk) for other
|
||||
modules. With these signals `Portfolio Generator` will generate the target
|
||||
portfolio and produce orders to be executed by `Order Executor`.
|
||||
on producing all kinds of forecast signals (e.g. *alpha*, risk) for other
|
||||
modules. With these signals `Decision Generator` will generate the target
|
||||
trading decisions(i.e. portfolio, orders) to be executed by `Execution Env`
|
||||
(i.e. the trading market). There may be multiple levels of `Trading Agent`
|
||||
and `Execution Env` (e.g. an *order executor trading agent and intraday
|
||||
order execution environment* could behave like an interday trading
|
||||
environment and nested in *daily portfolio management trading agent and
|
||||
interday trading environment* )
|
||||
|
||||
`Interface` layer `Interface` layer tries to present a user-friendly interface for the underlying
|
||||
system. `Analyser` module will provide users detailed analysis reports of
|
||||
|
||||
@@ -31,7 +31,7 @@ Users can easily intsall ``Qlib`` according to the following steps:
|
||||
git clone https://github.com/microsoft/qlib.git && cd qlib
|
||||
python setup.py install
|
||||
|
||||
To kown more about `installation`, please refer to `Qlib Installation <../start/installation.html>`_.
|
||||
To known more about `installation`, please refer to `Qlib Installation <../start/installation.html>`_.
|
||||
|
||||
Prepare Data
|
||||
==============
|
||||
@@ -44,7 +44,7 @@ Load and prepare data by running the following code:
|
||||
|
||||
This dataset is created by public data collected by crawler scripts in ``scripts/data_collector/``, which have been released in the same repository. Users could create the same dataset with it.
|
||||
|
||||
To kown more about `prepare data`, please refer to `Data Preparation <../component/data.html#data-preparation>`_.
|
||||
To known more about `prepare data`, please refer to `Data Preparation <../component/data.html#data-preparation>`_.
|
||||
|
||||
Auto Quant Research Workflow
|
||||
====================================
|
||||
|
||||
@@ -3,3 +3,4 @@ cmake
|
||||
numpy
|
||||
scipy
|
||||
scikit-learn
|
||||
pandas
|
||||
|
||||
@@ -27,7 +27,7 @@ Initialize Qlib before calling other APIs: run following code in python.
|
||||
|
||||
import qlib
|
||||
# region in [REG_CN, REG_US]
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
provider_uri = "~/.qlib/qlib_data/cn_data" # target_dir
|
||||
qlib.init(provider_uri=provider_uri, region=REG_CN)
|
||||
|
||||
@@ -42,12 +42,13 @@ Besides `provider_uri` and `region`, `qlib.init` has other parameters. The follo
|
||||
- `provider_uri`
|
||||
Type: str. The URI of the Qlib data. For example, it could be the location where the data loaded by ``get_data.py`` are stored.
|
||||
- `region`
|
||||
Type: str, optional parameter(default: `qlib.config.REG_CN`).
|
||||
Currently: ``qlib.config.REG_US`` ('us') and ``qlib.config.REG_CN`` ('cn') is supported. Different value of `region` will result in different stock market mode.
|
||||
- ``qlib.config.REG_US``: US stock market.
|
||||
- ``qlib.config.REG_CN``: China stock market.
|
||||
Type: str, optional parameter(default: `qlib.constant.REG_CN`).
|
||||
Currently: ``qlib.constant.REG_US`` ('us') and ``qlib.constant.REG_CN`` ('cn') is supported. Different value of `region` will result in different stock market mode.
|
||||
- ``qlib.constant.REG_US``: US stock market.
|
||||
- ``qlib.constant.REG_CN``: China stock market.
|
||||
|
||||
Different modes will result in different trading limitations and costs.
|
||||
The region is just `shortcuts for defining a batch of configurations <https://github.com/microsoft/qlib/blob/main/qlib/config.py#L239>`_. Users can set the key configurations manually if the existing region setting can't meet their requirements.
|
||||
- `redis_host`
|
||||
Type: str, optional parameter(default: "127.0.0.1"), host of `redis`
|
||||
The lock and cache mechanism relies on redis.
|
||||
|
||||
4
examples/benchmarks/ADARNN/README.md
Normal file
4
examples/benchmarks/ADARNN/README.md
Normal file
@@ -0,0 +1,4 @@
|
||||
# AdaRNN
|
||||
* Code: [https://github.com/jindongwang/transferlearning/tree/master/code/deep/adarnn](https://github.com/jindongwang/transferlearning/tree/master/code/deep/adarnn)
|
||||
* Paper: [AdaRNN: Adaptive Learning and Forecasting for Time Series](https://arxiv.org/pdf/2108.04443.pdf).
|
||||
|
||||
4
examples/benchmarks/ADARNN/requirements.txt
Normal file
4
examples/benchmarks/ADARNN/requirements.txt
Normal file
@@ -0,0 +1,4 @@
|
||||
pandas==1.1.2
|
||||
numpy==1.21.0
|
||||
scikit_learn==0.23.2
|
||||
torch==1.7.0
|
||||
@@ -0,0 +1,88 @@
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
start_time: 2008-01-01
|
||||
end_time: 2020-08-01
|
||||
fit_start_time: 2008-01-01
|
||||
fit_end_time: 2014-12-31
|
||||
instruments: *market
|
||||
infer_processors:
|
||||
- class: RobustZScoreNorm
|
||||
kwargs:
|
||||
fields_group: feature
|
||||
clip_outlier: true
|
||||
- class: Fillna
|
||||
kwargs:
|
||||
fields_group: feature
|
||||
learn_processors:
|
||||
- class: DropnaLabel
|
||||
- class: CSRankNorm
|
||||
kwargs:
|
||||
fields_group: label
|
||||
label: ["Ref($close, -2) / Ref($close, -1) - 1"]
|
||||
port_analysis_config: &port_analysis_config
|
||||
strategy:
|
||||
class: TopkDropoutStrategy
|
||||
module_path: qlib.contrib.strategy
|
||||
kwargs:
|
||||
model: <MODEL>
|
||||
dataset: <DATASET>
|
||||
topk: 50
|
||||
n_drop: 5
|
||||
backtest:
|
||||
start_time: 2017-01-01
|
||||
end_time: 2020-08-01
|
||||
account: 100000000
|
||||
benchmark: *benchmark
|
||||
exchange_kwargs:
|
||||
limit_threshold: 0.095
|
||||
deal_price: close
|
||||
open_cost: 0.0005
|
||||
close_cost: 0.0015
|
||||
min_cost: 5
|
||||
task:
|
||||
model:
|
||||
class: ADARNN
|
||||
module_path: qlib.contrib.model.pytorch_adarnn
|
||||
kwargs:
|
||||
d_feat: 6
|
||||
hidden_size: 64
|
||||
num_layers: 2
|
||||
dropout: 0.0
|
||||
n_epochs: 200
|
||||
lr: 1e-3
|
||||
early_stop: 20
|
||||
batch_size: 800
|
||||
metric: loss
|
||||
loss: mse
|
||||
GPU: 0
|
||||
dataset:
|
||||
class: DatasetH
|
||||
module_path: qlib.data.dataset
|
||||
kwargs:
|
||||
handler:
|
||||
class: Alpha360
|
||||
module_path: qlib.contrib.data.handler
|
||||
kwargs: *data_handler_config
|
||||
segments:
|
||||
train: [2008-01-01, 2014-12-31]
|
||||
valid: [2015-01-01, 2016-12-31]
|
||||
test: [2017-01-01, 2020-08-01]
|
||||
record:
|
||||
- class: SignalRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
model: <MODEL>
|
||||
dataset: <DATASET>
|
||||
- class: SigAnaRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
ana_long_short: False
|
||||
ann_scaler: 252
|
||||
- class: PortAnaRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
config: *port_analysis_config
|
||||
3
examples/benchmarks/ADD/README.md
Normal file
3
examples/benchmarks/ADD/README.md
Normal file
@@ -0,0 +1,3 @@
|
||||
# ADD
|
||||
* Paper: [ADD: Augmented Disentanglement Distillation Framework for Improving Stock Trend Forecasting](https://arxiv.org/abs/2012.06289).
|
||||
|
||||
4
examples/benchmarks/ADD/requirements.txt
Normal file
4
examples/benchmarks/ADD/requirements.txt
Normal file
@@ -0,0 +1,4 @@
|
||||
numpy==1.21.0
|
||||
pandas==1.1.2
|
||||
scikit_learn==0.23.2
|
||||
torch==1.7.0
|
||||
94
examples/benchmarks/ADD/workflow_config_add_Alpha360.yaml
Normal file
94
examples/benchmarks/ADD/workflow_config_add_Alpha360.yaml
Normal file
@@ -0,0 +1,94 @@
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
start_time: 2008-01-01
|
||||
end_time: 2020-08-01
|
||||
fit_start_time: 2008-01-01
|
||||
fit_end_time: 2014-12-31
|
||||
instruments: *market
|
||||
infer_processors:
|
||||
- class: RobustZScoreNorm
|
||||
kwargs:
|
||||
fields_group: feature
|
||||
clip_outlier: true
|
||||
- class: Fillna
|
||||
kwargs:
|
||||
fields_group: feature
|
||||
learn_processors:
|
||||
- class: DropnaLabel
|
||||
- class: CSRankNorm
|
||||
kwargs:
|
||||
fields_group: label
|
||||
label: ["Ref($close, -2) / Ref($close, -1) - 1"]
|
||||
port_analysis_config: &port_analysis_config
|
||||
strategy:
|
||||
class: TopkDropoutStrategy
|
||||
module_path: qlib.contrib.strategy
|
||||
kwargs:
|
||||
signal:
|
||||
- <MODEL>
|
||||
- <DATASET>
|
||||
topk: 50
|
||||
n_drop: 5
|
||||
backtest:
|
||||
start_time: 2017-01-01
|
||||
end_time: 2020-08-01
|
||||
account: 100000000
|
||||
benchmark: *benchmark
|
||||
exchange_kwargs:
|
||||
limit_threshold: 0.095
|
||||
deal_price: close
|
||||
open_cost: 0.0005
|
||||
close_cost: 0.0015
|
||||
min_cost: 5
|
||||
task:
|
||||
model:
|
||||
class: ADD
|
||||
module_path: qlib.contrib.model.pytorch_add
|
||||
kwargs:
|
||||
d_feat: 6
|
||||
hidden_size: 64
|
||||
num_layers: 2
|
||||
dropout: 0.1
|
||||
dec_dropout: 0.0
|
||||
n_epochs: 200
|
||||
lr: 1e-3
|
||||
early_stop: 20
|
||||
batch_size: 5000
|
||||
metric: ic
|
||||
base_model: GRU
|
||||
gamma: 0.1
|
||||
gamma_clip: 0.2
|
||||
optimizer: adam
|
||||
mu: 0.2
|
||||
GPU: 0
|
||||
dataset:
|
||||
class: DatasetH
|
||||
module_path: qlib.data.dataset
|
||||
kwargs:
|
||||
handler:
|
||||
class: Alpha360
|
||||
module_path: qlib.contrib.data.handler
|
||||
kwargs: *data_handler_config
|
||||
segments:
|
||||
train: [2008-01-01, 2014-12-31]
|
||||
valid: [2015-01-01, 2016-12-31]
|
||||
test: [2017-01-01, 2020-08-01]
|
||||
record:
|
||||
- class: SignalRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
model: <MODEL>
|
||||
dataset: <DATASET>
|
||||
- class: SigAnaRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
ana_long_short: False
|
||||
ann_scaler: 252
|
||||
- class: PortAnaRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
config: *port_analysis_config
|
||||
@@ -1,4 +1,4 @@
|
||||
numpy==1.17.4
|
||||
numpy==1.21.0
|
||||
pandas==1.1.2
|
||||
scikit_learn==0.23.2
|
||||
torch==1.7.0
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
pandas==1.1.2
|
||||
numpy==1.17.4
|
||||
numpy==1.21.0
|
||||
catboost==0.24.3
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
pandas==1.1.2
|
||||
numpy==1.17.4
|
||||
numpy==1.21.0
|
||||
lightgbm==3.1.0
|
||||
@@ -1,4 +1,4 @@
|
||||
pandas==1.1.2
|
||||
numpy==1.17.4
|
||||
numpy==1.21.0
|
||||
scikit_learn==0.23.2
|
||||
torch==1.7.0
|
||||
|
||||
2
examples/benchmarks/GRU/README.md
Normal file
2
examples/benchmarks/GRU/README.md
Normal file
@@ -0,0 +1,2 @@
|
||||
# Gated Recurrent Unit (GRU)
|
||||
* Paper: [Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation](https://aclanthology.org/D14-1179.pdf).
|
||||
@@ -1,4 +1,4 @@
|
||||
numpy==1.17.4
|
||||
numpy==1.21.0
|
||||
pandas==1.1.2
|
||||
scikit_learn==0.23.2
|
||||
torch==1.7.0
|
||||
|
||||
2
examples/benchmarks/LSTM/README.md
Normal file
2
examples/benchmarks/LSTM/README.md
Normal file
@@ -0,0 +1,2 @@
|
||||
# Long Short-Term Memory (LSTM)
|
||||
* Paper: [Long Short-Term Memory](https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory?redirectedFrom=fulltext).
|
||||
@@ -1,4 +1,4 @@
|
||||
numpy==1.17.4
|
||||
numpy==1.21.0
|
||||
pandas==1.1.2
|
||||
scikit_learn==0.23.2
|
||||
torch==1.7.0
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
pandas==1.1.2
|
||||
numpy==1.17.4
|
||||
numpy==1.21.0
|
||||
lightgbm==3.1.0
|
||||
|
||||
@@ -22,7 +22,6 @@ data_handler_config: &data_handler_config
|
||||
- class: CSRankNorm
|
||||
kwargs:
|
||||
fields_group: label
|
||||
label: ["Ref($close, -2) / Ref($close, -1) - 1"]
|
||||
port_analysis_config: &port_analysis_config
|
||||
strategy:
|
||||
class: TopkDropoutStrategy
|
||||
|
||||
1
examples/benchmarks/Localformer/README.md
Normal file
1
examples/benchmarks/Localformer/README.md
Normal file
@@ -0,0 +1 @@
|
||||
# Localformer
|
||||
@@ -1,3 +1,3 @@
|
||||
numpy==1.17.4
|
||||
numpy==1.21.0
|
||||
pandas==1.1.2
|
||||
torch==1.2.0
|
||||
1
examples/benchmarks/MLP/README.md
Normal file
1
examples/benchmarks/MLP/README.md
Normal file
@@ -0,0 +1 @@
|
||||
# Multi-Layer Perceptron (MLP)
|
||||
@@ -1,4 +1,4 @@
|
||||
pandas==1.1.2
|
||||
numpy==1.17.4
|
||||
numpy==1.21.0
|
||||
scikit_learn==0.23.2
|
||||
torch==1.7.0
|
||||
|
||||
@@ -1,10 +1,15 @@
|
||||
# Benchmarks Performance
|
||||
This page lists a batch of methods designed for alpha seeking. Each method tries to give scores/predictions for all stocks each day(e.g. forecasting the future excess return of stocks). The scores/predictions of the models will be used as the mined alpha. Investing in stocks with higher scores is expected to yield more profit.
|
||||
|
||||
The alpha is evaluated in two ways.
|
||||
1. The correlation between the alpha and future return.
|
||||
1. Constructing portfolio based on the alpha and evaluating the final total return.
|
||||
|
||||
Here are the results of each benchmark model running on Qlib's `Alpha360` and `Alpha158` dataset with China's A shared-stock & CSI300 data respectively. The values of each metric are the mean and std calculated based on 20 runs with different random seeds.
|
||||
|
||||
The numbers shown below demonstrate the performance of the entire `workflow` of each model. We will update the `workflow` as well as models in the near future for better results.
|
||||
<!--
|
||||
> If you need to reproduce the results below, please use the **v1** dataset: `python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/qlib_cn_1d --region cn --version v1`
|
||||
> If you need to reproduce the results below, please use the **v1** dataset: `python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn --version v1`
|
||||
>
|
||||
> In the new version of qlib, the default dataset is **v2**. Since the data is collected from the YahooFinance API (which is not very stable), the results of *v2* and *v1* may differ -->
|
||||
|
||||
@@ -16,6 +21,7 @@ The numbers shown below demonstrate the performance of the entire `workflow` of
|
||||
|
||||
| Model Name | Dataset | IC | ICIR | Rank IC | Rank ICIR | Annualized Return | Information Ratio | Max Drawdown |
|
||||
|------------------------------------------|-------------------------------------|-------------|-------------|-------------|-------------|-------------------|-------------------|--------------|
|
||||
| TCN(Shaojie Bai, et al.) | Alpha158 | 0.0275±0.00 | 0.2157±0.01 | 0.0411±0.00 | 0.3379±0.01 | 0.0190±0.02 | 0.2887±0.27 | -0.1202±0.03 |
|
||||
| TabNet(Sercan O. Arik, et al.) | Alpha158 | 0.0204±0.01 | 0.1554±0.07 | 0.0333±0.00 | 0.2552±0.05 | 0.0227±0.04 | 0.3676±0.54 | -0.1089±0.08 |
|
||||
| Transformer(Ashish Vaswani, et al.) | Alpha158 | 0.0264±0.00 | 0.2053±0.02 | 0.0407±0.00 | 0.3273±0.02 | 0.0273±0.02 | 0.3970±0.26 | -0.1101±0.02 |
|
||||
| GRU(Kyunghyun Cho, et al.) | Alpha158(with selected 20 features) | 0.0315±0.00 | 0.2450±0.04 | 0.0428±0.00 | 0.3440±0.03 | 0.0344±0.02 | 0.5160±0.25 | -0.1017±0.02 |
|
||||
@@ -35,7 +41,6 @@ The numbers shown below demonstrate the performance of the entire `workflow` of
|
||||
| DoubleEnsemble(Chuheng Zhang, et al.) | Alpha158 | 0.0544±0.00 | 0.4340±0.00 | 0.0523±0.00 | 0.4284±0.01 | 0.1168±0.01 | 1.3384±0.12 | -0.1036±0.01 |
|
||||
|
||||
|
||||
|
||||
## Alpha360 dataset
|
||||
|
||||
| Model Name | Dataset | IC | ICIR | Rank IC | Rank ICIR | Annualized Return | Information Ratio | Max Drawdown |
|
||||
@@ -48,9 +53,12 @@ The numbers shown below demonstrate the performance of the entire `workflow` of
|
||||
| XGBoost(Tianqi Chen, et al.) | Alpha360 | 0.0394±0.00 | 0.2909±0.00 | 0.0448±0.00 | 0.3679±0.00 | 0.0344±0.00 | 0.4527±0.02 | -0.1004±0.00 |
|
||||
| DoubleEnsemble(Chuheng Zhang, et al.) | Alpha360 | 0.0404±0.00 | 0.3023±0.00 | 0.0495±0.00 | 0.3898±0.00 | 0.0468±0.01 | 0.6302±0.20 | -0.0860±0.01 |
|
||||
| LightGBM(Guolin Ke, et al.) | Alpha360 | 0.0400±0.00 | 0.3037±0.00 | 0.0499±0.00 | 0.4042±0.00 | 0.0558±0.00 | 0.7632±0.00 | -0.0659±0.00 |
|
||||
| TCN(Shaojie Bai, et al.) | Alpha360 | 0.0441±0.00 | 0.3301±0.02 | 0.0519±0.00 | 0.4130±0.01 | 0.0604±0.02 | 0.8295±0.34 | -0.1018±0.03 |
|
||||
| ALSTM (Yao Qin, et al.) | Alpha360 | 0.0497±0.00 | 0.3829±0.04 | 0.0599±0.00 | 0.4736±0.03 | 0.0626±0.02 | 0.8651±0.31 | -0.0994±0.03 |
|
||||
| LSTM(Sepp Hochreiter, et al.) | Alpha360 | 0.0448±0.00 | 0.3474±0.04 | 0.0549±0.00 | 0.4366±0.03 | 0.0647±0.03 | 0.8963±0.39 | -0.0875±0.02 |
|
||||
| ADD | Alpha360 | 0.0430±0.00 | 0.3188±0.04 | 0.0559±0.00 | 0.4301±0.03 | 0.0667±0.02 | 0.8992±0.34 | -0.0855±0.02 |
|
||||
| GRU(Kyunghyun Cho, et al.) | Alpha360 | 0.0493±0.00 | 0.3772±0.04 | 0.0584±0.00 | 0.4638±0.03 | 0.0720±0.02 | 0.9730±0.33 | -0.0821±0.02 |
|
||||
| AdaRNN(Yuntao Du, et al.) | Alpha360 | 0.0464±0.01 | 0.3619±0.08 | 0.0539±0.01 | 0.4287±0.06 | 0.0753±0.03 | 1.0200±0.40 | -0.0936±0.03 |
|
||||
| GATs (Petar Velickovic, et al.) | Alpha360 | 0.0476±0.00 | 0.3508±0.02 | 0.0598±0.00 | 0.4604±0.01 | 0.0824±0.02 | 1.1079±0.26 | -0.0894±0.03 |
|
||||
| TCTS(Xueqing Wu, et al.) | Alpha360 | 0.0508±0.00 | 0.3931±0.04 | 0.0599±0.00 | 0.4756±0.03 | 0.0893±0.03 | 1.2256±0.36 | -0.0857±0.02 |
|
||||
| TRA(Hengxu Lin, et al.) | Alpha360 | 0.0485±0.00 | 0.3787±0.03 | 0.0587±0.00 | 0.4756±0.03 | 0.0920±0.03 | 1.2789±0.42 | -0.0834±0.02 |
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
pandas==1.1.2
|
||||
numpy==1.17.4
|
||||
numpy==1.21.0
|
||||
scikit_learn==0.23.2
|
||||
torch==1.7.0
|
||||
|
||||
4
examples/benchmarks/TCN/README.md
Normal file
4
examples/benchmarks/TCN/README.md
Normal file
@@ -0,0 +1,4 @@
|
||||
# TCN
|
||||
* Code: [https://github.com/locuslab/TCN](https://github.com/locuslab/TCN)
|
||||
* Paper: [An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling](https://arxiv.org/abs/1803.01271).
|
||||
|
||||
4
examples/benchmarks/TCN/requirements.txt
Normal file
4
examples/benchmarks/TCN/requirements.txt
Normal file
@@ -0,0 +1,4 @@
|
||||
numpy==1.21.0
|
||||
pandas==1.1.2
|
||||
scikit_learn==0.23.2
|
||||
torch==1.7.0
|
||||
100
examples/benchmarks/TCN/workflow_config_tcn_Alpha158.yaml
Executable file
100
examples/benchmarks/TCN/workflow_config_tcn_Alpha158.yaml
Executable file
@@ -0,0 +1,100 @@
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
start_time: 2008-01-01
|
||||
end_time: 2020-08-01
|
||||
fit_start_time: 2008-01-01
|
||||
fit_end_time: 2014-12-31
|
||||
instruments: *market
|
||||
infer_processors:
|
||||
- class: FilterCol
|
||||
kwargs:
|
||||
fields_group: feature
|
||||
col_list: ["RESI5", "WVMA5", "RSQR5", "KLEN", "RSQR10", "CORR5", "CORD5", "CORR10",
|
||||
"ROC60", "RESI10", "VSTD5", "RSQR60", "CORR60", "WVMA60", "STD5",
|
||||
"RSQR20", "CORD60", "CORD10", "CORR20", "KLOW"
|
||||
]
|
||||
- class: RobustZScoreNorm
|
||||
kwargs:
|
||||
fields_group: feature
|
||||
clip_outlier: true
|
||||
- class: Fillna
|
||||
kwargs:
|
||||
fields_group: feature
|
||||
learn_processors:
|
||||
- class: DropnaLabel
|
||||
- class: CSRankNorm
|
||||
kwargs:
|
||||
fields_group: label
|
||||
label: ["Ref($close, -2) / Ref($close, -1) - 1"]
|
||||
|
||||
port_analysis_config: &port_analysis_config
|
||||
strategy:
|
||||
class: TopkDropoutStrategy
|
||||
module_path: qlib.contrib.strategy
|
||||
kwargs:
|
||||
model: <MODEL>
|
||||
dataset: <DATASET>
|
||||
topk: 50
|
||||
n_drop: 5
|
||||
backtest:
|
||||
start_time: 2017-01-01
|
||||
end_time: 2020-08-01
|
||||
account: 100000000
|
||||
benchmark: *benchmark
|
||||
exchange_kwargs:
|
||||
limit_threshold: 0.095
|
||||
deal_price: close
|
||||
open_cost: 0.0005
|
||||
close_cost: 0.0015
|
||||
min_cost: 5
|
||||
task:
|
||||
model:
|
||||
class: TCN
|
||||
module_path: qlib.contrib.model.pytorch_tcn_ts
|
||||
kwargs:
|
||||
d_feat: 20
|
||||
num_layers: 5
|
||||
n_chans: 32
|
||||
kernel_size: 7
|
||||
dropout: 0.5
|
||||
n_epochs: 200
|
||||
lr: 1e-4
|
||||
early_stop: 20
|
||||
batch_size: 2000
|
||||
metric: loss
|
||||
loss: mse
|
||||
optimizer: adam
|
||||
n_jobs: 20
|
||||
GPU: 0
|
||||
dataset:
|
||||
class: TSDatasetH
|
||||
module_path: qlib.data.dataset
|
||||
kwargs:
|
||||
handler:
|
||||
class: Alpha158
|
||||
module_path: qlib.contrib.data.handler
|
||||
kwargs: *data_handler_config
|
||||
segments:
|
||||
train: [2008-01-01, 2014-12-31]
|
||||
valid: [2015-01-01, 2016-12-31]
|
||||
test: [2017-01-01, 2020-08-01]
|
||||
step_len: 20
|
||||
record:
|
||||
- class: SignalRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
model: <MODEL>
|
||||
dataset: <DATASET>
|
||||
- class: SigAnaRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
ana_long_short: False
|
||||
ann_scaler: 252
|
||||
- class: PortAnaRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
config: *port_analysis_config
|
||||
90
examples/benchmarks/TCN/workflow_config_tcn_Alpha360.yaml
Normal file
90
examples/benchmarks/TCN/workflow_config_tcn_Alpha360.yaml
Normal file
@@ -0,0 +1,90 @@
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
start_time: 2008-01-01
|
||||
end_time: 2020-08-01
|
||||
fit_start_time: 2008-01-01
|
||||
fit_end_time: 2014-12-31
|
||||
instruments: *market
|
||||
infer_processors:
|
||||
- class: RobustZScoreNorm
|
||||
kwargs:
|
||||
fields_group: feature
|
||||
clip_outlier: true
|
||||
- class: Fillna
|
||||
kwargs:
|
||||
fields_group: feature
|
||||
learn_processors:
|
||||
- class: DropnaLabel
|
||||
- class: CSRankNorm
|
||||
kwargs:
|
||||
fields_group: label
|
||||
label: ["Ref($close, -2) / Ref($close, -1) - 1"]
|
||||
port_analysis_config: &port_analysis_config
|
||||
strategy:
|
||||
class: TopkDropoutStrategy
|
||||
module_path: qlib.contrib.strategy
|
||||
kwargs:
|
||||
model: <MODEL>
|
||||
dataset: <DATASET>
|
||||
topk: 50
|
||||
n_drop: 5
|
||||
backtest:
|
||||
start_time: 2017-01-01
|
||||
end_time: 2020-08-01
|
||||
account: 100000000
|
||||
benchmark: *benchmark
|
||||
exchange_kwargs:
|
||||
limit_threshold: 0.095
|
||||
deal_price: close
|
||||
open_cost: 0.0005
|
||||
close_cost: 0.0015
|
||||
min_cost: 5
|
||||
task:
|
||||
model:
|
||||
class: TCN
|
||||
module_path: qlib.contrib.model.pytorch_tcn
|
||||
kwargs:
|
||||
d_feat: 6
|
||||
num_layers: 5
|
||||
n_chans: 128
|
||||
kernel_size: 3
|
||||
dropout: 0.5
|
||||
n_epochs: 200
|
||||
lr: 1e-3
|
||||
early_stop: 20
|
||||
batch_size: 2000
|
||||
metric: loss
|
||||
loss: mse
|
||||
optimizer: adam
|
||||
GPU: 0
|
||||
dataset:
|
||||
class: DatasetH
|
||||
module_path: qlib.data.dataset
|
||||
kwargs:
|
||||
handler:
|
||||
class: Alpha360
|
||||
module_path: qlib.contrib.data.handler
|
||||
kwargs: *data_handler_config
|
||||
segments:
|
||||
train: [2008-01-01, 2014-12-31]
|
||||
valid: [2015-01-01, 2016-12-31]
|
||||
test: [2017-01-01, 2020-08-01]
|
||||
record:
|
||||
- class: SignalRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
model: <MODEL>
|
||||
dataset: <DATASET>
|
||||
- class: SigAnaRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
ana_long_short: False
|
||||
ann_scaler: 252
|
||||
- class: PortAnaRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
config: *port_analysis_config
|
||||
@@ -1,4 +1,4 @@
|
||||
pandas==1.1.2
|
||||
numpy==1.17.4
|
||||
numpy==1.21.0
|
||||
scikit_learn==0.23.2
|
||||
torch==1.7.0
|
||||
@@ -95,4 +95,4 @@ task:
|
||||
- class: PortAnaRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
config: *port_analysis_config
|
||||
config: *port_analysis_config
|
||||
@@ -32,7 +32,7 @@ import abc
|
||||
import enum
|
||||
|
||||
|
||||
# Type defintions
|
||||
# Type definitions
|
||||
class DataTypes(enum.IntEnum):
|
||||
"""Defines numerical types of each column."""
|
||||
|
||||
|
||||
@@ -254,9 +254,9 @@ class DistributedHyperparamOptManager(HyperparamOptManager):
|
||||
param_ranges: Discrete hyperparameter range for random search.
|
||||
fixed_params: Fixed model parameters per experiment.
|
||||
root_model_folder: Folder to store optimisation artifacts.
|
||||
worker_number: Worker index definining which set of hyperparameters to
|
||||
worker_number: Worker index defining which set of hyperparameters to
|
||||
test.
|
||||
search_iterations: Maximum numer of random search iterations.
|
||||
search_iterations: Maximum number of random search iterations.
|
||||
num_iterations_per_worker: How many iterations are handled per worker.
|
||||
clear_serialised_params: Whether to regenerate hyperparameter
|
||||
combinations.
|
||||
@@ -330,7 +330,7 @@ class DistributedHyperparamOptManager(HyperparamOptManager):
|
||||
if os.path.exists(self.serialised_ranges_folder):
|
||||
df = pd.read_csv(self.serialised_ranges_path, index_col=0)
|
||||
else:
|
||||
print("Unable to load - regenerating serach ranges instead")
|
||||
print("Unable to load - regenerating search ranges instead")
|
||||
df = self.update_serialised_hyperparam_df()
|
||||
|
||||
return df
|
||||
|
||||
@@ -342,7 +342,7 @@ class TFTDataCache:
|
||||
|
||||
@classmethod
|
||||
def contains(cls, key):
|
||||
"""Retuns boolean indicating whether key is present in cache."""
|
||||
"""Returns boolean indicating whether key is present in cache."""
|
||||
|
||||
return key in cls._data_cache
|
||||
|
||||
@@ -1120,10 +1120,10 @@ class TemporalFusionTransformer:
|
||||
Args:
|
||||
df: Input dataframe
|
||||
return_targets: Whether to also return outputs aligned with predictions to
|
||||
faciliate evaluation
|
||||
facilitate evaluation
|
||||
|
||||
Returns:
|
||||
Input dataframe or tuple of (input dataframe, algined output dataframe).
|
||||
Input dataframe or tuple of (input dataframe, aligned output dataframe).
|
||||
"""
|
||||
|
||||
data = self._batch_data(df)
|
||||
|
||||
@@ -209,7 +209,6 @@ class TFTModel(ModelFT):
|
||||
fixed_params = self.data_formatter.get_experiment_params()
|
||||
params = self.data_formatter.get_default_model_params()
|
||||
|
||||
# Wendi: 合并调优的参数和非调优的参数
|
||||
params = {**params, **fixed_params}
|
||||
|
||||
if not os.path.exists(self.model_folder):
|
||||
@@ -295,7 +294,7 @@ class TFTModel(ModelFT):
|
||||
def to_pickle(self, path: Union[Path, str]):
|
||||
"""
|
||||
Tensorflow model can't be dumped directly.
|
||||
So the data should be save seperatedly
|
||||
So the data should be save separately
|
||||
|
||||
**TODO**: Please implement the function to load the files
|
||||
|
||||
|
||||
@@ -57,7 +57,7 @@ And here are two ways to run the model:
|
||||
python example.py --config_file configs/config_alstm.yaml
|
||||
```
|
||||
|
||||
Here we trained TRA on a pretrained backbone model. Therefore we run `*_init.yaml` before TRA's scipts.
|
||||
Here we trained TRA on a pretrained backbone model. Therefore we run `*_init.yaml` before TRA's scripts.
|
||||
|
||||
### Results
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
pandas==1.1.2
|
||||
numpy==1.17.4
|
||||
numpy==1.21.0
|
||||
scikit_learn==0.23.2
|
||||
torch==1.7.0
|
||||
seaborn
|
||||
|
||||
@@ -124,7 +124,7 @@ class TRAModel(Model):
|
||||
loss = (pred - label).pow(2).mean()
|
||||
|
||||
L = (all_preds.detach() - label[:, None]).pow(2)
|
||||
L -= L.min(dim=-1, keepdim=True).values # normalize & ensure postive input
|
||||
L -= L.min(dim=-1, keepdim=True).values # normalize & ensure positive input
|
||||
|
||||
data_set.assign_data(index, L) # save loss to memory
|
||||
|
||||
@@ -165,7 +165,7 @@ class TRAModel(Model):
|
||||
|
||||
L = (all_preds - label[:, None]).pow(2)
|
||||
|
||||
L -= L.min(dim=-1, keepdim=True).values # normalize & ensure postive input
|
||||
L -= L.min(dim=-1, keepdim=True).values # normalize & ensure positive input
|
||||
|
||||
data_set.assign_data(index, L) # save loss to memory
|
||||
|
||||
@@ -484,7 +484,7 @@ class TRA(nn.Module):
|
||||
|
||||
"""Temporal Routing Adaptor (TRA)
|
||||
|
||||
TRA takes historical prediction erros & latent representation as inputs,
|
||||
TRA takes historical prediction errors & latent representation as inputs,
|
||||
then routes the input sample to a specific predictor for training & inference.
|
||||
|
||||
Args:
|
||||
|
||||
3
examples/benchmarks/TabNet/README.md
Normal file
3
examples/benchmarks/TabNet/README.md
Normal file
@@ -0,0 +1,3 @@
|
||||
# TabNet
|
||||
* Code: [https://github.com/dreamquark-ai/tabnet](https://github.com/dreamquark-ai/tabnet)
|
||||
* Paper: [TabNet: Attentive Interpretable Tabular Learning](https://arxiv.org/pdf/1908.07442.pdf).
|
||||
3
examples/benchmarks/Transformer/README.md
Normal file
3
examples/benchmarks/Transformer/README.md
Normal file
@@ -0,0 +1,3 @@
|
||||
# Transformer
|
||||
* Code: [https://github.com/tensorflow/tensor2tensor](https://github.com/tensorflow/tensor2tensor)
|
||||
* Paper: [Attention is All you Need](https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf).
|
||||
30
examples/benchmarks_dynamic/DDG-DA/README.md
Normal file
30
examples/benchmarks_dynamic/DDG-DA/README.md
Normal file
@@ -0,0 +1,30 @@
|
||||
# Introduction
|
||||
This is the implementation of `DDG-DA` based on `Meta Controller` component provided by `Qlib`.
|
||||
|
||||
Please refer to the paper for more details: *DDG-DA: Data Distribution Generation for Predictable Concept Drift Adaptation* [[arXiv](https://arxiv.org/abs/2201.04038)]
|
||||
|
||||
|
||||
## Background
|
||||
In many real-world scenarios, we often deal with streaming data that is sequentially collected over time. Due to the non-stationary nature of the environment, the streaming data distribution may change in unpredictable ways, which is known as concept drift. To handle concept drift, previous methods first detect when/where the concept drift happens and then adapt models to fit the distribution of the latest data. However, there are still many cases that some underlying factors of environment evolution are predictable, making it possible to model the future concept drift trend of the streaming data, while such cases are not fully explored in previous work.
|
||||
|
||||
Therefore, we propose a novel method `DDG-DA`, that can effectively forecast the evolution of data distribution and improve the performance of models. Specifically, we first train a predictor to estimate the future data distribution, then leverage it to generate training samples, and finally train models on the generated data.
|
||||
|
||||
## Dataset
|
||||
The data in the paper are private. So we conduct experiments on Qlib's public dataset.
|
||||
Though the dataset is different, the conclusion remains the same. By applying `DDG-DA`, users can see rising trends at the test phase both in the proxy models' ICs and the performances of the forecasting models.
|
||||
|
||||
## Run the Code
|
||||
Users can try `DDG-DA` by running the following command:
|
||||
```bash
|
||||
python workflow.py run_all
|
||||
```
|
||||
|
||||
The default forecasting models are `Linear`. Users can choose other forecasting models by changing the `forecast_model` parameter when `DDG-DA` initializes. For example, users can try `LightGBM` forecasting models by running the following command:
|
||||
```bash
|
||||
python workflow.py --forecast_model="gbdt" run_all
|
||||
```
|
||||
|
||||
|
||||
## Results
|
||||
|
||||
The results of related methods in Qlib's public dataset can be found [here](../)
|
||||
1
examples/benchmarks_dynamic/DDG-DA/requirements.txt
Normal file
1
examples/benchmarks_dynamic/DDG-DA/requirements.txt
Normal file
@@ -0,0 +1 @@
|
||||
torch==1.10.0
|
||||
261
examples/benchmarks_dynamic/DDG-DA/workflow.py
Normal file
261
examples/benchmarks_dynamic/DDG-DA/workflow.py
Normal file
@@ -0,0 +1,261 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
from pathlib import Path
|
||||
from qlib.model.meta.task import MetaTask
|
||||
from qlib.contrib.meta.data_selection.model import MetaModelDS
|
||||
from qlib.contrib.meta.data_selection.dataset import InternalData, MetaDatasetDS
|
||||
from qlib.data.dataset.handler import DataHandlerLP
|
||||
|
||||
import pandas as pd
|
||||
import fire
|
||||
import sys
|
||||
from tqdm.auto import tqdm
|
||||
import yaml
|
||||
import pickle
|
||||
from qlib import auto_init
|
||||
from qlib.model.trainer import TrainerR, task_train
|
||||
from qlib.utils import init_instance_by_config
|
||||
from qlib.workflow.task.gen import RollingGen, task_generator
|
||||
from qlib.workflow import R
|
||||
from qlib.tests.data import GetData
|
||||
|
||||
DIRNAME = Path(__file__).absolute().resolve().parent
|
||||
sys.path.append(str(DIRNAME.parent / "baseline"))
|
||||
from rolling_benchmark import RollingBenchmark # NOTE: sys.path is changed for import RollingBenchmark
|
||||
|
||||
|
||||
class DDGDA:
|
||||
"""
|
||||
please run `python workflow.py run_all` to run the full workflow of the experiment
|
||||
|
||||
**NOTE**
|
||||
before running the example, please clean your previous results with following command
|
||||
- `rm -r mlruns`
|
||||
"""
|
||||
|
||||
def __init__(self, sim_task_model="linear", forecast_model="linear"):
|
||||
self.step = 20
|
||||
# NOTE:
|
||||
# the horizon must match the meaning in the base task template
|
||||
self.horizon = 20
|
||||
self.meta_exp_name = "DDG-DA"
|
||||
self.sim_task_model = sim_task_model # The model to capture the distribution of data.
|
||||
self.forecast_model = forecast_model # downstream forecasting models' type
|
||||
|
||||
def get_feature_importance(self):
|
||||
# this must be lightGBM, because it needs to get the feature importance
|
||||
rb = RollingBenchmark(model_type="gbdt")
|
||||
task = rb.basic_task()
|
||||
|
||||
model = init_instance_by_config(task["model"])
|
||||
dataset = init_instance_by_config(task["dataset"])
|
||||
model.fit(dataset)
|
||||
|
||||
fi = model.get_feature_importance()
|
||||
|
||||
# Because the model use numpy instead of dataframe for training lightgbm
|
||||
# So the we must use following extra steps to get the right feature importance
|
||||
df = dataset.prepare(segments=slice(None), col_set="feature", data_key=DataHandlerLP.DK_R)
|
||||
cols = df.columns
|
||||
fi_named = {cols[int(k.split("_")[1])]: imp for k, imp in fi.to_dict().items()}
|
||||
|
||||
return pd.Series(fi_named)
|
||||
|
||||
def dump_data_for_proxy_model(self):
|
||||
"""
|
||||
Dump data for training meta model.
|
||||
The meta model will be trained upon the proxy forecasting model.
|
||||
This dataset is for the proxy forecasting model.
|
||||
"""
|
||||
topk = 30
|
||||
fi = self.get_feature_importance()
|
||||
col_selected = fi.nlargest(topk)
|
||||
|
||||
rb = RollingBenchmark(model_type=self.sim_task_model)
|
||||
task = rb.basic_task()
|
||||
dataset = init_instance_by_config(task["dataset"])
|
||||
prep_ds = dataset.prepare(slice(None), col_set=["feature", "label"], data_key=DataHandlerLP.DK_L)
|
||||
|
||||
feature_df = prep_ds["feature"]
|
||||
label_df = prep_ds["label"]
|
||||
|
||||
feature_selected = feature_df.loc[:, col_selected.index]
|
||||
|
||||
feature_selected = feature_selected.groupby("datetime").apply(lambda df: (df - df.mean()).div(df.std()))
|
||||
feature_selected = feature_selected.fillna(0.0)
|
||||
|
||||
df_all = {
|
||||
"label": label_df.reindex(feature_selected.index),
|
||||
"feature": feature_selected,
|
||||
}
|
||||
df_all = pd.concat(df_all, axis=1)
|
||||
df_all.to_pickle(DIRNAME / "fea_label_df.pkl")
|
||||
|
||||
# dump data in handler format for aligning the interface
|
||||
handler = DataHandlerLP(
|
||||
data_loader={
|
||||
"class": "qlib.data.dataset.loader.StaticDataLoader",
|
||||
"kwargs": {"config": DIRNAME / "fea_label_df.pkl"},
|
||||
}
|
||||
)
|
||||
handler.to_pickle(DIRNAME / "handler_proxy.pkl", dump_all=True)
|
||||
|
||||
@property
|
||||
def _internal_data_path(self):
|
||||
return DIRNAME / f"internal_data_s{self.step}.pkl"
|
||||
|
||||
def dump_meta_ipt(self):
|
||||
"""
|
||||
Dump data for training meta model.
|
||||
This function will dump the input data for meta model
|
||||
"""
|
||||
# According to the experiments, the choice of the model type is very important for achieving good results
|
||||
rb = RollingBenchmark(model_type=self.sim_task_model)
|
||||
sim_task = rb.basic_task()
|
||||
|
||||
if self.sim_task_model == "gbdt":
|
||||
sim_task["model"].setdefault("kwargs", {}).update({"early_stopping_rounds": None, "num_boost_round": 150})
|
||||
|
||||
exp_name_sim = f"data_sim_s{self.step}"
|
||||
|
||||
internal_data = InternalData(sim_task, self.step, exp_name=exp_name_sim)
|
||||
internal_data.setup(trainer=TrainerR)
|
||||
|
||||
with self._internal_data_path.open("wb") as f:
|
||||
pickle.dump(internal_data, f)
|
||||
|
||||
def train_meta_model(self):
|
||||
"""
|
||||
training a meta model based on a simplified linear proxy model;
|
||||
"""
|
||||
|
||||
# 1) leverage the simplified proxy forecasting model to train meta model.
|
||||
# - Only the dataset part is important, in current version of meta model will integrate the
|
||||
rb = RollingBenchmark(model_type=self.sim_task_model)
|
||||
sim_task = rb.basic_task()
|
||||
proxy_forecast_model_task = {
|
||||
# "model": "qlib.contrib.model.linear.LinearModel",
|
||||
"dataset": {
|
||||
"class": "qlib.data.dataset.DatasetH",
|
||||
"kwargs": {
|
||||
"handler": f"file://{(DIRNAME / 'handler_proxy.pkl').absolute()}",
|
||||
"segments": {
|
||||
"train": ("2008-01-01", "2010-12-31"),
|
||||
"test": ("2011-01-01", sim_task["dataset"]["kwargs"]["segments"]["test"][1]),
|
||||
},
|
||||
},
|
||||
},
|
||||
# "record": ["qlib.workflow.record_temp.SignalRecord"]
|
||||
}
|
||||
# the proxy_forecast_model_task will be used to create meta tasks.
|
||||
# The test date of first task will be 2011-01-01. Each test segment will be about 20days
|
||||
# The tasks include all training tasks and test tasks.
|
||||
|
||||
# 2) preparing meta dataset
|
||||
kwargs = dict(
|
||||
task_tpl=proxy_forecast_model_task,
|
||||
step=self.step,
|
||||
segments=0.62, # keep test period consistent with the dataset yaml
|
||||
trunc_days=1 + self.horizon,
|
||||
hist_step_n=30,
|
||||
fill_method="max",
|
||||
rolling_ext_days=0,
|
||||
)
|
||||
# NOTE:
|
||||
# the input of meta model (internal data) are shared between proxy model and final forecasting model
|
||||
# but their task test segment are not aligned! It worked in my previous experiment.
|
||||
# So the misalignment will not affect the effectiveness of the method.
|
||||
with self._internal_data_path.open("rb") as f:
|
||||
internal_data = pickle.load(f)
|
||||
md = MetaDatasetDS(exp_name=internal_data, **kwargs)
|
||||
|
||||
# 3) train and logging meta model
|
||||
with R.start(experiment_name=self.meta_exp_name):
|
||||
R.log_params(**kwargs)
|
||||
mm = MetaModelDS(step=self.step, hist_step_n=kwargs["hist_step_n"], lr=0.001, max_epoch=200, seed=43)
|
||||
mm.fit(md)
|
||||
R.save_objects(model=mm)
|
||||
|
||||
@property
|
||||
def _task_path(self):
|
||||
return DIRNAME / f"tasks_s{self.step}.pkl"
|
||||
|
||||
def meta_inference(self):
|
||||
"""
|
||||
Leverage meta-model for inference:
|
||||
- Given
|
||||
- baseline tasks
|
||||
- input for meta model(internal data)
|
||||
- meta model (its learnt knowledge on proxy forecasting model is expected to transfer to normal forecasting model)
|
||||
"""
|
||||
# 1) get meta model
|
||||
exp = R.get_exp(experiment_name=self.meta_exp_name)
|
||||
rec = exp.list_recorders(rtype=exp.RT_L)[0]
|
||||
meta_model: MetaModelDS = rec.load_object("model")
|
||||
|
||||
# 2)
|
||||
# we are transfer to knowledge of meta model to final forecasting tasks.
|
||||
# Create MetaTaskDataset for the final forecasting tasks
|
||||
# Aligning the setting of it to the MetaTaskDataset when training Meta model is necessary
|
||||
|
||||
# 2.1) get previous config
|
||||
param = rec.list_params()
|
||||
trunc_days = int(param["trunc_days"])
|
||||
step = int(param["step"])
|
||||
hist_step_n = int(param["hist_step_n"])
|
||||
fill_method = param.get("fill_method", "max")
|
||||
|
||||
rb = RollingBenchmark(model_type=self.forecast_model)
|
||||
task_l = rb.create_rolling_tasks()
|
||||
|
||||
# 2.2) create meta dataset for final dataset
|
||||
kwargs = dict(
|
||||
task_tpl=task_l,
|
||||
step=step,
|
||||
segments=0.0, # all the tasks are for testing
|
||||
trunc_days=trunc_days,
|
||||
hist_step_n=hist_step_n,
|
||||
fill_method=fill_method,
|
||||
task_mode=MetaTask.PROC_MODE_TRANSFER,
|
||||
)
|
||||
|
||||
with self._internal_data_path.open("rb") as f:
|
||||
internal_data = pickle.load(f)
|
||||
mds = MetaDatasetDS(exp_name=internal_data, **kwargs)
|
||||
|
||||
# 3) meta model make inference and get new qlib task
|
||||
new_tasks = meta_model.inference(mds)
|
||||
with self._task_path.open("wb") as f:
|
||||
pickle.dump(new_tasks, f)
|
||||
|
||||
def train_and_eval_tasks(self):
|
||||
"""
|
||||
Training the tasks generated by meta model
|
||||
Then evaluate it
|
||||
"""
|
||||
with self._task_path.open("rb") as f:
|
||||
tasks = pickle.load(f)
|
||||
rb = RollingBenchmark(rolling_exp="rolling_ds", model_type=self.forecast_model)
|
||||
rb.train_rolling_tasks(tasks)
|
||||
rb.ens_rolling()
|
||||
rb.update_rolling_rec()
|
||||
|
||||
def run_all(self):
|
||||
# 1) file: handler_proxy.pkl
|
||||
self.dump_data_for_proxy_model()
|
||||
# 2)
|
||||
# file: internal_data_s20.pkl
|
||||
# mlflow: data_sim_s20, models for calculating meta_ipt
|
||||
self.dump_meta_ipt()
|
||||
# 3) meta model will be stored in `DDG-DA`
|
||||
self.train_meta_model()
|
||||
# 4) new_tasks are saved in "tasks_s20.pkl" (reweighter is added)
|
||||
self.meta_inference()
|
||||
# 5) load the saved tasks and train model
|
||||
self.train_and_eval_tasks()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
GetData().qlib_data(exists_skip=True)
|
||||
auto_init()
|
||||
fire.Fire(DDGDA)
|
||||
18
examples/benchmarks_dynamic/README.md
Normal file
18
examples/benchmarks_dynamic/README.md
Normal file
@@ -0,0 +1,18 @@
|
||||
# Introduction
|
||||
Due to the non-stationary nature of the environment of the financial market, the data distribution may change in different periods, which makes the performance of models build on training data decays in the future test data.
|
||||
So adapting the forecasting models/strategies to market dynamics is very important to the model/strategies' performance.
|
||||
|
||||
The table below shows the performances of different solutions on different forecasting models.
|
||||
|
||||
## Alpha158 dataset
|
||||
|
||||
| Model Name | Dataset | IC | ICIR | Rank IC | Rank ICIR | Annualized Return | Information Ratio | Max Drawdown |
|
||||
|------------------|---------|----|------|---------|-----------|-------------------|-------------------|--------------|
|
||||
| RR[Linear] |Alpha158 |0.088|0.570|0.102 |0.622 |0.077 |1.175 |-0.086 |
|
||||
| DDG-DA[Linear] |Alpha158 |0.093|0.622|0.106 |0.670 |0.085 |1.213 |-0.093 |
|
||||
| RR[LightGBM] |Alpha158 |0.079|0.566|0.088 |0.592 |0.075 |1.226 |-0.096 |
|
||||
| DDG-DA[LightGBM] |Alpha158 |0.084|0.639|0.093 |0.664 |0.099 |1.442 |-0.071 |
|
||||
|
||||
- The label horizon of the `Alpha158` dataset is set to 20.
|
||||
- The rolling time intervals are set to 20 trading days.
|
||||
- The test rolling periods are from January 2017 to August 2020.
|
||||
15
examples/benchmarks_dynamic/baseline/README.md
Normal file
15
examples/benchmarks_dynamic/baseline/README.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Introduction
|
||||
|
||||
This is the framework of periodically Rolling Retrain (RR) forecasting models. RR adapts to market dynamics by utilizing the up-to-date data periodically.
|
||||
|
||||
## Run the Code
|
||||
Users can try RR by running the following command:
|
||||
```bash
|
||||
python rolling_benchmark.py run_all
|
||||
```
|
||||
|
||||
The default forecasting models are `Linear`. Users can choose other forecasting models by changing the `model_type` parameter.
|
||||
For example, users can try `LightGBM` forecasting models by running the following command:
|
||||
```bash
|
||||
python rolling_benchmark.py --model_type="gbdt" run_all
|
||||
```
|
||||
114
examples/benchmarks_dynamic/baseline/rolling_benchmark.py
Normal file
114
examples/benchmarks_dynamic/baseline/rolling_benchmark.py
Normal file
@@ -0,0 +1,114 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
from qlib.model.ens.ensemble import RollingEnsemble
|
||||
from qlib.utils import init_instance_by_config
|
||||
import fire
|
||||
import yaml
|
||||
from qlib import auto_init
|
||||
from pathlib import Path
|
||||
from tqdm.auto import tqdm
|
||||
from qlib.model.trainer import TrainerR
|
||||
from qlib.workflow import R
|
||||
from qlib.tests.data import GetData
|
||||
|
||||
DIRNAME = Path(__file__).absolute().resolve().parent
|
||||
from qlib.workflow.task.gen import task_generator, RollingGen
|
||||
from qlib.workflow.task.collect import RecorderCollector
|
||||
from qlib.workflow.record_temp import PortAnaRecord, SigAnaRecord
|
||||
|
||||
|
||||
class RollingBenchmark:
|
||||
"""
|
||||
**NOTE**
|
||||
before running the example, please clean your previous results with following command
|
||||
- `rm -r mlruns`
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self, rolling_exp="rolling_models", model_type="linear") -> None:
|
||||
self.step = 20
|
||||
self.horizon = 20
|
||||
self.rolling_exp = rolling_exp
|
||||
self.model_type = model_type
|
||||
|
||||
def basic_task(self):
|
||||
"""For fast training rolling"""
|
||||
if self.model_type == "gbdt":
|
||||
conf_path = DIRNAME.parent.parent / "benchmarks" / "LightGBM" / "workflow_config_lightgbm_Alpha158.yaml"
|
||||
# dump the processed data on to disk for later loading to speed up the processing
|
||||
h_path = DIRNAME / "lightgbm_alpha158_handler_horizon{}.pkl".format(self.horizon)
|
||||
elif self.model_type == "linear":
|
||||
conf_path = DIRNAME.parent.parent / "benchmarks" / "Linear" / "workflow_config_linear_Alpha158.yaml"
|
||||
h_path = DIRNAME / "linear_alpha158_handler_horizon{}.pkl".format(self.horizon)
|
||||
else:
|
||||
raise AssertionError("Model type is not supported!")
|
||||
with conf_path.open("r") as f:
|
||||
conf = yaml.safe_load(f)
|
||||
|
||||
# modify dataset horizon
|
||||
conf["task"]["dataset"]["kwargs"]["handler"]["kwargs"]["label"] = [
|
||||
"Ref($close, -{}) / Ref($close, -1) - 1".format(self.horizon + 1)
|
||||
]
|
||||
|
||||
task = conf["task"]
|
||||
|
||||
if not h_path.exists():
|
||||
h_conf = task["dataset"]["kwargs"]["handler"]
|
||||
h = init_instance_by_config(h_conf)
|
||||
h.to_pickle(h_path, dump_all=True)
|
||||
|
||||
task["dataset"]["kwargs"]["handler"] = f"file://{h_path}"
|
||||
task["record"] = ["qlib.workflow.record_temp.SignalRecord"]
|
||||
return task
|
||||
|
||||
def create_rolling_tasks(self):
|
||||
task = self.basic_task()
|
||||
task_l = task_generator(
|
||||
task, RollingGen(step=self.step, trunc_days=self.horizon + 1)
|
||||
) # the last two days should be truncated to avoid information leakage
|
||||
return task_l
|
||||
|
||||
def train_rolling_tasks(self, task_l=None):
|
||||
if task_l is None:
|
||||
task_l = self.create_rolling_tasks()
|
||||
trainer = TrainerR(experiment_name=self.rolling_exp)
|
||||
trainer(task_l)
|
||||
|
||||
COMB_EXP = "rolling"
|
||||
|
||||
def ens_rolling(self):
|
||||
rc = RecorderCollector(
|
||||
experiment=self.rolling_exp,
|
||||
artifacts_key=["pred", "label"],
|
||||
process_list=[RollingEnsemble()],
|
||||
# rec_key_func=lambda rec: (self.COMB_EXP, rec.info["id"]),
|
||||
artifacts_path={"pred": "pred.pkl", "label": "label.pkl"},
|
||||
)
|
||||
res = rc()
|
||||
with R.start(experiment_name=self.COMB_EXP):
|
||||
R.log_params(exp_name=self.rolling_exp)
|
||||
R.save_objects(**{"pred.pkl": res["pred"], "label.pkl": res["label"]})
|
||||
|
||||
def update_rolling_rec(self):
|
||||
"""
|
||||
Evaluate the combined rolling results
|
||||
"""
|
||||
for rid, rec in R.list_recorders(experiment_name=self.COMB_EXP).items():
|
||||
for rt_cls in SigAnaRecord, PortAnaRecord:
|
||||
rt = rt_cls(recorder=rec, skip_existing=True)
|
||||
rt.generate()
|
||||
print(f"Your evaluation results can be found in the experiment named `{self.COMB_EXP}`.")
|
||||
|
||||
def run_all(self):
|
||||
# the results will be save in mlruns.
|
||||
# 1) each rolling task is saved in rolling_models
|
||||
self.train_rolling_tasks()
|
||||
# 2) combined rolling tasks and evaluation results are saved in rolling
|
||||
self.ens_rolling()
|
||||
self.update_rolling_rec()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
GetData().qlib_data(exists_skip=True)
|
||||
auto_init()
|
||||
fire.Fire(RollingBenchmark)
|
||||
2
examples/data_demo/README.md
Normal file
2
examples/data_demo/README.md
Normal file
@@ -0,0 +1,2 @@
|
||||
# Introduction
|
||||
The examples in this folder try to demonstrate some common usage of data-related modules of Qlib
|
||||
53
examples/data_demo/data_cache_demo.py
Normal file
53
examples/data_demo/data_cache_demo.py
Normal file
@@ -0,0 +1,53 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
"""
|
||||
The motivation of this demo
|
||||
- To show the data modules of Qlib is Serializable, users can dump processed data to disk to avoid duplicated data preprocessing
|
||||
"""
|
||||
|
||||
from copy import deepcopy
|
||||
from pathlib import Path
|
||||
import pickle
|
||||
from pprint import pprint
|
||||
import subprocess
|
||||
import yaml
|
||||
from qlib.log import TimeInspector
|
||||
|
||||
from qlib import init
|
||||
from qlib.data.dataset.handler import DataHandlerLP
|
||||
from qlib.utils import init_instance_by_config
|
||||
|
||||
# For general purpose, we use relative path
|
||||
DIRNAME = Path(__file__).absolute().resolve().parent
|
||||
|
||||
if __name__ == "__main__":
|
||||
init()
|
||||
|
||||
config_path = DIRNAME.parent / "benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml"
|
||||
|
||||
# 1) show original time
|
||||
with TimeInspector.logt("The original time without handler cache:"):
|
||||
subprocess.run(f"qrun {config_path}", shell=True)
|
||||
|
||||
# 2) dump handler
|
||||
task_config = yaml.safe_load(config_path.open())
|
||||
hd_conf = task_config["task"]["dataset"]["kwargs"]["handler"]
|
||||
pprint(hd_conf)
|
||||
hd: DataHandlerLP = init_instance_by_config(hd_conf)
|
||||
hd_path = DIRNAME / "handler.pkl"
|
||||
hd.to_pickle(hd_path, dump_all=True)
|
||||
|
||||
# 3) create new task with handler cache
|
||||
new_task_config = deepcopy(task_config)
|
||||
new_task_config["task"]["dataset"]["kwargs"]["handler"] = f"file://{hd_path}"
|
||||
new_task_config["sys"] = {"path": [str(config_path.parent.resolve())]}
|
||||
new_task_path = DIRNAME / "new_task.yaml"
|
||||
print("The location of the new task", new_task_path)
|
||||
|
||||
# save new task
|
||||
with new_task_path.open("w") as f:
|
||||
yaml.safe_dump(new_task_config, f, indent=4, sort_keys=False)
|
||||
|
||||
# 4) train model with new task
|
||||
with TimeInspector.logt("The time for task with handler cache:"):
|
||||
subprocess.run(f"qrun {new_task_path}", shell=True)
|
||||
59
examples/data_demo/data_mem_resuse_demo.py
Normal file
59
examples/data_demo/data_mem_resuse_demo.py
Normal file
@@ -0,0 +1,59 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
"""
|
||||
The motivation of this demo
|
||||
- To show the data modules of Qlib is Serializable, users can dump processed data to disk to avoid duplicated data preprocessing
|
||||
"""
|
||||
|
||||
from copy import deepcopy
|
||||
from pathlib import Path
|
||||
import pickle
|
||||
from pprint import pprint
|
||||
import subprocess
|
||||
|
||||
import yaml
|
||||
|
||||
from qlib import init
|
||||
from qlib.data.dataset.handler import DataHandlerLP
|
||||
from qlib.log import TimeInspector
|
||||
from qlib.model.trainer import task_train
|
||||
from qlib.utils import init_instance_by_config
|
||||
|
||||
# For general purpose, we use relative path
|
||||
DIRNAME = Path(__file__).absolute().resolve().parent
|
||||
|
||||
if __name__ == "__main__":
|
||||
init()
|
||||
|
||||
repeat = 2
|
||||
exp_name = "data_mem_reuse_demo"
|
||||
|
||||
config_path = DIRNAME.parent / "benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml"
|
||||
task_config = yaml.safe_load(config_path.open())
|
||||
|
||||
# 1) without using processed data in memory
|
||||
with TimeInspector.logt("The original time without reusing processed data in memory:"):
|
||||
for i in range(repeat):
|
||||
task_train(task_config["task"], experiment_name=exp_name)
|
||||
|
||||
# 2) prepare processed data in memory.
|
||||
hd_conf = task_config["task"]["dataset"]["kwargs"]["handler"]
|
||||
pprint(hd_conf)
|
||||
hd: DataHandlerLP = init_instance_by_config(hd_conf)
|
||||
|
||||
# 3) with reusing processed data in memory
|
||||
new_task = deepcopy(task_config["task"])
|
||||
new_task["dataset"]["kwargs"]["handler"] = hd
|
||||
print(new_task)
|
||||
|
||||
with TimeInspector.logt("The time with reusing processed data in memory:"):
|
||||
# this will save the time to reload and process data from disk(in `DataHandlerLP`)
|
||||
# It still takes a lot of time in the backtest phase
|
||||
for i in range(repeat):
|
||||
task_train(new_task, experiment_name=exp_name)
|
||||
|
||||
# 4) User can change other parts exclude processed data in memory(handler)
|
||||
new_task = deepcopy(task_config["task"])
|
||||
new_task["dataset"]["kwargs"]["segments"]["train"] = ("20100101", "20131231")
|
||||
with TimeInspector.logt("The time with reusing processed data in memory:"):
|
||||
task_train(new_task, experiment_name=exp_name)
|
||||
@@ -1,15 +1,20 @@
|
||||
# High-Frequency Dataset
|
||||
# Introduction
|
||||
This folder contains 2 examples
|
||||
- A high-frequency dataset example
|
||||
- An example of predicting the price trend in high-frequency data
|
||||
|
||||
## High-Frequency Dataset
|
||||
|
||||
This dataset is an example for RL high frequency trading.
|
||||
|
||||
## Get High-Frequency Data
|
||||
### Get High-Frequency Data
|
||||
|
||||
Get high-frequency data by running the following command:
|
||||
```bash
|
||||
python workflow.py get_data
|
||||
```
|
||||
|
||||
## Dump & Reload & Reinitialize the Dataset
|
||||
### Dump & Reload & Reinitialize the Dataset
|
||||
|
||||
|
||||
The High-Frequency Dataset is implemented as `qlib.data.dataset.DatasetH` in the `workflow.py`. `DatatsetH` is the subclass of [`qlib.utils.serial.Serializable`](https://qlib.readthedocs.io/en/latest/advanced/serial.html), whose state can be dumped in or loaded from disk in `pickle` format.
|
||||
@@ -27,9 +32,10 @@ Run the example by running the following command:
|
||||
python workflow.py dump_and_load_dataset
|
||||
```
|
||||
|
||||
## Benchmarks Performance
|
||||
### Signal Test
|
||||
Here are the results of signal test for benchmark models. We will keep updating benchmark models in future.
|
||||
## Benchmarks Performance (predicting the price trend in high-frequency data)
|
||||
|
||||
Here are the results of models for predicting the price trend in high-frequency data. We will keep updating benchmark models in future.
|
||||
|
||||
| Model Name | Dataset | IC | ICIR | Rank IC | Rank ICIR | Long precision| Short Precision | Long-Short Average Return | Long-Short Average Sharpe |
|
||||
|---|---|---|---|---|---|---|---|---|---|
|
||||
| LightGBM | Alpha158 | 0.3042±0.00 | 1.5372±0.00| 0.3117±0.00 | 1.6258±0.00 | 0.6720±0.00 | 0.6870±0.00 | 0.000769±0.00 | 1.0190±0.00 |
|
||||
| LightGBM | Alpha158 | 0.0349±0.00 | 0.3805±0.00| 0.0435±0.00 | 0.4724±0.00 | 0.5111±0.00 | 0.5428±0.00 | 0.000074±0.00 | 0.2677±0.00 |
|
||||
|
||||
@@ -150,7 +150,7 @@ class Cut(ElemOperator):
|
||||
self.l = l
|
||||
self.r = r
|
||||
if (self.l is not None and self.l <= 0) or (self.r is not None and self.r >= 0):
|
||||
raise ValueError("Cut operator l shoud > 0 and r should < 0")
|
||||
raise ValueError("Cut operator l should > 0 and r should < 0")
|
||||
|
||||
super(Cut, self).__init__(feature)
|
||||
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from qlib.constant import EPS
|
||||
from qlib.data.dataset.processor import Processor
|
||||
from qlib.data.dataset.utils import fetch_df_by_index
|
||||
|
||||
@@ -27,7 +28,7 @@ class HighFreqNorm(Processor):
|
||||
part_values = np.log1p(part_values)
|
||||
self.feature_med[name] = np.nanmedian(part_values)
|
||||
part_values = part_values - self.feature_med[name]
|
||||
self.feature_std[name] = np.nanmedian(np.absolute(part_values)) * 1.4826 + 1e-12
|
||||
self.feature_std[name] = np.nanmedian(np.absolute(part_values)) * 1.4826 + EPS
|
||||
part_values = part_values / self.feature_std[name]
|
||||
self.feature_vmax[name] = np.nanmax(part_values)
|
||||
self.feature_vmin[name] = np.nanmin(part_values)
|
||||
|
||||
@@ -5,7 +5,8 @@ import fire
|
||||
|
||||
import qlib
|
||||
import pickle
|
||||
from qlib.config import REG_CN, HIGH_FREQ_CONFIG
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.config import HIGH_FREQ_CONFIG
|
||||
|
||||
from qlib.utils import init_instance_by_config
|
||||
from qlib.data.dataset.handler import DataHandlerLP
|
||||
@@ -82,7 +83,7 @@ class HighfreqWorkflow:
|
||||
|
||||
def _init_qlib(self):
|
||||
"""initialize qlib"""
|
||||
# use yahoo_cn_1min data
|
||||
# use cn_data_1min data
|
||||
QLIB_INIT_CONFIG = {**HIGH_FREQ_CONFIG, **self.SPEC_CONF}
|
||||
provider_uri = QLIB_INIT_CONFIG.get("provider_uri")
|
||||
GetData().qlib_data(target_dir=provider_uri, interval="1min", region=REG_CN, exists_skip=True)
|
||||
|
||||
@@ -59,7 +59,7 @@ task:
|
||||
record:
|
||||
- class: "SignalRecord"
|
||||
module_path: "qlib.workflow.record_temp"
|
||||
kwargs:
|
||||
kwargs: {}
|
||||
- class: "HFSignalRecord"
|
||||
module_path: "qlib.workflow.record_temp"
|
||||
kwargs: {}
|
||||
@@ -1,6 +1,6 @@
|
||||
import qlib
|
||||
import optuna
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.utils import init_instance_by_config
|
||||
from qlib.tests.config import CSI300_DATASET_CONFIG
|
||||
from qlib.tests.data import GetData
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
import qlib
|
||||
import optuna
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.utils import init_instance_by_config
|
||||
from qlib.tests.data import GetData
|
||||
from qlib.tests.config import get_dataset_config, CSI300_MARKET, DATASET_ALPHA360_CLASS
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
|
||||
|
||||
import qlib
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
|
||||
from qlib.utils import init_instance_by_config
|
||||
from qlib.tests.data import GetData
|
||||
|
||||
@@ -11,13 +11,13 @@ from pprint import pprint
|
||||
|
||||
import fire
|
||||
import qlib
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.workflow import R
|
||||
from qlib.workflow.task.gen import RollingGen, task_generator
|
||||
from qlib.workflow.task.manage import TaskManager, run_task
|
||||
from qlib.workflow.task.collect import RecorderCollector
|
||||
from qlib.model.ens.group import RollingGroup
|
||||
from qlib.model.trainer import TrainerRM, task_train
|
||||
from qlib.model.trainer import TrainerR, TrainerRM, task_train
|
||||
from qlib.tests.config import CSI100_RECORD_LGB_TASK_CONFIG, CSI100_RECORD_XGBOOST_TASK_CONFIG
|
||||
|
||||
|
||||
@@ -29,7 +29,7 @@ class RollingTaskExample:
|
||||
task_url="mongodb://10.0.0.4:27017/",
|
||||
task_db_name="rolling_db",
|
||||
experiment_name="rolling_exp",
|
||||
task_pool="rolling_task",
|
||||
task_pool=None, # if user want to "rolling_task"
|
||||
task_config=None,
|
||||
rolling_step=550,
|
||||
rolling_type=RollingGen.ROLL_SD,
|
||||
@@ -43,14 +43,19 @@ class RollingTaskExample:
|
||||
}
|
||||
qlib.init(provider_uri=provider_uri, region=region, mongo=mongo_conf)
|
||||
self.experiment_name = experiment_name
|
||||
self.task_pool = task_pool
|
||||
if task_pool is None:
|
||||
self.trainer = TrainerR(experiment_name=self.experiment_name)
|
||||
else:
|
||||
self.task_pool = task_pool
|
||||
self.trainer = TrainerRM(self.experiment_name, self.task_pool)
|
||||
self.task_config = task_config
|
||||
self.rolling_gen = RollingGen(step=rolling_step, rtype=rolling_type)
|
||||
|
||||
# Reset all things to the first status, be careful to save important data
|
||||
def reset(self):
|
||||
print("========== reset ==========")
|
||||
TaskManager(task_pool=self.task_pool).remove()
|
||||
if isinstance(self.trainer, TrainerRM):
|
||||
TaskManager(task_pool=self.task_pool).remove()
|
||||
exp = R.get_exp(experiment_name=self.experiment_name)
|
||||
for rid in exp.list_recorders():
|
||||
exp.delete_recorder(rid)
|
||||
@@ -66,10 +71,10 @@ class RollingTaskExample:
|
||||
|
||||
def task_training(self, tasks):
|
||||
print("========== task_training ==========")
|
||||
trainer = TrainerRM(self.experiment_name, self.task_pool)
|
||||
trainer.train(tasks)
|
||||
self.trainer.train(tasks)
|
||||
|
||||
def worker(self):
|
||||
# NOTE: this is only used for TrainerRM
|
||||
# train tasks by other progress or machines for multiprocessing. It is same as TrainerRM.worker.
|
||||
print("========== worker ==========")
|
||||
run_task(task_train, self.task_pool, experiment_name=self.experiment_name)
|
||||
|
||||
@@ -1,10 +1,107 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
"""
|
||||
The expect result of `backtest` is following in current version
|
||||
|
||||
'The following are analysis results of benchmark return(1day).'
|
||||
risk
|
||||
mean 0.000651
|
||||
std 0.012472
|
||||
annualized_return 0.154967
|
||||
information_ratio 0.805422
|
||||
max_drawdown -0.160445
|
||||
'The following are analysis results of the excess return without cost(1day).'
|
||||
risk
|
||||
mean 0.001258
|
||||
std 0.007575
|
||||
annualized_return 0.299303
|
||||
information_ratio 2.561219
|
||||
max_drawdown -0.068386
|
||||
'The following are analysis results of the excess return with cost(1day).'
|
||||
risk
|
||||
mean 0.001110
|
||||
std 0.007575
|
||||
annualized_return 0.264280
|
||||
information_ratio 2.261392
|
||||
max_drawdown -0.071842
|
||||
[1706497:MainThread](2021-12-07 14:08:30,263) INFO - qlib.workflow - [record_temp.py:441] - Portfolio analysis record 'port_analysis_30minute.
|
||||
pkl' has been saved as the artifact of the Experiment 2
|
||||
'The following are analysis results of benchmark return(30minute).'
|
||||
risk
|
||||
mean 0.000078
|
||||
std 0.003646
|
||||
annualized_return 0.148787
|
||||
information_ratio 0.935252
|
||||
max_drawdown -0.142830
|
||||
('The following are analysis results of the excess return without '
|
||||
'cost(30minute).')
|
||||
risk
|
||||
mean 0.000174
|
||||
std 0.003343
|
||||
annualized_return 0.331867
|
||||
information_ratio 2.275019
|
||||
max_drawdown -0.074752
|
||||
'The following are analysis results of the excess return with cost(30minute).'
|
||||
risk
|
||||
mean 0.000155
|
||||
std 0.003343
|
||||
annualized_return 0.294536
|
||||
information_ratio 2.018860
|
||||
max_drawdown -0.075579
|
||||
[1706497:MainThread](2021-12-07 14:08:30,277) INFO - qlib.workflow - [record_temp.py:441] - Portfolio analysis record 'port_analysis_5minute.p
|
||||
kl' has been saved as the artifact of the Experiment 2
|
||||
'The following are analysis results of benchmark return(5minute).'
|
||||
risk
|
||||
mean 0.000015
|
||||
std 0.001460
|
||||
annualized_return 0.172170
|
||||
information_ratio 1.103439
|
||||
max_drawdown -0.144807
|
||||
'The following are analysis results of the excess return without cost(5minute).'
|
||||
risk
|
||||
mean 0.000028
|
||||
std 0.001412
|
||||
annualized_return 0.319771
|
||||
information_ratio 2.119563
|
||||
max_drawdown -0.077426
|
||||
'The following are analysis results of the excess return with cost(5minute).'
|
||||
risk
|
||||
mean 0.000025
|
||||
std 0.001412
|
||||
annualized_return 0.281536
|
||||
information_ratio 1.866091
|
||||
max_drawdown -0.078194
|
||||
[1706497:MainThread](2021-12-07 14:08:30,287) INFO - qlib.workflow - [record_temp.py:466] - Indicator analysis record 'indicator_analysis_1day
|
||||
.pkl' has been saved as the artifact of the Experiment 2
|
||||
'The following are analysis results of indicators(1day).'
|
||||
value
|
||||
ffr 0.945821
|
||||
pa 0.000324
|
||||
pos 0.542882
|
||||
[1706497:MainThread](2021-12-07 14:08:30,293) INFO - qlib.workflow - [record_temp.py:466] - Indicator analysis record 'indicator_analysis_30mi
|
||||
nute.pkl' has been saved as the artifact of the Experiment 2
|
||||
'The following are analysis results of indicators(30minute).'
|
||||
value
|
||||
ffr 0.982910
|
||||
pa 0.000037
|
||||
pos 0.500806
|
||||
[1706497:MainThread](2021-12-07 14:08:30,302) INFO - qlib.workflow - [record_temp.py:466] - Indicator analysis record 'indicator_analysis_5min
|
||||
ute.pkl' has been saved as the artifact of the Experiment 2
|
||||
'The following are analysis results of indicators(5minute).'
|
||||
value
|
||||
ffr 0.991017
|
||||
pa 0.000000
|
||||
pos 0.000000
|
||||
[1706497:MainThread](2021-12-07 14:08:30,627) INFO - qlib.timer - [log.py:113] - Time cost: 0.014s | waiting `async_log` Done
|
||||
"""
|
||||
|
||||
|
||||
from copy import deepcopy
|
||||
import qlib
|
||||
import fire
|
||||
from qlib.config import REG_CN, HIGH_FREQ_CONFIG
|
||||
import pandas as pd
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.config import HIGH_FREQ_CONFIG
|
||||
from qlib.data import D
|
||||
from qlib.utils import exists_qlib_data, init_instance_by_config, flatten_dict
|
||||
from qlib.workflow import R
|
||||
@@ -14,7 +111,6 @@ from qlib.backtest import collect_data
|
||||
|
||||
|
||||
class NestedDecisionExecutionWorkflow:
|
||||
|
||||
market = "csi300"
|
||||
benchmark = "SH000300"
|
||||
data_handler_config = {
|
||||
@@ -59,6 +155,8 @@ class NestedDecisionExecutionWorkflow:
|
||||
},
|
||||
}
|
||||
|
||||
exp_name = "nested"
|
||||
|
||||
port_analysis_config = {
|
||||
"executor": {
|
||||
"class": "NestedExecutor",
|
||||
@@ -134,7 +232,7 @@ class NestedDecisionExecutionWorkflow:
|
||||
qlib.init(provider_uri=provider_uri_map, dataset_cache=None, expression_cache=None)
|
||||
|
||||
def _train_model(self, model, dataset):
|
||||
with R.start(experiment_name="train"):
|
||||
with R.start(experiment_name=self.exp_name):
|
||||
R.log_params(**flatten_dict(self.task))
|
||||
model.fit(dataset)
|
||||
R.save_objects(**{"params.pkl": model})
|
||||
@@ -161,14 +259,11 @@ class NestedDecisionExecutionWorkflow:
|
||||
self.port_analysis_config["strategy"] = strategy_config
|
||||
self.port_analysis_config["backtest"]["benchmark"] = self.benchmark
|
||||
|
||||
with R.start(experiment_name="backtest"):
|
||||
|
||||
with R.start(experiment_name=self.exp_name, resume=True):
|
||||
recorder = R.get_recorder()
|
||||
par = PortAnaRecord(
|
||||
recorder,
|
||||
self.port_analysis_config,
|
||||
risk_analysis_freq=["day", "30min", "5min"],
|
||||
indicator_analysis_freq=["day", "30min", "5min"],
|
||||
indicator_analysis_method="value_weighted",
|
||||
)
|
||||
par.generate()
|
||||
@@ -199,6 +294,101 @@ class NestedDecisionExecutionWorkflow:
|
||||
for trade_decision in data_generator:
|
||||
print(trade_decision)
|
||||
|
||||
# the code below are for checking, users don't have to care about it
|
||||
# The tests can be categorized into 2 types
|
||||
# 1) comparing same backtest
|
||||
# - Basic test idea: the shared accumulated value are equal in multiple levels
|
||||
# - Aligning the profit calculation between multiple levels and single levels.
|
||||
# 2) comparing different backtest
|
||||
# - Basic test idea:
|
||||
# - the daily backtest will be similar as multi-level(the data quality makes this gap smaller)
|
||||
|
||||
def check_diff_freq(self):
|
||||
self._init_qlib()
|
||||
exp = R.get_exp(experiment_name="backtest")
|
||||
rec = next(iter(exp.list_recorders().values())) # assuming this will get the latest recorder
|
||||
for check_key in "account", "total_turnover", "total_cost":
|
||||
check_key = "total_cost"
|
||||
|
||||
acc_dict = {}
|
||||
for freq in ["30minute", "5minute", "1day"]:
|
||||
acc_dict[freq] = rec.load_object(f"portfolio_analysis/report_normal_{freq}.pkl")[check_key]
|
||||
acc_df = pd.DataFrame(acc_dict)
|
||||
acc_resam = acc_df.resample("1d").last().dropna()
|
||||
assert (acc_resam["30minute"] == acc_resam["1day"]).all()
|
||||
|
||||
def backtest_only_daily(self):
|
||||
"""
|
||||
This backtest is used for comparing the nested execution and single layer execution
|
||||
Due to the low quality daily-level and miniute-level data, they are hardly comparable.
|
||||
So it is used for detecting serious bugs which make the results different greatly.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
[1724971:MainThread](2021-12-07 16:24:31,156) INFO - qlib.workflow - [record_temp.py:441] - Portfolio analysis record 'port_analysis_1day.pkl'
|
||||
has been saved as the artifact of the Experiment 2
|
||||
'The following are analysis results of benchmark return(1day).'
|
||||
risk
|
||||
mean 0.000651
|
||||
std 0.012472
|
||||
annualized_return 0.154967
|
||||
information_ratio 0.805422
|
||||
max_drawdown -0.160445
|
||||
'The following are analysis results of the excess return without cost(1day).'
|
||||
risk
|
||||
mean 0.001375
|
||||
std 0.006103
|
||||
annualized_return 0.327204
|
||||
information_ratio 3.475016
|
||||
max_drawdown -0.024927
|
||||
'The following are analysis results of the excess return with cost(1day).'
|
||||
risk
|
||||
mean 0.001184
|
||||
std 0.006091
|
||||
annualized_return 0.281801
|
||||
information_ratio 2.998749
|
||||
max_drawdown -0.029568
|
||||
[1724971:MainThread](2021-12-07 16:24:31,170) INFO - qlib.workflow - [record_temp.py:466] - Indicator analysis record 'indicator_analysis_1day.
|
||||
pkl' has been saved as the artifact of the Experiment 2
|
||||
'The following are analysis results of indicators(1day).'
|
||||
value
|
||||
ffr 1.0
|
||||
pa 0.0
|
||||
pos 0.0
|
||||
[1724971:MainThread](2021-12-07 16:24:31,188) INFO - qlib.timer - [log.py:113] - Time cost: 0.007s | waiting `async_log` Done
|
||||
|
||||
"""
|
||||
self._init_qlib()
|
||||
model = init_instance_by_config(self.task["model"])
|
||||
dataset = init_instance_by_config(self.task["dataset"])
|
||||
self._train_model(model, dataset)
|
||||
strategy_config = {
|
||||
"class": "TopkDropoutStrategy",
|
||||
"module_path": "qlib.contrib.strategy.signal_strategy",
|
||||
"kwargs": {
|
||||
"signal": (model, dataset),
|
||||
"topk": 50,
|
||||
"n_drop": 5,
|
||||
},
|
||||
}
|
||||
pa_conf = deepcopy(self.port_analysis_config)
|
||||
pa_conf["strategy"] = strategy_config
|
||||
pa_conf["executor"] = {
|
||||
"class": "SimulatorExecutor",
|
||||
"module_path": "qlib.backtest.executor",
|
||||
"kwargs": {
|
||||
"time_per_step": "day",
|
||||
"generate_portfolio_metrics": True,
|
||||
"verbose": True,
|
||||
},
|
||||
}
|
||||
pa_conf["backtest"]["benchmark"] = self.benchmark
|
||||
|
||||
with R.start(experiment_name=self.exp_name, resume=True):
|
||||
recorder = R.get_recorder()
|
||||
par = PortAnaRecord(recorder, pa_conf)
|
||||
par.generate()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
fire.Fire(NestedDecisionExecutionWorkflow)
|
||||
|
||||
@@ -10,7 +10,7 @@ Next, we will finish updating online predictions.
|
||||
import copy
|
||||
import fire
|
||||
import qlib
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.model.trainer import task_train
|
||||
from qlib.workflow.online.utils import OnlineToolR
|
||||
from qlib.tests.config import CSI300_GBDT_TASK
|
||||
|
||||
52
examples/orderbook_data/README.md
Normal file
52
examples/orderbook_data/README.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# Introduction
|
||||
|
||||
This example tries to demonstrate how Qlib supports data without fixed shared frequency.
|
||||
|
||||
For example,
|
||||
- Daily prices volume data are fixed-frequency data. The data comes in a fixed frequency (i.e. daily)
|
||||
- Orders are not fixed data and they may come at any time point
|
||||
|
||||
To support such non-fixed-frequency, Qlib implements an Arctic-based backend.
|
||||
Here is an example to import and query data based on this backend.
|
||||
|
||||
# Installation
|
||||
|
||||
Please refer to [the installation docs](https://docs.mongodb.com/manual/installation/) of mongodb.
|
||||
Current version of script with default value tries to connect localhost **via default port without authentication**.
|
||||
|
||||
Run following command to install necessary libraries
|
||||
```
|
||||
pip install pytest coverage
|
||||
pip install arctic # NOTE: pip may fail to resolve the right package dependency !!! Please make sure the dependency are satisfied.
|
||||
```
|
||||
|
||||
# Importing example data
|
||||
|
||||
|
||||
1. (Optional) Please follow the first part of [this section](https://github.com/microsoft/qlib#data-preparation) to **get 1min data** of Qlib.
|
||||
2. Please follow following steps to download example data
|
||||
```bash
|
||||
cd examples/orderbook_data/
|
||||
wget http://fintech.msra.cn/stock_data/downloads/highfreq_orderboook_example_data.tar.bz2
|
||||
tar xf highfreq_orderboook_example_data.tar.bz2
|
||||
```
|
||||
|
||||
3. Please import the example data to your mongo db
|
||||
```bash
|
||||
cd examples/orderbook_data/
|
||||
python create_dataset.py initialize_library # Initialization Libraries
|
||||
python create_dataset.py import_data # Initialization Libraries
|
||||
```
|
||||
|
||||
# Query Examples
|
||||
|
||||
After importing these data, you run `example.py` to create some high-frequency features.
|
||||
```bash
|
||||
cd examples/orderbook_data/
|
||||
pytest -s --disable-warnings example.py # If you want run all examples
|
||||
pytest -s --disable-warnings example.py::TestClass::test_exp_10 # If you want to run specific example
|
||||
```
|
||||
|
||||
|
||||
# Known limitations
|
||||
Expression computing between different frequencies are not supported yet
|
||||
315
examples/orderbook_data/create_dataset.py
Executable file
315
examples/orderbook_data/create_dataset.py
Executable file
@@ -0,0 +1,315 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
"""
|
||||
NOTE:
|
||||
- This scripts is a demo to import example data import Qlib
|
||||
- !!!!!!!!!!!!!!!TODO!!!!!!!!!!!!!!!!!!!:
|
||||
- Its structure is not well designed and very ugly, your contribution is welcome to make importing dataset easier
|
||||
"""
|
||||
from datetime import date, datetime as dt
|
||||
import os
|
||||
from pathlib import Path
|
||||
import random
|
||||
import shutil
|
||||
import time
|
||||
import traceback
|
||||
|
||||
from arctic import Arctic, chunkstore
|
||||
import arctic
|
||||
from arctic import Arctic, CHUNK_STORE
|
||||
from arctic.chunkstore.chunkstore import CHUNK_SIZE
|
||||
import fire
|
||||
from joblib import Parallel, delayed, parallel
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from pandas import DataFrame
|
||||
from pandas.core.indexes.datetimes import date_range
|
||||
from pymongo.mongo_client import MongoClient
|
||||
|
||||
DIRNAME = Path(__file__).absolute().resolve().parent
|
||||
|
||||
# CONFIG
|
||||
N_JOBS = -1 # leaving one kernel free
|
||||
LOG_FILE_PATH = DIRNAME / "log_file"
|
||||
DATA_PATH = DIRNAME / "raw_data"
|
||||
DATABASE_PATH = DIRNAME / "orig_data"
|
||||
DATA_INFO_PATH = DIRNAME / "data_info"
|
||||
DATA_FINISH_INFO_PATH = DIRNAME / "./data_finish_info"
|
||||
DOC_TYPE = ["Tick", "Order", "OrderQueue", "Transaction", "Day", "Minute"]
|
||||
MAX_SIZE = 3000 * 1024 * 1024 * 1024
|
||||
ALL_STOCK_PATH = DATABASE_PATH / "all.txt"
|
||||
ARCTIC_SRV = "127.0.0.1"
|
||||
|
||||
|
||||
def get_library_name(doc_type):
|
||||
if str.lower(doc_type) == str.lower("Tick"):
|
||||
return "ticks"
|
||||
else:
|
||||
return str.lower(doc_type)
|
||||
|
||||
|
||||
def is_stock(exchange_place, code):
|
||||
if exchange_place == "SH" and code[0] != "6":
|
||||
return False
|
||||
if exchange_place == "SZ" and code[0] != "0" and code[:2] != "30":
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def add_one_stock_daily_data(filepath, type, exchange_place, arc, date):
|
||||
"""
|
||||
exchange_place: "SZ" OR "SH"
|
||||
type: "tick", "orderbook", ...
|
||||
filepath: the path of csv
|
||||
arc: arclink created by a process
|
||||
"""
|
||||
code = os.path.split(filepath)[-1].split(".csv")[0]
|
||||
if exchange_place == "SH" and code[0] != "6":
|
||||
return
|
||||
if exchange_place == "SZ" and code[0] != "0" and code[:2] != "30":
|
||||
return
|
||||
|
||||
df = pd.read_csv(filepath, encoding="gbk", dtype={"code": str})
|
||||
code = os.path.split(filepath)[-1].split(".csv")[0]
|
||||
|
||||
def format_time(day, hms):
|
||||
day = str(day)
|
||||
hms = str(hms)
|
||||
if hms[0] == "1": # >=10,
|
||||
return (
|
||||
"-".join([day[0:4], day[4:6], day[6:8]]) + " " + ":".join([hms[:2], hms[2:4], hms[4:6] + "." + hms[6:]])
|
||||
)
|
||||
else:
|
||||
return (
|
||||
"-".join([day[0:4], day[4:6], day[6:8]]) + " " + ":".join([hms[:1], hms[1:3], hms[3:5] + "." + hms[5:]])
|
||||
)
|
||||
|
||||
## Discard the entire row if wrong data timestamp encoutered.
|
||||
timestamp = list(zip(list(df["date"]), list(df["time"])))
|
||||
error_index_list = []
|
||||
for index, t in enumerate(timestamp):
|
||||
try:
|
||||
pd.Timestamp(format_time(t[0], t[1]))
|
||||
except Exception:
|
||||
error_index_list.append(index) ## The row number of the error line
|
||||
|
||||
# to-do: writting to logs
|
||||
|
||||
if len(error_index_list) > 0:
|
||||
print("error: {}, {}".format(filepath, len(error_index_list)))
|
||||
|
||||
df = df.drop(error_index_list)
|
||||
timestamp = list(zip(list(df["date"]), list(df["time"]))) ## The cleaned timestamp
|
||||
# generate timestamp
|
||||
pd_timestamp = pd.DatetimeIndex(
|
||||
[pd.Timestamp(format_time(timestamp[i][0], timestamp[i][1])) for i in range(len(df["date"]))]
|
||||
)
|
||||
df = df.drop(columns=["date", "time", "name", "code", "wind_code"])
|
||||
# df = pd.DataFrame(data=df.to_dict("list"), index=pd_timestamp)
|
||||
df["date"] = pd.to_datetime(pd_timestamp)
|
||||
df.set_index("date", inplace=True)
|
||||
|
||||
if str.lower(type) == "orderqueue":
|
||||
## extract ab1~ab50
|
||||
df["ab"] = [
|
||||
",".join([str(int(row["ab" + str(i + 1)])) for i in range(0, row["ab_items"])])
|
||||
for timestamp, row in df.iterrows()
|
||||
]
|
||||
df = df.drop(columns=["ab" + str(i) for i in range(1, 51)])
|
||||
|
||||
type = get_library_name(type)
|
||||
# arc.initialize_library(type, lib_type=CHUNK_STORE)
|
||||
lib = arc[type]
|
||||
|
||||
symbol = "".join([exchange_place, code])
|
||||
if symbol in lib.list_symbols():
|
||||
print("update {0}, date={1}".format(symbol, date))
|
||||
if df.empty == True:
|
||||
return error_index_list
|
||||
lib.update(symbol, df, chunk_size="D")
|
||||
else:
|
||||
print("write {0}, date={1}".format(symbol, date))
|
||||
lib.write(symbol, df, chunk_size="D")
|
||||
return error_index_list
|
||||
|
||||
|
||||
def add_one_stock_daily_data_wrapper(filepath, type, exchange_place, index, date):
|
||||
pid = os.getpid()
|
||||
code = os.path.split(filepath)[-1].split(".csv")[0]
|
||||
arc = Arctic(ARCTIC_SRV)
|
||||
try:
|
||||
if index % 100 == 0:
|
||||
print("index = {}, filepath = {}".format(index, filepath))
|
||||
error_index_list = add_one_stock_daily_data(filepath, type, exchange_place, arc, date)
|
||||
if error_index_list is not None and len(error_index_list) > 0:
|
||||
f = open(os.path.join(LOG_FILE_PATH, "temp_timestamp_error_{0}_{1}_{2}.txt".format(pid, date, type)), "a+")
|
||||
f.write("{}, {}, {}\n".format(filepath, error_index_list, exchange_place + "_" + code))
|
||||
f.close()
|
||||
|
||||
except Exception as e:
|
||||
info = traceback.format_exc()
|
||||
print("error:" + str(e))
|
||||
f = open(os.path.join(LOG_FILE_PATH, "temp_fail_{0}_{1}_{2}.txt".format(pid, date, type)), "a+")
|
||||
f.write("fail:" + str(filepath) + "\n" + str(e) + "\n" + str(info) + "\n")
|
||||
f.close()
|
||||
|
||||
finally:
|
||||
arc.reset()
|
||||
|
||||
|
||||
def add_data(tick_date, doc_type, stock_name_dict):
|
||||
pid = os.getpid()
|
||||
|
||||
if doc_type not in DOC_TYPE:
|
||||
print("doc_type not in {}".format(DOC_TYPE))
|
||||
return
|
||||
try:
|
||||
begin_time = time.time()
|
||||
os.system(f"cp {DATABASE_PATH}/{tick_date + '_{}.tar.gz'.format(doc_type)} {DATA_PATH}/")
|
||||
|
||||
os.system(
|
||||
f"tar -xvzf {DATA_PATH}/{tick_date + '_{}.tar.gz'.format(doc_type)} -C {DATA_PATH}/ {tick_date + '_' + doc_type}/SH"
|
||||
)
|
||||
os.system(
|
||||
f"tar -xvzf {DATA_PATH}/{tick_date + '_{}.tar.gz'.format(doc_type)} -C {DATA_PATH}/ {tick_date + '_' + doc_type}/SZ"
|
||||
)
|
||||
os.system(f"chmod 777 {DATA_PATH}")
|
||||
os.system(f"chmod 777 {DATA_PATH}/{tick_date + '_' + doc_type}")
|
||||
os.system(f"chmod 777 {DATA_PATH}/{tick_date + '_' + doc_type}/SH")
|
||||
os.system(f"chmod 777 {DATA_PATH}/{tick_date + '_' + doc_type}/SZ")
|
||||
os.system(f"chmod 777 {DATA_PATH}/{tick_date + '_' + doc_type}/SH/{tick_date}")
|
||||
os.system(f"chmod 777 {DATA_PATH}/{tick_date + '_' + doc_type}/SZ/{tick_date}")
|
||||
|
||||
print("tick_date={}".format(tick_date))
|
||||
|
||||
temp_data_path_sh = os.path.join(DATA_PATH, tick_date + "_" + doc_type, "SH", tick_date)
|
||||
temp_data_path_sz = os.path.join(DATA_PATH, tick_date + "_" + doc_type, "SZ", tick_date)
|
||||
is_files_exist = {"sh": os.path.exists(temp_data_path_sh), "sz": os.path.exists(temp_data_path_sz)}
|
||||
|
||||
sz_files = (
|
||||
(
|
||||
set([i.split(".csv")[0] for i in os.listdir(temp_data_path_sz) if i[:2] == "30" or i[0] == "0"])
|
||||
& set(stock_name_dict["SZ"])
|
||||
)
|
||||
if is_files_exist["sz"]
|
||||
else set()
|
||||
)
|
||||
sz_file_nums = len(sz_files) if is_files_exist["sz"] else 0
|
||||
sh_files = (
|
||||
(
|
||||
set([i.split(".csv")[0] for i in os.listdir(temp_data_path_sh) if i[0] == "6"])
|
||||
& set(stock_name_dict["SH"])
|
||||
)
|
||||
if is_files_exist["sh"]
|
||||
else set()
|
||||
)
|
||||
sh_file_nums = len(sh_files) if is_files_exist["sh"] else 0
|
||||
print("sz_file_nums:{}, sh_file_nums:{}".format(sz_file_nums, sh_file_nums))
|
||||
|
||||
f = (DATA_INFO_PATH / "data_info_log_{}_{}".format(doc_type, tick_date)).open("w+")
|
||||
f.write("sz:{}, sh:{}, date:{}:".format(sz_file_nums, sh_file_nums, tick_date) + "\n")
|
||||
f.close()
|
||||
|
||||
if sh_file_nums > 0:
|
||||
# write is not thread-safe, update may be thread-safe
|
||||
Parallel(n_jobs=N_JOBS)(
|
||||
delayed(add_one_stock_daily_data_wrapper)(
|
||||
os.path.join(temp_data_path_sh, name + ".csv"), doc_type, "SH", index, tick_date
|
||||
)
|
||||
for index, name in enumerate(list(sh_files))
|
||||
)
|
||||
if sz_file_nums > 0:
|
||||
# write is not thread-safe, update may be thread-safe
|
||||
Parallel(n_jobs=N_JOBS)(
|
||||
delayed(add_one_stock_daily_data_wrapper)(
|
||||
os.path.join(temp_data_path_sz, name + ".csv"), doc_type, "SZ", index, tick_date
|
||||
)
|
||||
for index, name in enumerate(list(sz_files))
|
||||
)
|
||||
|
||||
os.system(f"rm -f {DATA_PATH}/{tick_date + '_{}.tar.gz'.format(doc_type)}")
|
||||
os.system(f"rm -rf {DATA_PATH}/{tick_date + '_' + doc_type}")
|
||||
total_time = time.time() - begin_time
|
||||
f = (DATA_FINISH_INFO_PATH / "data_info_finish_log_{}_{}".format(doc_type, tick_date)).open("w+")
|
||||
f.write("finish: date:{}, consume_time:{}, end_time: {}".format(tick_date, total_time, time.time()) + "\n")
|
||||
f.close()
|
||||
|
||||
except Exception as e:
|
||||
info = traceback.format_exc()
|
||||
print("date error:" + str(e))
|
||||
f = open(os.path.join(LOG_FILE_PATH, "temp_fail_{0}_{1}_{2}.txt".format(pid, tick_date, doc_type)), "a+")
|
||||
f.write("fail:" + str(tick_date) + "\n" + str(e) + "\n" + str(info) + "\n")
|
||||
f.close()
|
||||
|
||||
|
||||
class DSCreator:
|
||||
"""Dataset creator"""
|
||||
|
||||
def clear(self):
|
||||
client = MongoClient(ARCTIC_SRV)
|
||||
client.drop_database("arctic")
|
||||
|
||||
def initialize_library(self):
|
||||
arc = Arctic(ARCTIC_SRV)
|
||||
for doc_type in DOC_TYPE:
|
||||
arc.initialize_library(get_library_name(doc_type), lib_type=CHUNK_STORE)
|
||||
|
||||
def _get_empty_folder(self, fp: Path):
|
||||
fp = Path(fp)
|
||||
if fp.exists():
|
||||
shutil.rmtree(fp)
|
||||
fp.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def import_data(self, doc_type_l=["Tick", "Transaction", "Order"]):
|
||||
# clear all the old files
|
||||
for fp in LOG_FILE_PATH, DATA_INFO_PATH, DATA_FINISH_INFO_PATH, DATA_PATH:
|
||||
self._get_empty_folder(fp)
|
||||
|
||||
arc = Arctic(ARCTIC_SRV)
|
||||
for doc_type in DOC_TYPE:
|
||||
# arc.initialize_library(get_library_name(doc_type), lib_type=CHUNK_STORE)
|
||||
arc.set_quota(get_library_name(doc_type), MAX_SIZE)
|
||||
arc.reset()
|
||||
|
||||
# doc_type = 'Day'
|
||||
for doc_type in doc_type_l:
|
||||
date_list = list(set([int(path.split("_")[0]) for path in os.listdir(DATABASE_PATH) if doc_type in path]))
|
||||
date_list.sort()
|
||||
date_list = [str(date) for date in date_list]
|
||||
|
||||
f = open(ALL_STOCK_PATH, "r")
|
||||
stock_name_list = [lines.split("\t")[0] for lines in f.readlines()]
|
||||
f.close()
|
||||
stock_name_dict = {
|
||||
"SH": [stock_name[2:] for stock_name in stock_name_list if "SH" in stock_name],
|
||||
"SZ": [stock_name[2:] for stock_name in stock_name_list if "SZ" in stock_name],
|
||||
}
|
||||
|
||||
lib_name = get_library_name(doc_type)
|
||||
a = Arctic(ARCTIC_SRV)
|
||||
# a.initialize_library(lib_name, lib_type=CHUNK_STORE)
|
||||
|
||||
stock_name_exist = a[lib_name].list_symbols()
|
||||
lib = a[lib_name]
|
||||
initialize_count = 0
|
||||
for stock_name in stock_name_list:
|
||||
if stock_name not in stock_name_exist:
|
||||
initialize_count += 1
|
||||
# A placeholder for stocks
|
||||
pdf = pd.DataFrame(index=[pd.Timestamp("1900-01-01")])
|
||||
pdf.index.name = "date" # an col named date is necessary
|
||||
lib.write(stock_name, pdf)
|
||||
print("initialize count: {}".format(initialize_count))
|
||||
print("tasks: {}".format(date_list))
|
||||
a.reset()
|
||||
|
||||
# date_list = [files.split("_")[0] for files in os.listdir("./raw_data_price") if "tar" in files]
|
||||
# print(len(date_list))
|
||||
date_list = ["20201231"] # for test
|
||||
Parallel(n_jobs=min(2, len(date_list)))(
|
||||
delayed(add_data)(date, doc_type, stock_name_dict) for date in date_list
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
fire.Fire(DSCreator)
|
||||
312
examples/orderbook_data/example.py
Normal file
312
examples/orderbook_data/example.py
Normal file
@@ -0,0 +1,312 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
from arctic.arctic import Arctic
|
||||
import qlib
|
||||
from qlib.data import D
|
||||
import unittest
|
||||
|
||||
|
||||
class TestClass(unittest.TestCase):
|
||||
"""
|
||||
Useful commands
|
||||
- run all tests: pytest examples/orderbook_data/example.py
|
||||
- run a single test: pytest -s --pdb --disable-warnings examples/orderbook_data/example.py::TestClass::test_basic01
|
||||
"""
|
||||
|
||||
def setUp(self):
|
||||
"""
|
||||
Configure for arctic
|
||||
"""
|
||||
provider_uri = "~/.qlib/qlib_data/yahoo_cn_1min"
|
||||
qlib.init(
|
||||
provider_uri=provider_uri,
|
||||
mem_cache_size_limit=1024 ** 3 * 2,
|
||||
mem_cache_type="sizeof",
|
||||
kernels=1,
|
||||
expression_provider={"class": "LocalExpressionProvider", "kwargs": {"time2idx": False}},
|
||||
feature_provider={
|
||||
"class": "ArcticFeatureProvider",
|
||||
"module_path": "qlib.contrib.data.data",
|
||||
"kwargs": {"uri": "127.0.0.1"},
|
||||
},
|
||||
dataset_provider={
|
||||
"class": "LocalDatasetProvider",
|
||||
"kwargs": {
|
||||
"align_time": False, # Order book is not fixed, so it can't be align to a shared fixed frequency calendar
|
||||
},
|
||||
},
|
||||
)
|
||||
# self.stocks_list = ["SH600519"]
|
||||
self.stocks_list = ["SZ000725"]
|
||||
|
||||
def test_basic(self):
|
||||
# NOTE: this data contains a lot of zeros in $askX and $bidX
|
||||
df = D.features(
|
||||
self.stocks_list,
|
||||
fields=["$ask1", "$ask2", "$bid1", "$bid2"],
|
||||
freq="ticks",
|
||||
start_time="20201230",
|
||||
end_time="20210101",
|
||||
)
|
||||
print(df)
|
||||
|
||||
def test_basic_without_time(self):
|
||||
df = D.features(self.stocks_list, fields=["$ask1"], freq="ticks")
|
||||
print(df)
|
||||
|
||||
def test_basic01(self):
|
||||
df = D.features(
|
||||
self.stocks_list,
|
||||
fields=["TResample($ask1, '1min', 'last')"],
|
||||
freq="ticks",
|
||||
start_time="20201230",
|
||||
end_time="20210101",
|
||||
)
|
||||
print(df)
|
||||
|
||||
def test_basic02(self):
|
||||
df = D.features(
|
||||
self.stocks_list,
|
||||
fields=["$function_code"],
|
||||
freq="transaction",
|
||||
start_time="20201230",
|
||||
end_time="20210101",
|
||||
)
|
||||
print(df)
|
||||
|
||||
def test_basic03(self):
|
||||
df = D.features(
|
||||
self.stocks_list,
|
||||
fields=["$function_code"],
|
||||
freq="order",
|
||||
start_time="20201230",
|
||||
end_time="20210101",
|
||||
)
|
||||
print(df)
|
||||
|
||||
# Here are some popular expressions for high-frequency
|
||||
# 1) some shared expression
|
||||
expr_sum_buy_ask_1 = "(TResample($ask1, '1min', 'last') + TResample($bid1, '1min', 'last'))"
|
||||
total_volume = (
|
||||
"TResample("
|
||||
+ "+".join([f"${name}{i}" for i in range(1, 11) for name in ["asize", "bsize"]])
|
||||
+ ", '1min', 'sum')"
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def total_func(name, method):
|
||||
return "TResample(" + "+".join([f"${name}{i}" for i in range(1, 11)]) + ",'1min', '{}')".format(method)
|
||||
|
||||
def test_exp_01(self):
|
||||
exprs = []
|
||||
names = []
|
||||
for name in ["asize", "bsize"]:
|
||||
for i in range(1, 11):
|
||||
exprs.append(f"TResample(${name}{i}, '1min', 'mean') / ({self.total_volume})")
|
||||
names.append(f"v_{name}_{i}")
|
||||
df = D.features(self.stocks_list, fields=exprs, freq="ticks")
|
||||
df.columns = names
|
||||
print(df)
|
||||
|
||||
# 2) some often used papers;
|
||||
def test_exp_02(self):
|
||||
spread_func = (
|
||||
lambda index: f"2 * TResample($ask{index} - $bid{index}, '1min', 'last') / {self.expr_sum_buy_ask_1}"
|
||||
)
|
||||
mid_func = (
|
||||
lambda index: f"2 * TResample(($ask{index} + $bid{index})/2, '1min', 'last') / {self.expr_sum_buy_ask_1}"
|
||||
)
|
||||
|
||||
exprs = []
|
||||
names = []
|
||||
for i in range(1, 11):
|
||||
exprs.extend([spread_func(i), mid_func(i)])
|
||||
names.extend([f"p_spread_{i}", f"p_mid_{i}"])
|
||||
df = D.features(self.stocks_list, fields=exprs, freq="ticks")
|
||||
df.columns = names
|
||||
print(df)
|
||||
|
||||
def test_exp_03(self):
|
||||
expr3_func1 = (
|
||||
lambda name, index_left, index_right: f"2 * TResample(Abs(${name}{index_left} - ${name}{index_right}), '1min', 'last') / {self.expr_sum_buy_ask_1}"
|
||||
)
|
||||
for name in ["ask", "bid"]:
|
||||
for i in range(1, 10):
|
||||
exprs = [expr3_func1(name, i + 1, i)]
|
||||
names = [f"p_diff_{name}_{i}_{i+1}"]
|
||||
exprs.extend([expr3_func1("ask", 10, 1), expr3_func1("bid", 1, 10)])
|
||||
names.extend(["p_diff_ask_10_1", "p_diff_bid_1_10"])
|
||||
df = D.features(self.stocks_list, fields=exprs, freq="ticks")
|
||||
df.columns = names
|
||||
print(df)
|
||||
|
||||
def test_exp_04(self):
|
||||
exprs = []
|
||||
names = []
|
||||
for name in ["asize", "bsize"]:
|
||||
exprs.append(f"(({ self.total_func(name, 'mean')}) / 10) / {self.total_volume}")
|
||||
names.append(f"v_avg_{name}")
|
||||
|
||||
df = D.features(self.stocks_list, fields=exprs, freq="ticks")
|
||||
df.columns = names
|
||||
print(df)
|
||||
|
||||
def test_exp_05(self):
|
||||
exprs = [
|
||||
f"2 * Sub({ self.total_func('ask', 'last')}, {self.total_func('bid', 'last')})/{self.expr_sum_buy_ask_1}",
|
||||
f"Sub({ self.total_func('asize', 'mean')}, {self.total_func('bsize', 'mean')})/{self.total_volume}",
|
||||
]
|
||||
names = ["p_accspread", "v_accspread"]
|
||||
|
||||
df = D.features(self.stocks_list, fields=exprs, freq="ticks")
|
||||
df.columns = names
|
||||
print(df)
|
||||
|
||||
# (p|v)_diff_(ask|bid|asize|bsize)_(time_interval)
|
||||
def test_exp_06(self):
|
||||
t = 3
|
||||
expr6_price_func = (
|
||||
lambda name, index, method: f'2 * (TResample(${name}{index}, "{t}s", "{method}") - Ref(TResample(${name}{index}, "{t}s", "{method}"), 1)) / {t}'
|
||||
)
|
||||
exprs = []
|
||||
names = []
|
||||
for i in range(1, 11):
|
||||
for name in ["bid", "ask"]:
|
||||
exprs.append(
|
||||
f"TResample({expr6_price_func(name, i, 'last')}, '1min', 'mean') / {self.expr_sum_buy_ask_1}"
|
||||
)
|
||||
names.append(f"p_diff_{name}{i}_{t}s")
|
||||
|
||||
for i in range(1, 11):
|
||||
for name in ["asize", "bsize"]:
|
||||
exprs.append(f"TResample({expr6_price_func(name, i, 'mean')}, '1min', 'mean') / {self.total_volume}")
|
||||
names.append(f"v_diff_{name}{i}_{t}s")
|
||||
|
||||
df = D.features(self.stocks_list, fields=exprs, freq="ticks")
|
||||
df.columns = names
|
||||
print(df)
|
||||
|
||||
# TODOs:
|
||||
# Following expressions may be implemented in the future
|
||||
# expr7_2 = lambda funccode, bsflag, time_interval: \
|
||||
# "TResample(TRolling(TEq(@transaction.function_code, {}) & TEq(@transaction.bs_flag ,{}), '{}s', 'sum') / \
|
||||
# TRolling(@transaction.function_code, '{}s', 'count') , '1min', 'mean')".format(ord(funccode), bsflag,time_interval,time_interval)
|
||||
# create_dataset(7, "SH600000", [expr7_2("C")] + [expr7(funccode, ordercode) for funccode in ['B','S'] for ordercode in ['0','1']])
|
||||
# create_dataset(7, ["SH600000"], [expr7_2("C", 48)] )
|
||||
|
||||
@staticmethod
|
||||
def expr7_init(funccode, ordercode, time_interval):
|
||||
# NOTE: based on on order frequency (i.e. freq="order")
|
||||
return f"Rolling(Eq($function_code, {ord(funccode)}) & Eq($order_kind ,{ord(ordercode)}), '{time_interval}s', 'sum') / Rolling($function_code, '{time_interval}s', 'count')"
|
||||
|
||||
# (la|lb|ma|mb|ca|cb)_intensity_(time_interval)
|
||||
def test_exp_07_1(self):
|
||||
# NOTE: based on transaction frequency (i.e. freq="transaction")
|
||||
expr7_3 = (
|
||||
lambda funccode, code, time_interval: f"TResample(Rolling(Eq($function_code, {ord(funccode)}) & {code}($ask_order, $bid_order) , '{time_interval}s', 'sum') / Rolling($function_code, '{time_interval}s', 'count') , '1min', 'mean')"
|
||||
)
|
||||
|
||||
exprs = [expr7_3("C", "Gt", "3"), expr7_3("C", "Lt", "3")]
|
||||
names = ["ca_intensity_3s", "cb_intensity_3s"]
|
||||
|
||||
df = D.features(self.stocks_list, fields=exprs, freq="transaction")
|
||||
df.columns = names
|
||||
print(df)
|
||||
|
||||
trans_dict = {"B": "a", "S": "b", "0": "l", "1": "m"}
|
||||
|
||||
def test_exp_07_2(self):
|
||||
# NOTE: based on on order frequency
|
||||
expr7 = (
|
||||
lambda funccode, ordercode, time_interval: f"TResample({self.expr7_init(funccode, ordercode, time_interval)}, '1min', 'mean')"
|
||||
)
|
||||
|
||||
exprs = []
|
||||
names = []
|
||||
for funccode in ["B", "S"]:
|
||||
for ordercode in ["0", "1"]:
|
||||
exprs.append(expr7(funccode, ordercode, "3"))
|
||||
names.append(self.trans_dict[ordercode] + self.trans_dict[funccode] + "_intensity_3s")
|
||||
df = D.features(self.stocks_list, fields=exprs, freq="transaction")
|
||||
df.columns = names
|
||||
print(df)
|
||||
|
||||
@staticmethod
|
||||
def expr7_3_init(funccode, code, time_interval):
|
||||
# NOTE: It depends on transaction frequency
|
||||
return f"Rolling(Eq($function_code, {ord(funccode)}) & {code}($ask_order, $bid_order) , '{time_interval}s', 'sum') / Rolling($function_code, '{time_interval}s', 'count')"
|
||||
|
||||
# (la|lb|ma|mb|ca|cb)_relative_intensity_(time_interval_small)_(time_interval_big)
|
||||
def test_exp_08_1(self):
|
||||
expr8_1 = (
|
||||
lambda funccode, ordercode, time_interval_short, time_interval_long: f"TResample(Gt({self.expr7_init(funccode, ordercode, time_interval_short)},{self.expr7_init(funccode, ordercode, time_interval_long)}), '1min', 'mean')"
|
||||
)
|
||||
|
||||
exprs = []
|
||||
names = []
|
||||
for funccode in ["B", "S"]:
|
||||
for ordercode in ["0", "1"]:
|
||||
exprs.append(expr8_1(funccode, ordercode, "10", "900"))
|
||||
names.append(self.trans_dict[ordercode] + self.trans_dict[funccode] + "_relative_intensity_10s_900s")
|
||||
|
||||
df = D.features(self.stocks_list, fields=exprs, freq="order")
|
||||
df.columns = names
|
||||
print(df)
|
||||
|
||||
def test_exp_08_2(self):
|
||||
# NOTE: It depends on transaction frequency
|
||||
expr8_2 = (
|
||||
lambda funccode, ordercode, time_interval_short, time_interval_long: f"TResample(Gt({self.expr7_3_init(funccode, ordercode, time_interval_short)},{self.expr7_3_init(funccode, ordercode, time_interval_long)}), '1min', 'mean')"
|
||||
)
|
||||
|
||||
exprs = [expr8_2("C", "Gt", "10", "900"), expr8_2("C", "Lt", "10", "900")]
|
||||
names = ["ca_relative_intensity_10s_900s", "cb_relative_intensity_10s_900s"]
|
||||
|
||||
df = D.features(self.stocks_list, fields=exprs, freq="transaction")
|
||||
df.columns = names
|
||||
print(df)
|
||||
|
||||
## v9(la|lb|ma|mb|ca|cb)_diff_intensity_(time_interval1)_(time_interval2)
|
||||
# 1) calculating the original data
|
||||
# 2) Resample data to 3s and calculate the changing rate
|
||||
# 3) Resample data to 1min
|
||||
|
||||
def test_exp_09_trans(self):
|
||||
exprs = [
|
||||
f'TResample(Div(Sub(TResample({self.expr7_3_init("C", "Gt", "3")}, "3s", "last"), Ref(TResample({self.expr7_3_init("C", "Gt", "3")}, "3s","last"), 1)), 3), "1min", "mean")',
|
||||
f'TResample(Div(Sub(TResample({self.expr7_3_init("C", "Lt", "3")}, "3s", "last"), Ref(TResample({self.expr7_3_init("C", "Lt", "3")}, "3s","last"), 1)), 3), "1min", "mean")',
|
||||
]
|
||||
names = ["ca_diff_intensity_3s_3s", "cb_diff_intensity_3s_3s"]
|
||||
df = D.features(self.stocks_list, fields=exprs, freq="transaction")
|
||||
df.columns = names
|
||||
print(df)
|
||||
|
||||
def test_exp_09_order(self):
|
||||
exprs = []
|
||||
names = []
|
||||
for funccode in ["B", "S"]:
|
||||
for ordercode in ["0", "1"]:
|
||||
exprs.append(
|
||||
f'TResample(Div(Sub(TResample({self.expr7_init(funccode, ordercode, "3")}, "3s", "last"), Ref(TResample({self.expr7_init(funccode, ordercode, "3")},"3s", "last"), 1)), 3) ,"1min", "mean")'
|
||||
)
|
||||
names.append(self.trans_dict[ordercode] + self.trans_dict[funccode] + "_diff_intensity_3s_3s")
|
||||
df = D.features(self.stocks_list, fields=exprs, freq="order")
|
||||
df.columns = names
|
||||
print(df)
|
||||
|
||||
def test_exp_10(self):
|
||||
exprs = []
|
||||
names = []
|
||||
for i in [5, 10, 30, 60]:
|
||||
exprs.append(
|
||||
f'TResample(Ref(TResample($ask1 + $bid1, "1s", "ffill"), {-i}) / TResample($ask1 + $bid1, "1s", "ffill") - 1, "1min", "mean" )'
|
||||
)
|
||||
names.append(f"lag_{i}_change_rate" for i in [5, 10, 30, 60])
|
||||
df = D.features(self.stocks_list, fields=exprs, freq="ticks")
|
||||
df.columns = names
|
||||
print(df)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
46
examples/portfolio/README.md
Normal file
46
examples/portfolio/README.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Portfolio Optimization Strategy
|
||||
|
||||
## Introduction
|
||||
|
||||
In `qlib/examples/benchmarks` we have various **alpha** models that predict
|
||||
the stock returns. We also use a simple rule based `TopkDropoutStrategy` to
|
||||
evaluate the investing performance of these models. However, such a strategy
|
||||
is too simple to control the portfolio risk like correlation and volatility.
|
||||
|
||||
To this end, an optimization based strategy should be used to for the
|
||||
trade-off between return and risk. In this doc, we will show how to use
|
||||
`EnhancedIndexingStrategy` to maximize portfolio return while minimizing
|
||||
tracking error relative to a benchmark.
|
||||
|
||||
|
||||
## Preparation
|
||||
|
||||
We use China stock market data for our example.
|
||||
|
||||
1. Prepare CSI300 weight:
|
||||
|
||||
```bash
|
||||
wget http://fintech.msra.cn/stock_data/downloads/csi300_weight.zip
|
||||
unzip -d ~/.qlib/qlib_data/cn_data csi300_weight.zip
|
||||
rm -f csi300_weight.zip
|
||||
```
|
||||
|
||||
2. Prepare risk model data:
|
||||
|
||||
```bash
|
||||
python prepare_riskdata.py
|
||||
```
|
||||
|
||||
Here we use a **Statistical Risk Model** implemented in `qlib.model.riskmodel`.
|
||||
However users are strongly recommended to use other risk models for better quality:
|
||||
* **Fundamental Risk Model** like MSCI BARRA
|
||||
* [Deep Risk Model](https://arxiv.org/abs/2107.05201)
|
||||
|
||||
|
||||
## End-to-End Workflow
|
||||
|
||||
You can finish workflow with `EnhancedIndexingStrategy` by running
|
||||
`qrun config_enhanced_indexing.yaml`.
|
||||
|
||||
In this config, we mainly changed the strategy section compared to
|
||||
`qlib/examples/benchmarks/workflow_config_lightgbm_Alpha158.yaml`.
|
||||
71
examples/portfolio/config_enhanced_indexing.yaml
Normal file
71
examples/portfolio/config_enhanced_indexing.yaml
Normal file
@@ -0,0 +1,71 @@
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
start_time: 2008-01-01
|
||||
end_time: 2020-08-01
|
||||
fit_start_time: 2008-01-01
|
||||
fit_end_time: 2014-12-31
|
||||
instruments: *market
|
||||
port_analysis_config: &port_analysis_config
|
||||
strategy:
|
||||
class: EnhancedIndexingStrategy
|
||||
module_path: qlib.contrib.strategy
|
||||
kwargs:
|
||||
model: <MODEL>
|
||||
dataset: <DATASET>
|
||||
riskmodel_root: ./riskdata
|
||||
backtest:
|
||||
start_time: 2017-01-01
|
||||
end_time: 2020-08-01
|
||||
account: 100000000
|
||||
benchmark: *benchmark
|
||||
exchange_kwargs:
|
||||
limit_threshold: 0.095
|
||||
deal_price: close
|
||||
open_cost: 0.0005
|
||||
close_cost: 0.0015
|
||||
min_cost: 5
|
||||
task:
|
||||
model:
|
||||
class: LGBModel
|
||||
module_path: qlib.contrib.model.gbdt
|
||||
kwargs:
|
||||
loss: mse
|
||||
colsample_bytree: 0.8879
|
||||
learning_rate: 0.2
|
||||
subsample: 0.8789
|
||||
lambda_l1: 205.6999
|
||||
lambda_l2: 580.9768
|
||||
max_depth: 8
|
||||
num_leaves: 210
|
||||
num_threads: 20
|
||||
dataset:
|
||||
class: DatasetH
|
||||
module_path: qlib.data.dataset
|
||||
kwargs:
|
||||
handler:
|
||||
class: Alpha158
|
||||
module_path: qlib.contrib.data.handler
|
||||
kwargs: *data_handler_config
|
||||
segments:
|
||||
train: [2008-01-01, 2014-12-31]
|
||||
valid: [2015-01-01, 2016-12-31]
|
||||
test: [2017-01-01, 2020-08-01]
|
||||
record:
|
||||
- class: SignalRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
model: <MODEL>
|
||||
dataset: <DATASET>
|
||||
- class: SigAnaRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
ana_long_short: False
|
||||
ann_scaler: 252
|
||||
- class: PortAnaRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
config: *port_analysis_config
|
||||
55
examples/portfolio/prepare_riskdata.py
Normal file
55
examples/portfolio/prepare_riskdata.py
Normal file
@@ -0,0 +1,55 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
import os
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from qlib.data import D
|
||||
from qlib.model.riskmodel import StructuredCovEstimator
|
||||
|
||||
|
||||
def prepare_data(riskdata_root="./riskdata", T=240, start_time="2016-01-01"):
|
||||
|
||||
universe = D.features(D.instruments("csi300"), ["$close"], start_time=start_time).swaplevel().sort_index()
|
||||
|
||||
price_all = (
|
||||
D.features(D.instruments("all"), ["$close"], start_time=start_time).squeeze().unstack(level="instrument")
|
||||
)
|
||||
|
||||
# StructuredCovEstimator is a statistical risk model
|
||||
riskmodel = StructuredCovEstimator()
|
||||
|
||||
for i in range(T - 1, len(price_all)):
|
||||
|
||||
date = price_all.index[i]
|
||||
ref_date = price_all.index[i - T + 1]
|
||||
|
||||
print(date)
|
||||
|
||||
codes = universe.loc[date].index
|
||||
price = price_all.loc[ref_date:date, codes]
|
||||
|
||||
# calculate return and remove extreme return
|
||||
ret = price.pct_change()
|
||||
ret.clip(ret.quantile(0.025), ret.quantile(0.975), axis=1, inplace=True)
|
||||
|
||||
# run risk model
|
||||
F, cov_b, var_u = riskmodel.predict(ret, is_price=False, return_decomposed_components=True)
|
||||
|
||||
# save risk data
|
||||
root = riskdata_root + "/" + date.strftime("%Y%m%d")
|
||||
os.makedirs(root, exist_ok=True)
|
||||
|
||||
pd.DataFrame(F, index=codes).to_pickle(root + "/factor_exp.pkl")
|
||||
pd.DataFrame(cov_b).to_pickle(root + "/factor_cov.pkl")
|
||||
# for specific_risk we follow the convention to save volatility
|
||||
pd.Series(np.sqrt(var_u), index=codes).to_pickle(root + "/specific_risk.pkl")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
import qlib
|
||||
|
||||
qlib.init(provider_uri="~/.qlib/qlib_data/cn_data")
|
||||
|
||||
prepare_data()
|
||||
@@ -6,7 +6,7 @@ import fire
|
||||
import pickle
|
||||
|
||||
from datetime import datetime
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.data.dataset.handler import DataHandlerLP
|
||||
from qlib.utils import init_instance_by_config
|
||||
from qlib.tests.data import GetData
|
||||
|
||||
@@ -20,7 +20,6 @@ from operator import xor
|
||||
from pprint import pprint
|
||||
|
||||
import qlib
|
||||
from qlib.config import REG_CN
|
||||
from qlib.workflow import R
|
||||
from qlib.tests.data import GetData
|
||||
|
||||
@@ -187,7 +186,7 @@ def gen_and_save_md_table(metrics, dataset):
|
||||
# read yaml, remove seed kwargs of model, and then save file in the temp_dir
|
||||
def gen_yaml_file_without_seed_kwargs(yaml_path, temp_dir):
|
||||
with open(yaml_path, "r") as fp:
|
||||
config = yaml.load(fp)
|
||||
config = yaml.safe_load(fp)
|
||||
try:
|
||||
del config["task"]["model"]["kwargs"]["seed"]
|
||||
except KeyError:
|
||||
@@ -248,7 +247,7 @@ class ModelRunner:
|
||||
determines the dataset to be used for each model.
|
||||
qlib_uri : str
|
||||
the uri to install qlib with pip
|
||||
it could be url on the we or local path
|
||||
it could be url on the we or local path (NOTE: the local path must be a absolute path)
|
||||
exp_folder_name: str
|
||||
the name of the experiment folder
|
||||
wait_before_rm_env : bool
|
||||
|
||||
@@ -61,13 +61,7 @@
|
||||
"\n",
|
||||
"import qlib\n",
|
||||
"import pandas as pd\n",
|
||||
"from qlib.config import REG_CN\n",
|
||||
"from qlib.contrib.model.gbdt import LGBModel\n",
|
||||
"from qlib.contrib.data.handler import Alpha158\n",
|
||||
"from qlib.contrib.evaluate import (\n",
|
||||
" backtest as normal_backtest,\n",
|
||||
" risk_analysis,\n",
|
||||
")\n",
|
||||
"from qlib.constant import REG_CN\n",
|
||||
"from qlib.utils import exists_qlib_data, init_instance_by_config\n",
|
||||
"from qlib.workflow import R\n",
|
||||
"from qlib.workflow.record_temp import SignalRecord, PortAnaRecord\n",
|
||||
@@ -204,7 +198,7 @@
|
||||
" },\n",
|
||||
" \"strategy\": {\n",
|
||||
" \"class\": \"TopkDropoutStrategy\",\n",
|
||||
" \"module_path\": \"qlib.contrib.strategy.model_strategy\",\n",
|
||||
" \"module_path\": \"qlib.contrib.strategy.signal_strategy\",\n",
|
||||
" \"kwargs\": {\n",
|
||||
" \"model\": model,\n",
|
||||
" \"dataset\": dataset,\n",
|
||||
|
||||
@@ -2,10 +2,10 @@
|
||||
# Licensed under the MIT License.
|
||||
|
||||
import qlib
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.utils import init_instance_by_config, flatten_dict
|
||||
from qlib.workflow import R
|
||||
from qlib.workflow.record_temp import SignalRecord, PortAnaRecord
|
||||
from qlib.workflow.record_temp import SignalRecord, PortAnaRecord, SigAnaRecord
|
||||
from qlib.tests.data import GetData
|
||||
from qlib.tests.config import CSI300_BENCH, CSI300_GBDT_TASK
|
||||
|
||||
@@ -70,6 +70,10 @@ if __name__ == "__main__":
|
||||
sr = SignalRecord(model, dataset, recorder)
|
||||
sr.generate()
|
||||
|
||||
# Signal Analysis
|
||||
sar = SigAnaRecord(recorder)
|
||||
sar.generate()
|
||||
|
||||
# backtest. If users want to use backtest based on their own prediction,
|
||||
# please refer to https://qlib.readthedocs.io/en/latest/component/recorder.html#record-template.
|
||||
par = PortAnaRecord(recorder, port_analysis_config, "day")
|
||||
|
||||
@@ -2,8 +2,7 @@
|
||||
# Licensed under the MIT License.
|
||||
from pathlib import Path
|
||||
|
||||
_version_path = Path(__file__).absolute().parent / "VERSION.txt" # This file is copyed from setup.py
|
||||
__version__ = _version_path.read_text(encoding="utf-8").strip()
|
||||
__version__ = "0.8.3"
|
||||
__version__bak = __version__ # This version is backup for QlibConfig.reset_qlib_version
|
||||
import os
|
||||
from typing import Union
|
||||
@@ -13,9 +12,24 @@ import platform
|
||||
import subprocess
|
||||
from .log import get_module_logger
|
||||
|
||||
|
||||
# init qlib
|
||||
def init(default_conf="client", **kwargs):
|
||||
"""
|
||||
|
||||
Parameters
|
||||
----------
|
||||
default_conf: str
|
||||
the default value is client. Accepted values: client/server.
|
||||
**kwargs :
|
||||
clear_mem_cache: str
|
||||
the default value is True;
|
||||
Will the memory cache be clear.
|
||||
It is often used to improve performance when init will be called for multiple times
|
||||
skip_if_reg: bool: str
|
||||
the default value is True;
|
||||
When using the recorder, skip_if_reg can set to True to avoid loss of recorder.
|
||||
|
||||
"""
|
||||
from .config import C
|
||||
from .data.cache import H
|
||||
|
||||
@@ -29,7 +43,9 @@ def init(default_conf="client", **kwargs):
|
||||
logger.warning("Skip initialization because `skip_if_reg is True`")
|
||||
return
|
||||
|
||||
H.clear()
|
||||
clear_mem_cache = kwargs.pop("clear_mem_cache", True)
|
||||
if clear_mem_cache:
|
||||
H.clear()
|
||||
C.set(default_conf, **kwargs)
|
||||
|
||||
# mount nfs
|
||||
@@ -46,7 +62,7 @@ def init(default_conf="client", **kwargs):
|
||||
else:
|
||||
logger.warning(f"auto_path is False, please make sure {mount_path} is mounted")
|
||||
elif uri_type == C.NFS_URI:
|
||||
_mount_nfs_uri(provider_uri, mount_path, C["auto_mount"])
|
||||
_mount_nfs_uri(provider_uri, C.dpm.get_data_uri(_freq), C["auto_mount"])
|
||||
else:
|
||||
raise NotImplementedError(f"This type of URI is not supported")
|
||||
|
||||
@@ -79,7 +95,7 @@ def _mount_nfs_uri(provider_uri, mount_path, auto_mount: bool = False):
|
||||
sys_type = platform.system()
|
||||
if "win" in sys_type.lower():
|
||||
# system: window
|
||||
exec_result = os.popen("mount -o anon %s %s" % (provider_uri, mount_path + ":"))
|
||||
exec_result = os.popen(f"mount -o anon {provider_uri} {mount_path}")
|
||||
result = exec_result.read()
|
||||
if "85" in result:
|
||||
LOG.warning(f"{provider_uri} on Windows:{mount_path} is already mounted")
|
||||
@@ -169,7 +185,7 @@ def get_project_path(config_name="config.yaml", cur_path: Union[Path, str, None]
|
||||
- There is a file named `config.yaml` in qlib.
|
||||
|
||||
For example:
|
||||
If your project file system stucuture follows such a pattern
|
||||
If your project file system structure follows such a pattern
|
||||
|
||||
<project_path>/
|
||||
- config.yaml
|
||||
@@ -214,7 +230,7 @@ def auto_init(**kwargs):
|
||||
Here are two examples of the configuration
|
||||
|
||||
Example 1)
|
||||
If you want create a new project-specific config based on a shared configure, you can use `conf_type: ref`
|
||||
If you want to create a new project-specific config based on a shared configure, you can use `conf_type: ref`
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
@@ -230,7 +246,7 @@ def auto_init(**kwargs):
|
||||
default_exp_name: "Experiment"
|
||||
|
||||
Example 2)
|
||||
If you wan to create simple a stand alone config, you can use following config(a.k.a `conf_type: origin`)
|
||||
If you want to create simple a standalone config, you can use following config(a.k.a. `conf_type: origin`)
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@@ -260,8 +276,8 @@ def auto_init(**kwargs):
|
||||
init_from_yaml_conf(conf_pp, **kwargs)
|
||||
elif conf_type == "ref":
|
||||
# This config type will be more convenient in following scenario
|
||||
# - There is a shared configure file and you don't want to edit it inplace.
|
||||
# - The shared configure may be updated later and you don't want to copy it.
|
||||
# - There is a shared configure file, and you don't want to edit it inplace.
|
||||
# - The shared configure may be updated later, and you don't want to copy it.
|
||||
# - You have some customized config.
|
||||
qlib_conf_path = conf.get("qlib_cfg", None)
|
||||
|
||||
|
||||
@@ -50,11 +50,12 @@ def get_exchange(
|
||||
subscribe_fields: list
|
||||
subscribe fields.
|
||||
open_cost : float
|
||||
open transaction cost.
|
||||
open transaction cost. It is a ratio. The cost is proportional to your order's deal amount.
|
||||
close_cost : float
|
||||
close transaction cost.
|
||||
close transaction cost. It is a ratio. The cost is proportional to your order's deal amount.
|
||||
min_cost : float
|
||||
min transaction cost.
|
||||
min transaction cost. It is an absolute amount of cost instead of a ratio of your order's deal amount.
|
||||
e.g. You must pay at least 5 yuan of commission regardless of your order's deal amount.
|
||||
trade_unit : int
|
||||
Included in kwargs. Please refer to the docs of `__init__` of `Exchange`
|
||||
deal_price: Union[str, Tuple[str], List[str]]
|
||||
@@ -185,8 +186,10 @@ def get_strategy_executor(
|
||||
trade_exchange = get_exchange(**exchange_kwargs)
|
||||
|
||||
common_infra = CommonInfrastructure(trade_account=trade_account, trade_exchange=trade_exchange)
|
||||
trade_strategy = init_instance_by_config(strategy, accept_types=BaseStrategy, common_infra=common_infra)
|
||||
trade_executor = init_instance_by_config(executor, accept_types=BaseExecutor, common_infra=common_infra)
|
||||
trade_strategy = init_instance_by_config(strategy, accept_types=BaseStrategy)
|
||||
trade_strategy.reset_common_infra(common_infra)
|
||||
trade_executor = init_instance_by_config(executor, accept_types=BaseExecutor)
|
||||
trade_executor.reset_common_infra(common_infra)
|
||||
|
||||
return trade_strategy, trade_executor
|
||||
|
||||
|
||||
@@ -29,7 +29,10 @@ rtn & earning in the Account
|
||||
|
||||
|
||||
class AccumulatedInfo:
|
||||
"""accumulated trading info, including accumulated return/cost/turnover"""
|
||||
"""
|
||||
accumulated trading info, including accumulated return/cost/turnover
|
||||
AccumulatedInfo should be shared across different levels
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
self.reset()
|
||||
@@ -62,6 +65,11 @@ class AccumulatedInfo:
|
||||
|
||||
|
||||
class Account:
|
||||
"""
|
||||
The correctness of the metrics of Account in nested execution depends on the shallow copy of `trade_account` in qlib/backtest/executor.py:NestedExecutor
|
||||
Different level of executor has different Account object when calculating metrics. But the position object is shared cross all the Account object.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
init_cash: float = 1e9,
|
||||
@@ -95,6 +103,8 @@ class Account:
|
||||
self.init_vars(init_cash, position_dict, freq, benchmark_config)
|
||||
|
||||
def init_vars(self, init_cash, position_dict, freq: str, benchmark_config: dict):
|
||||
# 1) the following variables are shared by multiple layers
|
||||
# - you will see a shallow copy instead of deepcopy in the NestedExecutor;
|
||||
self.init_cash = init_cash
|
||||
self.current_position: BasePosition = init_instance_by_config(
|
||||
{
|
||||
@@ -106,6 +116,9 @@ class Account:
|
||||
"module_path": "qlib.backtest.position",
|
||||
}
|
||||
)
|
||||
self.accum_info = AccumulatedInfo()
|
||||
|
||||
# 2) following variables are not shared between layers
|
||||
self.portfolio_metrics = None
|
||||
self.hist_positions = {}
|
||||
self.reset(freq=freq, benchmark_config=benchmark_config)
|
||||
@@ -119,7 +132,8 @@ class Account:
|
||||
def reset_report(self, freq, benchmark_config):
|
||||
# portfolio related metrics
|
||||
if self.is_port_metr_enabled():
|
||||
self.accum_info = AccumulatedInfo()
|
||||
# NOTE:
|
||||
# `accum_info` and `current_position` are shared here
|
||||
self.portfolio_metrics = PortfolioMetrics(freq, benchmark_config)
|
||||
self.hist_positions = {}
|
||||
|
||||
@@ -185,7 +199,7 @@ class Account:
|
||||
|
||||
# if stock is sold out, no stock price information in Position, then we should update account first, then update current position
|
||||
# if stock is bought, there is no stock in current position, update current, then update account
|
||||
# The cost will be substracted from the cash at last. So the trading logic can ignore the cost calculation
|
||||
# The cost will be subtracted from the cash at last. So the trading logic can ignore the cost calculation
|
||||
if order.direction == Order.SELL:
|
||||
# sell stock
|
||||
self._update_state_from_order(order, trade_val, cost, trade_price)
|
||||
@@ -364,7 +378,7 @@ class Account:
|
||||
)
|
||||
|
||||
def get_portfolio_metrics(self):
|
||||
"""get the history portfolio_metrics and postions instance"""
|
||||
"""get the history portfolio_metrics and positions instance"""
|
||||
if self.is_port_metr_enabled():
|
||||
_portfolio_metrics = self.portfolio_metrics.generate_portfolio_metrics_dataframe()
|
||||
_positions = self.get_hist_positions()
|
||||
|
||||
@@ -13,7 +13,7 @@ from tqdm.auto import tqdm
|
||||
|
||||
|
||||
def backtest_loop(start_time, end_time, trade_strategy: BaseStrategy, trade_executor: BaseExecutor):
|
||||
"""backtest funciton for the interaction of the outermost strategy and executor in the nested decision execution
|
||||
"""backtest function for the interaction of the outermost strategy and executor in the nested decision execution
|
||||
|
||||
please refer to the docs of `collect_data_loop`
|
||||
|
||||
|
||||
@@ -505,8 +505,8 @@ class BaseTradeDecision:
|
||||
`inner_trade_decision` will be changed **inplaced**.
|
||||
|
||||
Motivation of the `mod_inner_decision`
|
||||
- Leave a hook for outer decision to affact the decision generated by the inner strategy
|
||||
- e.g. the outmost strategy generate a time range for trading. But the upper layer can only affact the
|
||||
- Leave a hook for outer decision to affect the decision generated by the inner strategy
|
||||
- e.g. the outmost strategy generate a time range for trading. But the upper layer can only affect the
|
||||
nearest layer in the original design. With `mod_inner_decision`, the decision can passed through multiple
|
||||
layers
|
||||
|
||||
|
||||
@@ -14,7 +14,8 @@ import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from ..data.data import D
|
||||
from ..config import C, REG_CN
|
||||
from ..config import C
|
||||
from ..constant import REG_CN
|
||||
from ..log import get_module_logger
|
||||
from .decision import Order, OrderDir, OrderHelper
|
||||
from .high_performance_ds import BaseQuote, PandasQuote, NumpyQuote
|
||||
@@ -103,7 +104,7 @@ class Exchange:
|
||||
Necessary fields:
|
||||
$close is for calculating the total value at end of each day.
|
||||
Optional fields:
|
||||
$volume is only necessary when we limit the trade amount or caculate PA(vwap) indicator
|
||||
$volume is only necessary when we limit the trade amount or calculate PA(vwap) indicator
|
||||
$vwap is only necessary when we use the $vwap price as the deal price
|
||||
$factor is for rounding to the trading unit
|
||||
limit_sell will be set to False by default(False indicates we can sell this
|
||||
@@ -231,7 +232,7 @@ class Exchange:
|
||||
self.extra_quote["limit_buy"] = False
|
||||
self.logger.warning("No limit_buy set for extra_quote. All stock will be able to be bought.")
|
||||
assert set(self.extra_quote.columns) == set(self.quote_df.columns) - {"$change"}
|
||||
self.quote_df = pd.concat([self.quote_df, extra_quote], sort=False, axis=0)
|
||||
self.quote_df = pd.concat([self.quote_df, self.extra_quote], sort=False, axis=0)
|
||||
|
||||
LT_TP_EXP = "(exp)" # Tuple[str, str]
|
||||
LT_FLT = "float" # float
|
||||
@@ -401,9 +402,9 @@ class Exchange:
|
||||
def get_close(self, stock_id, start_time, end_time, method="ts_data_last"):
|
||||
return self.quote.get_data(stock_id, start_time, end_time, field="$close", method=method)
|
||||
|
||||
def get_volume(self, stock_id, start_time, end_time):
|
||||
def get_volume(self, stock_id, start_time, end_time, method="sum"):
|
||||
"""get the total deal volume of stock with `stock_id` between the time interval [start_time, end_time)"""
|
||||
return self.quote.get_data(stock_id, start_time, end_time, field="$volume", method="sum")
|
||||
return self.quote.get_data(stock_id, start_time, end_time, field="$volume", method=method)
|
||||
|
||||
def get_deal_price(self, stock_id, start_time, end_time, direction: OrderDir, method="ts_data_last"):
|
||||
if direction == OrderDir.SELL:
|
||||
@@ -505,7 +506,7 @@ class Exchange:
|
||||
Note: some future information is used in this function
|
||||
Parameter:
|
||||
target_position : dict { stock_id : amount }
|
||||
current_postion : dict { stock_id : amount}
|
||||
current_position : dict { stock_id : amount}
|
||||
trade_unit : trade_unit
|
||||
down sample : for amount 321 and trade_unit 100, deal_amount is 300
|
||||
deal order on trade_date
|
||||
@@ -535,7 +536,7 @@ class Exchange:
|
||||
deal_amount = self.get_real_deal_amount(current_amount, target_amount, factor)
|
||||
if deal_amount == 0:
|
||||
continue
|
||||
elif deal_amount > 0:
|
||||
if deal_amount > 0:
|
||||
# buy stock
|
||||
buy_order_list.append(
|
||||
Order(
|
||||
@@ -686,9 +687,7 @@ class Exchange:
|
||||
orig_deal_amount = order.deal_amount
|
||||
order.deal_amount = max(min(vol_limit_min, orig_deal_amount), 0)
|
||||
if vol_limit_min < orig_deal_amount:
|
||||
self.logger.debug(
|
||||
f"Order clipped due to volume limitation: {order}, {[(vol, rule) for vol, rule in zip(vol_limit_num, vol_limit)]}"
|
||||
)
|
||||
self.logger.debug(f"Order clipped due to volume limitation: {order}, {list(zip(vol_limit_num, vol_limit))}")
|
||||
|
||||
def _get_buy_amount_by_cash_limit(self, trade_price, cash, cost_ratio):
|
||||
"""return the real order amount after cash limit for buying.
|
||||
@@ -736,7 +735,11 @@ class Exchange:
|
||||
|
||||
# TODO: the adjusted cost ratio can be overestimated as deal_amount will be clipped in the next steps
|
||||
trade_val = order.deal_amount * trade_price
|
||||
adj_cost_ratio = self.impact_cost * (trade_val / total_trade_val) ** 2
|
||||
if not total_trade_val or np.isnan(total_trade_val):
|
||||
# TODO: assert trade_val == 0, f"trade_val != 0, total_trade_val: {total_trade_val}; order info: {order}"
|
||||
adj_cost_ratio = self.impact_cost
|
||||
else:
|
||||
adj_cost_ratio = self.impact_cost * (trade_val / total_trade_val) ** 2
|
||||
|
||||
if order.direction == Order.SELL:
|
||||
cost_ratio = self.close_cost + adj_cost_ratio
|
||||
|
||||
@@ -41,7 +41,7 @@ class BaseExecutor:
|
||||
Parameters
|
||||
----------
|
||||
time_per_step : str
|
||||
trade time per trading step, used for genreate the trade calendar
|
||||
trade time per trading step, used for generate the trade calendar
|
||||
show_indicator: bool, optional
|
||||
whether to show indicators, :
|
||||
- 'pa', the price advantage
|
||||
@@ -118,7 +118,7 @@ class BaseExecutor:
|
||||
self.dealt_order_amount = defaultdict(float)
|
||||
self.deal_day = None
|
||||
|
||||
def reset_common_infra(self, common_infra):
|
||||
def reset_common_infra(self, common_infra, copy_trade_account=False):
|
||||
"""
|
||||
reset infrastructure for trading
|
||||
- reset trade_account
|
||||
@@ -129,9 +129,14 @@ class BaseExecutor:
|
||||
self.common_infra.update(common_infra)
|
||||
|
||||
if common_infra.has("trade_account"):
|
||||
# NOTE: there is a trick in the code.
|
||||
# copy is used instead of deepcopy. So positions are shared
|
||||
self.trade_account: Account = copy.copy(common_infra.get("trade_account"))
|
||||
if copy_trade_account:
|
||||
# NOTE: there is a trick in the code.
|
||||
# shallow copy is used instead of deepcopy.
|
||||
# 1. So positions are shared
|
||||
# 2. Others are not shared, so each level has it own metrics (portfolio and trading metrics)
|
||||
self.trade_account: Account = copy.copy(common_infra.get("trade_account"))
|
||||
else:
|
||||
self.trade_account = common_infra.get("trade_account")
|
||||
self.trade_account.reset(freq=self.time_per_step, port_metr_enabled=self.generate_portfolio_metrics)
|
||||
|
||||
@property
|
||||
@@ -189,7 +194,7 @@ class BaseExecutor:
|
||||
return return_value.get("execute_result")
|
||||
|
||||
@abstractclassmethod
|
||||
def _collect_data(self, trade_decision: BaseTradeDecision, level: int = 0) -> Tuple[List[object], dict]:
|
||||
def _collect_data(cls, trade_decision: BaseTradeDecision, level: int = 0) -> Tuple[List[object], dict]:
|
||||
"""
|
||||
Please refer to the doc of collect_data
|
||||
The only difference between `_collect_data` and `collect_data` is that some common steps are moved into
|
||||
@@ -342,14 +347,18 @@ class NestedExecutor(BaseExecutor):
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
def reset_common_infra(self, common_infra):
|
||||
def reset_common_infra(self, common_infra, copy_trade_account=False):
|
||||
"""
|
||||
reset infrastructure for trading
|
||||
- reset inner_strategyand inner_executor common infra
|
||||
"""
|
||||
super(NestedExecutor, self).reset_common_infra(common_infra)
|
||||
# NOTE: please refer to the docs of BaseExecutor.reset_common_infra for the meaning of `copy_trade_account`
|
||||
|
||||
self.inner_executor.reset_common_infra(common_infra)
|
||||
# The first level follow the `copy_trade_account` from the upper level
|
||||
super(NestedExecutor, self).reset_common_infra(common_infra, copy_trade_account=copy_trade_account)
|
||||
|
||||
# The lower level have to copy the trade_account
|
||||
self.inner_executor.reset_common_infra(common_infra, copy_trade_account=True)
|
||||
self.inner_strategy.reset_common_infra(common_infra)
|
||||
|
||||
def _init_sub_trading(self, trade_decision):
|
||||
@@ -360,12 +369,12 @@ class NestedExecutor(BaseExecutor):
|
||||
self.inner_strategy.reset(level_infra=sub_level_infra, outer_trade_decision=trade_decision)
|
||||
|
||||
def _update_trade_decision(self, trade_decision: BaseTradeDecision) -> BaseTradeDecision:
|
||||
# outter strategy have chance to update decision each iterator
|
||||
# outer strategy have chance to update decision each iterator
|
||||
updated_trade_decision = trade_decision.update(self.inner_executor.trade_calendar)
|
||||
if updated_trade_decision is not None:
|
||||
trade_decision = updated_trade_decision
|
||||
# NEW UPDATE
|
||||
# create a hook for inner strategy to update outter decision
|
||||
# create a hook for inner strategy to update outer decision
|
||||
self.inner_strategy.alter_outer_trade_decision(trade_decision)
|
||||
return trade_decision
|
||||
|
||||
@@ -395,9 +404,25 @@ class NestedExecutor(BaseExecutor):
|
||||
if not self._align_range_limit or start_idx <= sub_cal.get_trade_step() <= end_idx:
|
||||
# if force align the range limit, skip the steps outside the decision range limit
|
||||
|
||||
_inner_trade_decision: BaseTradeDecision = self.inner_strategy.generate_trade_decision(
|
||||
_inner_execute_result
|
||||
)
|
||||
res = self.inner_strategy.generate_trade_decision(_inner_execute_result)
|
||||
|
||||
# NOTE: !!!!!
|
||||
# the two lines below is for a special case in RL
|
||||
# To solve the confliction below
|
||||
# - Normally, user will create a strategy and embed it into Qlib's executor and simulator interaction loop
|
||||
# For a _nested qlib example_, (Qlib Strategy) <=> (Qlib Executor[(inner Qlib Strategy) <=> (inner Qlib Executor)])
|
||||
# - However, RL-based framework has it's own script to run the loop
|
||||
# For an _RL learning example_, (RL Policy) <=> (RL Env[(inner Qlib Executor)])
|
||||
# To make it possible to run _nested qlib example_ and _RL learning example_ together, the solution below is proposed
|
||||
# - The entry script follow the example of _RL learning example_ to be compatible with all kinds of RL Framework
|
||||
# - Each step of (RL Env) will make (inner Qlib Executor) one step forward
|
||||
# - (inner Qlib Strategy) is a proxy strategy, it will give the program control right to (RL Env) by `yield from` and wait for the action from the policy
|
||||
# So the two lines below is the implementation of yielding control rights
|
||||
if isinstance(res, GeneratorType):
|
||||
res = yield from res
|
||||
|
||||
_inner_trade_decision: BaseTradeDecision = res
|
||||
|
||||
trade_decision.mod_inner_decision(_inner_trade_decision) # propagate part of decision information
|
||||
|
||||
# NOTE sub_cal.get_step_time() must be called before collect_data in case of step shifting
|
||||
@@ -407,6 +432,7 @@ class NestedExecutor(BaseExecutor):
|
||||
_inner_execute_result = yield from self.inner_executor.collect_data(
|
||||
trade_decision=_inner_trade_decision, level=level + 1
|
||||
)
|
||||
self.post_inner_exe_step(_inner_execute_result)
|
||||
execute_result.extend(_inner_execute_result)
|
||||
|
||||
inner_order_indicators.append(
|
||||
@@ -418,6 +444,17 @@ class NestedExecutor(BaseExecutor):
|
||||
|
||||
return execute_result, {"inner_order_indicators": inner_order_indicators, "decision_list": decision_list}
|
||||
|
||||
def post_inner_exe_step(self, inner_exe_res):
|
||||
"""
|
||||
A hook for doing sth after each step of inner strategy
|
||||
|
||||
Parameters
|
||||
----------
|
||||
inner_exe_res :
|
||||
the execution result of inner task
|
||||
"""
|
||||
pass
|
||||
|
||||
def get_all_executors(self):
|
||||
"""get all executors, including self and inner_executor.get_all_executors()"""
|
||||
return [self, *self.inner_executor.get_all_executors()]
|
||||
|
||||
@@ -400,7 +400,7 @@ class BaseOrderIndicator:
|
||||
indicators : List[BaseOrderIndicator]
|
||||
the list of all inner indicators.
|
||||
metrics : Union[str, List[str]]
|
||||
all metrics needs ot be sumed.
|
||||
all metrics needs to be sumed.
|
||||
fill_value : float, optional
|
||||
fill np.NaN with value. By default None.
|
||||
"""
|
||||
|
||||
@@ -20,7 +20,7 @@ class BasePosition:
|
||||
Please refer to the `Position` class for the position
|
||||
"""
|
||||
|
||||
def __init__(self, cash=0.0, *args, **kwargs):
|
||||
def __init__(self, *args, cash=0.0, **kwargs):
|
||||
self._settle_type = self.ST_NO
|
||||
|
||||
def skip_update(self) -> bool:
|
||||
@@ -152,7 +152,7 @@ class BasePosition:
|
||||
"""
|
||||
generate stock weight dict {stock_id : value weight of stock in the position}
|
||||
it is meaningful in the beginning or the end of each trade step
|
||||
- During execution of each trading step, the weight may be not consistant with the portfolio value
|
||||
- During execution of each trading step, the weight may be not consistent with the portfolio value
|
||||
|
||||
Parameters
|
||||
----------
|
||||
@@ -223,6 +223,12 @@ class BasePosition:
|
||||
"""
|
||||
raise NotImplementedError(f"Please implement the `settle_commit` method")
|
||||
|
||||
def __str__(self):
|
||||
return self.__dict__.__str__()
|
||||
|
||||
def __repr__(self):
|
||||
return self.__dict__.__repr__()
|
||||
|
||||
|
||||
class Position(BasePosition):
|
||||
"""Position
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user