mirror of
https://github.com/microsoft/qlib.git
synced 2026-06-29 17:11:20 +08:00
Compare commits
31 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e7954bdb32 | ||
|
|
d6f69aefea | ||
|
|
1bebe9780e | ||
|
|
7a4a92bc69 | ||
|
|
271782c9dd | ||
|
|
d0113ea7df | ||
|
|
c3996955ef | ||
|
|
8261965015 | ||
|
|
6f71f8a46b | ||
|
|
edd8badeaf | ||
|
|
19689024d4 | ||
|
|
0304df0d5b | ||
|
|
181ee3c070 | ||
|
|
cf35562e84 | ||
|
|
184ce34a34 | ||
|
|
382ababc01 | ||
|
|
bcf18c14de | ||
|
|
6c1332f604 | ||
|
|
93088485c3 | ||
|
|
c633d3fec0 | ||
|
|
0b6d99bd38 | ||
|
|
03cce8c908 | ||
|
|
e76b409d9a | ||
|
|
3e79a088ef | ||
|
|
dfc0ed3c01 | ||
|
|
f59cfe51e0 | ||
|
|
1ecdfd45fe | ||
|
|
622303b83a | ||
|
|
6bafd0a09b | ||
|
|
aed9c09091 | ||
|
|
1b8f0b4575 |
1
.github/PULL_REQUEST_TEMPLATE.md
vendored
1
.github/PULL_REQUEST_TEMPLATE.md
vendored
@@ -8,6 +8,7 @@
|
||||
<!--- Why is this change required? What problem does it solve? -->
|
||||
|
||||
## How Has This Been Tested?
|
||||
<! --- Put an `x` in all the boxes that apply: --->
|
||||
- [ ] Pass the test by running: `pytest qlib/tests/test_all_pipeline.py` under upper directory of `qlib`.
|
||||
- [ ] If you are adding a new feature, test on your own test scripts.
|
||||
|
||||
|
||||
@@ -30,7 +30,7 @@ Version 0.2.1
|
||||
--------------------
|
||||
- Support registering user-defined ``Provider``.
|
||||
- Support use operators in string format, e.g. ``['Ref($close, 1)']`` is valid field format.
|
||||
- Support dynamic fields in ``$some_field`` format. And exising fields like ``Close()`` may be deprecated in the future.
|
||||
- Support dynamic fields in ``$some_field`` format. And existing fields like ``Close()`` may be deprecated in the future.
|
||||
|
||||
Version 0.2.2
|
||||
--------------------
|
||||
@@ -78,7 +78,7 @@ Version 0.3.5
|
||||
- Support multi-label training, you can provide multiple label in ``handler``. (But LightGBM doesn't support due to the algorithm itself)
|
||||
- Refactor ``handler`` code, dataset.py is no longer used, and you can deploy your own labels and features in ``feature_label_config``
|
||||
- Handler only offer DataFrame. Also, ``trainer`` and model.py only receive DataFrame
|
||||
- Change ``split_rolling_data``, we roll the data on market calender now, not on normal date
|
||||
- Change ``split_rolling_data``, we roll the data on market calendar now, not on normal date
|
||||
- Move some date config from ``handler`` to ``trainer``
|
||||
|
||||
Version 0.4.0
|
||||
@@ -167,11 +167,11 @@ Version 0.8.0
|
||||
- There are lots of changes for daily trading, it is hard to list all of them. But a few important changes could be noticed
|
||||
- The trading limitation is more accurate;
|
||||
- In `previous version <https://github.com/microsoft/qlib/blob/v0.7.2/qlib/contrib/backtest/exchange.py#L160>`_, longing and shorting actions share the same action.
|
||||
- In `current verison <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/backtest/exchange.py#L304>`_, the trading limitation is different between loging and shorting action.
|
||||
- In `current version <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/backtest/exchange.py#L304>`_, the trading limitation is different between logging and shorting action.
|
||||
- The constant is different when calculating annualized metrics.
|
||||
- `Current version <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/contrib/evaluate.py#L42>`_ uses more accurate constant than `previous version <https://github.com/microsoft/qlib/blob/v0.7.2/qlib/contrib/evaluate.py#L22>`_
|
||||
- `A new version <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/tests/data.py#L17>`_ of data is released. Due to the unstability of Yahoo data source, the data may be different after downloading data again.
|
||||
- Users could chec kout the backtesting results between `Current version <https://github.com/microsoft/qlib/tree/7c31012b507a3823117bddcc693fc64899460b2a/examples/benchmarks>`_ and `previous version <https://github.com/microsoft/qlib/tree/v0.7.2/examples/benchmarks>`_
|
||||
- Users could check out the backtesting results between `Current version <https://github.com/microsoft/qlib/tree/7c31012b507a3823117bddcc693fc64899460b2a/examples/benchmarks>`_ and `previous version <https://github.com/microsoft/qlib/tree/v0.7.2/examples/benchmarks>`_
|
||||
|
||||
|
||||
Other Versions
|
||||
|
||||
51
README.md
51
README.md
@@ -11,6 +11,8 @@
|
||||
Recent released features
|
||||
| Feature | Status |
|
||||
| -- | ------ |
|
||||
| Meta-Learning-based framework & DDG-DA | [Released](https://github.com/microsoft/qlib/pull/743) on Jan 10, 2022 |
|
||||
| Planning-based portfolio optimization | [Released](https://github.com/microsoft/qlib/pull/754) on Dec 28, 2021 |
|
||||
| Release Qlib v0.8.0 | [Released](https://github.com/microsoft/qlib/releases/tag/v0.8.0) on Dec 8, 2021 |
|
||||
| ADD model | [Released](https://github.com/microsoft/qlib/pull/704) on Nov 22, 2021 |
|
||||
| ADARNN model | [Released](https://github.com/microsoft/qlib/pull/689) on Nov 14, 2021 |
|
||||
@@ -49,9 +51,12 @@ For more details, please refer to our paper ["Qlib: An AI-oriented Quantitative
|
||||
- [Data Preparation](#data-preparation)
|
||||
- [Auto Quant Research Workflow](#auto-quant-research-workflow)
|
||||
- [Building Customized Quant Research Workflow by Code](#building-customized-quant-research-workflow-by-code)
|
||||
- [**Quant Model(Paper) Zoo**](#quant-model-paper-zoo)
|
||||
- [Run a single model](#run-a-single-model)
|
||||
- [Run multiple models](#run-multiple-models)
|
||||
- [Main Challenges & Solutions in Quant Research](#main-challenges--solutions-in-quant-research)
|
||||
- [Forecasting: Finding Valuable Signals/Patterns](#forecasting-finding-valuable-signalspatterns)
|
||||
- [**Quant Model (Paper) Zoo**](#quant-model-paper-zoo)
|
||||
- [Run a Single Model](#run-a-single-model)
|
||||
- [Run Multiple Models](#run-multiple-models)
|
||||
- [Adapting to Market Dynamics](#adapting-to-market-dynamics)
|
||||
- [**Quant Dataset Zoo**](#quant-dataset-zoo)
|
||||
- [More About Qlib](#more-about-qlib)
|
||||
- [Offline Mode and Online Mode](#offline-mode-and-online-mode)
|
||||
@@ -66,10 +71,8 @@ New features under development(order by estimated release time).
|
||||
Your feedbacks about the features are very important.
|
||||
| Feature | Status |
|
||||
| -- | ------ |
|
||||
| Planning-based portfolio optimization | Under review: https://github.com/microsoft/qlib/pull/280 |
|
||||
| Fund data supporting and analysis | Under review: https://github.com/microsoft/qlib/pull/292 |
|
||||
| Point-in-Time database | Under review: https://github.com/microsoft/qlib/pull/343 |
|
||||
| Meta-Learning-based data selection | Initial opensource version under development |
|
||||
| Orderbook database | Under review: https://github.com/microsoft/qlib/pull/744 |
|
||||
|
||||
# Framework of Qlib
|
||||
|
||||
@@ -195,7 +198,7 @@ We recommend users to prepare their own data if they have a high-quality dataset
|
||||
```python
|
||||
import qlib
|
||||
from qlib.data import D
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
|
||||
# Initialization
|
||||
mount_path = "~/.qlib/qlib_data/cn_data" # target_dir
|
||||
@@ -280,8 +283,18 @@ Qlib provides a tool named `qrun` to run the whole workflow automatically (inclu
|
||||
## Building Customized Quant Research Workflow by Code
|
||||
The automatic workflow may not suit the research workflow of all Quant researchers. To support a flexible Quant research workflow, Qlib also provides a modularized interface to allow researchers to build their own workflow by code. [Here](examples/workflow_by_code.ipynb) is a demo for customized Quant research workflow by code.
|
||||
|
||||
# Main Challenges & Solutions in Quant Research
|
||||
Quant investment is an very unique scenario with lots of key challenges to be solved.
|
||||
Currently, Qlib provides some solutions for several of them.
|
||||
|
||||
# [Quant Model (Paper) Zoo](examples/benchmarks)
|
||||
## Forecasting: Finding Valuable Signals/Patterns
|
||||
Accurate forecasting of the stock price trend is a very important part to construct profitable portfolios.
|
||||
However, huge amount of data with various formats in the financial market which make it challenging to build forecasting models.
|
||||
|
||||
An increasing number of SOTA Quant research works/papers, which focus on building forecasting models to mine valuable signals/patterns in complex financial data, are released in `Qlib`
|
||||
|
||||
|
||||
### [Quant Model (Paper) Zoo](examples/benchmarks)
|
||||
|
||||
Here is a list of models built on `Qlib`.
|
||||
- [GBDT based on XGBoost (Tianqi Chen, et al. KDD 2016)](examples/benchmarks/XGBoost/)
|
||||
@@ -308,7 +321,7 @@ Your PR of new Quant models is highly welcomed.
|
||||
|
||||
The performance of each model on the `Alpha158` and `Alpha360` dataset can be found [here](examples/benchmarks/README.md).
|
||||
|
||||
## Run a single model
|
||||
### Run a single model
|
||||
All the models listed above are runnable with ``Qlib``. Users can find the config files we provide and some details about the model through the [benchmarks](examples/benchmarks) folder. More information can be retrieved at the model files listed above.
|
||||
|
||||
`Qlib` provides three different ways to run a single model, users can pick the one that fits their cases best:
|
||||
@@ -318,7 +331,7 @@ All the models listed above are runnable with ``Qlib``. Users can find the confi
|
||||
- Users can use the script [`run_all_model.py`](examples/run_all_model.py) listed in the `examples` folder to run a model. Here is an example of the specific shell command to be used: `python run_all_model.py run --models=lightgbm`, where the `--models` arguments can take any number of models listed above(the available models can be found in [benchmarks](examples/benchmarks/)). For more use cases, please refer to the file's [docstrings](examples/run_all_model.py).
|
||||
- **NOTE**: Each baseline has different environment dependencies, please make sure that your python version aligns with the requirements(e.g. TFT only supports Python 3.6~3.7 due to the limitation of `tensorflow==1.15.0`)
|
||||
|
||||
## Run multiple models
|
||||
### Run multiple models
|
||||
`Qlib` also provides a script [`run_all_model.py`](examples/run_all_model.py) which can run multiple models for several iterations. (**Note**: the script only support *Linux* for now. Other OS will be supported in the future. Besides, it doesn't support parallel running the same model for multiple times as well, and this will be fixed in the future development too.)
|
||||
|
||||
The script will create a unique virtual environment for each model, and delete the environments after training. Thus, only experiment results such as `IC` and `backtest` results will be generated and stored.
|
||||
@@ -330,6 +343,14 @@ python run_all_model.py run 10
|
||||
|
||||
It also provides the API to run specific models at once. For more use cases, please refer to the file's [docstrings](examples/run_all_model.py).
|
||||
|
||||
## [Adapting to Market Dynamics](examples/benchmarks_dynamic)
|
||||
|
||||
Due to the non-stationary nature of the environment of the financial market, the data distribution may change in different periods, which makes the performance of models build on training data decays in the future test data.
|
||||
So adapting the forecasting models/strategies to market dynamics is very important to the model/strategies' performance.
|
||||
|
||||
Here is a list of solutions built on `Qlib`.
|
||||
- [Rolling Retraining](examples/benchmarks_dynamic/baseline/)
|
||||
- [DDG-DA on pytorch (Wendi, et al. AAAI 2022)](examples/benchmarks_dynamic/DDG-DA/)
|
||||
|
||||
# Quant Dataset Zoo
|
||||
Dataset plays a very important role in Quant. Here is a list of the datasets built on `Qlib`:
|
||||
@@ -418,6 +439,16 @@ For example, if you want to contribute to Qlib's document/code, you can follow t
|
||||
<img src="https://github.com/demon143/qlib/blob/main/docs/_static/img/change%20doc.gif" />
|
||||
</p>
|
||||
|
||||
If you don't know how to start to contribute, you can refer to the following examples.
|
||||
| Type | Examples |
|
||||
| -- | -- |
|
||||
| Solving issues | [Answer a question](https://github.com/microsoft/qlib/issues/749); [issuing](https://github.com/microsoft/qlib/issues/765) or [fixing](https://github.com/microsoft/qlib/pull/792) a bug |
|
||||
| Docs | [Improve docs quality](https://github.com/microsoft/qlib/pull/797/files) ; [Fix a typo](https://github.com/microsoft/qlib/pull/774) |
|
||||
| Feature | Implement a [requested feature](https://github.com/microsoft/qlib/projects) like [this](https://github.com/microsoft/qlib/pull/754); [Refactor interfaces](https://github.com/microsoft/qlib/pull/539/files) |
|
||||
| Dataset | [Add a dataset](https://github.com/microsoft/qlib/pull/733) |
|
||||
| Models | [Implement a new model](https://github.com/microsoft/qlib/pull/689) |
|
||||
|
||||
If you would like to become one of Qlib's maintainers to contribute more (e.g. help merge PR, triage issues), please contact us by email([qlib@microsoft.com](mailto:qlib@microsoft.com)). We are glad to help you to set the right permission.
|
||||
|
||||
## Licence
|
||||
Most contributions require you to agree to a
|
||||
|
||||
@@ -1 +0,0 @@
|
||||
0.8.0.99
|
||||
@@ -21,6 +21,12 @@ The introduction of ``Data Layer`` includes the following parts.
|
||||
- Cache
|
||||
- Data and Cache File Structure
|
||||
|
||||
Here is a typical example of Qlib data workflow
|
||||
|
||||
- Users download data and converting data into Qlib format(with filename suffix `.bin`). In this step, typically only some basic data are stored on disk(such as OHLCV).
|
||||
- Creating some basic features based on Qlib's expression Engine(e.g. "Ref($close, 60) / $close", the return of last 60 trading days). Supported operators in the expression engine can be found `here <https://github.com/microsoft/qlib/blob/main/qlib/data/ops.py>`_. This step is typically implemented in Qlib's `Data Loader <https://qlib.readthedocs.io/en/latest/component/data.html#data-loader>`_ which is a component of `Data Handler <https://qlib.readthedocs.io/en/latest/component/data.html#data-handler>`_ .
|
||||
- If users require more complicated data processing (e.g. data normalization), `Data Handler <https://qlib.readthedocs.io/en/latest/component/data.html#data-handler>`_ support user-customized processors to process data(some predefined processors can be found `here <https://github.com/microsoft/qlib/blob/main/qlib/data/dataset/processor.py>`_). The processors are different from operators in expression engine. It is designed for some complicated data processing methods which is hard to supported in operators in expression engine.
|
||||
- At last, `Dataset <https://qlib.readthedocs.io/en/latest/component/data.html#dataset>`_ is responsible to prepare model-specific dataset from the processed data of Data Handler
|
||||
|
||||
Data Preparation
|
||||
============================
|
||||
@@ -46,6 +52,7 @@ Also, ``Qlib`` provides a high-frequency dataset. Users can run a high-frequency
|
||||
Qlib Format Dataset
|
||||
--------------------
|
||||
``Qlib`` has provided an off-the-shelf dataset in `.bin` format, users could use the script ``scripts/get_data.py`` to download the China-Stock dataset as follows.
|
||||
The price volume data look different from the actual dealling price because of they are **adjusted** (`adjusted price <https://www.investopedia.com/terms/a/adjusted_closing_price.asp>`_). And then you may find that the adjusted price may be different from different data sources. This is because different data sources may vary in the way of adjusting prices. Qlib normalize the price on first trading day of each stock to 1 when adjusting them.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
@@ -213,7 +220,7 @@ The `trade unit` defines the unit number of stocks can be used in a trade, and t
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
qlib.init(provider_uri='~/.qlib/qlib_data/cn_data', region=REG_CN)
|
||||
|
||||
|
||||
|
||||
@@ -14,7 +14,7 @@ To get the join trading performance of daily and intraday trading, they must int
|
||||
In order to support the joint backtest strategies in multiple levels, a corresponding framework is required. None of the publicly available high-frequency trading frameworks considers multi-level joint trading, which make the backtesting aforementioned inaccurate.
|
||||
|
||||
Besides backtesting, the optimization of strategies from different levels is not standalone and can be affected by each other.
|
||||
For example, the best portfolio management strategy may change with the performance of order executions(e.g. a portfolio with higher turnover may becomes a better choice when we imporve the order execution strategies).
|
||||
For example, the best portfolio management strategy may change with the performance of order executions(e.g. a portfolio with higher turnover may becomes a better choice when we improve the order execution strategies).
|
||||
To achieve the overall good performance , it is necessary to consider the interaction of strategies in different level.
|
||||
|
||||
Therefore, building a new framework for trading in multiple levels becomes necessary to solve the various problems mentioned above, for which we designed a nested decision execution framework that consider the interaction of strategies.
|
||||
|
||||
68
docs/component/meta.rst
Normal file
68
docs/component/meta.rst
Normal file
@@ -0,0 +1,68 @@
|
||||
.. _meta:
|
||||
|
||||
=================================
|
||||
Meta Controller: Meta-Task & Meta-Dataset & Meta-Model
|
||||
=================================
|
||||
.. currentmodule:: qlib
|
||||
|
||||
|
||||
Introduction
|
||||
=============
|
||||
``Meta Controller`` provides guidance to ``Forecast Model``, which aims to learn regular patterns among a series of forecasting tasks and use learned patterns to guide forthcoming forecasting tasks. Users can implement their own meta-model instance based on ``Meta Controller`` module.
|
||||
|
||||
Meta Task
|
||||
=============
|
||||
|
||||
A `Meta Task` instance is the basic element in the meta-learning framework. It saves the data that can be used for the `Meta Model`. Multiple `Meta Task` instances may share the same `Data Handler`, controlled by `Meta Dataset`. Users should use `prepare_task_data()` to obtain the data that can be directly fed into the `Meta Model`.
|
||||
|
||||
.. autoclass:: qlib.model.meta.task.MetaTask
|
||||
:members:
|
||||
|
||||
Meta Dataset
|
||||
=============
|
||||
|
||||
`Meta Dataset` controls the meta-information generating process. It is on the duty of providing data for training the `Meta Model`. Users should use `prepare_tasks` to retrieve a list of `Meta Task` instances.
|
||||
|
||||
.. autoclass:: qlib.model.meta.dataset.MetaTaskDataset
|
||||
:members:
|
||||
|
||||
Meta Model
|
||||
=============
|
||||
|
||||
General Meta Model
|
||||
------------------
|
||||
`Meta Model` instance is the part that controls the workflow. The usage of the `Meta Model` includes:
|
||||
1. Users train their `Meta Model` with the `fit` function.
|
||||
2. The `Meta Model` instance guides the workflow by giving useful information via the `inference` function.
|
||||
|
||||
.. autoclass:: qlib.model.meta.model.MetaModel
|
||||
:members:
|
||||
|
||||
Meta Task Model
|
||||
------------------
|
||||
This type of meta-model may interact with task definitions directly. Then, the `Meta Task Model` is the class for them to inherit from. They guide the base tasks by modifying the base task definitions. The function `prepare_tasks` can be used to obtain the modified base task definitions.
|
||||
|
||||
.. autoclass:: qlib.model.meta.model.MetaTaskModel
|
||||
:members:
|
||||
|
||||
Meta Guide Model
|
||||
------------------
|
||||
This type of meta-model participates in the training process of the base forecasting model. The meta-model may guide the base forecasting models during their training to improve their performances.
|
||||
|
||||
.. autoclass:: qlib.model.meta.model.MetaGuideModel
|
||||
:members:
|
||||
|
||||
|
||||
Example
|
||||
=============
|
||||
``Qlib`` provides an implementation of ``Meta Model`` module, ``DDG-DA``,
|
||||
which adapts to the market dynamics.
|
||||
|
||||
``DDG-DA`` includes four steps:
|
||||
|
||||
1. Calculate meta-information and encapsulate it into ``Meta Task`` instances. All the meta-tasks form a ``Meta Dataset`` instance.
|
||||
2. Train ``DDG-DA`` based on the training data of the meta-dataset.
|
||||
3. Do the inference of the ``DDG-DA`` to get guide information.
|
||||
4. Apply guide information to the forecasting models to improve their performances.
|
||||
|
||||
The `above example <https://github.com/microsoft/qlib/tree/main/examples/benchmarks_dynamic/DDG-DA>`_ can be found in ``examples/benchmarks_dynamic/DDG-DA/workflow.py``.
|
||||
@@ -37,7 +37,7 @@ Here is a general view of the structure of the system:
|
||||
|
||||
This experiment management system defines a set of interface and provided a concrete implementation ``MLflowExpManager``, which is based on the machine learning platform: ``MLFlow`` (`link <https://mlflow.org/>`_).
|
||||
|
||||
If users set the implementation of ``ExpManager`` to be ``MLflowExpManager``, they can use the command `mlflow ui` to visualize and check the experiment results. For more information, pleaes refer to the related documents `here <https://www.mlflow.org/docs/latest/cli.html#mlflow-ui>`_.
|
||||
If users set the implementation of ``ExpManager`` to be ``MLflowExpManager``, they can use the command `mlflow ui` to visualize and check the experiment results. For more information, please refer to the related documents `here <https://www.mlflow.org/docs/latest/cli.html#mlflow-ui>`_.
|
||||
|
||||
Qlib Recorder
|
||||
===================
|
||||
|
||||
@@ -8,7 +8,7 @@ Portfolio Strategy: Portfolio Management
|
||||
Introduction
|
||||
===================
|
||||
|
||||
``Portfolio Strategy`` is designed to adopt different portfolio strategies, which means that users can adopt different algorithms to generate investment portfolios based on the prediction scores of the ``Forecast Model``. Users can use the ``Portfolio Strategy`` in an automatic workflow by ``Workflow`` module, please refer to `Workflow: Workflow Management <workflow.html>`_.
|
||||
``Portfolio Strategy`` is designed to adopt different portfolio strategies, which means that users can adopt different algorithms to generate investment portfolios based on the prediction scores of the ``Forecast Model``. Users can use the ``Portfolio Strategy`` in an automatic workflow by ``Workflow`` module, please refer to `Workflow: Workflow Management <workflow.html>`_.
|
||||
|
||||
Because the components in ``Qlib`` are designed in a loosely-coupled way, ``Portfolio Strategy`` can be used as an independent module also.
|
||||
|
||||
@@ -22,20 +22,20 @@ Base Class & Interface
|
||||
BaseStrategy
|
||||
------------------
|
||||
|
||||
Qlib provides a base class ``qlib.contrib.strategy.BaseStrategy``. All strategy classes need to inherit the base class and implement its interface.
|
||||
Qlib provides a base class ``qlib.strategy.base.BaseStrategy``. All strategy classes need to inherit the base class and implement its interface.
|
||||
|
||||
- `get_risk_degree`
|
||||
Return the proportion of your total value you will use in investment. Dynamically risk_degree will result in Market timing.
|
||||
|
||||
- `generate_order_list`
|
||||
Return the order list.
|
||||
Return the order list.
|
||||
|
||||
Users can inherit `BaseStrategy` to customize their strategy class.
|
||||
|
||||
WeightStrategyBase
|
||||
--------------------
|
||||
|
||||
Qlib also provides a class ``qlib.contrib.strategy.WeightStrategyBase`` that is a subclass of `BaseStrategy`.
|
||||
Qlib also provides a class ``qlib.contrib.strategy.WeightStrategyBase`` that is a subclass of `BaseStrategy`.
|
||||
|
||||
`WeightStrategyBase` only focuses on the target positions, and automatically generates an order list based on positions. It provides the `generate_target_weight_position` interface.
|
||||
|
||||
@@ -71,17 +71,27 @@ TopkDropoutStrategy
|
||||
|
||||
- `Topk`: The number of stocks held
|
||||
- `Drop`: The number of stocks sold on each trading day
|
||||
|
||||
|
||||
Currently, the number of held stocks is `Topk`.
|
||||
On each trading day, the `Drop` number of held stocks with the worst `prediction score` will be sold, and the same number of unheld stocks with the best `prediction score` will be bought.
|
||||
|
||||
|
||||
.. image:: ../_static/img/topk_drop.png
|
||||
:alt: Topk-Drop
|
||||
|
||||
``TopkDrop`` algorithm sells `Drop` stocks every trading day, which guarantees a fixed turnover rate.
|
||||
|
||||
|
||||
- Generate the order list from the target amount
|
||||
|
||||
EnhancedIndexingStrategy
|
||||
------------------------
|
||||
`EnhancedIndexingStrategy` Enhanced indexing combines the arts of active management and passive management,
|
||||
with the aim of outperforming a benchmark index (e.g., S&P 500) in terms of portfolio return while controlling
|
||||
the risk exposure (a.k.a. tracking error).
|
||||
|
||||
For more information, please refer to `qlib.contrib.strategy.signal_strategy.EnhancedIndexingStrategy`
|
||||
and `qlib.contrib.strategy.optimizer.enhanced_indexing.EnhancedIndexingOptimizer`.
|
||||
|
||||
|
||||
Usage & Example
|
||||
====================
|
||||
|
||||
|
||||
@@ -31,7 +31,7 @@ Let's see an example,
|
||||
|
||||
First make sure you have the latest version of `qlib` installed.
|
||||
|
||||
Then, you need to privide a configuration to setup the experiment.
|
||||
Then, you need to provide a configuration to setup the experiment.
|
||||
We write a simple configuration example as following,
|
||||
|
||||
.. code-block:: YAML
|
||||
@@ -217,13 +217,13 @@ The tuner pipeline contains different tuners, and the `tuner` program will proce
|
||||
Each part represents a tuner, and its modules which are to be tuned. Space in each part is the hyper-parameters' space of a certain module, you need to create your searching space and modify it in `/qlib/contrib/tuner/space.py`. We use `hyperopt` package to help us to construct the space, you can see the detail of how to use it in https://github.com/hyperopt/hyperopt/wiki/FMin .
|
||||
|
||||
- model
|
||||
You need to provide the `class` and the `space` of the model. If the model is user's own implementation, you need to privide the `module_path`.
|
||||
You need to provide the `class` and the `space` of the model. If the model is user's own implementation, you need to provide the `module_path`.
|
||||
|
||||
- trainer
|
||||
You need to proveide the `class` of the trainer. If the trainer is user's own implementation, you need to privide the `module_path`.
|
||||
You need to provide the `class` of the trainer. If the trainer is user's own implementation, you need to provide the `module_path`.
|
||||
|
||||
- strategy
|
||||
You need to provide the `class` and the `space` of the strategy. If the strategy is user's own implementation, you need to privide the `module_path`.
|
||||
You need to provide the `class` and the `space` of the strategy. If the strategy is user's own implementation, you need to provide the `module_path`.
|
||||
|
||||
- data_label
|
||||
The label of the data, you can search which kinds of labels will lead to a better result. This part is optional, and you only need to provide `space`.
|
||||
@@ -273,7 +273,7 @@ You need to use the same dataset to evaluate your different `estimator` experime
|
||||
About the data and backtest
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
`data` and `backtest` are all same in the whole `tuner` experiment. Different `estimator` experiments must use the same data and backtest method. So, these two parts of config are same with that in `estimator` configuration. You can see the precise defination of these parts in `estimator` introduction. We only provide an example here.
|
||||
`data` and `backtest` are all same in the whole `tuner` experiment. Different `estimator` experiments must use the same data and backtest method. So, these two parts of config are same with that in `estimator` configuration. You can see the precise definition of these parts in `estimator` introduction. We only provide an example here.
|
||||
|
||||
.. code-block:: YAML
|
||||
|
||||
|
||||
@@ -36,10 +36,11 @@ Document Structure
|
||||
:caption: COMPONENTS:
|
||||
|
||||
Workflow: Workflow Management <component/workflow.rst>
|
||||
Data Layer: Data Framework&Usage <component/data.rst>
|
||||
Data Layer: Data Framework & Usage <component/data.rst>
|
||||
Forecast Model: Model Training & Prediction <component/model.rst>
|
||||
Portfolio Management and Backtest <component/strategy.rst>
|
||||
Nested Decision Execution: High-Frequency Trading <component/highfreq.rst>
|
||||
Meta Controller: Meta-Task & Meta-Dataset & Meta-Model <component/meta.rst>
|
||||
Qlib Recorder: Experiment Management <component/recorder.rst>
|
||||
Analysis: Evaluation & Results Analysis <component/report.rst>
|
||||
Online Serving: Online Management & Strategy & Tool <component/online.rst>
|
||||
|
||||
@@ -31,7 +31,7 @@ Users can easily intsall ``Qlib`` according to the following steps:
|
||||
git clone https://github.com/microsoft/qlib.git && cd qlib
|
||||
python setup.py install
|
||||
|
||||
To kown more about `installation`, please refer to `Qlib Installation <../start/installation.html>`_.
|
||||
To known more about `installation`, please refer to `Qlib Installation <../start/installation.html>`_.
|
||||
|
||||
Prepare Data
|
||||
==============
|
||||
@@ -44,7 +44,7 @@ Load and prepare data by running the following code:
|
||||
|
||||
This dataset is created by public data collected by crawler scripts in ``scripts/data_collector/``, which have been released in the same repository. Users could create the same dataset with it.
|
||||
|
||||
To kown more about `prepare data`, please refer to `Data Preparation <../component/data.html#data-preparation>`_.
|
||||
To known more about `prepare data`, please refer to `Data Preparation <../component/data.html#data-preparation>`_.
|
||||
|
||||
Auto Quant Research Workflow
|
||||
====================================
|
||||
|
||||
@@ -3,3 +3,4 @@ cmake
|
||||
numpy
|
||||
scipy
|
||||
scikit-learn
|
||||
pandas
|
||||
|
||||
@@ -27,7 +27,7 @@ Initialize Qlib before calling other APIs: run following code in python.
|
||||
|
||||
import qlib
|
||||
# region in [REG_CN, REG_US]
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
provider_uri = "~/.qlib/qlib_data/cn_data" # target_dir
|
||||
qlib.init(provider_uri=provider_uri, region=REG_CN)
|
||||
|
||||
@@ -42,10 +42,10 @@ Besides `provider_uri` and `region`, `qlib.init` has other parameters. The follo
|
||||
- `provider_uri`
|
||||
Type: str. The URI of the Qlib data. For example, it could be the location where the data loaded by ``get_data.py`` are stored.
|
||||
- `region`
|
||||
Type: str, optional parameter(default: `qlib.config.REG_CN`).
|
||||
Currently: ``qlib.config.REG_US`` ('us') and ``qlib.config.REG_CN`` ('cn') is supported. Different value of `region` will result in different stock market mode.
|
||||
- ``qlib.config.REG_US``: US stock market.
|
||||
- ``qlib.config.REG_CN``: China stock market.
|
||||
Type: str, optional parameter(default: `qlib.constant.REG_CN`).
|
||||
Currently: ``qlib.constant.REG_US`` ('us') and ``qlib.constant.REG_CN`` ('cn') is supported. Different value of `region` will result in different stock market mode.
|
||||
- ``qlib.constant.REG_US``: US stock market.
|
||||
- ``qlib.constant.REG_CN``: China stock market.
|
||||
|
||||
Different modes will result in different trading limitations and costs.
|
||||
The region is just `shortcuts for defining a batch of configurations <https://github.com/microsoft/qlib/blob/main/qlib/config.py#L239>`_. Users can set the key configurations manually if the existing region setting can't meet their requirements.
|
||||
|
||||
@@ -22,7 +22,6 @@ data_handler_config: &data_handler_config
|
||||
- class: CSRankNorm
|
||||
kwargs:
|
||||
fields_group: label
|
||||
label: ["Ref($close, -2) / Ref($close, -1) - 1"]
|
||||
port_analysis_config: &port_analysis_config
|
||||
strategy:
|
||||
class: TopkDropoutStrategy
|
||||
|
||||
@@ -9,7 +9,7 @@ Here are the results of each benchmark model running on Qlib's `Alpha360` and `A
|
||||
|
||||
The numbers shown below demonstrate the performance of the entire `workflow` of each model. We will update the `workflow` as well as models in the near future for better results.
|
||||
<!--
|
||||
> If you need to reproduce the results below, please use the **v1** dataset: `python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/qlib_cn_1d --region cn --version v1`
|
||||
> If you need to reproduce the results below, please use the **v1** dataset: `python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn --version v1`
|
||||
>
|
||||
> In the new version of qlib, the default dataset is **v2**. Since the data is collected from the YahooFinance API (which is not very stable), the results of *v2* and *v1* may differ -->
|
||||
|
||||
|
||||
@@ -32,7 +32,7 @@ import abc
|
||||
import enum
|
||||
|
||||
|
||||
# Type defintions
|
||||
# Type definitions
|
||||
class DataTypes(enum.IntEnum):
|
||||
"""Defines numerical types of each column."""
|
||||
|
||||
|
||||
@@ -254,9 +254,9 @@ class DistributedHyperparamOptManager(HyperparamOptManager):
|
||||
param_ranges: Discrete hyperparameter range for random search.
|
||||
fixed_params: Fixed model parameters per experiment.
|
||||
root_model_folder: Folder to store optimisation artifacts.
|
||||
worker_number: Worker index definining which set of hyperparameters to
|
||||
worker_number: Worker index defining which set of hyperparameters to
|
||||
test.
|
||||
search_iterations: Maximum numer of random search iterations.
|
||||
search_iterations: Maximum number of random search iterations.
|
||||
num_iterations_per_worker: How many iterations are handled per worker.
|
||||
clear_serialised_params: Whether to regenerate hyperparameter
|
||||
combinations.
|
||||
@@ -330,7 +330,7 @@ class DistributedHyperparamOptManager(HyperparamOptManager):
|
||||
if os.path.exists(self.serialised_ranges_folder):
|
||||
df = pd.read_csv(self.serialised_ranges_path, index_col=0)
|
||||
else:
|
||||
print("Unable to load - regenerating serach ranges instead")
|
||||
print("Unable to load - regenerating search ranges instead")
|
||||
df = self.update_serialised_hyperparam_df()
|
||||
|
||||
return df
|
||||
|
||||
@@ -342,7 +342,7 @@ class TFTDataCache:
|
||||
|
||||
@classmethod
|
||||
def contains(cls, key):
|
||||
"""Retuns boolean indicating whether key is present in cache."""
|
||||
"""Returns boolean indicating whether key is present in cache."""
|
||||
|
||||
return key in cls._data_cache
|
||||
|
||||
@@ -1120,10 +1120,10 @@ class TemporalFusionTransformer:
|
||||
Args:
|
||||
df: Input dataframe
|
||||
return_targets: Whether to also return outputs aligned with predictions to
|
||||
faciliate evaluation
|
||||
facilitate evaluation
|
||||
|
||||
Returns:
|
||||
Input dataframe or tuple of (input dataframe, algined output dataframe).
|
||||
Input dataframe or tuple of (input dataframe, aligned output dataframe).
|
||||
"""
|
||||
|
||||
data = self._batch_data(df)
|
||||
|
||||
@@ -209,7 +209,6 @@ class TFTModel(ModelFT):
|
||||
fixed_params = self.data_formatter.get_experiment_params()
|
||||
params = self.data_formatter.get_default_model_params()
|
||||
|
||||
# Wendi: 合并调优的参数和非调优的参数
|
||||
params = {**params, **fixed_params}
|
||||
|
||||
if not os.path.exists(self.model_folder):
|
||||
@@ -295,7 +294,7 @@ class TFTModel(ModelFT):
|
||||
def to_pickle(self, path: Union[Path, str]):
|
||||
"""
|
||||
Tensorflow model can't be dumped directly.
|
||||
So the data should be save seperatedly
|
||||
So the data should be save separately
|
||||
|
||||
**TODO**: Please implement the function to load the files
|
||||
|
||||
|
||||
@@ -57,7 +57,7 @@ And here are two ways to run the model:
|
||||
python example.py --config_file configs/config_alstm.yaml
|
||||
```
|
||||
|
||||
Here we trained TRA on a pretrained backbone model. Therefore we run `*_init.yaml` before TRA's scipts.
|
||||
Here we trained TRA on a pretrained backbone model. Therefore we run `*_init.yaml` before TRA's scripts.
|
||||
|
||||
### Results
|
||||
|
||||
|
||||
@@ -124,7 +124,7 @@ class TRAModel(Model):
|
||||
loss = (pred - label).pow(2).mean()
|
||||
|
||||
L = (all_preds.detach() - label[:, None]).pow(2)
|
||||
L -= L.min(dim=-1, keepdim=True).values # normalize & ensure postive input
|
||||
L -= L.min(dim=-1, keepdim=True).values # normalize & ensure positive input
|
||||
|
||||
data_set.assign_data(index, L) # save loss to memory
|
||||
|
||||
@@ -165,7 +165,7 @@ class TRAModel(Model):
|
||||
|
||||
L = (all_preds - label[:, None]).pow(2)
|
||||
|
||||
L -= L.min(dim=-1, keepdim=True).values # normalize & ensure postive input
|
||||
L -= L.min(dim=-1, keepdim=True).values # normalize & ensure positive input
|
||||
|
||||
data_set.assign_data(index, L) # save loss to memory
|
||||
|
||||
@@ -484,7 +484,7 @@ class TRA(nn.Module):
|
||||
|
||||
"""Temporal Routing Adaptor (TRA)
|
||||
|
||||
TRA takes historical prediction erros & latent representation as inputs,
|
||||
TRA takes historical prediction errors & latent representation as inputs,
|
||||
then routes the input sample to a specific predictor for training & inference.
|
||||
|
||||
Args:
|
||||
|
||||
27
examples/benchmarks_dynamic/DDG-DA/README.md
Normal file
27
examples/benchmarks_dynamic/DDG-DA/README.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# Introduction
|
||||
This is the implementation of `DDG-DA` based on `Meta Controller` component provided by `Qlib`.
|
||||
|
||||
## Background
|
||||
In many real-world scenarios, we often deal with streaming data that is sequentially collected over time. Due to the non-stationary nature of the environment, the streaming data distribution may change in unpredictable ways, which is known as concept drift. To handle concept drift, previous methods first detect when/where the concept drift happens and then adapt models to fit the distribution of the latest data. However, there are still many cases that some underlying factors of environment evolution are predictable, making it possible to model the future concept drift trend of the streaming data, while such cases are not fully explored in previous work.
|
||||
|
||||
Therefore, we propose a novel method `DDG-DA`, that can effectively forecast the evolution of data distribution and improve the performance of models. Specifically, we first train a predictor to estimate the future data distribution, then leverage it to generate training samples, and finally train models on the generated data.
|
||||
|
||||
## Dataset
|
||||
The data in the paper are private. So we conduct experiments on Qlib's public dataset.
|
||||
Though the dataset is different, the conclusion remains the same. By applying `DDG-DA`, users can see rising trends at the test phase both in the proxy models' ICs and the performances of the forecasting models.
|
||||
|
||||
## Run the Code
|
||||
Users can try `DDG-DA` by running the following command:
|
||||
```bash
|
||||
python workflow.py run_all
|
||||
```
|
||||
|
||||
The default forecasting models are `Linear`. Users can choose other forecasting models by changing the `forecast_model` parameter when `DDG-DA` initializes. For example, users can try `LightGBM` forecasting models by running the following command:
|
||||
```bash
|
||||
python workflow.py --forecast_model="gbdt" run_all
|
||||
```
|
||||
|
||||
|
||||
## Results
|
||||
|
||||
The results of related methods in Qlib's public dataset can be found [here](../)
|
||||
1
examples/benchmarks_dynamic/DDG-DA/requirements.txt
Normal file
1
examples/benchmarks_dynamic/DDG-DA/requirements.txt
Normal file
@@ -0,0 +1 @@
|
||||
torch==1.10.0
|
||||
258
examples/benchmarks_dynamic/DDG-DA/workflow.py
Normal file
258
examples/benchmarks_dynamic/DDG-DA/workflow.py
Normal file
@@ -0,0 +1,258 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
from pathlib import Path
|
||||
from qlib.model.meta.task import MetaTask
|
||||
from qlib.contrib.meta.data_selection.model import MetaModelDS
|
||||
from qlib.contrib.meta.data_selection.dataset import InternalData, MetaDatasetDS
|
||||
from qlib.data.dataset.handler import DataHandlerLP
|
||||
|
||||
import pandas as pd
|
||||
import fire
|
||||
import sys
|
||||
from tqdm.auto import tqdm
|
||||
import yaml
|
||||
import pickle
|
||||
from qlib import auto_init
|
||||
from qlib.model.trainer import TrainerR, task_train
|
||||
from qlib.utils import init_instance_by_config
|
||||
from qlib.workflow.task.gen import RollingGen, task_generator
|
||||
from qlib.workflow import R
|
||||
from qlib.tests.data import GetData
|
||||
|
||||
DIRNAME = Path(__file__).absolute().resolve().parent
|
||||
sys.path.append(str(DIRNAME.parent / "baseline"))
|
||||
from rolling_benchmark import RollingBenchmark # NOTE: sys.path is changed for import RollingBenchmark
|
||||
|
||||
|
||||
class DDGDA:
|
||||
"""
|
||||
please run `python workflow.py run_all` to run the full workflow of the experiment
|
||||
|
||||
**NOTE**
|
||||
before running the example, please clean your previous results with following command
|
||||
- `rm -r mlruns`
|
||||
"""
|
||||
|
||||
def __init__(self, sim_task_model="linear", forecast_model="linear"):
|
||||
self.step = 20
|
||||
# NOTE:
|
||||
# the horizon must match the meaning in the base task template
|
||||
self.horizon = 20
|
||||
self.meta_exp_name = "DDG-DA"
|
||||
self.sim_task_model = sim_task_model # The model to capture the distribution of data.
|
||||
self.forecast_model = forecast_model # downstream forecasting models' type
|
||||
|
||||
def get_feature_importance(self):
|
||||
# this must be lightGBM, because it needs to get the feature importance
|
||||
rb = RollingBenchmark(model_type="gbdt")
|
||||
task = rb.basic_task()
|
||||
|
||||
model = init_instance_by_config(task["model"])
|
||||
dataset = init_instance_by_config(task["dataset"])
|
||||
model.fit(dataset)
|
||||
|
||||
fi = model.get_feature_importance()
|
||||
|
||||
# Because the model use numpy instead of dataframe for training lightgbm
|
||||
# So the we must use following extra steps to get the right feature importance
|
||||
df = dataset.prepare(segments=slice(None), col_set="feature", data_key=DataHandlerLP.DK_R)
|
||||
cols = df.columns
|
||||
fi_named = {cols[int(k.split("_")[1])]: imp for k, imp in fi.to_dict().items()}
|
||||
|
||||
return pd.Series(fi_named)
|
||||
|
||||
def dump_data_for_proxy_model(self):
|
||||
"""
|
||||
Dump data for training meta model.
|
||||
The meta model will be trained upon the proxy forecasting model.
|
||||
This dataset is for the proxy forecasting model.
|
||||
"""
|
||||
topk = 30
|
||||
fi = self.get_feature_importance()
|
||||
col_selected = fi.nlargest(topk)
|
||||
|
||||
rb = RollingBenchmark(model_type=self.sim_task_model)
|
||||
task = rb.basic_task()
|
||||
dataset = init_instance_by_config(task["dataset"])
|
||||
prep_ds = dataset.prepare(slice(None), col_set=["feature", "label"], data_key=DataHandlerLP.DK_L)
|
||||
|
||||
feature_df = prep_ds["feature"]
|
||||
label_df = prep_ds["label"]
|
||||
|
||||
feature_selected = feature_df.loc[:, col_selected.index]
|
||||
|
||||
feature_selected = feature_selected.groupby("datetime").apply(lambda df: (df - df.mean()).div(df.std()))
|
||||
feature_selected = feature_selected.fillna(0.0)
|
||||
|
||||
df_all = {
|
||||
"label": label_df.reindex(feature_selected.index),
|
||||
"feature": feature_selected,
|
||||
}
|
||||
df_all = pd.concat(df_all, axis=1)
|
||||
df_all.to_pickle(DIRNAME / "fea_label_df.pkl")
|
||||
|
||||
# dump data in handler format for aligning the interface
|
||||
handler = DataHandlerLP(
|
||||
data_loader={
|
||||
"class": "qlib.data.dataset.loader.StaticDataLoader",
|
||||
"kwargs": {"config": DIRNAME / "fea_label_df.pkl"},
|
||||
}
|
||||
)
|
||||
handler.to_pickle(DIRNAME / "handler_proxy.pkl", dump_all=True)
|
||||
|
||||
@property
|
||||
def _internal_data_path(self):
|
||||
return DIRNAME / f"internal_data_s{self.step}.pkl"
|
||||
|
||||
def dump_meta_ipt(self):
|
||||
"""
|
||||
Dump data for training meta model.
|
||||
This function will dump the input data for meta model
|
||||
"""
|
||||
# According to the experiments, the choice of the model type is very important for achieving good results
|
||||
rb = RollingBenchmark(model_type=self.sim_task_model)
|
||||
sim_task = rb.basic_task()
|
||||
|
||||
if self.sim_task_model == "gbdt":
|
||||
sim_task["model"].setdefault("kwargs", {}).update({"early_stopping_rounds": None, "num_boost_round": 150})
|
||||
|
||||
exp_name_sim = f"data_sim_s{self.step}"
|
||||
|
||||
internal_data = InternalData(sim_task, self.step, exp_name=exp_name_sim)
|
||||
internal_data.setup(trainer=TrainerR)
|
||||
|
||||
with self._internal_data_path.open("wb") as f:
|
||||
pickle.dump(internal_data, f)
|
||||
|
||||
def train_meta_model(self):
|
||||
"""
|
||||
training a meta model based on a simplified linear proxy model;
|
||||
"""
|
||||
|
||||
# 1) leverage the simplified proxy forecasting model to train meta model.
|
||||
# - Only the dataset part is important, in current version of meta model will integrate the
|
||||
rb = RollingBenchmark(model_type=self.sim_task_model)
|
||||
sim_task = rb.basic_task()
|
||||
proxy_forecast_model_task = {
|
||||
# "model": "qlib.contrib.model.linear.LinearModel",
|
||||
"dataset": {
|
||||
"class": "qlib.data.dataset.DatasetH",
|
||||
"kwargs": {
|
||||
"handler": f"file://{(DIRNAME / 'handler_proxy.pkl').absolute()}",
|
||||
"segments": {
|
||||
"train": ("2008-01-01", "2010-12-31"),
|
||||
"test": ("2011-01-01", sim_task["dataset"]["kwargs"]["segments"]["test"][1]),
|
||||
},
|
||||
},
|
||||
},
|
||||
# "record": ["qlib.workflow.record_temp.SignalRecord"]
|
||||
}
|
||||
|
||||
# 2) preparing meta dataset
|
||||
kwargs = dict(
|
||||
task_tpl=proxy_forecast_model_task,
|
||||
step=self.step,
|
||||
segments=0.62, # keep test period consistent with the dataset yaml
|
||||
trunc_days=1 + self.horizon,
|
||||
hist_step_n=30,
|
||||
fill_method="max",
|
||||
rolling_ext_days=0,
|
||||
)
|
||||
# NOTE:
|
||||
# the input of meta model (internal data) are shared between proxy model and final forecasting model
|
||||
# but their task test segment are not aligned! It worked in my previous experiment.
|
||||
# So the misalignment will not affect the effectiveness of the method.
|
||||
with self._internal_data_path.open("rb") as f:
|
||||
internal_data = pickle.load(f)
|
||||
md = MetaDatasetDS(exp_name=internal_data, **kwargs)
|
||||
|
||||
# 3) train and logging meta model
|
||||
with R.start(experiment_name=self.meta_exp_name):
|
||||
R.log_params(**kwargs)
|
||||
mm = MetaModelDS(step=self.step, hist_step_n=kwargs["hist_step_n"], lr=0.001, max_epoch=200, seed=43)
|
||||
mm.fit(md)
|
||||
R.save_objects(model=mm)
|
||||
|
||||
@property
|
||||
def _task_path(self):
|
||||
return DIRNAME / f"tasks_s{self.step}.pkl"
|
||||
|
||||
def meta_inference(self):
|
||||
"""
|
||||
Leverage meta-model for inference:
|
||||
- Given
|
||||
- baseline tasks
|
||||
- input for meta model(internal data)
|
||||
- meta model (its learnt knowledge on proxy forecasting model is expected to transfer to normal forecasting model)
|
||||
"""
|
||||
# 1) get meta model
|
||||
exp = R.get_exp(experiment_name=self.meta_exp_name)
|
||||
rec = exp.list_recorders(rtype=exp.RT_L)[0]
|
||||
meta_model: MetaModelDS = rec.load_object("model")
|
||||
|
||||
# 2)
|
||||
# we are transfer to knowledge of meta model to final forecasting tasks.
|
||||
# Create MetaTaskDataset for the final forecasting tasks
|
||||
# Aligning the setting of it to the MetaTaskDataset when training Meta model is necessary
|
||||
|
||||
# 2.1) get previous config
|
||||
param = rec.list_params()
|
||||
trunc_days = int(param["trunc_days"])
|
||||
step = int(param["step"])
|
||||
hist_step_n = int(param["hist_step_n"])
|
||||
fill_method = param.get("fill_method", "max")
|
||||
|
||||
rb = RollingBenchmark(model_type=self.forecast_model)
|
||||
task_l = rb.create_rolling_tasks()
|
||||
|
||||
# 2.2) create meta dataset for final dataset
|
||||
kwargs = dict(
|
||||
task_tpl=task_l,
|
||||
step=step,
|
||||
segments=0.0, # all the tasks are for testing
|
||||
trunc_days=trunc_days,
|
||||
hist_step_n=hist_step_n,
|
||||
fill_method=fill_method,
|
||||
task_mode=MetaTask.PROC_MODE_TRANSFER,
|
||||
)
|
||||
|
||||
with self._internal_data_path.open("rb") as f:
|
||||
internal_data = pickle.load(f)
|
||||
mds = MetaDatasetDS(exp_name=internal_data, **kwargs)
|
||||
|
||||
# 3) meta model make inference and get new qlib task
|
||||
new_tasks = meta_model.inference(mds)
|
||||
with self._task_path.open("wb") as f:
|
||||
pickle.dump(new_tasks, f)
|
||||
|
||||
def train_and_eval_tasks(self):
|
||||
"""
|
||||
Training the tasks generated by meta model
|
||||
Then evaluate it
|
||||
"""
|
||||
with self._task_path.open("rb") as f:
|
||||
tasks = pickle.load(f)
|
||||
rb = RollingBenchmark(rolling_exp="rolling_ds", model_type=self.forecast_model)
|
||||
rb.train_rolling_tasks(tasks)
|
||||
rb.ens_rolling()
|
||||
rb.update_rolling_rec()
|
||||
|
||||
def run_all(self):
|
||||
# 1) file: handler_proxy.pkl
|
||||
self.dump_data_for_proxy_model()
|
||||
# 2)
|
||||
# file: internal_data_s20.pkl
|
||||
# mlflow: data_sim_s20, models for calculating meta_ipt
|
||||
self.dump_meta_ipt()
|
||||
# 3) meta model will be stored in `DDG-DA`
|
||||
self.train_meta_model()
|
||||
# 4) new_tasks are saved in "tasks_s20.pkl" (reweighter is added)
|
||||
self.meta_inference()
|
||||
# 5) load the saved tasks and train model
|
||||
self.train_and_eval_tasks()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
GetData().qlib_data(exists_skip=True)
|
||||
auto_init()
|
||||
fire.Fire(DDGDA)
|
||||
18
examples/benchmarks_dynamic/README.md
Normal file
18
examples/benchmarks_dynamic/README.md
Normal file
@@ -0,0 +1,18 @@
|
||||
# Introduction
|
||||
Due to the non-stationary nature of the environment of the financial market, the data distribution may change in different periods, which makes the performance of models build on training data decays in the future test data.
|
||||
So adapting the forecasting models/strategies to market dynamics is very important to the model/strategies' performance.
|
||||
|
||||
The table below shows the performances of different solutions on different forecasting models.
|
||||
|
||||
## Alpha158 dataset
|
||||
|
||||
| Model Name | Dataset | IC | ICIR | Rank IC | Rank ICIR | Annualized Return | Information Ratio | Max Drawdown |
|
||||
|------------------|---------|----|------|---------|-----------|-------------------|-------------------|--------------|
|
||||
| RR[Linear] |Alpha158 |0.088|0.570|0.102 |0.622 |0.077 |1.175 |-0.086 |
|
||||
| DDG-DA[Linear] |Alpha158 |0.093|0.622|0.106 |0.670 |0.085 |1.213 |-0.093 |
|
||||
| RR[LightGBM] |Alpha158 |0.079|0.566|0.088 |0.592 |0.075 |1.226 |-0.096 |
|
||||
| DDG-DA[LightGBM] |Alpha158 |0.084|0.639|0.093 |0.664 |0.099 |1.442 |-0.071 |
|
||||
|
||||
- The label horizon of the `Alpha158` dataset is set to 20.
|
||||
- The rolling time intervals are set to 20 trading days.
|
||||
- The test rolling periods are from January 2017 to August 2020.
|
||||
15
examples/benchmarks_dynamic/baseline/README.md
Normal file
15
examples/benchmarks_dynamic/baseline/README.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Introduction
|
||||
|
||||
This is the framework of periodically Rolling Retrain (RR) forecasting models. RR adapts to market dynamics by utilizing the up-to-date data periodically.
|
||||
|
||||
## Run the Code
|
||||
Users can try RR by running the following command:
|
||||
```bash
|
||||
python rolling_benchmark.py run_all
|
||||
```
|
||||
|
||||
The default forecasting models are `Linear`. Users can choose other forecasting models by changing the `model_type` parameter.
|
||||
For example, users can try `LightGBM` forecasting models by running the following command:
|
||||
```bash
|
||||
python rolling_benchmark.py --model_type="gbdt" run_all
|
||||
```
|
||||
114
examples/benchmarks_dynamic/baseline/rolling_benchmark.py
Normal file
114
examples/benchmarks_dynamic/baseline/rolling_benchmark.py
Normal file
@@ -0,0 +1,114 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
from qlib.model.ens.ensemble import RollingEnsemble
|
||||
from qlib.utils import init_instance_by_config
|
||||
import fire
|
||||
import yaml
|
||||
from qlib import auto_init
|
||||
from pathlib import Path
|
||||
from tqdm.auto import tqdm
|
||||
from qlib.model.trainer import TrainerR
|
||||
from qlib.workflow import R
|
||||
from qlib.tests.data import GetData
|
||||
|
||||
DIRNAME = Path(__file__).absolute().resolve().parent
|
||||
from qlib.workflow.task.gen import task_generator, RollingGen
|
||||
from qlib.workflow.task.collect import RecorderCollector
|
||||
from qlib.workflow.record_temp import PortAnaRecord, SigAnaRecord
|
||||
|
||||
|
||||
class RollingBenchmark:
|
||||
"""
|
||||
**NOTE**
|
||||
before running the example, please clean your previous results with following command
|
||||
- `rm -r mlruns`
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self, rolling_exp="rolling_models", model_type="linear") -> None:
|
||||
self.step = 20
|
||||
self.horizon = 20
|
||||
self.rolling_exp = rolling_exp
|
||||
self.model_type = model_type
|
||||
|
||||
def basic_task(self):
|
||||
"""For fast training rolling"""
|
||||
if self.model_type == "gbdt":
|
||||
conf_path = DIRNAME.parent.parent / "benchmarks" / "LightGBM" / "workflow_config_lightgbm_Alpha158.yaml"
|
||||
# dump the processed data on to disk for later loading to speed up the processing
|
||||
h_path = DIRNAME / "lightgbm_alpha158_handler_horizon{}.pkl".format(self.horizon)
|
||||
elif self.model_type == "linear":
|
||||
conf_path = DIRNAME.parent.parent / "benchmarks" / "Linear" / "workflow_config_linear_Alpha158.yaml"
|
||||
h_path = DIRNAME / "linear_alpha158_handler_horizon{}.pkl".format(self.horizon)
|
||||
else:
|
||||
raise AssertionError("Model type is not supported!")
|
||||
with conf_path.open("r") as f:
|
||||
conf = yaml.safe_load(f)
|
||||
|
||||
# modify dataset horizon
|
||||
conf["task"]["dataset"]["kwargs"]["handler"]["kwargs"]["label"] = [
|
||||
"Ref($close, -{}) / Ref($close, -1) - 1".format(self.horizon + 1)
|
||||
]
|
||||
|
||||
task = conf["task"]
|
||||
|
||||
if not h_path.exists():
|
||||
h_conf = task["dataset"]["kwargs"]["handler"]
|
||||
h = init_instance_by_config(h_conf)
|
||||
h.to_pickle(h_path, dump_all=True)
|
||||
|
||||
task["dataset"]["kwargs"]["handler"] = f"file://{h_path}"
|
||||
task["record"] = ["qlib.workflow.record_temp.SignalRecord"]
|
||||
return task
|
||||
|
||||
def create_rolling_tasks(self):
|
||||
task = self.basic_task()
|
||||
task_l = task_generator(
|
||||
task, RollingGen(step=self.step, trunc_days=self.horizon + 1)
|
||||
) # the last two days should be truncated to avoid information leakage
|
||||
return task_l
|
||||
|
||||
def train_rolling_tasks(self, task_l=None):
|
||||
if task_l is None:
|
||||
task_l = self.create_rolling_tasks()
|
||||
trainer = TrainerR(experiment_name=self.rolling_exp)
|
||||
trainer(task_l)
|
||||
|
||||
COMB_EXP = "rolling"
|
||||
|
||||
def ens_rolling(self):
|
||||
rc = RecorderCollector(
|
||||
experiment=self.rolling_exp,
|
||||
artifacts_key=["pred", "label"],
|
||||
process_list=[RollingEnsemble()],
|
||||
# rec_key_func=lambda rec: (self.COMB_EXP, rec.info["id"]),
|
||||
artifacts_path={"pred": "pred.pkl", "label": "label.pkl"},
|
||||
)
|
||||
res = rc()
|
||||
with R.start(experiment_name=self.COMB_EXP):
|
||||
R.log_params(exp_name=self.rolling_exp)
|
||||
R.save_objects(**{"pred.pkl": res["pred"], "label.pkl": res["label"]})
|
||||
|
||||
def update_rolling_rec(self):
|
||||
"""
|
||||
Evaluate the combined rolling results
|
||||
"""
|
||||
for rid, rec in R.list_recorders(experiment_name=self.COMB_EXP).items():
|
||||
for rt_cls in SigAnaRecord, PortAnaRecord:
|
||||
rt = rt_cls(recorder=rec, skip_existing=True)
|
||||
rt.generate()
|
||||
print(f"Your evaluation results can be found in the experiment named `{self.COMB_EXP}`.")
|
||||
|
||||
def run_all(self):
|
||||
# the results will be save in mlruns.
|
||||
# 1) each rolling task is saved in rolling_models
|
||||
self.train_rolling_tasks()
|
||||
# 2) combined rolling tasks and evaluation results are saved in rolling
|
||||
self.ens_rolling()
|
||||
self.update_rolling_rec()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
GetData().qlib_data(exists_skip=True)
|
||||
auto_init()
|
||||
fire.Fire(RollingBenchmark)
|
||||
@@ -150,7 +150,7 @@ class Cut(ElemOperator):
|
||||
self.l = l
|
||||
self.r = r
|
||||
if (self.l is not None and self.l <= 0) or (self.r is not None and self.r >= 0):
|
||||
raise ValueError("Cut operator l shoud > 0 and r should < 0")
|
||||
raise ValueError("Cut operator l should > 0 and r should < 0")
|
||||
|
||||
super(Cut, self).__init__(feature)
|
||||
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from qlib.constant import EPS
|
||||
from qlib.data.dataset.processor import Processor
|
||||
from qlib.data.dataset.utils import fetch_df_by_index
|
||||
|
||||
@@ -27,7 +28,7 @@ class HighFreqNorm(Processor):
|
||||
part_values = np.log1p(part_values)
|
||||
self.feature_med[name] = np.nanmedian(part_values)
|
||||
part_values = part_values - self.feature_med[name]
|
||||
self.feature_std[name] = np.nanmedian(np.absolute(part_values)) * 1.4826 + 1e-12
|
||||
self.feature_std[name] = np.nanmedian(np.absolute(part_values)) * 1.4826 + EPS
|
||||
part_values = part_values / self.feature_std[name]
|
||||
self.feature_vmax[name] = np.nanmax(part_values)
|
||||
self.feature_vmin[name] = np.nanmin(part_values)
|
||||
|
||||
@@ -5,7 +5,8 @@ import fire
|
||||
|
||||
import qlib
|
||||
import pickle
|
||||
from qlib.config import REG_CN, HIGH_FREQ_CONFIG
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.config import HIGH_FREQ_CONFIG
|
||||
|
||||
from qlib.utils import init_instance_by_config
|
||||
from qlib.data.dataset.handler import DataHandlerLP
|
||||
@@ -82,7 +83,7 @@ class HighfreqWorkflow:
|
||||
|
||||
def _init_qlib(self):
|
||||
"""initialize qlib"""
|
||||
# use yahoo_cn_1min data
|
||||
# use cn_data_1min data
|
||||
QLIB_INIT_CONFIG = {**HIGH_FREQ_CONFIG, **self.SPEC_CONF}
|
||||
provider_uri = QLIB_INIT_CONFIG.get("provider_uri")
|
||||
GetData().qlib_data(target_dir=provider_uri, interval="1min", region=REG_CN, exists_skip=True)
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
import qlib
|
||||
import optuna
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.utils import init_instance_by_config
|
||||
from qlib.tests.config import CSI300_DATASET_CONFIG
|
||||
from qlib.tests.data import GetData
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
import qlib
|
||||
import optuna
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.utils import init_instance_by_config
|
||||
from qlib.tests.data import GetData
|
||||
from qlib.tests.config import get_dataset_config, CSI300_MARKET, DATASET_ALPHA360_CLASS
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
|
||||
|
||||
import qlib
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
|
||||
from qlib.utils import init_instance_by_config
|
||||
from qlib.tests.data import GetData
|
||||
|
||||
@@ -11,7 +11,7 @@ from pprint import pprint
|
||||
|
||||
import fire
|
||||
import qlib
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.workflow import R
|
||||
from qlib.workflow.task.gen import RollingGen, task_generator
|
||||
from qlib.workflow.task.manage import TaskManager, run_task
|
||||
|
||||
@@ -100,7 +100,8 @@ from copy import deepcopy
|
||||
import qlib
|
||||
import fire
|
||||
import pandas as pd
|
||||
from qlib.config import REG_CN, HIGH_FREQ_CONFIG
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.config import HIGH_FREQ_CONFIG
|
||||
from qlib.data import D
|
||||
from qlib.utils import exists_qlib_data, init_instance_by_config, flatten_dict
|
||||
from qlib.workflow import R
|
||||
@@ -154,6 +155,8 @@ class NestedDecisionExecutionWorkflow:
|
||||
},
|
||||
}
|
||||
|
||||
exp_name = "nested"
|
||||
|
||||
port_analysis_config = {
|
||||
"executor": {
|
||||
"class": "NestedExecutor",
|
||||
@@ -229,7 +232,7 @@ class NestedDecisionExecutionWorkflow:
|
||||
qlib.init(provider_uri=provider_uri_map, dataset_cache=None, expression_cache=None)
|
||||
|
||||
def _train_model(self, model, dataset):
|
||||
with R.start(experiment_name="train"):
|
||||
with R.start(experiment_name=self.exp_name):
|
||||
R.log_params(**flatten_dict(self.task))
|
||||
model.fit(dataset)
|
||||
R.save_objects(**{"params.pkl": model})
|
||||
@@ -256,7 +259,7 @@ class NestedDecisionExecutionWorkflow:
|
||||
self.port_analysis_config["strategy"] = strategy_config
|
||||
self.port_analysis_config["backtest"]["benchmark"] = self.benchmark
|
||||
|
||||
with R.start(experiment_name="backtest"):
|
||||
with R.start(experiment_name=self.exp_name, resume=True):
|
||||
recorder = R.get_recorder()
|
||||
par = PortAnaRecord(
|
||||
recorder,
|
||||
@@ -298,7 +301,7 @@ class NestedDecisionExecutionWorkflow:
|
||||
# - Aligning the profit calculation between multiple levels and single levels.
|
||||
# 2) comparing different backtest
|
||||
# - Basic test idea:
|
||||
# - the daily backtest will be similar as multi-level(the data quality makes this gap samller)
|
||||
# - the daily backtest will be similar as multi-level(the data quality makes this gap smaller)
|
||||
|
||||
def check_diff_freq(self):
|
||||
self._init_qlib()
|
||||
@@ -381,7 +384,7 @@ class NestedDecisionExecutionWorkflow:
|
||||
}
|
||||
pa_conf["backtest"]["benchmark"] = self.benchmark
|
||||
|
||||
with R.start(experiment_name="backtest"):
|
||||
with R.start(experiment_name=self.exp_name, resume=True):
|
||||
recorder = R.get_recorder()
|
||||
par = PortAnaRecord(recorder, pa_conf)
|
||||
par.generate()
|
||||
|
||||
@@ -10,7 +10,7 @@ Next, we will finish updating online predictions.
|
||||
import copy
|
||||
import fire
|
||||
import qlib
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.model.trainer import task_train
|
||||
from qlib.workflow.online.utils import OnlineToolR
|
||||
from qlib.tests.config import CSI300_GBDT_TASK
|
||||
|
||||
46
examples/portfolio/README.md
Normal file
46
examples/portfolio/README.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Portfolio Optimization Strategy
|
||||
|
||||
## Introduction
|
||||
|
||||
In `qlib/examples/benchmarks` we have various **alpha** models that predict
|
||||
the stock returns. We also use a simple rule based `TopkDropoutStrategy` to
|
||||
evaluate the investing performance of these models. However, such a strategy
|
||||
is too simple to control the portfolio risk like correlation and volatility.
|
||||
|
||||
To this end, an optimization based strategy should be used to for the
|
||||
trade-off between return and risk. In this doc, we will show how to use
|
||||
`EnhancedIndexingStrategy` to maximize portfolio return while minimizing
|
||||
tracking error relative to a benchmark.
|
||||
|
||||
|
||||
## Preparation
|
||||
|
||||
We use China stock market data for our example.
|
||||
|
||||
1. Prepare CSI300 weight:
|
||||
|
||||
```bash
|
||||
wget http://fintech.msra.cn/stock_data/downloads/csi300_weight.zip
|
||||
unzip -d ~/.qlib/qlib_data/cn_data csi300_weight.zip
|
||||
rm -f csi300_weight.zip
|
||||
```
|
||||
|
||||
2. Prepare risk model data:
|
||||
|
||||
```bash
|
||||
python prepare_riskdata.py
|
||||
```
|
||||
|
||||
Here we use a **Statistical Risk Model** implemented in `qlib.model.riskmodel`.
|
||||
However users are strongly recommended to use other risk models for better quality:
|
||||
* **Fundamental Risk Model** like MSCI BARRA
|
||||
* [Deep Risk Model](https://arxiv.org/abs/2107.05201)
|
||||
|
||||
|
||||
## End-to-End Workflow
|
||||
|
||||
You can finish workflow with `EnhancedIndexingStrategy` by running
|
||||
`qrun config_enhanced_indexing.yaml`.
|
||||
|
||||
In this config, we mainly changed the strategy section compared to
|
||||
`qlib/examples/benchmarks/workflow_config_lightgbm_Alpha158.yaml`.
|
||||
71
examples/portfolio/config_enhanced_indexing.yaml
Normal file
71
examples/portfolio/config_enhanced_indexing.yaml
Normal file
@@ -0,0 +1,71 @@
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
start_time: 2008-01-01
|
||||
end_time: 2020-08-01
|
||||
fit_start_time: 2008-01-01
|
||||
fit_end_time: 2014-12-31
|
||||
instruments: *market
|
||||
port_analysis_config: &port_analysis_config
|
||||
strategy:
|
||||
class: EnhancedIndexingStrategy
|
||||
module_path: qlib.contrib.strategy
|
||||
kwargs:
|
||||
model: <MODEL>
|
||||
dataset: <DATASET>
|
||||
riskmodel_root: ./riskdata
|
||||
backtest:
|
||||
start_time: 2017-01-01
|
||||
end_time: 2020-08-01
|
||||
account: 100000000
|
||||
benchmark: *benchmark
|
||||
exchange_kwargs:
|
||||
limit_threshold: 0.095
|
||||
deal_price: close
|
||||
open_cost: 0.0005
|
||||
close_cost: 0.0015
|
||||
min_cost: 5
|
||||
task:
|
||||
model:
|
||||
class: LGBModel
|
||||
module_path: qlib.contrib.model.gbdt
|
||||
kwargs:
|
||||
loss: mse
|
||||
colsample_bytree: 0.8879
|
||||
learning_rate: 0.2
|
||||
subsample: 0.8789
|
||||
lambda_l1: 205.6999
|
||||
lambda_l2: 580.9768
|
||||
max_depth: 8
|
||||
num_leaves: 210
|
||||
num_threads: 20
|
||||
dataset:
|
||||
class: DatasetH
|
||||
module_path: qlib.data.dataset
|
||||
kwargs:
|
||||
handler:
|
||||
class: Alpha158
|
||||
module_path: qlib.contrib.data.handler
|
||||
kwargs: *data_handler_config
|
||||
segments:
|
||||
train: [2008-01-01, 2014-12-31]
|
||||
valid: [2015-01-01, 2016-12-31]
|
||||
test: [2017-01-01, 2020-08-01]
|
||||
record:
|
||||
- class: SignalRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
model: <MODEL>
|
||||
dataset: <DATASET>
|
||||
- class: SigAnaRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
ana_long_short: False
|
||||
ann_scaler: 252
|
||||
- class: PortAnaRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
config: *port_analysis_config
|
||||
55
examples/portfolio/prepare_riskdata.py
Normal file
55
examples/portfolio/prepare_riskdata.py
Normal file
@@ -0,0 +1,55 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
import os
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from qlib.data import D
|
||||
from qlib.model.riskmodel import StructuredCovEstimator
|
||||
|
||||
|
||||
def prepare_data(riskdata_root="./riskdata", T=240, start_time="2016-01-01"):
|
||||
|
||||
universe = D.features(D.instruments("csi300"), ["$close"], start_time=start_time).swaplevel().sort_index()
|
||||
|
||||
price_all = (
|
||||
D.features(D.instruments("all"), ["$close"], start_time=start_time).squeeze().unstack(level="instrument")
|
||||
)
|
||||
|
||||
# StructuredCovEstimator is a statistical risk model
|
||||
riskmodel = StructuredCovEstimator()
|
||||
|
||||
for i in range(T - 1, len(price_all)):
|
||||
|
||||
date = price_all.index[i]
|
||||
ref_date = price_all.index[i - T + 1]
|
||||
|
||||
print(date)
|
||||
|
||||
codes = universe.loc[date].index
|
||||
price = price_all.loc[ref_date:date, codes]
|
||||
|
||||
# calculate return and remove extreme return
|
||||
ret = price.pct_change()
|
||||
ret.clip(ret.quantile(0.025), ret.quantile(0.975), axis=1, inplace=True)
|
||||
|
||||
# run risk model
|
||||
F, cov_b, var_u = riskmodel.predict(ret, is_price=False, return_decomposed_components=True)
|
||||
|
||||
# save risk data
|
||||
root = riskdata_root + "/" + date.strftime("%Y%m%d")
|
||||
os.makedirs(root, exist_ok=True)
|
||||
|
||||
pd.DataFrame(F, index=codes).to_pickle(root + "/factor_exp.pkl")
|
||||
pd.DataFrame(cov_b).to_pickle(root + "/factor_cov.pkl")
|
||||
# for specific_risk we follow the convention to save volatility
|
||||
pd.Series(np.sqrt(var_u), index=codes).to_pickle(root + "/specific_risk.pkl")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
import qlib
|
||||
|
||||
qlib.init(provider_uri="~/.qlib/qlib_data/cn_data")
|
||||
|
||||
prepare_data()
|
||||
@@ -6,7 +6,7 @@ import fire
|
||||
import pickle
|
||||
|
||||
from datetime import datetime
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.data.dataset.handler import DataHandlerLP
|
||||
from qlib.utils import init_instance_by_config
|
||||
from qlib.tests.data import GetData
|
||||
|
||||
@@ -20,7 +20,6 @@ from operator import xor
|
||||
from pprint import pprint
|
||||
|
||||
import qlib
|
||||
from qlib.config import REG_CN
|
||||
from qlib.workflow import R
|
||||
from qlib.tests.data import GetData
|
||||
|
||||
|
||||
@@ -61,7 +61,7 @@
|
||||
"\n",
|
||||
"import qlib\n",
|
||||
"import pandas as pd\n",
|
||||
"from qlib.config import REG_CN\n",
|
||||
"from qlib.constant import REG_CN\n",
|
||||
"from qlib.utils import exists_qlib_data, init_instance_by_config\n",
|
||||
"from qlib.workflow import R\n",
|
||||
"from qlib.workflow.record_temp import SignalRecord, PortAnaRecord\n",
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
# Licensed under the MIT License.
|
||||
|
||||
import qlib
|
||||
from qlib.config import REG_CN
|
||||
from qlib.constant import REG_CN
|
||||
from qlib.utils import init_instance_by_config, flatten_dict
|
||||
from qlib.workflow import R
|
||||
from qlib.workflow.record_temp import SignalRecord, PortAnaRecord, SigAnaRecord
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
# Licensed under the MIT License.
|
||||
from pathlib import Path
|
||||
|
||||
__version__ = "0.8.0.99"
|
||||
__version__ = "0.8.1"
|
||||
__version__bak = __version__ # This version is backup for QlibConfig.reset_qlib_version
|
||||
import os
|
||||
from typing import Union
|
||||
@@ -19,11 +19,17 @@ def init(default_conf="client", **kwargs):
|
||||
|
||||
Parameters
|
||||
----------
|
||||
default_conf: str
|
||||
the default value is client. Accepted values: client/server.
|
||||
**kwargs :
|
||||
clear_mem_cache: str
|
||||
the default value is True;
|
||||
Will the memory cache be clear.
|
||||
It is often used to improve performance when init will be called for multiple times
|
||||
skip_if_reg: bool: str
|
||||
the default value is True;
|
||||
When using the recorder, skip_if_reg can set to True to avoid loss of recorder.
|
||||
|
||||
"""
|
||||
from .config import C
|
||||
from .data.cache import H
|
||||
@@ -180,7 +186,7 @@ def get_project_path(config_name="config.yaml", cur_path: Union[Path, str, None]
|
||||
- There is a file named `config.yaml` in qlib.
|
||||
|
||||
For example:
|
||||
If your project file system stucuture follows such a pattern
|
||||
If your project file system structure follows such a pattern
|
||||
|
||||
<project_path>/
|
||||
- config.yaml
|
||||
@@ -225,7 +231,7 @@ def auto_init(**kwargs):
|
||||
Here are two examples of the configuration
|
||||
|
||||
Example 1)
|
||||
If you want create a new project-specific config based on a shared configure, you can use `conf_type: ref`
|
||||
If you want to create a new project-specific config based on a shared configure, you can use `conf_type: ref`
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
@@ -241,7 +247,7 @@ def auto_init(**kwargs):
|
||||
default_exp_name: "Experiment"
|
||||
|
||||
Example 2)
|
||||
If you wan to create simple a stand alone config, you can use following config(a.k.a `conf_type: origin`)
|
||||
If you want to create simple a standalone config, you can use following config(a.k.a. `conf_type: origin`)
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@@ -271,8 +277,8 @@ def auto_init(**kwargs):
|
||||
init_from_yaml_conf(conf_pp, **kwargs)
|
||||
elif conf_type == "ref":
|
||||
# This config type will be more convenient in following scenario
|
||||
# - There is a shared configure file and you don't want to edit it inplace.
|
||||
# - The shared configure may be updated later and you don't want to copy it.
|
||||
# - There is a shared configure file, and you don't want to edit it inplace.
|
||||
# - The shared configure may be updated later, and you don't want to copy it.
|
||||
# - You have some customized config.
|
||||
qlib_conf_path = conf.get("qlib_cfg", None)
|
||||
|
||||
|
||||
@@ -31,7 +31,7 @@ rtn & earning in the Account
|
||||
class AccumulatedInfo:
|
||||
"""
|
||||
accumulated trading info, including accumulated return/cost/turnover
|
||||
AccumulatedInfo should be shared accross different levels
|
||||
AccumulatedInfo should be shared across different levels
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
@@ -199,7 +199,7 @@ class Account:
|
||||
|
||||
# if stock is sold out, no stock price information in Position, then we should update account first, then update current position
|
||||
# if stock is bought, there is no stock in current position, update current, then update account
|
||||
# The cost will be substracted from the cash at last. So the trading logic can ignore the cost calculation
|
||||
# The cost will be subtracted from the cash at last. So the trading logic can ignore the cost calculation
|
||||
if order.direction == Order.SELL:
|
||||
# sell stock
|
||||
self._update_state_from_order(order, trade_val, cost, trade_price)
|
||||
@@ -378,7 +378,7 @@ class Account:
|
||||
)
|
||||
|
||||
def get_portfolio_metrics(self):
|
||||
"""get the history portfolio_metrics and postions instance"""
|
||||
"""get the history portfolio_metrics and positions instance"""
|
||||
if self.is_port_metr_enabled():
|
||||
_portfolio_metrics = self.portfolio_metrics.generate_portfolio_metrics_dataframe()
|
||||
_positions = self.get_hist_positions()
|
||||
|
||||
@@ -13,7 +13,7 @@ from tqdm.auto import tqdm
|
||||
|
||||
|
||||
def backtest_loop(start_time, end_time, trade_strategy: BaseStrategy, trade_executor: BaseExecutor):
|
||||
"""backtest funciton for the interaction of the outermost strategy and executor in the nested decision execution
|
||||
"""backtest function for the interaction of the outermost strategy and executor in the nested decision execution
|
||||
|
||||
please refer to the docs of `collect_data_loop`
|
||||
|
||||
|
||||
@@ -505,8 +505,8 @@ class BaseTradeDecision:
|
||||
`inner_trade_decision` will be changed **inplaced**.
|
||||
|
||||
Motivation of the `mod_inner_decision`
|
||||
- Leave a hook for outer decision to affact the decision generated by the inner strategy
|
||||
- e.g. the outmost strategy generate a time range for trading. But the upper layer can only affact the
|
||||
- Leave a hook for outer decision to affect the decision generated by the inner strategy
|
||||
- e.g. the outmost strategy generate a time range for trading. But the upper layer can only affect the
|
||||
nearest layer in the original design. With `mod_inner_decision`, the decision can passed through multiple
|
||||
layers
|
||||
|
||||
|
||||
@@ -14,7 +14,8 @@ import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from ..data.data import D
|
||||
from ..config import C, REG_CN
|
||||
from ..config import C
|
||||
from ..constant import REG_CN
|
||||
from ..log import get_module_logger
|
||||
from .decision import Order, OrderDir, OrderHelper
|
||||
from .high_performance_ds import BaseQuote, PandasQuote, NumpyQuote
|
||||
@@ -103,7 +104,7 @@ class Exchange:
|
||||
Necessary fields:
|
||||
$close is for calculating the total value at end of each day.
|
||||
Optional fields:
|
||||
$volume is only necessary when we limit the trade amount or caculate PA(vwap) indicator
|
||||
$volume is only necessary when we limit the trade amount or calculate PA(vwap) indicator
|
||||
$vwap is only necessary when we use the $vwap price as the deal price
|
||||
$factor is for rounding to the trading unit
|
||||
limit_sell will be set to False by default(False indicates we can sell this
|
||||
@@ -505,7 +506,7 @@ class Exchange:
|
||||
Note: some future information is used in this function
|
||||
Parameter:
|
||||
target_position : dict { stock_id : amount }
|
||||
current_postion : dict { stock_id : amount}
|
||||
current_position : dict { stock_id : amount}
|
||||
trade_unit : trade_unit
|
||||
down sample : for amount 321 and trade_unit 100, deal_amount is 300
|
||||
deal order on trade_date
|
||||
@@ -535,7 +536,7 @@ class Exchange:
|
||||
deal_amount = self.get_real_deal_amount(current_amount, target_amount, factor)
|
||||
if deal_amount == 0:
|
||||
continue
|
||||
elif deal_amount > 0:
|
||||
if deal_amount > 0:
|
||||
# buy stock
|
||||
buy_order_list.append(
|
||||
Order(
|
||||
@@ -686,9 +687,7 @@ class Exchange:
|
||||
orig_deal_amount = order.deal_amount
|
||||
order.deal_amount = max(min(vol_limit_min, orig_deal_amount), 0)
|
||||
if vol_limit_min < orig_deal_amount:
|
||||
self.logger.debug(
|
||||
f"Order clipped due to volume limitation: {order}, {[(vol, rule) for vol, rule in zip(vol_limit_num, vol_limit)]}"
|
||||
)
|
||||
self.logger.debug(f"Order clipped due to volume limitation: {order}, {list(zip(vol_limit_num, vol_limit))}")
|
||||
|
||||
def _get_buy_amount_by_cash_limit(self, trade_price, cash, cost_ratio):
|
||||
"""return the real order amount after cash limit for buying.
|
||||
|
||||
@@ -41,7 +41,7 @@ class BaseExecutor:
|
||||
Parameters
|
||||
----------
|
||||
time_per_step : str
|
||||
trade time per trading step, used for genreate the trade calendar
|
||||
trade time per trading step, used for generate the trade calendar
|
||||
show_indicator: bool, optional
|
||||
whether to show indicators, :
|
||||
- 'pa', the price advantage
|
||||
@@ -118,7 +118,7 @@ class BaseExecutor:
|
||||
self.dealt_order_amount = defaultdict(float)
|
||||
self.deal_day = None
|
||||
|
||||
def reset_common_infra(self, common_infra):
|
||||
def reset_common_infra(self, common_infra, copy_trade_account=False):
|
||||
"""
|
||||
reset infrastructure for trading
|
||||
- reset trade_account
|
||||
@@ -129,9 +129,14 @@ class BaseExecutor:
|
||||
self.common_infra.update(common_infra)
|
||||
|
||||
if common_infra.has("trade_account"):
|
||||
# NOTE: there is a trick in the code.
|
||||
# shallow copy is used instead of deepcopy. So positions are shared
|
||||
self.trade_account: Account = copy.copy(common_infra.get("trade_account"))
|
||||
if copy_trade_account:
|
||||
# NOTE: there is a trick in the code.
|
||||
# shallow copy is used instead of deepcopy.
|
||||
# 1. So positions are shared
|
||||
# 2. Others are not shared, so each level has it own metrics (portfolio and trading metrics)
|
||||
self.trade_account: Account = copy.copy(common_infra.get("trade_account"))
|
||||
else:
|
||||
self.trade_account = common_infra.get("trade_account")
|
||||
self.trade_account.reset(freq=self.time_per_step, port_metr_enabled=self.generate_portfolio_metrics)
|
||||
|
||||
@property
|
||||
@@ -189,7 +194,7 @@ class BaseExecutor:
|
||||
return return_value.get("execute_result")
|
||||
|
||||
@abstractclassmethod
|
||||
def _collect_data(self, trade_decision: BaseTradeDecision, level: int = 0) -> Tuple[List[object], dict]:
|
||||
def _collect_data(cls, trade_decision: BaseTradeDecision, level: int = 0) -> Tuple[List[object], dict]:
|
||||
"""
|
||||
Please refer to the doc of collect_data
|
||||
The only difference between `_collect_data` and `collect_data` is that some common steps are moved into
|
||||
@@ -342,14 +347,18 @@ class NestedExecutor(BaseExecutor):
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
def reset_common_infra(self, common_infra):
|
||||
def reset_common_infra(self, common_infra, copy_trade_account=False):
|
||||
"""
|
||||
reset infrastructure for trading
|
||||
- reset inner_strategyand inner_executor common infra
|
||||
"""
|
||||
super(NestedExecutor, self).reset_common_infra(common_infra)
|
||||
# NOTE: please refer to the docs of BaseExecutor.reset_common_infra for the meaning of `copy_trade_account`
|
||||
|
||||
self.inner_executor.reset_common_infra(common_infra)
|
||||
# The first level follow the `copy_trade_account` from the upper level
|
||||
super(NestedExecutor, self).reset_common_infra(common_infra, copy_trade_account=copy_trade_account)
|
||||
|
||||
# The lower level have to copy the trade_account
|
||||
self.inner_executor.reset_common_infra(common_infra, copy_trade_account=True)
|
||||
self.inner_strategy.reset_common_infra(common_infra)
|
||||
|
||||
def _init_sub_trading(self, trade_decision):
|
||||
@@ -360,12 +369,12 @@ class NestedExecutor(BaseExecutor):
|
||||
self.inner_strategy.reset(level_infra=sub_level_infra, outer_trade_decision=trade_decision)
|
||||
|
||||
def _update_trade_decision(self, trade_decision: BaseTradeDecision) -> BaseTradeDecision:
|
||||
# outter strategy have chance to update decision each iterator
|
||||
# outer strategy have chance to update decision each iterator
|
||||
updated_trade_decision = trade_decision.update(self.inner_executor.trade_calendar)
|
||||
if updated_trade_decision is not None:
|
||||
trade_decision = updated_trade_decision
|
||||
# NEW UPDATE
|
||||
# create a hook for inner strategy to update outter decision
|
||||
# create a hook for inner strategy to update outer decision
|
||||
self.inner_strategy.alter_outer_trade_decision(trade_decision)
|
||||
return trade_decision
|
||||
|
||||
|
||||
@@ -400,7 +400,7 @@ class BaseOrderIndicator:
|
||||
indicators : List[BaseOrderIndicator]
|
||||
the list of all inner indicators.
|
||||
metrics : Union[str, List[str]]
|
||||
all metrics needs ot be sumed.
|
||||
all metrics needs to be sumed.
|
||||
fill_value : float, optional
|
||||
fill np.NaN with value. By default None.
|
||||
"""
|
||||
|
||||
@@ -20,7 +20,7 @@ class BasePosition:
|
||||
Please refer to the `Position` class for the position
|
||||
"""
|
||||
|
||||
def __init__(self, cash=0.0, *args, **kwargs):
|
||||
def __init__(self, *args, cash=0.0, **kwargs):
|
||||
self._settle_type = self.ST_NO
|
||||
|
||||
def skip_update(self) -> bool:
|
||||
@@ -152,7 +152,7 @@ class BasePosition:
|
||||
"""
|
||||
generate stock weight dict {stock_id : value weight of stock in the position}
|
||||
it is meaningful in the beginning or the end of each trade step
|
||||
- During execution of each trading step, the weight may be not consistant with the portfolio value
|
||||
- During execution of each trading step, the weight may be not consistent with the portfolio value
|
||||
|
||||
Parameters
|
||||
----------
|
||||
|
||||
@@ -39,7 +39,7 @@ def get_benchmark_weight(
|
||||
if not path:
|
||||
path = Path(C.dpm.get_data_uri(freq)).expanduser() / "raw" / "AIndexMembers" / "weights.csv"
|
||||
# TODO: the storage of weights should be implemented in a more elegent way
|
||||
# TODO: The benchmark is not consistant with the filename in instruments.
|
||||
# TODO: The benchmark is not consistent with the filename in instruments.
|
||||
bench_weight_df = pd.read_csv(path, usecols=["code", "date", "index", "weight"])
|
||||
bench_weight_df = bench_weight_df[bench_weight_df["index"] == bench]
|
||||
bench_weight_df["date"] = pd.to_datetime(bench_weight_df["date"])
|
||||
@@ -156,16 +156,16 @@ def decompose_portofolio(stock_weight_df, stock_group_df, stock_ret_df):
|
||||
group_weight, stock_weight_in_group = decompose_portofolio_weight(stock_weight_df, stock_group_df)
|
||||
|
||||
group_ret = {}
|
||||
for group_key in stock_weight_in_group:
|
||||
stock_weight_in_group_start_date = min(stock_weight_in_group[group_key].index)
|
||||
stock_weight_in_group_end_date = max(stock_weight_in_group[group_key].index)
|
||||
for group_key, val in stock_weight_in_group.items():
|
||||
stock_weight_in_group_start_date = min(val.index)
|
||||
stock_weight_in_group_end_date = max(val.index)
|
||||
|
||||
temp_stock_ret_df = stock_ret_df[
|
||||
(stock_ret_df.index >= stock_weight_in_group_start_date)
|
||||
& (stock_ret_df.index <= stock_weight_in_group_end_date)
|
||||
]
|
||||
|
||||
group_ret[group_key] = (temp_stock_ret_df * stock_weight_in_group[group_key]).sum(axis=1)
|
||||
group_ret[group_key] = (temp_stock_ret_df * val).sum(axis=1)
|
||||
# If no weight is assigned, then the return of group will be np.nan
|
||||
group_ret[group_key][group_weight[group_key] == 0.0] = np.nan
|
||||
|
||||
|
||||
@@ -73,7 +73,7 @@ class PortfolioMetrics:
|
||||
self.init_bench(freq=freq, benchmark_config=benchmark_config)
|
||||
|
||||
def init_vars(self):
|
||||
self.accounts = OrderedDict() # account postion value for each trade time
|
||||
self.accounts = OrderedDict() # account position value for each trade time
|
||||
self.returns = OrderedDict() # daily return rate for each trade time
|
||||
self.total_turnovers = OrderedDict() # total turnover for each trade time
|
||||
self.turnovers = OrderedDict() # turnover for each trade time
|
||||
@@ -212,7 +212,8 @@ class PortfolioMetrics:
|
||||
path: str/ pathlib.Path()
|
||||
"""
|
||||
path = pathlib.Path(path)
|
||||
r = pd.read_csv(open(path, "rb"), index_col=0)
|
||||
with path.open("rb") as f:
|
||||
r = pd.read_csv(f, index_col=0)
|
||||
r.index = pd.DatetimeIndex(r.index)
|
||||
|
||||
index = r.index
|
||||
@@ -236,7 +237,7 @@ class Indicator:
|
||||
"""
|
||||
`Indicator` is implemented in a aggregate way.
|
||||
All the metrics are calculated aggregately.
|
||||
All the metrics are calculated for a seperated stock and in a specific step on a specific level.
|
||||
All the metrics are calculated for a separated stock and in a specific step on a specific level.
|
||||
|
||||
| indicator | desc. |
|
||||
|--------------+--------------------------------------------------------------|
|
||||
|
||||
@@ -93,7 +93,7 @@ class TradeCalendarManager:
|
||||
|
||||
About the endpoints:
|
||||
- Qlib uses the closed interval in time-series data selection, which has the same performance as pandas.Series.loc
|
||||
# - The returned right endpoints should minus 1 seconds becasue of the closed interval representation in Qlib.
|
||||
# - The returned right endpoints should minus 1 seconds because of the closed interval representation in Qlib.
|
||||
# Note: Qlib supports up to minutely decision execution, so 1 seconds is less than any trading time interval.
|
||||
|
||||
Parameters
|
||||
@@ -205,10 +205,7 @@ class BaseInfrastructure:
|
||||
warnings.warn(f"infra {infra_name} is not found!")
|
||||
|
||||
def has(self, infra_name):
|
||||
if infra_name in self.get_support_infra() and hasattr(self, infra_name):
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
return infra_name in self.get_support_infra() and hasattr(self, infra_name)
|
||||
|
||||
def update(self, other):
|
||||
support_infra = other.get_support_infra()
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
About the configs
|
||||
=================
|
||||
|
||||
The config will based on _default_config.
|
||||
The config will be based on _default_config.
|
||||
Two modes are supported
|
||||
- client
|
||||
- server
|
||||
@@ -22,13 +22,15 @@ from pathlib import Path
|
||||
from typing import Optional, Union
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
from qlib.constant import REG_CN, REG_US
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from qlib.utils.time import Freq
|
||||
|
||||
|
||||
class Config:
|
||||
def __init__(self, default_conf):
|
||||
self.__dict__["_default_config"] = copy.deepcopy(default_conf) # avoiding conflictions with __getattr__
|
||||
self.__dict__["_default_config"] = copy.deepcopy(default_conf) # avoiding conflicts with __getattr__
|
||||
self.reset()
|
||||
|
||||
def __getitem__(self, key):
|
||||
@@ -74,10 +76,6 @@ class Config:
|
||||
self.update(**config_c.__dict__["_config"])
|
||||
|
||||
|
||||
# REGION CONST
|
||||
REG_CN = "cn"
|
||||
REG_US = "us"
|
||||
|
||||
# pickle.dump protocol version: https://docs.python.org/3/library/pickle.html#data-stream-format
|
||||
PROTOCOL_VERSION = 4
|
||||
|
||||
@@ -240,7 +238,7 @@ MODE_CONF = {
|
||||
}
|
||||
|
||||
HIGH_FREQ_CONFIG = {
|
||||
"provider_uri": "~/.qlib/qlib_data/yahoo_cn_1min",
|
||||
"provider_uri": "~/.qlib/qlib_data/cn_data_1min",
|
||||
"dataset_cache": None,
|
||||
"expression_cache": "DiskExpressionCache",
|
||||
"region": REG_CN,
|
||||
@@ -271,7 +269,11 @@ class QlibConfig(Config):
|
||||
self._registered = False
|
||||
|
||||
class DataPathManager:
|
||||
def __init__(self, provider_uri: Union[str, Path, dict], mount_path: Union[str, Path, dict]):
|
||||
def __init__(
|
||||
self,
|
||||
provider_uri: Union[str, Path, dict],
|
||||
mount_path: Union[str, Path, dict],
|
||||
):
|
||||
self.provider_uri = provider_uri
|
||||
self.mount_path = mount_path
|
||||
|
||||
@@ -360,10 +362,10 @@ class QlibConfig(Config):
|
||||
"""
|
||||
configure qlib based on the input parameters
|
||||
|
||||
The configure will act like a dictionary.
|
||||
The configuration will act like a dictionary.
|
||||
|
||||
Normally, it literally replace the value according to the keys.
|
||||
However, sometimes it is hard for users to set the config when the configure is nested and complicated
|
||||
Normally, it literally is replaced the value according to the keys.
|
||||
However, sometimes it is hard for users to set the config when the configuration is nested and complicated
|
||||
|
||||
So this API provides some special parameters for users to set the keys in a more convenient way.
|
||||
- region: REG_CN, REG_US
|
||||
|
||||
9
qlib/constant.py
Normal file
9
qlib/constant.py
Normal file
@@ -0,0 +1,9 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
# REGION CONST
|
||||
REG_CN = "cn"
|
||||
REG_US = "us"
|
||||
|
||||
# Epsilon for avoiding division by zero.
|
||||
EPS = 1e-12
|
||||
@@ -63,9 +63,7 @@ def _get_date_parse_fn(target):
|
||||
get_date_parse_fn('20120101')('2017-01-01') => '20170101'
|
||||
get_date_parse_fn(20120101)('2017-01-01') => 20170101
|
||||
"""
|
||||
if isinstance(target, pd.Timestamp):
|
||||
_fn = lambda x: pd.Timestamp(x) # Timestamp('2020-01-01')
|
||||
elif isinstance(target, int):
|
||||
if isinstance(target, int):
|
||||
_fn = lambda x: int(str(x).replace("-", "")[:8]) # 20200201
|
||||
elif isinstance(target, str) and len(target) == 8:
|
||||
_fn = lambda x: str(x).replace("-", "")[:8] # '20200201'
|
||||
@@ -158,7 +156,7 @@ class MTSDatasetH(DatasetH):
|
||||
try:
|
||||
df = self.handler._learn.copy() # use copy otherwise recorder will fail
|
||||
# FIXME: currently we cannot support switching from `_learn` to `_infer` for inference
|
||||
except:
|
||||
except Exception:
|
||||
warnings.warn("cannot access `_learn`, will load raw data")
|
||||
df = self.handler._data.copy()
|
||||
df.index = df.index.swaplevel()
|
||||
|
||||
@@ -126,9 +126,9 @@ class Alpha360(DataHandlerLP):
|
||||
fields += ["$vwap/$close"]
|
||||
names += ["VWAP0"]
|
||||
for i in range(59, 0, -1):
|
||||
fields += ["Ref($volume, %d)/$volume" % (i)]
|
||||
fields += ["Ref($volume, %d)/($volume+1e-12)" % (i)]
|
||||
names += ["VOLUME%d" % (i)]
|
||||
fields += ["$volume/$volume"]
|
||||
fields += ["$volume/($volume+1e-12)"]
|
||||
names += ["VOLUME0"]
|
||||
|
||||
return fields, names
|
||||
@@ -249,7 +249,7 @@ class Alpha158(DataHandlerLP):
|
||||
names += [field.upper() + str(d) for d in windows]
|
||||
if "volume" in config:
|
||||
windows = config["volume"].get("windows", range(5))
|
||||
fields += ["Ref($volume, %d)/$volume" % d if d != 0 else "$volume/$volume" for d in windows]
|
||||
fields += ["Ref($volume, %d)/($volume+1e-12)" % d if d != 0 else "$volume/($volume+1e-12)" for d in windows]
|
||||
names += ["VOLUME" + str(d) for d in windows]
|
||||
if "rolling" in config:
|
||||
windows = config["rolling"].get("windows", [5, 10, 20, 30, 60])
|
||||
|
||||
@@ -18,8 +18,8 @@ class SepDataFrame:
|
||||
"""
|
||||
(Sep)erate DataFrame
|
||||
We usually concat multiple dataframe to be processed together(Such as feature, label, weight, filter).
|
||||
However, they are usally be used seperately at last.
|
||||
This will result in extra cost for concating and spliting data(reshaping and copying data in the memory is very expensive)
|
||||
However, they are usually be used separately at last.
|
||||
This will result in extra cost for concatenating and splitting data(reshaping and copying data in the memory is very expensive)
|
||||
|
||||
SepDataFrame tries to act like a DataFrame whose column with multiindex
|
||||
"""
|
||||
|
||||
@@ -371,7 +371,7 @@ def long_short_backtest(
|
||||
|
||||
def t_run():
|
||||
pred_FN = "./check_pred.csv"
|
||||
pred = pd.read_csv(pred_FN)
|
||||
pred: pd.DataFrame = pd.read_csv(pred_FN)
|
||||
pred["datetime"] = pd.to_datetime(pred["datetime"])
|
||||
pred = pred.set_index([pred.columns[0], pred.columns[1]])
|
||||
pred = pred.iloc[:9000]
|
||||
|
||||
@@ -38,11 +38,11 @@ def _get_position_value_from_df(evaluate_date, position, close_data_df):
|
||||
def get_position_value(evaluate_date, position):
|
||||
"""sum of close*amount
|
||||
|
||||
get value of postion
|
||||
get value of position
|
||||
|
||||
use close price
|
||||
|
||||
postions:
|
||||
positions:
|
||||
{
|
||||
Timestamp('2016-01-05 00:00:00'):
|
||||
{
|
||||
|
||||
4
qlib/contrib/meta/__init__.py
Normal file
4
qlib/contrib/meta/__init__.py
Normal file
@@ -0,0 +1,4 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
from .data_selection import MetaTaskDS, MetaDatasetDS, MetaModelDS
|
||||
5
qlib/contrib/meta/data_selection/__init__.py
Normal file
5
qlib/contrib/meta/data_selection/__init__.py
Normal file
@@ -0,0 +1,5 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
from .dataset import MetaDatasetDS, MetaTaskDS
|
||||
from .model import MetaModelDS
|
||||
325
qlib/contrib/meta/data_selection/dataset.py
Normal file
325
qlib/contrib/meta/data_selection/dataset.py
Normal file
@@ -0,0 +1,325 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
from copy import deepcopy
|
||||
from qlib.data.dataset.utils import init_task_handler
|
||||
from qlib.utils.data import deepcopy_basic_type
|
||||
from qlib.contrib.torch import data_to_tensor
|
||||
from qlib.workflow.task.utils import TimeAdjuster
|
||||
from qlib.model.meta.task import MetaTask
|
||||
from typing import Dict, List, Union, Text, Tuple
|
||||
from qlib.data.dataset.handler import DataHandler
|
||||
from qlib.log import get_module_logger
|
||||
from qlib.utils import auto_filter_kwargs, get_date_by_shift, init_instance_by_config
|
||||
from qlib.workflow import R
|
||||
from qlib.workflow.task.gen import RollingGen, task_generator
|
||||
from joblib import Parallel, delayed
|
||||
from qlib.model.meta.dataset import MetaTaskDataset
|
||||
from qlib.model.trainer import task_train, TrainerR
|
||||
from qlib.data.dataset import DatasetH
|
||||
from tqdm.auto import tqdm
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
|
||||
|
||||
class InternalData:
|
||||
def __init__(self, task_tpl: dict, step: int, exp_name: str):
|
||||
self.task_tpl = task_tpl
|
||||
self.step = step
|
||||
self.exp_name = exp_name
|
||||
|
||||
def setup(self, trainer=TrainerR, trainer_kwargs={}):
|
||||
"""
|
||||
after running this function `self.data_ic_df` will become set.
|
||||
Each col represents a data.
|
||||
Each row represents the Timestamp of performance of that data.
|
||||
For example,
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
2021-06-21 2021-06-04 2021-05-21 2021-05-07 2021-04-20 2021-04-06 2021-03-22 2021-03-08 ...
|
||||
2021-07-02 2021-06-18 2021-06-03 2021-05-20 2021-05-06 2021-04-19 2021-04-02 2021-03-19 ...
|
||||
datetime ...
|
||||
2018-01-02 0.079782 0.115975 0.070866 0.028849 -0.081170 0.140380 0.063864 0.110987 ...
|
||||
2018-01-03 0.123386 0.107789 0.071037 0.045278 -0.060782 0.167446 0.089779 0.124476 ...
|
||||
2018-01-04 0.140775 0.097206 0.063702 0.042415 -0.078164 0.173218 0.098914 0.114389 ...
|
||||
2018-01-05 0.030320 -0.037209 -0.044536 -0.047267 -0.081888 0.045648 0.059947 0.047652 ...
|
||||
2018-01-08 0.107201 0.009219 -0.015995 -0.036594 -0.086633 0.108965 0.122164 0.108508 ...
|
||||
... ... ... ... ... ... ... ... ... ...
|
||||
|
||||
"""
|
||||
|
||||
# 1) prepare the prediction of proxy models
|
||||
perf_task_tpl = deepcopy(self.task_tpl) # this task is supposed to contains no complicated objects
|
||||
|
||||
trainer = auto_filter_kwargs(trainer)(experiment_name=self.exp_name, **trainer_kwargs)
|
||||
# NOTE:
|
||||
# The handler is initialized for only once.
|
||||
if not trainer.has_worker():
|
||||
self.dh = init_task_handler(perf_task_tpl)
|
||||
else:
|
||||
self.dh = init_instance_by_config(perf_task_tpl["dataset"]["kwargs"]["handler"])
|
||||
|
||||
seg = perf_task_tpl["dataset"]["kwargs"]["segments"]
|
||||
|
||||
# We want to split the training time period into small segments.
|
||||
perf_task_tpl["dataset"]["kwargs"]["segments"] = {
|
||||
"train": (DatasetH.get_min_time(seg), DatasetH.get_max_time(seg)),
|
||||
"test": (None, None),
|
||||
}
|
||||
|
||||
# NOTE:
|
||||
# we play a trick here
|
||||
# treat the training segments as test to create the rolling tasks
|
||||
rg = RollingGen(step=self.step, test_key="train", train_key=None, task_copy_func=deepcopy_basic_type)
|
||||
gen_task = task_generator(perf_task_tpl, [rg])
|
||||
|
||||
recorders = R.list_recorders(experiment_name=self.exp_name)
|
||||
if len(gen_task) == len(recorders):
|
||||
get_module_logger("Internal Data").info("the data has been initialized")
|
||||
else:
|
||||
# train new models
|
||||
assert 0 == len(recorders), "An empty experiment is required for setup `InternalData``"
|
||||
trainer.train(gen_task)
|
||||
|
||||
# 2) extract the similarity matrix
|
||||
label_df = self.dh.fetch(col_set="label")
|
||||
# for
|
||||
recorders = R.list_recorders(experiment_name=self.exp_name)
|
||||
|
||||
key_l = []
|
||||
ic_l = []
|
||||
for _, rec in tqdm(recorders.items(), desc="calc"):
|
||||
pred = rec.load_object("pred.pkl")
|
||||
task = rec.load_object("task")
|
||||
data_key = task["dataset"]["kwargs"]["segments"]["train"]
|
||||
key_l.append(data_key)
|
||||
ic_l.append(delayed(self._calc_perf)(pred.iloc[:, 0], label_df.iloc[:, 0]))
|
||||
|
||||
ic_l = Parallel(n_jobs=-1)(ic_l)
|
||||
self.data_ic_df = pd.DataFrame(dict(zip(key_l, ic_l)))
|
||||
self.data_ic_df = self.data_ic_df.sort_index().sort_index(axis=1)
|
||||
|
||||
del self.dh # handler is not useful now
|
||||
|
||||
def _calc_perf(self, pred, label):
|
||||
df = pd.DataFrame({"pred": pred, "label": label})
|
||||
df = df.groupby("datetime").corr(method="spearman")
|
||||
corr = df.loc(axis=0)[:, "pred"]["label"].droplevel(axis=0, level=-1)
|
||||
return corr
|
||||
|
||||
def update(self):
|
||||
"""update the data for online trading"""
|
||||
# TODO:
|
||||
# when new data are totally(including label) available
|
||||
# - update the prediction
|
||||
# - update the data similarity map(if applied)
|
||||
|
||||
|
||||
class MetaTaskDS(MetaTask):
|
||||
"""Meta Task for Data Selection"""
|
||||
|
||||
def __init__(self, task: dict, meta_info: pd.DataFrame, mode: str = MetaTask.PROC_MODE_FULL, fill_method="max"):
|
||||
"""
|
||||
The description of the processed data
|
||||
|
||||
time_perf: A array with shape <hist_step_n * step, data pieces> -> data piece performance
|
||||
|
||||
time_belong: A array with shape <sample, data pieces> -> belong or not (1. or 0.)
|
||||
array([[1., 0., 0., ..., 0., 0., 0.],
|
||||
[1., 0., 0., ..., 0., 0., 0.],
|
||||
[1., 0., 0., ..., 0., 0., 0.],
|
||||
...,
|
||||
[0., 0., 0., ..., 0., 0., 1.],
|
||||
[0., 0., 0., ..., 0., 0., 1.],
|
||||
[0., 0., 0., ..., 0., 0., 1.]])
|
||||
|
||||
"""
|
||||
super().__init__(task, meta_info)
|
||||
self.fill_method = fill_method
|
||||
|
||||
time_perf = self._get_processed_meta_info()
|
||||
self.processed_meta_input = {"time_perf": time_perf}
|
||||
# FIXME: memory issue in this step
|
||||
if mode == MetaTask.PROC_MODE_FULL:
|
||||
# process metainfo_
|
||||
ds = self.get_dataset()
|
||||
|
||||
# these three lines occupied 70% of the time of initializing MetaTaskDS
|
||||
d_train, d_test = ds.prepare(["train", "test"], col_set=["feature", "label"])
|
||||
prev_size = d_test.shape[0]
|
||||
d_train = d_train.dropna(axis=0)
|
||||
d_test = d_test.dropna(axis=0)
|
||||
if prev_size == 0 or d_test.shape[0] / prev_size <= 0.1:
|
||||
raise ValueError(f"Most of samples are dropped. Please check this task: {task}")
|
||||
|
||||
assert (
|
||||
d_test.groupby("datetime").size().shape[0] >= 5
|
||||
), "In this segment, this trading dates is less than 5, you'd better check the data."
|
||||
|
||||
sample_time_belong = np.zeros((d_train.shape[0], time_perf.shape[1]))
|
||||
for i, col in enumerate(time_perf.columns):
|
||||
# these two lines of code occupied 20% of the time of initializing MetaTaskDS
|
||||
slc = slice(*d_train.index.slice_locs(start=col[0], end=col[1]))
|
||||
sample_time_belong[slc, i] = 1.0
|
||||
|
||||
# If you want that last month also belongs to the last time_perf
|
||||
# Assumptions: the latest data has similar performance like the last month
|
||||
sample_time_belong[sample_time_belong.sum(axis=1) != 1, -1] = 1.0
|
||||
|
||||
self.processed_meta_input.update(
|
||||
dict(
|
||||
X=d_train["feature"],
|
||||
y=d_train["label"].iloc[:, 0],
|
||||
X_test=d_test["feature"],
|
||||
y_test=d_test["label"].iloc[:, 0],
|
||||
time_belong=sample_time_belong,
|
||||
test_idx=d_test["label"].index,
|
||||
)
|
||||
)
|
||||
|
||||
# TODO: set device: I think this is not necessary to converting data format.
|
||||
self.processed_meta_input = data_to_tensor(self.processed_meta_input)
|
||||
|
||||
def _get_processed_meta_info(self):
|
||||
meta_info_norm = self.meta_info.sub(self.meta_info.mean(axis=1), axis=0) # .fillna(0.)
|
||||
if self.fill_method == "max":
|
||||
meta_info_norm = meta_info_norm.T.fillna(
|
||||
meta_info_norm.max(axis=1)
|
||||
).T # fill it with row max to align with previous implementation
|
||||
elif self.fill_method == "zero":
|
||||
pass
|
||||
else:
|
||||
raise NotImplementedError(f"This type of input is not supported")
|
||||
meta_info_norm = meta_info_norm.fillna(0.0) # always fill zero in case of NaN
|
||||
return meta_info_norm
|
||||
|
||||
def get_meta_input(self):
|
||||
return self.processed_meta_input
|
||||
|
||||
|
||||
class MetaDatasetDS(MetaTaskDataset):
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
task_tpl: Union[dict, list],
|
||||
step: int,
|
||||
trunc_days: int = None,
|
||||
rolling_ext_days: int = 0,
|
||||
exp_name: Union[str, InternalData],
|
||||
segments: Union[Dict[Text, Tuple], float],
|
||||
hist_step_n: int = 10,
|
||||
task_mode: str = MetaTask.PROC_MODE_FULL,
|
||||
fill_method: str = "max",
|
||||
):
|
||||
"""
|
||||
A dataset for meta model.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
task_tpl : Union[dict, list]
|
||||
Decide what tasks are used.
|
||||
- dict : the task template, the prepared task is generated with `step`, `trunc_days` and `RollingGen`
|
||||
- list : when list, use the list of tasks directly
|
||||
the list is supposed to be sorted according timeline
|
||||
step : int
|
||||
the rolling step
|
||||
trunc_days: int
|
||||
days to be truncated based on the test start
|
||||
rolling_ext_days: int
|
||||
sometimes users want to train meta models for a longer test period but with smaller rolling steps for more task samples.
|
||||
the total length of test periods will be `step + rolling_ext_days`
|
||||
|
||||
exp_name : Union[str, InternalData]
|
||||
Decide what meta_info are used for prediction.
|
||||
- str: the name of the experiment to store the performance of data
|
||||
- InternalData: a prepared internal data
|
||||
segments: Union[Dict[Text, Tuple], float]
|
||||
the segments to divide data
|
||||
both left and right
|
||||
if segments is a float:
|
||||
the float represents the percentage of data for training
|
||||
hist_step_n: int
|
||||
length of historical steps for the meta infomation
|
||||
task_mode : str
|
||||
Please refer to the docs of MetaTask
|
||||
"""
|
||||
super().__init__(segments=segments)
|
||||
if isinstance(exp_name, InternalData):
|
||||
self.internal_data = exp_name
|
||||
else:
|
||||
self.internal_data = InternalData(task_tpl, step=step, exp_name=exp_name)
|
||||
self.internal_data.setup()
|
||||
self.task_tpl = deepcopy(task_tpl) # FIXME: if the handler is shared, how to avoid the explosion of the memroy.
|
||||
self.trunc_days = trunc_days
|
||||
self.hist_step_n = hist_step_n
|
||||
self.step = step
|
||||
|
||||
if isinstance(task_tpl, dict):
|
||||
rg = RollingGen(
|
||||
step=step, trunc_days=trunc_days, task_copy_func=deepcopy_basic_type
|
||||
) # NOTE: trunc_days is very important !!!!
|
||||
task_iter = rg(task_tpl)
|
||||
if rolling_ext_days > 0:
|
||||
self.ta = TimeAdjuster(future=True)
|
||||
for t in task_iter:
|
||||
t["dataset"]["kwargs"]["segments"]["test"] = self.ta.shift(
|
||||
t["dataset"]["kwargs"]["segments"]["test"], step=rolling_ext_days, rtype=RollingGen.ROLL_EX
|
||||
)
|
||||
if task_mode == MetaTask.PROC_MODE_FULL:
|
||||
# Only pre initializing the task when full task is req
|
||||
# initializing handler and share it.
|
||||
init_task_handler(task_tpl)
|
||||
else:
|
||||
assert isinstance(task_tpl, list)
|
||||
task_iter = task_tpl
|
||||
|
||||
self.task_list = []
|
||||
self.meta_task_l = []
|
||||
logger = get_module_logger("MetaDatasetDS")
|
||||
logger.info(f"Example task for training meta model: {task_iter[0]}")
|
||||
for t in tqdm(task_iter, desc="creating meta tasks"):
|
||||
try:
|
||||
self.meta_task_l.append(
|
||||
MetaTaskDS(t, meta_info=self._prepare_meta_ipt(t), mode=task_mode, fill_method=fill_method)
|
||||
)
|
||||
self.task_list.append(t)
|
||||
except ValueError as e:
|
||||
logger.warning(f"ValueError: {e}")
|
||||
assert len(self.meta_task_l) > 0, "No meta tasks found. Please check the data and setting"
|
||||
|
||||
def _prepare_meta_ipt(self, task):
|
||||
ic_df = self.internal_data.data_ic_df
|
||||
|
||||
segs = task["dataset"]["kwargs"]["segments"]
|
||||
end = max([segs[k][1] for k in ("train", "valid") if k in segs])
|
||||
ic_df_avail = ic_df.loc[:end, pd.IndexSlice[:, :end]]
|
||||
|
||||
# meta data set focus on the **information** instead of preprocess
|
||||
# 1) filter the future info
|
||||
def mask_future(s):
|
||||
"""mask future information"""
|
||||
# from qlib.utils import get_date_by_shift
|
||||
start, end = s.name
|
||||
end = get_date_by_shift(trading_date=end, shift=self.trunc_days - 1, future=True)
|
||||
return s.mask((s.index >= start) & (s.index <= end))
|
||||
|
||||
ic_df_avail = ic_df_avail.apply(mask_future) # apply to each col
|
||||
|
||||
# 2) filter the info with too long periods
|
||||
total_len = self.step * self.hist_step_n
|
||||
if ic_df_avail.shape[0] >= total_len:
|
||||
return ic_df_avail.iloc[-total_len:]
|
||||
else:
|
||||
raise ValueError("the history of distribution data is not long enough.")
|
||||
|
||||
def _prepare_seg(self, segment: Text) -> List[MetaTask]:
|
||||
if isinstance(self.segments, float):
|
||||
train_task_n = int(len(self.meta_task_l) * self.segments)
|
||||
if segment == "train":
|
||||
return self.meta_task_l[:train_task_n]
|
||||
elif segment == "test":
|
||||
return self.meta_task_l[train_task_n:]
|
||||
else:
|
||||
raise NotImplementedError(f"This type of input is not supported")
|
||||
else:
|
||||
raise NotImplementedError(f"This type of input is not supported")
|
||||
182
qlib/contrib/meta/data_selection/model.py
Normal file
182
qlib/contrib/meta/data_selection/model.py
Normal file
@@ -0,0 +1,182 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
from qlib.log import get_module_logger
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
from qlib.model.meta.task import MetaTask
|
||||
import torch
|
||||
from torch import nn
|
||||
from torch import optim
|
||||
from tqdm.auto import tqdm
|
||||
import collections
|
||||
import copy
|
||||
from typing import Union, List, Tuple, Dict
|
||||
|
||||
from ....data.dataset.weight import Reweighter
|
||||
from ....model.meta.dataset import MetaTaskDataset
|
||||
from ....model.meta.model import MetaModel, MetaTaskModel
|
||||
from ....workflow import R
|
||||
|
||||
from .utils import ICLoss
|
||||
from .dataset import MetaDatasetDS
|
||||
from qlib.contrib.meta.data_selection.net import PredNet
|
||||
from qlib.data.dataset.weight import Reweighter
|
||||
from qlib.log import get_module_logger
|
||||
|
||||
logger = get_module_logger("data selection")
|
||||
|
||||
|
||||
class TimeReweighter(Reweighter):
|
||||
def __init__(self, time_weight: pd.Series):
|
||||
self.time_weight = time_weight
|
||||
|
||||
def reweight(self, data: Union[pd.DataFrame, pd.Series]):
|
||||
# TODO: handling TSDataSampler
|
||||
w_s = pd.Series(1.0, index=data.index)
|
||||
for k, w in self.time_weight.items():
|
||||
w_s.loc[slice(*k)] = w
|
||||
logger.info(f"Reweighting result: {w_s}")
|
||||
return w_s
|
||||
|
||||
|
||||
class MetaModelDS(MetaTaskModel):
|
||||
"""
|
||||
The meta-model for meta-learning-based data selection.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
step,
|
||||
hist_step_n,
|
||||
clip_method="tanh",
|
||||
clip_weight=2.0,
|
||||
criterion="ic_loss",
|
||||
lr=0.0001,
|
||||
max_epoch=100,
|
||||
seed=43,
|
||||
):
|
||||
self.step = step
|
||||
self.hist_step_n = hist_step_n
|
||||
self.clip_method = clip_method
|
||||
self.clip_weight = clip_weight
|
||||
self.criterion = criterion
|
||||
self.lr = lr
|
||||
self.max_epoch = max_epoch
|
||||
self.fitted = False
|
||||
torch.manual_seed(seed)
|
||||
|
||||
def run_epoch(self, phase, task_list, epoch, opt, loss_l, ignore_weight=False):
|
||||
if phase == "train":
|
||||
self.tn.train()
|
||||
torch.set_grad_enabled(True)
|
||||
else:
|
||||
self.tn.eval()
|
||||
torch.set_grad_enabled(False)
|
||||
running_loss = 0.0
|
||||
pred_y_all = []
|
||||
for task in tqdm(task_list, desc=f"{phase} Task", leave=False):
|
||||
meta_input = task.get_meta_input()
|
||||
pred, weights = self.tn(
|
||||
meta_input["X"],
|
||||
meta_input["y"],
|
||||
meta_input["time_perf"],
|
||||
meta_input["time_belong"],
|
||||
meta_input["X_test"],
|
||||
ignore_weight=ignore_weight,
|
||||
)
|
||||
if self.criterion == "mse":
|
||||
criterion = nn.MSELoss()
|
||||
loss = criterion(pred, meta_input["y_test"])
|
||||
elif self.criterion == "ic_loss":
|
||||
criterion = ICLoss()
|
||||
try:
|
||||
loss = criterion(pred, meta_input["y_test"], meta_input["test_idx"], skip_size=50)
|
||||
except ValueError as e:
|
||||
get_module_logger("MetaModelDS").warning(f"Exception `{e}` when calculating IC loss")
|
||||
continue
|
||||
|
||||
assert not np.isnan(loss.detach().item()), "NaN loss!"
|
||||
|
||||
if phase == "train":
|
||||
opt.zero_grad()
|
||||
norm_loss = nn.MSELoss()
|
||||
loss.backward()
|
||||
opt.step()
|
||||
elif phase == "test":
|
||||
pass
|
||||
|
||||
pred_y_all.append(
|
||||
pd.DataFrame(
|
||||
{
|
||||
"pred": pd.Series(pred.detach().cpu().numpy(), index=meta_input["test_idx"]),
|
||||
"label": pd.Series(meta_input["y_test"].detach().cpu().numpy(), index=meta_input["test_idx"]),
|
||||
}
|
||||
)
|
||||
)
|
||||
running_loss += loss.detach().item()
|
||||
running_loss = running_loss / len(task_list)
|
||||
loss_l.setdefault(phase, []).append(running_loss)
|
||||
|
||||
pred_y_all = pd.concat(pred_y_all)
|
||||
ic = pred_y_all.groupby("datetime").apply(lambda df: df["pred"].corr(df["label"], method="spearman")).mean()
|
||||
|
||||
R.log_metrics(**{f"loss/{phase}": running_loss, "step": epoch})
|
||||
R.log_metrics(**{f"ic/{phase}": ic, "step": epoch})
|
||||
|
||||
def fit(self, meta_dataset: MetaDatasetDS):
|
||||
"""
|
||||
The meta-learning-based data selection interacts directly with meta-dataset due to the close-form proxy measurement.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
meta_dataset : MetaDatasetDS
|
||||
The meta-model takes the meta-dataset for its training process.
|
||||
"""
|
||||
|
||||
if not self.fitted:
|
||||
for k in set(["lr", "step", "hist_step_n", "clip_method", "clip_weight", "criterion", "max_epoch"]):
|
||||
R.log_params(**{k: getattr(self, k)})
|
||||
|
||||
# FIXME: get test tasks for just checking the performance
|
||||
phases = ["train", "test"]
|
||||
meta_tasks_l = meta_dataset.prepare_tasks(phases)
|
||||
|
||||
if len(meta_tasks_l[1]):
|
||||
R.log_params(
|
||||
**dict(proxy_test_begin=meta_tasks_l[1][0].task["dataset"]["kwargs"]["segments"]["test"])
|
||||
) # debug: record when the test phase starts
|
||||
|
||||
self.tn = PredNet(
|
||||
step=self.step, hist_step_n=self.hist_step_n, clip_weight=self.clip_weight, clip_method=self.clip_method
|
||||
)
|
||||
|
||||
opt = optim.Adam(self.tn.parameters(), lr=self.lr)
|
||||
|
||||
# run weight with no weight
|
||||
for phase, task_list in zip(phases, meta_tasks_l):
|
||||
self.run_epoch(f"{phase}_noweight", task_list, 0, opt, {}, ignore_weight=True)
|
||||
self.run_epoch(f"{phase}_init", task_list, 0, opt, {})
|
||||
|
||||
# run training
|
||||
loss_l = {}
|
||||
for epoch in tqdm(range(self.max_epoch), desc="epoch"):
|
||||
for phase, task_list in zip(phases, meta_tasks_l):
|
||||
self.run_epoch(phase, task_list, epoch, opt, loss_l)
|
||||
R.save_objects(**{"model.pkl": self.tn})
|
||||
self.fitted = True
|
||||
|
||||
def _prepare_task(self, task: MetaTask) -> dict:
|
||||
meta_ipt = task.get_meta_input()
|
||||
weights = self.tn.twm(meta_ipt["time_perf"])
|
||||
|
||||
weight_s = pd.Series(weights.detach().cpu().numpy(), index=task.meta_info.columns)
|
||||
task = copy.copy(task.task) # NOTE: this is a shallow copy.
|
||||
task["reweighter"] = TimeReweighter(weight_s)
|
||||
return task
|
||||
|
||||
def inference(self, meta_dataset: MetaTaskDataset) -> List[dict]:
|
||||
res = []
|
||||
for mt in meta_dataset.prepare_tasks("test"):
|
||||
res.append(self._prepare_task(mt))
|
||||
return res
|
||||
68
qlib/contrib/meta/data_selection/net.py
Normal file
68
qlib/contrib/meta/data_selection/net.py
Normal file
@@ -0,0 +1,68 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
import torch
|
||||
from torch import nn
|
||||
|
||||
from .utils import preds_to_weight_with_clamp, SingleMetaBase
|
||||
|
||||
|
||||
class TimeWeightMeta(SingleMetaBase):
|
||||
def __init__(self, hist_step_n, clip_weight=None, clip_method="clamp"):
|
||||
# clip_method includes "tanh" or "clamp"
|
||||
super().__init__(hist_step_n, clip_weight, clip_method)
|
||||
self.linear = nn.Linear(hist_step_n, 1)
|
||||
self.k = nn.Parameter(torch.Tensor([8.0]))
|
||||
|
||||
def forward(self, time_perf, time_belong=None, return_preds=False):
|
||||
hist_step_n = self.linear.in_features
|
||||
# NOTE: the reshape order is very important
|
||||
time_perf = time_perf.reshape(hist_step_n, time_perf.shape[0] // hist_step_n, *time_perf.shape[1:])
|
||||
time_perf = torch.mean(time_perf, dim=1, keepdim=False)
|
||||
|
||||
preds = []
|
||||
for i in range(time_perf.shape[1]):
|
||||
preds.append(self.linear(time_perf[:, i]))
|
||||
preds = torch.cat(preds)
|
||||
preds = preds - torch.mean(preds) # avoid using future information
|
||||
preds = preds * self.k
|
||||
if return_preds:
|
||||
if time_belong is None:
|
||||
return preds
|
||||
else:
|
||||
return time_belong @ preds
|
||||
else:
|
||||
weights = preds_to_weight_with_clamp(preds, self.clip_weight, self.clip_method)
|
||||
if time_belong is None:
|
||||
return weights
|
||||
else:
|
||||
return time_belong @ weights
|
||||
|
||||
|
||||
class PredNet(nn.Module):
|
||||
def __init__(self, step, hist_step_n, clip_weight=None, clip_method="tanh"):
|
||||
super().__init__()
|
||||
self.step = step
|
||||
self.twm = TimeWeightMeta(hist_step_n=hist_step_n, clip_weight=clip_weight, clip_method=clip_method)
|
||||
self.init_paramters(hist_step_n)
|
||||
|
||||
def get_sample_weights(self, X, time_perf, time_belong, ignore_weight=False):
|
||||
weights = torch.from_numpy(np.ones(X.shape[0])).float().to(X.device)
|
||||
if not ignore_weight:
|
||||
if time_perf is not None:
|
||||
weights_t = self.twm(time_perf, time_belong)
|
||||
weights = weights * weights_t
|
||||
return weights
|
||||
|
||||
def forward(self, X, y, time_perf, time_belong, X_test, ignore_weight=False):
|
||||
"""Please refer to the docs of MetaTaskDS for the description of the variables"""
|
||||
weights = self.get_sample_weights(X, time_perf, time_belong, ignore_weight=ignore_weight)
|
||||
X_w = X.T * weights.view(1, -1)
|
||||
theta = torch.inverse(X_w @ X) @ X_w @ y
|
||||
return X_test @ theta, weights
|
||||
|
||||
def init_paramters(self, hist_step_n):
|
||||
self.twm.linear.weight.data = 1.0 / hist_step_n + self.twm.linear.weight.data * 0.01
|
||||
self.twm.linear.bias.data.fill_(0.0)
|
||||
98
qlib/contrib/meta/data_selection/utils.py
Normal file
98
qlib/contrib/meta/data_selection/utils.py
Normal file
@@ -0,0 +1,98 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
import torch
|
||||
from torch import nn
|
||||
from qlib.contrib.torch import data_to_tensor
|
||||
|
||||
|
||||
class ICLoss(nn.Module):
|
||||
def forward(self, pred, y, idx, skip_size=50):
|
||||
"""forward.
|
||||
|
||||
:param pred:
|
||||
:param y:
|
||||
:param idx: Assume the level of the idx is (date, inst), and it is sorted
|
||||
"""
|
||||
prev = None
|
||||
diff_point = []
|
||||
for i, (date, inst) in enumerate(idx):
|
||||
if date != prev:
|
||||
diff_point.append(i)
|
||||
prev = date
|
||||
diff_point.append(None)
|
||||
|
||||
ic_all = 0.0
|
||||
skip_n = 0
|
||||
for start_i, end_i in zip(diff_point, diff_point[1:]):
|
||||
pred_focus = pred[start_i:end_i] # TODO: just for fake
|
||||
if pred_focus.shape[0] < skip_size:
|
||||
# skip some days which have very small amount of stock.
|
||||
skip_n += 1
|
||||
continue
|
||||
y_focus = y[start_i:end_i]
|
||||
ic_day = torch.dot(
|
||||
(pred_focus - pred_focus.mean()) / np.sqrt(pred_focus.shape[0]) / pred_focus.std(),
|
||||
(y_focus - y_focus.mean()) / np.sqrt(y_focus.shape[0]) / y_focus.std(),
|
||||
)
|
||||
ic_all += ic_day
|
||||
if len(diff_point) - 1 - skip_n <= 0:
|
||||
raise ValueError("No enough data for calculating iC")
|
||||
ic_mean = ic_all / (len(diff_point) - 1 - skip_n)
|
||||
return -ic_mean # ic loss
|
||||
|
||||
|
||||
def preds_to_weight_with_clamp(preds, clip_weight=None, clip_method="tanh"):
|
||||
"""
|
||||
Clip the weights.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
clip_weight: float
|
||||
The clip threshold.
|
||||
clip_method: str
|
||||
The clip method. Current available: "clamp", "tanh", and "sigmoid".
|
||||
"""
|
||||
if clip_weight is not None:
|
||||
if clip_method == "clamp":
|
||||
weights = torch.exp(preds)
|
||||
weights = weights.clamp(1.0 / clip_weight, clip_weight)
|
||||
elif clip_method == "tanh":
|
||||
weights = torch.exp(torch.tanh(preds) * np.log(clip_weight))
|
||||
elif clip_method == "sigmoid":
|
||||
# intuitively assume its sum is 1
|
||||
if clip_weight == 0.0:
|
||||
weights = torch.ones_like(preds)
|
||||
else:
|
||||
sm = nn.Sigmoid()
|
||||
weights = sm(preds) * clip_weight # TODO: The clip_weight is useless here.
|
||||
weights = weights / torch.sum(weights) * weights.numel()
|
||||
else:
|
||||
raise ValueError("Unknown clip_method")
|
||||
else:
|
||||
weights = torch.exp(preds)
|
||||
return weights
|
||||
|
||||
|
||||
class SingleMetaBase(nn.Module):
|
||||
def __init__(self, hist_n, clip_weight=None, clip_method="clamp"):
|
||||
# method can be tanh or clamp
|
||||
super().__init__()
|
||||
self.clip_weight = clip_weight
|
||||
if clip_method in ["tanh", "clamp"]:
|
||||
if self.clip_weight is not None and self.clip_weight < 1.0:
|
||||
self.clip_weight = 1 / self.clip_weight
|
||||
self.clip_method = clip_method
|
||||
|
||||
def is_enabled(self):
|
||||
if self.clip_weight is None:
|
||||
return True
|
||||
if self.clip_method == "sigmoid":
|
||||
if self.clip_weight > 0.0:
|
||||
return True
|
||||
else:
|
||||
if self.clip_weight > 1.0:
|
||||
return True
|
||||
return False
|
||||
@@ -11,6 +11,7 @@ from ...model.base import Model
|
||||
from ...data.dataset import DatasetH
|
||||
from ...data.dataset.handler import DataHandlerLP
|
||||
from ...model.interpret.base import FeatureInt
|
||||
from ...data.dataset.weight import Reweighter
|
||||
|
||||
|
||||
class CatBoostModel(Model, FeatureInt):
|
||||
@@ -31,6 +32,7 @@ class CatBoostModel(Model, FeatureInt):
|
||||
early_stopping_rounds=50,
|
||||
verbose_eval=20,
|
||||
evals_result=dict(),
|
||||
reweighter=None,
|
||||
**kwargs
|
||||
):
|
||||
df_train, df_valid = dataset.prepare(
|
||||
@@ -49,8 +51,17 @@ class CatBoostModel(Model, FeatureInt):
|
||||
else:
|
||||
raise ValueError("CatBoost doesn't support multi-label training")
|
||||
|
||||
train_pool = Pool(data=x_train, label=y_train_1d)
|
||||
valid_pool = Pool(data=x_valid, label=y_valid_1d)
|
||||
if reweighter is None:
|
||||
w_train = None
|
||||
w_valid = None
|
||||
elif isinstance(reweighter, Reweighter):
|
||||
w_train = reweighter.reweight(df_train).values
|
||||
w_valid = reweighter.reweight(df_valid).values
|
||||
else:
|
||||
raise ValueError("Unsupported reweighter type.")
|
||||
|
||||
train_pool = Pool(data=x_train, label=y_train_1d, weight=w_train)
|
||||
valid_pool = Pool(data=x_valid, label=y_valid_1d, weight=w_valid)
|
||||
|
||||
# Initialize the catboost model
|
||||
self._params["iterations"] = num_boost_round
|
||||
|
||||
@@ -4,59 +4,73 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import lightgbm as lgb
|
||||
from typing import Text, Union
|
||||
from typing import List, Text, Tuple, Union
|
||||
from ...model.base import ModelFT
|
||||
from ...data.dataset import DatasetH
|
||||
from ...data.dataset.handler import DataHandlerLP
|
||||
from ...model.interpret.base import LightGBMFInt
|
||||
from ...data.dataset.weight import Reweighter
|
||||
|
||||
|
||||
class LGBModel(ModelFT, LightGBMFInt):
|
||||
"""LightGBM Model"""
|
||||
|
||||
def __init__(self, loss="mse", early_stopping_rounds=50, **kwargs):
|
||||
def __init__(self, loss="mse", early_stopping_rounds=50, num_boost_round=1000, **kwargs):
|
||||
if loss not in {"mse", "binary"}:
|
||||
raise NotImplementedError
|
||||
self.params = {"objective": loss, "verbosity": -1}
|
||||
self.params.update(kwargs)
|
||||
self.early_stopping_rounds = early_stopping_rounds
|
||||
self.num_boost_round = num_boost_round
|
||||
self.model = None
|
||||
|
||||
def _prepare_data(self, dataset: DatasetH):
|
||||
df_train, df_valid = dataset.prepare(
|
||||
["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L
|
||||
)
|
||||
if df_train.empty or df_valid.empty:
|
||||
raise ValueError("Empty data from dataset, please check your dataset config.")
|
||||
x_train, y_train = df_train["feature"], df_train["label"]
|
||||
x_valid, y_valid = df_valid["feature"], df_valid["label"]
|
||||
def _prepare_data(self, dataset: DatasetH, reweighter=None) -> List[Tuple[lgb.Dataset, str]]:
|
||||
"""
|
||||
The motivation of current version is to make validation optional
|
||||
- train segment is necessary;
|
||||
"""
|
||||
ds_l = []
|
||||
assert "train" in dataset.segments
|
||||
for key in ["train", "valid"]:
|
||||
if key in dataset.segments:
|
||||
df = dataset.prepare(key, col_set=["feature", "label"], data_key=DataHandlerLP.DK_L)
|
||||
if df.empty:
|
||||
raise ValueError("Empty data from dataset, please check your dataset config.")
|
||||
x, y = df["feature"], df["label"]
|
||||
|
||||
# Lightgbm need 1D array as its label
|
||||
if y_train.values.ndim == 2 and y_train.values.shape[1] == 1:
|
||||
y_train, y_valid = np.squeeze(y_train.values), np.squeeze(y_valid.values)
|
||||
else:
|
||||
raise ValueError("LightGBM doesn't support multi-label training")
|
||||
# Lightgbm need 1D array as its label
|
||||
if y.values.ndim == 2 and y.values.shape[1] == 1:
|
||||
y = np.squeeze(y.values)
|
||||
else:
|
||||
raise ValueError("LightGBM doesn't support multi-label training")
|
||||
|
||||
dtrain = lgb.Dataset(x_train, label=y_train)
|
||||
dvalid = lgb.Dataset(x_valid, label=y_valid)
|
||||
return dtrain, dvalid
|
||||
if reweighter is None:
|
||||
w = None
|
||||
elif isinstance(reweighter, Reweighter):
|
||||
w = reweighter.reweight(df)
|
||||
else:
|
||||
raise ValueError("Unsupported reweighter type.")
|
||||
ds_l.append((lgb.Dataset(x.values, label=y, weight=w), key))
|
||||
return ds_l
|
||||
|
||||
def fit(
|
||||
self,
|
||||
dataset: DatasetH,
|
||||
num_boost_round=1000,
|
||||
num_boost_round=None,
|
||||
early_stopping_rounds=None,
|
||||
verbose_eval=20,
|
||||
evals_result=dict(),
|
||||
reweighter=None,
|
||||
**kwargs
|
||||
):
|
||||
dtrain, dvalid = self._prepare_data(dataset)
|
||||
ds_l = self._prepare_data(dataset, reweighter)
|
||||
ds, names = list(zip(*ds_l))
|
||||
self.model = lgb.train(
|
||||
self.params,
|
||||
dtrain,
|
||||
num_boost_round=num_boost_round,
|
||||
valid_sets=[dtrain, dvalid],
|
||||
valid_names=["train", "valid"],
|
||||
ds[0], # training dataset
|
||||
num_boost_round=self.num_boost_round if num_boost_round is None else num_boost_round,
|
||||
valid_sets=ds,
|
||||
valid_names=names,
|
||||
early_stopping_rounds=(
|
||||
self.early_stopping_rounds if early_stopping_rounds is None else early_stopping_rounds
|
||||
),
|
||||
@@ -64,8 +78,8 @@ class LGBModel(ModelFT, LightGBMFInt):
|
||||
evals_result=evals_result,
|
||||
**kwargs
|
||||
)
|
||||
evals_result["train"] = list(evals_result["train"].values())[0]
|
||||
evals_result["valid"] = list(evals_result["valid"].values())[0]
|
||||
for k in names:
|
||||
evals_result[k] = list(evals_result[k].values())[0]
|
||||
|
||||
def predict(self, dataset: DatasetH, segment: Union[Text, slice] = "test"):
|
||||
if self.model is None:
|
||||
@@ -73,7 +87,7 @@ class LGBModel(ModelFT, LightGBMFInt):
|
||||
x_test = dataset.prepare(segment, col_set="feature", data_key=DataHandlerLP.DK_I)
|
||||
return pd.Series(self.model.predict(x_test.values), index=x_test.index)
|
||||
|
||||
def finetune(self, dataset: DatasetH, num_boost_round=10, verbose_eval=20):
|
||||
def finetune(self, dataset: DatasetH, num_boost_round=10, verbose_eval=20, reweighter=None):
|
||||
"""
|
||||
finetune model
|
||||
|
||||
@@ -87,7 +101,7 @@ class LGBModel(ModelFT, LightGBMFInt):
|
||||
verbose level
|
||||
"""
|
||||
# Based on existing model and finetune by train more rounds
|
||||
dtrain, _ = self._prepare_data(dataset)
|
||||
dtrain, _ = self._prepare_data(dataset, reweighter)
|
||||
if dtrain.empty:
|
||||
raise ValueError("Empty data from dataset, please check your dataset config.")
|
||||
self.model = lgb.train(
|
||||
|
||||
@@ -56,7 +56,7 @@ class HFLGBModel(ModelFT, LightGBMFInt):
|
||||
|
||||
def hf_signal_test(self, dataset: DatasetH, threhold=0.2):
|
||||
"""
|
||||
Test the sigal in high frequency test set
|
||||
Test the signal in high frequency test set
|
||||
"""
|
||||
if self.model == None:
|
||||
raise ValueError("Model hasn't been trained yet")
|
||||
@@ -86,7 +86,7 @@ class HFLGBModel(ModelFT, LightGBMFInt):
|
||||
raise ValueError("Empty data from dataset, please check your dataset config.")
|
||||
|
||||
x_train, y_train = df_train["feature"], df_train["label"]
|
||||
x_valid, y_valid = df_train["feature"], df_valid["label"]
|
||||
x_valid, y_valid = df_valid["feature"], df_valid["label"]
|
||||
if y_train.values.ndim == 2 and y_train.values.shape[1] == 1:
|
||||
l_name = df_train["label"].columns[0]
|
||||
# Convert label into alpha
|
||||
|
||||
@@ -4,6 +4,7 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from typing import Text, Union
|
||||
from qlib.data.dataset.weight import Reweighter
|
||||
from scipy.optimize import nnls
|
||||
from sklearn.linear_model import LinearRegression, Ridge, Lasso
|
||||
|
||||
@@ -49,33 +50,40 @@ class LinearModel(Model):
|
||||
|
||||
self.coef_ = None
|
||||
|
||||
def fit(self, dataset: DatasetH):
|
||||
def fit(self, dataset: DatasetH, reweighter: Reweighter = None):
|
||||
df_train = dataset.prepare("train", col_set=["feature", "label"], data_key=DataHandlerLP.DK_L)
|
||||
if df_train.empty:
|
||||
raise ValueError("Empty data from dataset, please check your dataset config.")
|
||||
if reweighter is not None:
|
||||
w: pd.Series = reweighter.reweight(df_train)
|
||||
w = w.values
|
||||
else:
|
||||
w = None
|
||||
X, y = df_train["feature"].values, np.squeeze(df_train["label"].values)
|
||||
|
||||
if self.estimator in [self.OLS, self.RIDGE, self.LASSO]:
|
||||
self._fit(X, y)
|
||||
self._fit(X, y, w)
|
||||
elif self.estimator == self.NNLS:
|
||||
self._fit_nnls(X, y)
|
||||
self._fit_nnls(X, y, w)
|
||||
else:
|
||||
raise ValueError(f"unknown estimator `{self.estimator}`")
|
||||
|
||||
return self
|
||||
|
||||
def _fit(self, X, y):
|
||||
def _fit(self, X, y, w):
|
||||
if self.estimator == self.OLS:
|
||||
model = LinearRegression(fit_intercept=self.fit_intercept, copy_X=False)
|
||||
else:
|
||||
model = {self.RIDGE: Ridge, self.LASSO: Lasso}[self.estimator](
|
||||
alpha=self.alpha, fit_intercept=self.fit_intercept, copy_X=False
|
||||
)
|
||||
model.fit(X, y)
|
||||
model.fit(X, y, sample_weight=w)
|
||||
self.coef_ = model.coef_
|
||||
self.intercept_ = model.intercept_
|
||||
|
||||
def _fit_nnls(self, X, y):
|
||||
def _fit_nnls(self, X, y, w=None):
|
||||
if w is not None:
|
||||
raise NotImplementedError("TODO: support nnls with weight") # TODO
|
||||
if self.fit_intercept:
|
||||
X = np.c_[X, np.ones(len(X))] # NOTE: mem copy
|
||||
coef = nnls(X, y)[0]
|
||||
|
||||
@@ -554,7 +554,7 @@ class AdaRNN(nn.Module):
|
||||
return fc_out
|
||||
|
||||
|
||||
class TransferLoss(object):
|
||||
class TransferLoss:
|
||||
def __init__(self, loss_type="cosine", input_dim=512):
|
||||
"""
|
||||
Supported loss_type: mmd(mmd_lin), mmd_rbf, coral, cosine, kl, js, mine, adv
|
||||
|
||||
@@ -22,6 +22,8 @@ from .pytorch_utils import count_parameters
|
||||
from ...model.base import Model
|
||||
from ...data.dataset import DatasetH, TSDatasetH
|
||||
from ...data.dataset.handler import DataHandlerLP
|
||||
from ...model.utils import ConcatDataset
|
||||
from ...data.dataset.weight import Reweighter
|
||||
|
||||
|
||||
class ALSTM(Model):
|
||||
@@ -139,15 +141,18 @@ class ALSTM(Model):
|
||||
def use_gpu(self):
|
||||
return self.device != torch.device("cpu")
|
||||
|
||||
def mse(self, pred, label):
|
||||
loss = (pred - label) ** 2
|
||||
def mse(self, pred, label, weight):
|
||||
loss = weight * (pred - label) ** 2
|
||||
return torch.mean(loss)
|
||||
|
||||
def loss_fn(self, pred, label):
|
||||
def loss_fn(self, pred, label, weight=None):
|
||||
mask = ~torch.isnan(label)
|
||||
|
||||
if weight is None:
|
||||
weight = torch.ones_like(label)
|
||||
|
||||
if self.loss == "mse":
|
||||
return self.mse(pred[mask], label[mask])
|
||||
return self.mse(pred[mask], label[mask], weight[mask])
|
||||
|
||||
raise ValueError("unknown loss `%s`" % self.loss)
|
||||
|
||||
@@ -164,12 +169,12 @@ class ALSTM(Model):
|
||||
|
||||
self.ALSTM_model.train()
|
||||
|
||||
for data in data_loader:
|
||||
for (data, weight) in data_loader:
|
||||
feature = data[:, :, 0:-1].to(self.device)
|
||||
label = data[:, -1, -1].to(self.device)
|
||||
|
||||
pred = self.ALSTM_model(feature.float())
|
||||
loss = self.loss_fn(pred, label)
|
||||
loss = self.loss_fn(pred, label, weight.to(self.device))
|
||||
|
||||
self.train_optimizer.zero_grad()
|
||||
loss.backward()
|
||||
@@ -183,7 +188,7 @@ class ALSTM(Model):
|
||||
scores = []
|
||||
losses = []
|
||||
|
||||
for data in data_loader:
|
||||
for (data, weight) in data_loader:
|
||||
|
||||
feature = data[:, :, 0:-1].to(self.device)
|
||||
# feature[torch.isnan(feature)] = 0
|
||||
@@ -191,7 +196,7 @@ class ALSTM(Model):
|
||||
|
||||
with torch.no_grad():
|
||||
pred = self.ALSTM_model(feature.float())
|
||||
loss = self.loss_fn(pred, label)
|
||||
loss = self.loss_fn(pred, label, weight.to(self.device))
|
||||
losses.append(loss.item())
|
||||
|
||||
score = self.metric_fn(pred, label)
|
||||
@@ -204,6 +209,7 @@ class ALSTM(Model):
|
||||
dataset,
|
||||
evals_result=dict(),
|
||||
save_path=None,
|
||||
reweighter=None,
|
||||
):
|
||||
dl_train = dataset.prepare("train", col_set=["feature", "label"], data_key=DataHandlerLP.DK_L)
|
||||
dl_valid = dataset.prepare("valid", col_set=["feature", "label"], data_key=DataHandlerLP.DK_L)
|
||||
@@ -213,11 +219,28 @@ class ALSTM(Model):
|
||||
dl_train.config(fillna_type="ffill+bfill") # process nan brought by dataloader
|
||||
dl_valid.config(fillna_type="ffill+bfill") # process nan brought by dataloader
|
||||
|
||||
if reweighter is None:
|
||||
wl_train = np.ones(len(dl_train))
|
||||
wl_valid = np.ones(len(dl_valid))
|
||||
elif isinstance(reweighter, Reweighter):
|
||||
wl_train = reweighter.reweight(dl_train)
|
||||
wl_valid = reweighter.reweight(dl_valid)
|
||||
else:
|
||||
raise ValueError("Unsupported reweighter type.")
|
||||
|
||||
train_loader = DataLoader(
|
||||
dl_train, batch_size=self.batch_size, shuffle=True, num_workers=self.n_jobs, drop_last=True
|
||||
ConcatDataset(dl_train, wl_train),
|
||||
batch_size=self.batch_size,
|
||||
shuffle=True,
|
||||
num_workers=self.n_jobs,
|
||||
drop_last=True,
|
||||
)
|
||||
valid_loader = DataLoader(
|
||||
dl_valid, batch_size=self.batch_size, shuffle=False, num_workers=self.n_jobs, drop_last=True
|
||||
ConcatDataset(dl_valid, wl_valid),
|
||||
batch_size=self.batch_size,
|
||||
shuffle=False,
|
||||
num_workers=self.n_jobs,
|
||||
drop_last=True,
|
||||
)
|
||||
|
||||
save_path = get_or_create_path(save_path)
|
||||
|
||||
@@ -260,7 +260,7 @@ class GATs(Model):
|
||||
|
||||
if self.model_path is not None:
|
||||
self.logger.info("Loading pretrained model...")
|
||||
pretrained_model.load_state_dict(torch.load(self.model_path))
|
||||
pretrained_model.load_state_dict(torch.load(self.model_path, map_location=self.device))
|
||||
|
||||
model_dict = self.GAT_model.state_dict()
|
||||
pretrained_dict = {k: v for k, v in pretrained_model.state_dict().items() if k in model_dict}
|
||||
|
||||
@@ -276,7 +276,7 @@ class GATs(Model):
|
||||
|
||||
if self.model_path is not None:
|
||||
self.logger.info("Loading pretrained model...")
|
||||
pretrained_model.load_state_dict(torch.load(self.model_path))
|
||||
pretrained_model.load_state_dict(torch.load(self.model_path, map_location=self.device))
|
||||
|
||||
model_dict = self.GAT_model.state_dict()
|
||||
pretrained_dict = {k: v for k, v in pretrained_model.state_dict().items() if k in model_dict}
|
||||
|
||||
@@ -21,6 +21,8 @@ from .pytorch_utils import count_parameters
|
||||
from ...model.base import Model
|
||||
from ...data.dataset import DatasetH, TSDatasetH
|
||||
from ...data.dataset.handler import DataHandlerLP
|
||||
from ...model.utils import ConcatDataset
|
||||
from ...data.dataset.weight import Reweighter
|
||||
|
||||
|
||||
class GRU(Model):
|
||||
@@ -138,15 +140,18 @@ class GRU(Model):
|
||||
def use_gpu(self):
|
||||
return self.device != torch.device("cpu")
|
||||
|
||||
def mse(self, pred, label):
|
||||
loss = (pred - label) ** 2
|
||||
def mse(self, pred, label, weight):
|
||||
loss = weight * (pred - label) ** 2
|
||||
return torch.mean(loss)
|
||||
|
||||
def loss_fn(self, pred, label):
|
||||
def loss_fn(self, pred, label, weight=None):
|
||||
mask = ~torch.isnan(label)
|
||||
|
||||
if weight is None:
|
||||
weight = torch.ones_like(label)
|
||||
|
||||
if self.loss == "mse":
|
||||
return self.mse(pred[mask], label[mask])
|
||||
return self.mse(pred[mask], label[mask], weight[mask])
|
||||
|
||||
raise ValueError("unknown loss `%s`" % self.loss)
|
||||
|
||||
@@ -163,12 +168,12 @@ class GRU(Model):
|
||||
|
||||
self.GRU_model.train()
|
||||
|
||||
for data in data_loader:
|
||||
for (data, weight) in data_loader:
|
||||
feature = data[:, :, 0:-1].to(self.device)
|
||||
label = data[:, -1, -1].to(self.device)
|
||||
|
||||
pred = self.GRU_model(feature.float())
|
||||
loss = self.loss_fn(pred, label)
|
||||
loss = self.loss_fn(pred, label, weight.to(self.device))
|
||||
|
||||
self.train_optimizer.zero_grad()
|
||||
loss.backward()
|
||||
@@ -182,7 +187,7 @@ class GRU(Model):
|
||||
scores = []
|
||||
losses = []
|
||||
|
||||
for data in data_loader:
|
||||
for (data, weight) in data_loader:
|
||||
|
||||
feature = data[:, :, 0:-1].to(self.device)
|
||||
# feature[torch.isnan(feature)] = 0
|
||||
@@ -190,7 +195,7 @@ class GRU(Model):
|
||||
|
||||
with torch.no_grad():
|
||||
pred = self.GRU_model(feature.float())
|
||||
loss = self.loss_fn(pred, label)
|
||||
loss = self.loss_fn(pred, label, weight.to(self.device))
|
||||
losses.append(loss.item())
|
||||
|
||||
score = self.metric_fn(pred, label)
|
||||
@@ -203,6 +208,7 @@ class GRU(Model):
|
||||
dataset,
|
||||
evals_result=dict(),
|
||||
save_path=None,
|
||||
reweighter=None,
|
||||
):
|
||||
dl_train = dataset.prepare("train", col_set=["feature", "label"], data_key=DataHandlerLP.DK_L)
|
||||
dl_valid = dataset.prepare("valid", col_set=["feature", "label"], data_key=DataHandlerLP.DK_L)
|
||||
@@ -212,11 +218,28 @@ class GRU(Model):
|
||||
dl_train.config(fillna_type="ffill+bfill") # process nan brought by dataloader
|
||||
dl_valid.config(fillna_type="ffill+bfill") # process nan brought by dataloader
|
||||
|
||||
if reweighter is None:
|
||||
wl_train = np.ones(len(dl_train))
|
||||
wl_valid = np.ones(len(dl_valid))
|
||||
elif isinstance(reweighter, Reweighter):
|
||||
wl_train = reweighter.reweight(dl_train)
|
||||
wl_valid = reweighter.reweight(dl_valid)
|
||||
else:
|
||||
raise ValueError("Unsupported reweighter type.")
|
||||
|
||||
train_loader = DataLoader(
|
||||
dl_train, batch_size=self.batch_size, shuffle=True, num_workers=self.n_jobs, drop_last=True
|
||||
ConcatDataset(dl_train, wl_train),
|
||||
batch_size=self.batch_size,
|
||||
shuffle=True,
|
||||
num_workers=self.n_jobs,
|
||||
drop_last=True,
|
||||
)
|
||||
valid_loader = DataLoader(
|
||||
dl_valid, batch_size=self.batch_size, shuffle=False, num_workers=self.n_jobs, drop_last=True
|
||||
ConcatDataset(dl_valid, wl_valid),
|
||||
batch_size=self.batch_size,
|
||||
shuffle=False,
|
||||
num_workers=self.n_jobs,
|
||||
drop_last=True,
|
||||
)
|
||||
|
||||
save_path = get_or_create_path(save_path)
|
||||
|
||||
@@ -20,6 +20,8 @@ from torch.utils.data import DataLoader
|
||||
from ...model.base import Model
|
||||
from ...data.dataset import DatasetH, TSDatasetH
|
||||
from ...data.dataset.handler import DataHandlerLP
|
||||
from ...model.utils import ConcatDataset
|
||||
from ...data.dataset.weight import Reweighter
|
||||
|
||||
|
||||
class LSTM(Model):
|
||||
@@ -134,15 +136,18 @@ class LSTM(Model):
|
||||
def use_gpu(self):
|
||||
return self.device != torch.device("cpu")
|
||||
|
||||
def mse(self, pred, label):
|
||||
loss = (pred - label) ** 2
|
||||
def mse(self, pred, label, weight):
|
||||
loss = weight * (pred - label) ** 2
|
||||
return torch.mean(loss)
|
||||
|
||||
def loss_fn(self, pred, label):
|
||||
mask = ~torch.isnan(label)
|
||||
|
||||
if weight is None:
|
||||
weight = torch.ones_like(label)
|
||||
|
||||
if self.loss == "mse":
|
||||
return self.mse(pred[mask], label[mask])
|
||||
return self.mse(pred[mask], label[mask], weight[mask])
|
||||
|
||||
raise ValueError("unknown loss `%s`" % self.loss)
|
||||
|
||||
@@ -159,12 +164,12 @@ class LSTM(Model):
|
||||
|
||||
self.LSTM_model.train()
|
||||
|
||||
for data in data_loader:
|
||||
for (data, weight) in data_loader:
|
||||
feature = data[:, :, 0:-1].to(self.device)
|
||||
label = data[:, -1, -1].to(self.device)
|
||||
|
||||
pred = self.LSTM_model(feature.float())
|
||||
loss = self.loss_fn(pred, label)
|
||||
loss = self.loss_fn(pred, label, weight.to(self.device))
|
||||
|
||||
self.train_optimizer.zero_grad()
|
||||
loss.backward()
|
||||
@@ -178,14 +183,14 @@ class LSTM(Model):
|
||||
scores = []
|
||||
losses = []
|
||||
|
||||
for data in data_loader:
|
||||
for (data, weight) in data_loader:
|
||||
|
||||
feature = data[:, :, 0:-1].to(self.device)
|
||||
# feature[torch.isnan(feature)] = 0
|
||||
label = data[:, -1, -1].to(self.device)
|
||||
|
||||
pred = self.LSTM_model(feature.float())
|
||||
loss = self.loss_fn(pred, label)
|
||||
loss = self.loss_fn(pred, label, weight.to(self.device))
|
||||
losses.append(loss.item())
|
||||
|
||||
score = self.metric_fn(pred, label)
|
||||
@@ -198,6 +203,7 @@ class LSTM(Model):
|
||||
dataset,
|
||||
evals_result=dict(),
|
||||
save_path=None,
|
||||
reweighter=None,
|
||||
):
|
||||
dl_train = dataset.prepare("train", col_set=["feature", "label"], data_key=DataHandlerLP.DK_L)
|
||||
dl_valid = dataset.prepare("valid", col_set=["feature", "label"], data_key=DataHandlerLP.DK_L)
|
||||
@@ -207,11 +213,28 @@ class LSTM(Model):
|
||||
dl_train.config(fillna_type="ffill+bfill") # process nan brought by dataloader
|
||||
dl_valid.config(fillna_type="ffill+bfill") # process nan brought by dataloader
|
||||
|
||||
if reweighter is None:
|
||||
wl_train = np.ones(len(dl_train))
|
||||
wl_valid = np.ones(len(dl_valid))
|
||||
elif isinstance(reweighter, Reweighter):
|
||||
wl_train = reweighter.reweight(dl_train)
|
||||
wl_valid = reweighter.reweight(dl_valid)
|
||||
else:
|
||||
raise ValueError("Unsupported reweighter type.")
|
||||
|
||||
train_loader = DataLoader(
|
||||
dl_train, batch_size=self.batch_size, shuffle=True, num_workers=self.n_jobs, drop_last=True
|
||||
ConcatDataset(dl_train, wl_train),
|
||||
batch_size=self.batch_size,
|
||||
shuffle=True,
|
||||
num_workers=self.n_jobs,
|
||||
drop_last=True,
|
||||
)
|
||||
valid_loader = DataLoader(
|
||||
dl_valid, batch_size=self.batch_size, shuffle=False, num_workers=self.n_jobs, drop_last=True
|
||||
ConcatDataset(dl_valid, wl_valid),
|
||||
batch_size=self.batch_size,
|
||||
shuffle=False,
|
||||
num_workers=self.n_jobs,
|
||||
drop_last=True,
|
||||
)
|
||||
|
||||
save_path = get_or_create_path(save_path)
|
||||
|
||||
@@ -19,6 +19,7 @@ from .pytorch_utils import count_parameters
|
||||
from ...model.base import Model
|
||||
from ...data.dataset import DatasetH
|
||||
from ...data.dataset.handler import DataHandlerLP
|
||||
from ...data.dataset.weight import Reweighter
|
||||
from ...utils import unpack_archive_with_buffer, save_multiple_parts_file, get_or_create_path
|
||||
from ...log import get_module_logger
|
||||
from ...workflow import R
|
||||
@@ -97,7 +98,6 @@ class DNNModelPytorch(Model):
|
||||
"\nlr_decay_steps : {}"
|
||||
"\noptimizer : {}"
|
||||
"\nloss_type : {}"
|
||||
"\neval_steps : {}"
|
||||
"\nseed : {}"
|
||||
"\ndevice : {}"
|
||||
"\nuse_GPU : {}"
|
||||
@@ -112,7 +112,6 @@ class DNNModelPytorch(Model):
|
||||
lr_decay_steps,
|
||||
optimizer,
|
||||
loss,
|
||||
eval_steps,
|
||||
seed,
|
||||
self.device,
|
||||
self.use_gpu,
|
||||
@@ -166,18 +165,22 @@ class DNNModelPytorch(Model):
|
||||
evals_result=dict(),
|
||||
verbose=True,
|
||||
save_path=None,
|
||||
reweighter=None,
|
||||
):
|
||||
df_train, df_valid = dataset.prepare(
|
||||
["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L
|
||||
)
|
||||
x_train, y_train = df_train["feature"], df_train["label"]
|
||||
x_valid, y_valid = df_valid["feature"], df_valid["label"]
|
||||
try:
|
||||
wdf_train, wdf_valid = dataset.prepare(["train", "valid"], col_set=["weight"], data_key=DataHandlerLP.DK_L)
|
||||
w_train, w_valid = wdf_train["weight"], wdf_valid["weight"]
|
||||
except KeyError as e:
|
||||
|
||||
if reweighter is None:
|
||||
w_train = pd.DataFrame(np.ones_like(y_train.values), index=y_train.index)
|
||||
w_valid = pd.DataFrame(np.ones_like(y_valid.values), index=y_valid.index)
|
||||
elif isinstance(reweighter, Reweighter):
|
||||
w_train = pd.DataFrame(reweighter.reweight(df_train))
|
||||
w_valid = pd.DataFrame(reweighter.reweight(df_valid))
|
||||
else:
|
||||
raise ValueError("Unsupported reweighter type.")
|
||||
|
||||
save_path = get_or_create_path(save_path)
|
||||
stop_steps = 0
|
||||
@@ -257,7 +260,7 @@ class DNNModelPytorch(Model):
|
||||
self.scheduler.step(cur_loss_val)
|
||||
|
||||
# restore the optimal parameters after training
|
||||
self.dnn_model.load_state_dict(torch.load(save_path))
|
||||
self.dnn_model.load_state_dict(torch.load(save_path, map_location=self.device))
|
||||
if self.use_gpu:
|
||||
torch.cuda.empty_cache()
|
||||
|
||||
@@ -296,7 +299,7 @@ class DNNModelPytorch(Model):
|
||||
]
|
||||
_model_path = os.path.join(model_dir, _model_name)
|
||||
# Load model
|
||||
self.dnn_model.load_state_dict(torch.load(_model_path))
|
||||
self.dnn_model.load_state_dict(torch.load(_model_path, map_location=self.device))
|
||||
self.fitted = True
|
||||
|
||||
|
||||
@@ -326,8 +329,8 @@ class Net(nn.Module):
|
||||
dnn_layers = []
|
||||
drop_input = nn.Dropout(0.05)
|
||||
dnn_layers.append(drop_input)
|
||||
for i, (input_dim, hidden_units) in enumerate(zip(layers[:-1], layers[1:])):
|
||||
fc = nn.Linear(input_dim, hidden_units)
|
||||
for i, (_input_dim, hidden_units) in enumerate(zip(layers[:-1], layers[1:])):
|
||||
fc = nn.Linear(_input_dim, hidden_units)
|
||||
activation = nn.LeakyReLU(negative_slope=0.1, inplace=False)
|
||||
bn = nn.BatchNorm1d(hidden_units)
|
||||
seq = nn.Sequential(fc, bn, activation)
|
||||
|
||||
@@ -160,7 +160,7 @@ class TabnetModel(Model):
|
||||
self.logger.info("Pretrain...")
|
||||
self.pretrain_fn(dataset, self.pretrain_file)
|
||||
self.logger.info("Load Pretrain model")
|
||||
self.tabnet_model.load_state_dict(torch.load(self.pretrain_file))
|
||||
self.tabnet_model.load_state_dict(torch.load(self.pretrain_file, map_location=self.device))
|
||||
|
||||
# adding one more linear layer to fit the final output dimension
|
||||
self.tabnet_model = FinetuneModel(self.out_dim, self.final_out_dim, self.tabnet_model).to(self.device)
|
||||
@@ -446,7 +446,7 @@ class TabNet(nn.Module):
|
||||
Args:
|
||||
n_d: dimension of the features used to calculate the final results
|
||||
n_a: dimension of the features input to the attention transformer of the next step
|
||||
n_shared: numbr of shared steps in feature transfomer(optional)
|
||||
n_shared: numbr of shared steps in feature transformer(optional)
|
||||
n_ind: number of independent steps in feature transformer
|
||||
n_steps: number of steps of pass through tabbet
|
||||
relax coefficient:
|
||||
@@ -479,7 +479,7 @@ class TabNet(nn.Module):
|
||||
out = torch.zeros(x.size(0), self.n_d).to(x.device)
|
||||
for step in self.steps:
|
||||
x_te, l = step(x, x_a, priors)
|
||||
out += F.relu(x_te[:, : self.n_d]) # split the feautre from feat_transformer
|
||||
out += F.relu(x_te[:, : self.n_d]) # split the feature from feat_transformer
|
||||
x_a = x_te[:, self.n_d :]
|
||||
sparse_loss.append(l)
|
||||
return self.fc(out), sum(sparse_loss)
|
||||
|
||||
@@ -56,6 +56,7 @@ class TCTS(Model):
|
||||
loss="mse",
|
||||
fore_optimizer="adam",
|
||||
weight_optimizer="adam",
|
||||
input_dim=360,
|
||||
output_dim=5,
|
||||
fore_lr=5e-7,
|
||||
weight_lr=5e-7,
|
||||
@@ -83,6 +84,7 @@ class TCTS(Model):
|
||||
self.device = torch.device("cuda:%d" % (GPU) if torch.cuda.is_available() else "cpu")
|
||||
self.use_gpu = torch.cuda.is_available()
|
||||
self.seed = seed
|
||||
self.input_dim = input_dim
|
||||
self.output_dim = output_dim
|
||||
self.fore_lr = fore_lr
|
||||
self.weight_lr = weight_lr
|
||||
@@ -139,7 +141,6 @@ class TCTS(Model):
|
||||
raise NotImplementedError("mode {} is not supported!".format(self.mode))
|
||||
|
||||
def train_epoch(self, x_train, y_train, x_valid, y_valid):
|
||||
|
||||
x_train_values = x_train.values
|
||||
y_train_values = np.squeeze(y_train.values)
|
||||
|
||||
@@ -297,7 +298,7 @@ class TCTS(Model):
|
||||
dropout=self.dropout,
|
||||
)
|
||||
self.weight_model = MLPModel(
|
||||
d_feat=360 + 3 * self.output_dim + 1,
|
||||
d_feat=self.input_dim + 3 * self.output_dim + 1,
|
||||
hidden_size=self.hidden_size,
|
||||
num_layers=self.num_layers,
|
||||
dropout=self.dropout,
|
||||
@@ -350,9 +351,9 @@ class TCTS(Model):
|
||||
break
|
||||
|
||||
print("best loss:", best_loss, "@", best_epoch)
|
||||
best_param = torch.load(save_path + "_fore_model.bin")
|
||||
best_param = torch.load(save_path + "_fore_model.bin", map_location=self.device)
|
||||
self.fore_model.load_state_dict(best_param)
|
||||
best_param = torch.load(save_path + "_weight_model.bin")
|
||||
best_param = torch.load(save_path + "_weight_model.bin", map_location=self.device)
|
||||
self.weight_model.load_state_dict(best_param)
|
||||
self.fitted = True
|
||||
|
||||
|
||||
@@ -19,12 +19,13 @@ import torch.nn.functional as F
|
||||
|
||||
try:
|
||||
from torch.utils.tensorboard import SummaryWriter
|
||||
except:
|
||||
except ImportError:
|
||||
SummaryWriter = None
|
||||
|
||||
from tqdm import tqdm
|
||||
|
||||
from qlib.utils import get_or_create_path
|
||||
from qlib.constant import EPS
|
||||
from qlib.log import get_module_logger
|
||||
from qlib.model.base import Model
|
||||
from qlib.contrib.data.dataset import MTSDatasetH
|
||||
@@ -232,7 +233,7 @@ class TRAModel(Model):
|
||||
choice_all.append(pd.DataFrame(choice.detach().cpu().numpy(), index=index))
|
||||
decay = self.rho ** (self.global_step // 100) # decay every 100 steps
|
||||
lamb = 0 if is_pretrain else self.lamb * decay
|
||||
reg = prob.log().mul(P).sum(dim=1).mean() # train router to predict OT assignment
|
||||
reg = prob.log().mul(P).sum(dim=1).mean() # train router to predict TO assignment
|
||||
if self._writer is not None and not is_pretrain:
|
||||
self._writer.add_scalar("training/router_loss", -reg.item(), self.global_step)
|
||||
self._writer.add_scalar("training/reg_loss", loss.item(), self.global_step)
|
||||
@@ -256,7 +257,7 @@ class TRAModel(Model):
|
||||
total_loss += loss.item()
|
||||
total_count += 1
|
||||
|
||||
if self.use_daily_transport and len(P_all):
|
||||
if self.use_daily_transport and len(P_all) > 0:
|
||||
P_all = pd.concat(P_all, axis=0)
|
||||
prob_all = pd.concat(prob_all, axis=0)
|
||||
choice_all = pd.concat(choice_all, axis=0)
|
||||
@@ -663,7 +664,7 @@ class TRA(nn.Module):
|
||||
|
||||
"""Temporal Routing Adaptor (TRA)
|
||||
|
||||
TRA takes historical prediction erros & latent representation as inputs,
|
||||
TRA takes historical prediction errors & latent representation as inputs,
|
||||
then routes the input sample to a specific predictor for training & inference.
|
||||
|
||||
Args:
|
||||
@@ -791,7 +792,7 @@ def minmax_norm(x):
|
||||
xmin = x.min(dim=-1, keepdim=True).values
|
||||
xmax = x.max(dim=-1, keepdim=True).values
|
||||
mask = (xmin == xmax).squeeze()
|
||||
x = (x - xmin) / (xmax - xmin + 1e-12)
|
||||
x = (x - xmin) / (xmax - xmin + EPS)
|
||||
x[mask] = 1
|
||||
return x
|
||||
|
||||
|
||||
@@ -33,5 +33,5 @@ def count_parameters(models_or_parameters, unit="m"):
|
||||
elif unit == "gb" or unit == "g":
|
||||
counts /= 2 ** 30
|
||||
elif unit is not None:
|
||||
raise ValueError("Unknow unit: {:}".format(unit))
|
||||
raise ValueError("Unknown unit: {:}".format(unit))
|
||||
return counts
|
||||
|
||||
@@ -9,6 +9,7 @@ from ...model.base import Model
|
||||
from ...data.dataset import DatasetH
|
||||
from ...data.dataset.handler import DataHandlerLP
|
||||
from ...model.interpret.base import FeatureInt
|
||||
from ...data.dataset.weight import Reweighter
|
||||
|
||||
|
||||
class XGBModel(Model, FeatureInt):
|
||||
@@ -26,6 +27,7 @@ class XGBModel(Model, FeatureInt):
|
||||
early_stopping_rounds=50,
|
||||
verbose_eval=20,
|
||||
evals_result=dict(),
|
||||
reweighter=None,
|
||||
**kwargs
|
||||
):
|
||||
|
||||
@@ -43,8 +45,17 @@ class XGBModel(Model, FeatureInt):
|
||||
else:
|
||||
raise ValueError("XGBoost doesn't support multi-label training")
|
||||
|
||||
dtrain = xgb.DMatrix(x_train, label=y_train_1d)
|
||||
dvalid = xgb.DMatrix(x_valid, label=y_valid_1d)
|
||||
if reweighter is None:
|
||||
w_train = None
|
||||
w_valid = None
|
||||
elif isinstance(reweighter, Reweighter):
|
||||
w_train = reweighter.reweight(df_train)
|
||||
w_valid = reweighter.reweight(df_valid)
|
||||
else:
|
||||
raise ValueError("Unsupported reweighter type.")
|
||||
|
||||
dtrain = xgb.DMatrix(x_train.values, label=y_train_1d, weight=w_train)
|
||||
dvalid = xgb.DMatrix(x_valid.values, label=y_valid_1d, weight=w_valid)
|
||||
self.model = xgb.train(
|
||||
self._params,
|
||||
dtrain=dtrain,
|
||||
|
||||
@@ -36,7 +36,7 @@ def save_instance(instance, file_path):
|
||||
save(dump) an instance to a pickle file
|
||||
Parameter
|
||||
instance :
|
||||
data to te dumped
|
||||
data to be dumped
|
||||
file_path : string / pathlib.Path()
|
||||
path of file to be dumped
|
||||
"""
|
||||
|
||||
@@ -15,7 +15,6 @@ from plotly.figure_factory import create_distplot
|
||||
|
||||
|
||||
class BaseGraph:
|
||||
""" """
|
||||
|
||||
_name = None
|
||||
|
||||
@@ -297,8 +296,8 @@ class SubplotsGraph:
|
||||
|
||||
:return:
|
||||
"""
|
||||
self._sub_graph_data = list()
|
||||
self._subplot_titles = list()
|
||||
self._sub_graph_data = []
|
||||
self._subplot_titles = []
|
||||
|
||||
for i, column_name in enumerate(self._df.columns):
|
||||
row = math.ceil((i + 1) / self.__cols)
|
||||
|
||||
@@ -5,6 +5,7 @@
|
||||
from .signal_strategy import (
|
||||
TopkDropoutStrategy,
|
||||
WeightStrategyBase,
|
||||
EnhancedIndexingStrategy,
|
||||
)
|
||||
|
||||
from .rule_strategy import (
|
||||
|
||||
@@ -47,7 +47,7 @@ class SoftTopkStrategy(WeightStrategyBase):
|
||||
Return the proportion of your total value you will used in investment.
|
||||
Dynamically risk_degree will result in Market timing
|
||||
"""
|
||||
# It will use 95% amoutn of your total value by default
|
||||
# It will use 95% amount of your total value by default
|
||||
return self.risk_degree
|
||||
|
||||
def generate_target_weight_position(self, score, current, trade_start_time, trade_end_time):
|
||||
|
||||
203
qlib/contrib/strategy/optimizer/enhanced_indexing.py
Normal file
203
qlib/contrib/strategy/optimizer/enhanced_indexing.py
Normal file
@@ -0,0 +1,203 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
import numpy as np
|
||||
import cvxpy as cp
|
||||
import pandas as pd
|
||||
|
||||
from typing import Union, Optional, Dict, Any, List
|
||||
|
||||
from qlib.log import get_module_logger
|
||||
from .base import BaseOptimizer
|
||||
|
||||
|
||||
logger = get_module_logger("EnhancedIndexingOptimizer")
|
||||
|
||||
|
||||
class EnhancedIndexingOptimizer(BaseOptimizer):
|
||||
"""
|
||||
Portfolio Optimizer for Enhanced Indexing
|
||||
|
||||
Notations:
|
||||
w0: current holding weights
|
||||
wb: benchmark weight
|
||||
r: expected return
|
||||
F: factor exposure
|
||||
cov_b: factor covariance
|
||||
var_u: residual variance (diagonal)
|
||||
lamb: risk aversion parameter
|
||||
delta: total turnover limit
|
||||
b_dev: benchmark deviation limit
|
||||
f_dev: factor deviation limit
|
||||
|
||||
Also denote:
|
||||
d = w - wb: benchmark deviation
|
||||
v = d @ F: factor deviation
|
||||
|
||||
The optimization problem for enhanced indexing:
|
||||
max_w d @ r - lamb * (v @ cov_b @ v + var_u @ d**2)
|
||||
s.t. w >= 0
|
||||
sum(w) == 1
|
||||
sum(|w - w0|) <= delta
|
||||
d >= -b_dev
|
||||
d <= b_dev
|
||||
v >= -f_dev
|
||||
v <= f_dev
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
lamb: float = 1,
|
||||
delta: Optional[float] = 0.2,
|
||||
b_dev: Optional[float] = 0.01,
|
||||
f_dev: Optional[Union[List[float], np.ndarray]] = None,
|
||||
scale_return: bool = True,
|
||||
epsilon: float = 5e-5,
|
||||
solver_kwargs: Optional[Dict[str, Any]] = {},
|
||||
):
|
||||
"""
|
||||
Args:
|
||||
lamb (float): risk aversion parameter (larger `lamb` means more focus on risk)
|
||||
delta (float): total turnover limit
|
||||
b_dev (float): benchmark deviation limit
|
||||
f_dev (list): factor deviation limit
|
||||
scale_return (bool): whether scale return to match estimated volatility
|
||||
epsilon (float): minimum weight
|
||||
solver_kwargs (dict): kwargs for cvxpy solver
|
||||
"""
|
||||
|
||||
assert lamb >= 0, "risk aversion parameter `lamb` should be positive"
|
||||
self.lamb = lamb
|
||||
|
||||
assert delta >= 0, "turnover limit `delta` should be positive"
|
||||
self.delta = delta
|
||||
|
||||
assert b_dev is None or b_dev >= 0, "benchmark deviation limit `b_dev` should be positive"
|
||||
self.b_dev = b_dev
|
||||
|
||||
if isinstance(f_dev, float):
|
||||
assert f_dev >= 0, "factor deviation limit `f_dev` should be positive"
|
||||
elif f_dev is not None:
|
||||
f_dev = np.array(f_dev)
|
||||
assert all(f_dev >= 0), "factor deviation limit `f_dev` should be positive"
|
||||
self.f_dev = f_dev
|
||||
|
||||
self.scale_return = scale_return
|
||||
self.epsilon = epsilon
|
||||
self.solver_kwargs = solver_kwargs
|
||||
|
||||
def __call__(
|
||||
self,
|
||||
r: np.ndarray,
|
||||
F: np.ndarray,
|
||||
cov_b: np.ndarray,
|
||||
var_u: np.ndarray,
|
||||
w0: np.ndarray,
|
||||
wb: np.ndarray,
|
||||
mfh: Optional[np.ndarray] = None,
|
||||
mfs: Optional[np.ndarray] = None,
|
||||
) -> np.ndarray:
|
||||
"""
|
||||
Args:
|
||||
r (np.ndarray): expected returns
|
||||
F (np.ndarray): factor exposure
|
||||
cov_b (np.ndarray): factor covariance
|
||||
var_u (np.ndarray): residual variance
|
||||
w0 (np.ndarray): current holding weights
|
||||
wb (np.ndarray): benchmark weights
|
||||
mfh (np.ndarray): mask force holding
|
||||
mfs (np.ndarray): mask force selling
|
||||
|
||||
Returns:
|
||||
np.ndarray: optimized portfolio allocation
|
||||
"""
|
||||
# scale return to match volatility
|
||||
if self.scale_return:
|
||||
r = r / r.std()
|
||||
r *= np.sqrt(np.mean(np.diag(F @ cov_b @ F.T) + var_u))
|
||||
|
||||
# target weight
|
||||
w = cp.Variable(len(r), nonneg=True)
|
||||
w.value = wb # for warm start
|
||||
|
||||
# precompute exposure
|
||||
d = w - wb # benchmark exposure
|
||||
v = d @ F # factor exposure
|
||||
|
||||
# objective
|
||||
ret = d @ r # excess return
|
||||
risk = cp.quad_form(v, cov_b) + var_u @ (d ** 2) # tracking error
|
||||
obj = cp.Maximize(ret - self.lamb * risk)
|
||||
|
||||
# weight bounds
|
||||
lb = np.zeros_like(wb)
|
||||
ub = np.ones_like(wb)
|
||||
|
||||
# bench bounds
|
||||
if self.b_dev is not None:
|
||||
lb = np.maximum(lb, wb - self.b_dev)
|
||||
ub = np.minimum(ub, wb + self.b_dev)
|
||||
|
||||
# force holding
|
||||
if mfh is not None:
|
||||
lb[mfh] = w0[mfh]
|
||||
ub[mfh] = w0[mfh]
|
||||
|
||||
# force selling
|
||||
# NOTE: this will override mfh
|
||||
if mfs is not None:
|
||||
lb[mfs] = 0
|
||||
ub[mfs] = 0
|
||||
|
||||
# constraints
|
||||
# TODO: currently we assume fullly invest in the stocks,
|
||||
# in the future we should support holding cash as an asset
|
||||
cons = [cp.sum(w) == 1, w >= lb, w <= ub]
|
||||
|
||||
# factor deviation
|
||||
if self.f_dev is not None:
|
||||
cons.extend([v >= -self.f_dev, v <= self.f_dev])
|
||||
|
||||
# total turnover constraint
|
||||
t_cons = []
|
||||
if self.delta is not None:
|
||||
if w0 is not None and w0.sum() > 0:
|
||||
t_cons.extend([cp.norm(w - w0, 1) <= self.delta])
|
||||
|
||||
# optimize
|
||||
# trial 1: use all constraints
|
||||
success = False
|
||||
try:
|
||||
prob = cp.Problem(obj, cons + t_cons)
|
||||
prob.solve(solver=cp.ECOS, warm_start=True, **self.solver_kwargs)
|
||||
assert prob.status == "optimal"
|
||||
success = True
|
||||
except Exception as e:
|
||||
logger.warning(f"trial 1 failed {e} (status: {prob.status})")
|
||||
|
||||
# trial 2: remove turnover constraint
|
||||
if not success and len(t_cons):
|
||||
logger.info("try removing turnover constraint as the last optimization failed")
|
||||
try:
|
||||
w.value = wb
|
||||
prob = cp.Problem(obj, cons)
|
||||
prob.solve(solver=cp.ECOS, warm_start=True, **self.solver_kwargs)
|
||||
assert prob.status in ["optimal", "optimal_inaccurate"]
|
||||
success = True
|
||||
except Exception as e:
|
||||
logger.warning(f"trial 2 failed {e} (status: {prob.status})")
|
||||
|
||||
# return current weight if not success
|
||||
if not success:
|
||||
logger.warning("optimization failed, will return current holding weight")
|
||||
return w0
|
||||
|
||||
if prob.status == "optimal_inaccurate":
|
||||
logger.warning(f"the optimization is inaccurate")
|
||||
|
||||
# remove small weight
|
||||
w = np.asarray(w.value)
|
||||
w[w < self.epsilon] = 0
|
||||
w /= w.sum()
|
||||
|
||||
return w
|
||||
@@ -8,7 +8,7 @@ import pandas as pd
|
||||
import scipy.optimize as so
|
||||
from typing import Optional, Union, Callable, List
|
||||
|
||||
from qlib.portfolio.optimizer import BaseOptimizer
|
||||
from .base import BaseOptimizer
|
||||
|
||||
|
||||
class PortfolioOptimizer(BaseOptimizer):
|
||||
@@ -35,7 +35,7 @@ class PortfolioOptimizer(BaseOptimizer):
|
||||
lamb: float = 0,
|
||||
delta: float = 0,
|
||||
alpha: float = 0.0,
|
||||
scale_alpha: bool = True,
|
||||
scale_return: bool = True,
|
||||
tol: float = 1e-8,
|
||||
):
|
||||
"""
|
||||
@@ -44,7 +44,7 @@ class PortfolioOptimizer(BaseOptimizer):
|
||||
lamb (float): risk aversion parameter (larger `lamb` means more focus on return)
|
||||
delta (float): turnover rate limit
|
||||
alpha (float): l2 norm regularizer
|
||||
scale_alpha (bool): if to scale alpha to match the volatility of the covariance matrix
|
||||
scale_return (bool): if to scale alpha to match the volatility of the covariance matrix
|
||||
tol (float): tolerance for optimization termination
|
||||
"""
|
||||
assert method in [self.OPT_GMV, self.OPT_MVO, self.OPT_RP, self.OPT_INV], f"method `{method}` is not supported"
|
||||
@@ -60,18 +60,18 @@ class PortfolioOptimizer(BaseOptimizer):
|
||||
self.alpha = alpha
|
||||
|
||||
self.tol = tol
|
||||
self.scale_alpha = scale_alpha
|
||||
self.scale_return = scale_return
|
||||
|
||||
def __call__(
|
||||
self,
|
||||
S: Union[np.ndarray, pd.DataFrame],
|
||||
u: Optional[Union[np.ndarray, pd.Series]] = None,
|
||||
r: Optional[Union[np.ndarray, pd.Series]] = None,
|
||||
w0: Optional[Union[np.ndarray, pd.Series]] = None,
|
||||
) -> Union[np.ndarray, pd.Series]:
|
||||
"""
|
||||
Args:
|
||||
S (np.ndarray or pd.DataFrame): covariance matrix
|
||||
u (np.ndarray or pd.Series): expected returns (a.k.a., alpha)
|
||||
r (np.ndarray or pd.Series): expected return
|
||||
w0 (np.ndarray or pd.Series): initial weights (for turnover control)
|
||||
|
||||
Returns:
|
||||
@@ -83,12 +83,12 @@ class PortfolioOptimizer(BaseOptimizer):
|
||||
index = S.index
|
||||
S = S.values
|
||||
|
||||
# transform alpha
|
||||
if u is not None:
|
||||
assert len(u) == len(S), "`u` has mismatched shape"
|
||||
if isinstance(u, pd.Series):
|
||||
assert u.index.equals(index), "`u` has mismatched index"
|
||||
u = u.values
|
||||
# transform return
|
||||
if r is not None:
|
||||
assert len(r) == len(S), "`r` has mismatched shape"
|
||||
if isinstance(r, pd.Series):
|
||||
assert r.index.equals(index), "`r` has mismatched index"
|
||||
r = r.values
|
||||
|
||||
# transform initial weights
|
||||
if w0 is not None:
|
||||
@@ -97,13 +97,13 @@ class PortfolioOptimizer(BaseOptimizer):
|
||||
assert w0.index.equals(index), "`w0` has mismatched index"
|
||||
w0 = w0.values
|
||||
|
||||
# scale alpha to match volatility
|
||||
if u is not None and self.scale_alpha:
|
||||
u = u / u.std()
|
||||
u *= np.mean(np.diag(S)) ** 0.5
|
||||
# scale return to match volatility
|
||||
if r is not None and self.scale_return:
|
||||
r = r / r.std()
|
||||
r *= np.sqrt(np.mean(np.diag(S)))
|
||||
|
||||
# optimize
|
||||
w = self._optimize(S, u, w0)
|
||||
w = self._optimize(S, r, w0)
|
||||
|
||||
# restore index if needed
|
||||
if index is not None:
|
||||
@@ -111,30 +111,30 @@ class PortfolioOptimizer(BaseOptimizer):
|
||||
|
||||
return w
|
||||
|
||||
def _optimize(self, S: np.ndarray, u: Optional[np.ndarray] = None, w0: Optional[np.ndarray] = None) -> np.ndarray:
|
||||
def _optimize(self, S: np.ndarray, r: Optional[np.ndarray] = None, w0: Optional[np.ndarray] = None) -> np.ndarray:
|
||||
|
||||
# inverse volatility
|
||||
if self.method == self.OPT_INV:
|
||||
if u is not None:
|
||||
warnings.warn("`u` is set but will not be used for `inv` portfolio")
|
||||
if r is not None:
|
||||
warnings.warn("`r` is set but will not be used for `inv` portfolio")
|
||||
if w0 is not None:
|
||||
warnings.warn("`w0` is set but will not be used for `inv` portfolio")
|
||||
return self._optimize_inv(S)
|
||||
|
||||
# global minimum variance
|
||||
if self.method == self.OPT_GMV:
|
||||
if u is not None:
|
||||
warnings.warn("`u` is set but will not be used for `gmv` portfolio")
|
||||
if r is not None:
|
||||
warnings.warn("`r` is set but will not be used for `gmv` portfolio")
|
||||
return self._optimize_gmv(S, w0)
|
||||
|
||||
# mean-variance
|
||||
if self.method == self.OPT_MVO:
|
||||
return self._optimize_mvo(S, u, w0)
|
||||
return self._optimize_mvo(S, r, w0)
|
||||
|
||||
# risk parity
|
||||
if self.method == self.OPT_RP:
|
||||
if u is not None:
|
||||
warnings.warn("`u` is set but will not be used for `rp` portfolio")
|
||||
if r is not None:
|
||||
warnings.warn("`r` is set but will not be used for `rp` portfolio")
|
||||
return self._optimize_rp(S, w0)
|
||||
|
||||
def _optimize_inv(self, S: np.ndarray) -> np.ndarray:
|
||||
@@ -155,17 +155,17 @@ class PortfolioOptimizer(BaseOptimizer):
|
||||
return self._solve(len(S), self._get_objective_gmv(S), *self._get_constrains(w0))
|
||||
|
||||
def _optimize_mvo(
|
||||
self, S: np.ndarray, u: Optional[np.ndarray] = None, w0: Optional[np.ndarray] = None
|
||||
self, S: np.ndarray, r: Optional[np.ndarray] = None, w0: Optional[np.ndarray] = None
|
||||
) -> np.ndarray:
|
||||
"""optimize mean-variance portfolio
|
||||
|
||||
This method solves the following optimization problem
|
||||
min_w - w' u + lamb * w' S w
|
||||
min_w - w' r + lamb * w' S w
|
||||
s.t. w >= 0, sum(w) == 1
|
||||
where `S` is the covariance matrix, `u` is the expected returns,
|
||||
and `lamb` is the risk aversion parameter.
|
||||
"""
|
||||
return self._solve(len(S), self._get_objective_mvo(S, u), *self._get_constrains(w0))
|
||||
return self._solve(len(S), self._get_objective_mvo(S, r), *self._get_constrains(w0))
|
||||
|
||||
def _optimize_rp(self, S: np.ndarray, w0: Optional[np.ndarray] = None) -> np.ndarray:
|
||||
"""optimize risk parity portfolio
|
||||
@@ -189,16 +189,16 @@ class PortfolioOptimizer(BaseOptimizer):
|
||||
|
||||
return func
|
||||
|
||||
def _get_objective_mvo(self, S: np.ndarray, u: np.ndarray = None) -> Callable:
|
||||
def _get_objective_mvo(self, S: np.ndarray, r: np.ndarray = None) -> Callable:
|
||||
"""mean-variance optimization objective
|
||||
|
||||
Optimization objective
|
||||
min_w - w' u + lamb * w' S w
|
||||
min_w - w' r + lamb * w' S w
|
||||
"""
|
||||
|
||||
def func(x):
|
||||
risk = x @ S @ x
|
||||
ret = x @ u
|
||||
ret = x @ r
|
||||
return -ret + self.lamb * risk
|
||||
|
||||
return func
|
||||
@@ -24,7 +24,7 @@ class TWAPStrategy(BaseStrategy):
|
||||
|
||||
NOTE:
|
||||
- This TWAP strategy will celling round when trading. This will make the TWAP trading strategy produce the order
|
||||
ealier when the total trade unit of amount is less than the trading step
|
||||
earlier when the total trade unit of amount is less than the trading step
|
||||
"""
|
||||
|
||||
def reset(self, outer_trade_decision: BaseTradeDecision = None, **kwargs):
|
||||
@@ -43,8 +43,8 @@ class TWAPStrategy(BaseStrategy):
|
||||
def generate_trade_decision(self, execute_result=None):
|
||||
# NOTE: corner cases!!!
|
||||
# - If using upperbound round, please don't sell the amount which should in next step
|
||||
# - the coordinate of the amount between steps is hard to be dealed between steps in the same level. It
|
||||
# is easier to be dealed in upper steps
|
||||
# - the coordinate of the amount between steps is hard to be dealt between steps in the same level. It
|
||||
# is easier to be dealt in upper steps
|
||||
|
||||
# strategy is not available. Give an empty decision
|
||||
if len(self.outer_trade_decision.get_decision()) == 0:
|
||||
|
||||
@@ -1,70 +1,49 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
import os
|
||||
import copy
|
||||
from qlib.backtest.signal import Signal, create_signal_from
|
||||
from typing import Dict, List, Text, Tuple, Union
|
||||
from qlib.data.dataset import Dataset
|
||||
from qlib.model.base import BaseModel
|
||||
from qlib.backtest.position import Position
|
||||
import warnings
|
||||
import cvxpy as cp
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from ...utils.resam import resam_ts_data
|
||||
from ...strategy.base import BaseStrategy
|
||||
from ...backtest.decision import Order, BaseTradeDecision, OrderDir, TradeDecisionWO
|
||||
from typing import Dict, List, Text, Tuple, Union
|
||||
|
||||
from .order_generator import OrderGenWInteract
|
||||
from qlib.data import D
|
||||
from qlib.data.dataset import Dataset
|
||||
from qlib.model.base import BaseModel
|
||||
from qlib.strategy.base import BaseStrategy
|
||||
from qlib.backtest.position import Position
|
||||
from qlib.backtest.signal import Signal, create_signal_from
|
||||
from qlib.backtest.decision import Order, BaseTradeDecision, OrderDir, TradeDecisionWO
|
||||
from qlib.log import get_module_logger
|
||||
from qlib.utils import get_pre_trading_date, load_dataset
|
||||
from qlib.utils.resam import resam_ts_data
|
||||
from qlib.contrib.strategy.order_generator import OrderGenWInteract, OrderGenWOInteract
|
||||
from qlib.contrib.strategy.optimizer import EnhancedIndexingOptimizer
|
||||
|
||||
|
||||
class TopkDropoutStrategy(BaseStrategy):
|
||||
# TODO:
|
||||
# 1. Supporting leverage the get_range_limit result from the decision
|
||||
# 2. Supporting alter_outer_trade_decision
|
||||
# 3. Supporting checking the availability of trade decision
|
||||
class BaseSignalStrategy(BaseStrategy):
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
topk,
|
||||
n_drop,
|
||||
signal: Union[Signal, Tuple[BaseModel, Dataset], List, Dict, Text, pd.Series, pd.DataFrame] = None,
|
||||
method_sell="bottom",
|
||||
method_buy="top",
|
||||
risk_degree=0.95,
|
||||
hold_thresh=1,
|
||||
only_tradable=False,
|
||||
model=None,
|
||||
dataset=None,
|
||||
risk_degree: float = 0.95,
|
||||
trade_exchange=None,
|
||||
level_infra=None,
|
||||
common_infra=None,
|
||||
model=None,
|
||||
dataset=None,
|
||||
**kwargs,
|
||||
):
|
||||
"""
|
||||
Parameters
|
||||
-----------
|
||||
topk : int
|
||||
the number of stocks in the portfolio.
|
||||
n_drop : int
|
||||
number of stocks to be replaced in each trading date.
|
||||
signal :
|
||||
the information to describe a signal. Please refer to the docs of `qlib.backtest.signal.create_signal_from`
|
||||
the decision of the strategy will base on the given signal
|
||||
method_sell : str
|
||||
dropout method_sell, random/bottom.
|
||||
method_buy : str
|
||||
dropout method_buy, random/top.
|
||||
risk_degree : float
|
||||
position percentage of total value.
|
||||
hold_thresh : int
|
||||
minimum holding days
|
||||
before sell stock , will check current.get_stock_count(order.stock_id) >= self.hold_thresh.
|
||||
only_tradable : bool
|
||||
will the strategy only consider the tradable stock when buying and selling.
|
||||
if only_tradable:
|
||||
strategy will make buy sell decision without checking the tradable state of the stock.
|
||||
else:
|
||||
strategy will make decision with the tradable state of the stock info and avoid buy and sell them.
|
||||
trade_exchange : Exchange
|
||||
exchange that provides market info, used to deal order and generate report
|
||||
- If `trade_exchange` is None, self.trade_exchange will be set with common_infra
|
||||
@@ -74,16 +53,9 @@ class TopkDropoutStrategy(BaseStrategy):
|
||||
- In minutely execution, the daily exchange is not usable, only the minutely exchange is recommended.
|
||||
|
||||
"""
|
||||
super(TopkDropoutStrategy, self).__init__(
|
||||
level_infra=level_infra, common_infra=common_infra, trade_exchange=trade_exchange, **kwargs
|
||||
)
|
||||
self.topk = topk
|
||||
self.n_drop = n_drop
|
||||
self.method_sell = method_sell
|
||||
self.method_buy = method_buy
|
||||
super().__init__(level_infra=level_infra, common_infra=common_infra, trade_exchange=trade_exchange, **kwargs)
|
||||
|
||||
self.risk_degree = risk_degree
|
||||
self.hold_thresh = hold_thresh
|
||||
self.only_tradable = only_tradable
|
||||
|
||||
# This is trying to be compatible with previous version of qlib task config
|
||||
if model is not None and dataset is not None:
|
||||
@@ -97,15 +69,65 @@ class TopkDropoutStrategy(BaseStrategy):
|
||||
Return the proportion of your total value you will used in investment.
|
||||
Dynamically risk_degree will result in Market timing.
|
||||
"""
|
||||
# It will use 95% amoutn of your total value by default
|
||||
# It will use 95% amount of your total value by default
|
||||
return self.risk_degree
|
||||
|
||||
|
||||
class TopkDropoutStrategy(BaseSignalStrategy):
|
||||
# TODO:
|
||||
# 1. Supporting leverage the get_range_limit result from the decision
|
||||
# 2. Supporting alter_outer_trade_decision
|
||||
# 3. Supporting checking the availability of trade decision
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
topk,
|
||||
n_drop,
|
||||
method_sell="bottom",
|
||||
method_buy="top",
|
||||
hold_thresh=1,
|
||||
only_tradable=False,
|
||||
**kwargs,
|
||||
):
|
||||
"""
|
||||
Parameters
|
||||
-----------
|
||||
topk : int
|
||||
the number of stocks in the portfolio.
|
||||
n_drop : int
|
||||
number of stocks to be replaced in each trading date.
|
||||
method_sell : str
|
||||
dropout method_sell, random/bottom.
|
||||
method_buy : str
|
||||
dropout method_buy, random/top.
|
||||
hold_thresh : int
|
||||
minimum holding days
|
||||
before sell stock , will check current.get_stock_count(order.stock_id) >= self.hold_thresh.
|
||||
only_tradable : bool
|
||||
will the strategy only consider the tradable stock when buying and selling.
|
||||
if only_tradable:
|
||||
strategy will make buy sell decision without checking the tradable state of the stock.
|
||||
else:
|
||||
strategy will make decision with the tradable state of the stock info and avoid buy and sell them.
|
||||
"""
|
||||
super().__init__(**kwargs)
|
||||
self.topk = topk
|
||||
self.n_drop = n_drop
|
||||
self.method_sell = method_sell
|
||||
self.method_buy = method_buy
|
||||
self.hold_thresh = hold_thresh
|
||||
self.only_tradable = only_tradable
|
||||
|
||||
def generate_trade_decision(self, execute_result=None):
|
||||
# get the number of trading step finished, trade_step can be [0, 1, 2, ..., trade_len - 1]
|
||||
trade_step = self.trade_calendar.get_trade_step()
|
||||
trade_start_time, trade_end_time = self.trade_calendar.get_step_time(trade_step)
|
||||
pred_start_time, pred_end_time = self.trade_calendar.get_step_time(trade_step, shift=1)
|
||||
pred_score = self.signal.get_signal(start_time=pred_start_time, end_time=pred_end_time)
|
||||
# NOTE: the current version of topk dropout strategy can't handle pd.DataFrame(multiple signal)
|
||||
# So it only leverage the first col of signal
|
||||
if isinstance(pred_score, pd.DataFrame):
|
||||
pred_score = pred_score.iloc[:, 0]
|
||||
if pred_score is None:
|
||||
return TradeDecisionWO([], self)
|
||||
if self.only_tradable:
|
||||
@@ -253,7 +275,7 @@ class TopkDropoutStrategy(BaseStrategy):
|
||||
return TradeDecisionWO(sell_order_list + buy_order_list, self)
|
||||
|
||||
|
||||
class WeightStrategyBase(BaseStrategy):
|
||||
class WeightStrategyBase(BaseSignalStrategy):
|
||||
# TODO:
|
||||
# 1. Supporting leverage the get_range_limit result from the decision
|
||||
# 2. Supporting alter_outer_trade_decision
|
||||
@@ -261,11 +283,7 @@ class WeightStrategyBase(BaseStrategy):
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
signal: Union[Signal, Tuple[BaseModel, Dataset], List, Dict, Text, pd.Series, pd.DataFrame],
|
||||
order_generator_cls_or_obj=OrderGenWInteract,
|
||||
trade_exchange=None,
|
||||
level_infra=None,
|
||||
common_infra=None,
|
||||
order_generator_cls_or_obj=OrderGenWOInteract,
|
||||
**kwargs,
|
||||
):
|
||||
"""
|
||||
@@ -280,24 +298,13 @@ class WeightStrategyBase(BaseStrategy):
|
||||
- In daily execution, both daily exchange and minutely are usable, but the daily exchange is recommended because it run faster.
|
||||
- In minutely execution, the daily exchange is not usable, only the minutely exchange is recommended.
|
||||
"""
|
||||
super(WeightStrategyBase, self).__init__(
|
||||
level_infra=level_infra, common_infra=common_infra, trade_exchange=trade_exchange, **kwargs
|
||||
)
|
||||
super().__init__(**kwargs)
|
||||
|
||||
if isinstance(order_generator_cls_or_obj, type):
|
||||
self.order_generator = order_generator_cls_or_obj()
|
||||
else:
|
||||
self.order_generator = order_generator_cls_or_obj
|
||||
|
||||
self.signal: Signal = create_signal_from(signal)
|
||||
|
||||
def get_risk_degree(self, trade_step=None):
|
||||
"""get_risk_degree
|
||||
Return the proportion of your total value you will used in investment.
|
||||
Dynamically risk_degree will result in Market timing.
|
||||
"""
|
||||
# It will use 95% amoutn of your total value by default
|
||||
return 0.95
|
||||
|
||||
def generate_target_weight_position(self, score, current, trade_start_time, trade_end_time):
|
||||
"""
|
||||
Generate target position from score for this date and the current position.The cash is not considered in the position
|
||||
@@ -341,3 +348,154 @@ class WeightStrategyBase(BaseStrategy):
|
||||
trade_end_time=trade_end_time,
|
||||
)
|
||||
return TradeDecisionWO(order_list, self)
|
||||
|
||||
|
||||
class EnhancedIndexingStrategy(WeightStrategyBase):
|
||||
|
||||
"""Enhanced Indexing Strategy
|
||||
|
||||
Enhanced indexing combines the arts of active management and passive management,
|
||||
with the aim of outperforming a benchmark index (e.g., S&P 500) in terms of
|
||||
portfolio return while controlling the risk exposure (a.k.a. tracking error).
|
||||
|
||||
Users need to prepare their risk model data like below:
|
||||
|
||||
├── /path/to/riskmodel
|
||||
├──── 20210101
|
||||
├────── factor_exp.{csv|pkl|h5}
|
||||
├────── factor_cov.{csv|pkl|h5}
|
||||
├────── specific_risk.{csv|pkl|h5}
|
||||
├────── blacklist.{csv|pkl|h5} # optional
|
||||
|
||||
The risk model data can be obtained from risk data provider. You can also use
|
||||
`qlib.model.riskmodel.structured.StructuredCovEstimator` to prepare these data.
|
||||
|
||||
Args:
|
||||
riskmodel_path (str): risk model path
|
||||
name_mapping (dict): alternative file names
|
||||
"""
|
||||
|
||||
FACTOR_EXP_NAME = "factor_exp.pkl"
|
||||
FACTOR_COV_NAME = "factor_cov.pkl"
|
||||
SPECIFIC_RISK_NAME = "specific_risk.pkl"
|
||||
BLACKLIST_NAME = "blacklist.pkl"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
riskmodel_root,
|
||||
market="csi500",
|
||||
turn_limit=None,
|
||||
name_mapping={},
|
||||
optimizer_kwargs={},
|
||||
verbose=False,
|
||||
**kwargs,
|
||||
):
|
||||
super().__init__(**kwargs)
|
||||
|
||||
self.logger = get_module_logger("EnhancedIndexingStrategy")
|
||||
|
||||
self.riskmodel_root = riskmodel_root
|
||||
self.market = market
|
||||
self.turn_limit = turn_limit
|
||||
|
||||
self.factor_exp_path = name_mapping.get("factor_exp", self.FACTOR_EXP_NAME)
|
||||
self.factor_cov_path = name_mapping.get("factor_cov", self.FACTOR_COV_NAME)
|
||||
self.specific_risk_path = name_mapping.get("specific_risk", self.SPECIFIC_RISK_NAME)
|
||||
self.blacklist_path = name_mapping.get("blacklist", self.BLACKLIST_NAME)
|
||||
|
||||
self.optimizer = EnhancedIndexingOptimizer(**optimizer_kwargs)
|
||||
|
||||
self.verbose = verbose
|
||||
|
||||
self._riskdata_cache = {}
|
||||
|
||||
def get_risk_data(self, date):
|
||||
|
||||
if date in self._riskdata_cache:
|
||||
return self._riskdata_cache[date]
|
||||
|
||||
root = self.riskmodel_root + "/" + date.strftime("%Y%m%d")
|
||||
if not os.path.exists(root):
|
||||
return None
|
||||
|
||||
factor_exp = load_dataset(root + "/" + self.factor_exp_path, index_col=[0])
|
||||
factor_cov = load_dataset(root + "/" + self.factor_cov_path, index_col=[0])
|
||||
specific_risk = load_dataset(root + "/" + self.specific_risk_path, index_col=[0])
|
||||
|
||||
if not factor_exp.index.equals(specific_risk.index):
|
||||
# NOTE: for stocks missing specific_risk, we always assume it have the highest volatility
|
||||
specific_risk = specific_risk.reindex(factor_exp.index, fill_value=specific_risk.max())
|
||||
|
||||
universe = factor_exp.index.tolist()
|
||||
|
||||
blacklist = []
|
||||
if os.path.exists(root + "/" + self.blacklist_path):
|
||||
blacklist = load_dataset(root + "/" + self.blacklist_path).index.tolist()
|
||||
|
||||
self._riskdata_cache[date] = factor_exp.values, factor_cov.values, specific_risk.values, universe, blacklist
|
||||
|
||||
return self._riskdata_cache[date]
|
||||
|
||||
def generate_target_weight_position(self, score, current, trade_start_time, trade_end_time):
|
||||
|
||||
trade_date = trade_start_time
|
||||
pre_date = get_pre_trading_date(trade_date, future=True) # previous trade date
|
||||
|
||||
# load risk data
|
||||
outs = self.get_risk_data(pre_date)
|
||||
if outs is None:
|
||||
self.logger.warning(f"no risk data for {pre_date:%Y-%m-%d}, skip optimization")
|
||||
return None
|
||||
factor_exp, factor_cov, specific_risk, universe, blacklist = outs
|
||||
|
||||
# transform score
|
||||
# NOTE: for stocks missing score, we always assume they have the lowest score
|
||||
score = score.reindex(universe).fillna(score.min()).values
|
||||
|
||||
# get current weight
|
||||
# NOTE: if a stock is not in universe, its current weight will be zero
|
||||
cur_weight = current.get_stock_weight_dict(only_stock=False)
|
||||
cur_weight = np.array([cur_weight.get(stock, 0) for stock in universe])
|
||||
assert all(cur_weight >= 0), "current weight has negative values"
|
||||
cur_weight = cur_weight / self.get_risk_degree(trade_date) # sum of weight should be risk_degree
|
||||
if cur_weight.sum() > 1 and self.verbose:
|
||||
self.logger.warning(f"previous total holdings excess risk degree (current: {cur_weight.sum()})")
|
||||
|
||||
# load bench weight
|
||||
bench_weight = D.features(
|
||||
D.instruments("all"), [f"${self.market}_weight"], start_time=pre_date, end_time=pre_date
|
||||
).squeeze()
|
||||
bench_weight.index = bench_weight.index.droplevel(level="datetime")
|
||||
bench_weight = bench_weight.reindex(universe).fillna(0).values
|
||||
|
||||
# whether stock tradable
|
||||
# NOTE: currently we use last day volume to check whether tradable
|
||||
tradable = D.features(D.instruments("all"), ["$volume"], start_time=pre_date, end_time=pre_date).squeeze()
|
||||
tradable.index = tradable.index.droplevel(level="datetime")
|
||||
tradable = tradable.reindex(universe).gt(0).values
|
||||
mask_force_hold = ~tradable
|
||||
|
||||
# mask force sell
|
||||
mask_force_sell = np.array([stock in blacklist for stock in universe], dtype=bool)
|
||||
|
||||
# optimize
|
||||
weight = self.optimizer(
|
||||
r=score,
|
||||
F=factor_exp,
|
||||
cov_b=factor_cov,
|
||||
var_u=specific_risk ** 2,
|
||||
w0=cur_weight,
|
||||
wb=bench_weight,
|
||||
mfh=mask_force_hold,
|
||||
mfs=mask_force_sell,
|
||||
)
|
||||
|
||||
target_weight_position = {stock: weight for stock, weight in zip(universe, weight) if weight > 0}
|
||||
|
||||
if self.verbose:
|
||||
self.logger.info("trade date: {:%Y-%m-%d}".format(trade_date))
|
||||
self.logger.info("number of holding stocks: {}".format(len(target_weight_position)))
|
||||
self.logger.info("total holding weight: {:.6f}".format(weight.sum()))
|
||||
|
||||
return target_weight_position
|
||||
|
||||
31
qlib/contrib/torch.py
Normal file
31
qlib/contrib/torch.py
Normal file
@@ -0,0 +1,31 @@
|
||||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT License.
|
||||
"""
|
||||
This module is not a necessary part of Qlib.
|
||||
They are just some tools for convenience
|
||||
It is should not imported into the core part of qlib
|
||||
"""
|
||||
import torch
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
|
||||
def data_to_tensor(data, device="cpu", raise_error=False):
|
||||
if isinstance(data, torch.Tensor):
|
||||
if device == "cpu":
|
||||
return data.cpu()
|
||||
else:
|
||||
return data.to(device)
|
||||
if isinstance(data, (pd.DataFrame, pd.Series)):
|
||||
return data_to_tensor(torch.from_numpy(data.values).float(), device)
|
||||
elif isinstance(data, np.ndarray):
|
||||
return data_to_tensor(torch.from_numpy(data).float(), device)
|
||||
elif isinstance(data, (tuple, list)):
|
||||
return [data_to_tensor(i, device) for i in data]
|
||||
elif isinstance(data, dict):
|
||||
return {k: data_to_tensor(v, device) for k, v in data.items()}
|
||||
else:
|
||||
if raise_error:
|
||||
raise ValueError(f"Unsupported data type: {type(data)}.")
|
||||
else:
|
||||
return data
|
||||
@@ -90,7 +90,7 @@ class QLibTuner(Tuner):
|
||||
|
||||
def objective(self, params):
|
||||
|
||||
# 1. Setup an config for a spcific estimator process
|
||||
# 1. Setup an config for a specific estimator process
|
||||
estimator_path = self.setup_estimator_config(params)
|
||||
self.logger.info("Searching params: {} ".format(params))
|
||||
|
||||
|
||||
@@ -359,7 +359,7 @@ class ExpressionCache(BaseProviderCache):
|
||||
def update(self, cache_uri: Union[str, Path], freq: str = "day"):
|
||||
"""Update expression cache to latest calendar.
|
||||
|
||||
Overide this method to define how to update expression cache corresponding to users' own cache mechanism.
|
||||
Override this method to define how to update expression cache corresponding to users' own cache mechanism.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
@@ -445,7 +445,7 @@ class DatasetCache(BaseProviderCache):
|
||||
def update(self, cache_uri: Union[str, Path], freq: str = "day"):
|
||||
"""Update dataset cache to latest calendar.
|
||||
|
||||
Overide this method to define how to update dataset cache corresponding to users' own cache mechanism.
|
||||
Override this method to define how to update dataset cache corresponding to users' own cache mechanism.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
@@ -543,7 +543,7 @@ class DiskExpressionCache(ExpressionCache):
|
||||
# instance
|
||||
series = self.provider.expression(instrument, field, _calendar[0], _calendar[-1], freq)
|
||||
if not series.empty:
|
||||
# This expresion is empty, we don't generate any cache for it.
|
||||
# This expression is empty, we don't generate any cache for it.
|
||||
with CacheUtils.writer_lock(self.r, f"{str(C.dpm.get_data_uri(freq))}:expression-{_cache_uri}"):
|
||||
self.gen_expression_cache(
|
||||
expression_data=series,
|
||||
@@ -858,7 +858,7 @@ class DiskDatasetCache(DatasetCache):
|
||||
"""gen_dataset_cache
|
||||
|
||||
.. note:: This function does not consider the cache read write lock. Please
|
||||
Aquire the lock outside this function
|
||||
Acquire the lock outside this function
|
||||
|
||||
The format the cache contains 3 parts(followed by typical filename).
|
||||
|
||||
@@ -1035,7 +1035,7 @@ class DiskDatasetCache(DatasetCache):
|
||||
# FIXME:
|
||||
# Because the feature cache are stored as .bin file.
|
||||
# So the series read from features are all float32.
|
||||
# However, the first dataset cache is calulated based on the
|
||||
# However, the first dataset cache is calculated based on the
|
||||
# raw data. So the data type may be float64.
|
||||
# Different data type will result in failure of appending data
|
||||
if "/{}".format(DatasetCache.HDF_KEY) in store.keys():
|
||||
|
||||
@@ -58,7 +58,7 @@ class Client:
|
||||
msg_proc_func : func
|
||||
the function to process the message when receiving response, should have arg `*args`.
|
||||
msg_queue: Queue
|
||||
The queue to pass the messsage after callback.
|
||||
The queue to pass the message after callback.
|
||||
"""
|
||||
head_info = {"version": qlib.__version__}
|
||||
|
||||
|
||||
@@ -16,7 +16,7 @@ from multiprocessing import Pool
|
||||
from typing import Iterable, Union
|
||||
from typing import List, Union
|
||||
|
||||
# For supporting multiprocessing in outter code, joblib is used
|
||||
# For supporting multiprocessing in outer code, joblib is used
|
||||
from joblib import delayed
|
||||
|
||||
from .cache import H
|
||||
@@ -55,15 +55,6 @@ class ProviderBackendMixin:
|
||||
def backend_obj(self, **kwargs):
|
||||
backend = self.backend if self.backend else self.get_default_backend()
|
||||
backend = copy.deepcopy(backend)
|
||||
|
||||
# set default storage kwargs
|
||||
# NOTE: provider_uri priority:
|
||||
# 1. backend_config: backend_obj["kwargs"]["provider_uri"]
|
||||
# 2. qlib.init: provider_uri
|
||||
backend_kwargs = backend.setdefault("kwargs", {})
|
||||
provider_uri = backend_kwargs.get("provider_uri", None)
|
||||
provider_uri = C.dpm.provider_uri if provider_uri is None else C.dpm.format_provider_uri(provider_uri)
|
||||
backend_kwargs["provider_uri"] = provider_uri
|
||||
backend.setdefault("kwargs", {}).update(**kwargs)
|
||||
return init_instance_by_config(backend)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
from ...utils.serial import Serializable
|
||||
from typing import Union, List, Tuple, Dict, Text, Optional
|
||||
from typing import Callable, Union, List, Tuple, Dict, Text, Optional
|
||||
from ...utils import init_instance_by_config, np_ffill, time_to_slc_point
|
||||
from ...log import get_module_logger
|
||||
from .handler import DataHandler, DataHandlerLP
|
||||
@@ -235,6 +235,28 @@ class DatasetH(Dataset):
|
||||
else:
|
||||
raise NotImplementedError(f"This type of input is not supported")
|
||||
|
||||
# helper functions
|
||||
@staticmethod
|
||||
def get_min_time(segments):
|
||||
return DatasetH._get_extrema(segments, 0, (lambda a, b: a > b))
|
||||
|
||||
@staticmethod
|
||||
def get_max_time(segments):
|
||||
return DatasetH._get_extrema(segments, 1, (lambda a, b: a < b))
|
||||
|
||||
@staticmethod
|
||||
def _get_extrema(segments, idx: int, cmp: Callable, key_func=pd.Timestamp):
|
||||
"""it will act like sort and return the max value or None"""
|
||||
candidate = None
|
||||
for k, seg in segments.items():
|
||||
point = seg[idx]
|
||||
if point is None:
|
||||
# None indicates unbounded, return directly
|
||||
return None
|
||||
elif candidate is None or cmp(key_func(candidate), key_func(point)):
|
||||
candidate = point
|
||||
return candidate
|
||||
|
||||
|
||||
class TSDataSampler:
|
||||
"""
|
||||
@@ -392,7 +414,7 @@ class TSDataSampler:
|
||||
2021-01-14 12441 12442 12443 12444 12445 12446 ...
|
||||
2) the second element: {<original index>: <row, col>}
|
||||
"""
|
||||
# object incase of pandas converting int to flaot
|
||||
# object incase of pandas converting int to float
|
||||
idx_df = pd.Series(range(data.shape[0]), index=data.index, dtype=object)
|
||||
idx_df = lazy_sort_index(idx_df.unstack())
|
||||
# NOTE: the correctness of `__getitem__` depends on columns sorted here
|
||||
@@ -572,7 +594,7 @@ class TSDatasetH(DatasetH):
|
||||
flt_kwargs = deepcopy(kwargs)
|
||||
if flt_col is not None:
|
||||
flt_kwargs["col_set"] = flt_col
|
||||
flt_data = self._prepare_seg(ext_slice, **flt_kwargs)
|
||||
flt_data = super()._prepare_seg(ext_slice, **flt_kwargs)
|
||||
assert len(flt_data.columns) == 1
|
||||
else:
|
||||
flt_data = None
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user