1
0
mirror of https://github.com/microsoft/qlib.git synced 2026-06-06 05:51:17 +08:00
Files
qlib/docs/hidden/tuner.rst
wangwenxi-handsome 3760a18a8d Merge nested main (#597)
* MVP for Indian Stocks in qlib using yahooquery

* cleaned with black

* cleaned with black

* add YahooNormalizeIN and YahooNormalizeIN1d

* cleaned the code

* added 1min for IN and also updated readme

* update comments

* fix comments

* recorder support upload both raw file and directory

* fix comments

* Update README.md

* Fix docs of QlibRecorder

* sort index after loader (#538)

make sure the fetch method is based on a index-sorted pd.DataFrame

* refactor online serving rolling api

* refactor TRA

* format by black

* fix horizon

* fix TRA when use single head

* clean up

* improve pretrain

* update README

* fix tra when logdir is None

* fix tra when logdir is None

* Update strategy.py

* Update README.md

* Update README.md

* Conda Suggestion

* code standard docs

* Update ensemble.py (#560)

* Fix CI  Bug (#575)


Co-authored-by: yuxwang <anduinnn@foxmail.com>

* Update gen.py (#576)

* Fix multi-process loop calls (#574)

* check lexsort in the 'lazy_sort_index' function (#566)

* check lexsort

* check lexsort

* lexsort comment

* lexsort comment

* Delete .DS_Store

* Update README.md

* bug fix & use oracle transport pretrain

* mend

* Add `backend_freq_config` parameter, support multi-freq uri

* Add sample_config to QlibDataLoader, support multi-freq

* add multi-freq example

* get_cls_kwargs renamed get_callable_kwargs

* support multi-freq uri

* Add inst_processors to D.features

* Fix typo

* Fix the index type of the multi-freq example

* Fix duplicate mlflow directories in tests

* Add DataPathManager to QlibConfig && modify inst_processors to supports list only

* Modify the default value in the multi_freq example

* Modify client-server mode and dataset-cache to disable inst_processor

* Add wheel package to github CI

* fix comment

* Update FAQ.rst

* Update README.md

Fix wrong link

* Update the docs of TaskManager (#586)

* Update manage.py

* update yaml

* update run_all_model

* Modify the Feature to be case sensitive (#589)

* update README

* remove verbose

* fix spell bug

* fix typos (#592)

* Update Release Note

* fix portfolio bug

* Add calendar support for resample

* add freq kwargs

* test.yml: Remove redundant code (#595)

* Supporting shared processor (#596)

* Supporting shared processor

* fix readonly reverse bug

* remove pytests dependency

* with fit bug

* fix parameter error

* fix comments

* Fix undefined names in Python code (#599)

* Update pytorch_tabnet.py

$ `flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics`
```
./qlib/qlib/contrib/model/pytorch_tabnet.py:567:38: F821 undefined name 'inp'
            self.independ.append(GLU(inp, out_dim, vbs=vbs))
                                     ^
./qlib/examples/model_rolling/task_manager_rolling.py:75:18: F821 undefined name 'task_train'
        run_task(task_train, self.task_pool, experiment_name=self.experiment_name)
                 ^
2     F821 undefined name 'task_train'
2
```

* Fix undefined names in Python code

* from qlib.model.trainer import task_train

* update seed

* fix some docstring

* add comments

* Fix SimpleDatasetCache

* Update setup.py

updated classifiers

* Update setup.py

change to matplotlib==3.3

* Update python-publish.yml

added python 3.9

* updategrade version number

* Update model list

* fix the type of filter_pipe

* fix comment

* fix record_temp

* update cvxpy version

* Update code_standard.rst (#587)

* Update code_standard.rst

* Update docs/developer/code_standard.rst

Co-authored-by: you-n-g <you-n-g@users.noreply.github.com>

Co-authored-by: you-n-g <you-n-g@users.noreply.github.com>

* Add file lock for MLflowExpManager (#619)

* fix torch version

* Share version number (#620)

* Update initialization.rst (#622)

* Update initialization.rst

* Update docs/start/initialization.rst

Co-authored-by: you-n-g <you-n-g@users.noreply.github.com>

* Update docs/start/initialization.rst

Co-authored-by: you-n-g <you-n-g@users.noreply.github.com>

Co-authored-by: you-n-g <you-n-g@users.noreply.github.com>

* fix bugs for running previous exmaple

* fix deal amount bug

* update change doc (#623)

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Delete change doc.gif

* Add files via upload

* Update README.md

* Delete change doc.gif

* Add files via upload

* Delete change doc.gif

* Add files via upload

* Update README.md

Co-authored-by: you-n-g <you-n-g@users.noreply.github.com>

Co-authored-by: you-n-g <you-n-g@users.noreply.github.com>

* update doc

* simplify run all model

* fix run all model bug

* Fix Models (#483)

* fix gat dataset

* fix tft model

* Update tft.py

* Fix tft.py

Co-authored-by: Pengrong Zhu <zhu.pengrong@foxmail.com>

* type and skip empty exp

* fix model yaml config

* fix tft import bug

* skip empty result

* fix model and yaml bug

* fix wrong generate parameter

* Modify multi-freq example (#626)

* modify the example of multi-freq

* add Copyright

* add a comment to average_ops.py

* modify the example of multi-freq

* add comment to multi_freq_handler.py

* add the Ref expression description to multi_freq_handler.py

* add expression description to multi_freq_handler.py

* update images

* fix workflow and update framework

Co-authored-by: Gaurav <2796gaurav@gmail.com>
Co-authored-by: 2796gaurav <17353992+2796gaurav@users.noreply.github.com>
Co-authored-by: bxdd <bxd98@126.com>
Co-authored-by: Young <afe.young@gmail.com>
Co-authored-by: you-n-g <you-n-g@users.noreply.github.com>
Co-authored-by: Dong Zhou <Zhou.Dong@microsoft.com>
Co-authored-by: ZhangTP1996 <ztp18@mails.tsinghua.edu.cn>
Co-authored-by: demon143 <59681577+demon143@users.noreply.github.com>
Co-authored-by: Wangwuyi123 <51237097+Wangwuyi123@users.noreply.github.com>
Co-authored-by: yuxwang <anduinnn@foxmail.com>
Co-authored-by: Pengrong Zhu <zhu.pengrong@foxmail.com>
Co-authored-by: Mark Zhao <50850474+markzhao98@users.noreply.github.com>
Co-authored-by: cslwqxx <cslwqxx@users.noreply.github.com>
Co-authored-by: Dong Zhou <evanzd@users.noreply.github.com>
Co-authored-by: SaintMalik <37118134+saintmalik@users.noreply.github.com>
Co-authored-by: Christian Clauss <cclauss@me.com>
Co-authored-by: Anurag Kumar <mailanu98@gmail.com>
Co-authored-by: demon143 <785696300@qq.com>
2021-10-01 02:15:30 +08:00

326 lines
14 KiB
ReStructuredText

.. _tuner:
Tuner
===================
.. currentmodule:: qlib
Introduction
-------------------
Welcome to use Tuner, this document is based on that you can use Estimator proficiently and correctly.
You can find the optimal hyper-parameters and combinations of models, trainers, strategies and data labels.
The usage of program `tuner` is similar with `estimator`, you need provide the URL of the configuration file.
The `tuner` will do the following things:
- Construct tuner pipeline
- Search and save best hyper-parameters of one tuner
- Search next tuner in pipeline
- Save the global best hyper-parameters and combination
Each tuner is consisted with a kind of combination of modules, and its goal is searching the optimal hyper-parameters of this combination.
The pipeline is consisted with different tuners, it is aim at finding the optimal combination of modules.
The result will be printed on screen and saved in file, you can check the result in your experiment saving files.
Example
~~~~~~~
Let's see an example,
First make sure you have the latest version of `qlib` installed.
Then, you need to privide a configuration to setup the experiment.
We write a simple configuration example as following,
.. code-block:: YAML
experiment:
name: tuner_experiment
tuner_class: QLibTuner
qlib_client:
auto_mount: False
logging_level: INFO
optimization_criteria:
report_type: model
report_factor: model_score
optim_type: max
tuner_pipeline:
-
model:
class: SomeModel
space: SomeModelSpace
trainer:
class: RollingTrainer
strategy:
class: TopkAmountStrategy
space: TopkAmountStrategySpace
max_evals: 2
time_period:
rolling_period: 360
train_start_date: 2005-01-01
train_end_date: 2014-12-31
validate_start_date: 2015-01-01
validate_end_date: 2016-06-30
test_start_date: 2016-07-01
test_end_date: 2018-04-30
data:
class: ALPHA360
provider_uri: /data/qlib
args:
start_date: 2005-01-01
end_date: 2018-04-30
dropna_label: True
dropna_feature: True
filter:
market: csi500
filter_pipeline:
-
class: NameDFilter
module_path: qlib.data.filter
args:
name_rule_re: S(?!Z3)
fstart_time: 2018-01-01
fend_time: 2018-12-11
-
class: ExpressionDFilter
module_path: qlib.data.filter
args:
rule_expression: $open/$factor<=45
fstart_time: 2018-01-01
fend_time: 2018-12-11
backtest:
normal_backtest_args:
limit_threshold: 0.095
account: 500000
benchmark: SH000905
deal_price: vwap
long_short_backtest_args:
topk: 50
Next, we run the following command, and you can see:
.. code-block:: bash
~/v-yindzh/Qlib/cfg$ tuner -c tuner_config.yaml
Searching params: {'model_space': {'colsample_bytree': 0.8870905643607678, 'lambda_l1': 472.3188735122233, 'lambda_l2': 92.75390994877243, 'learning_rate': 0.09741751430635413, 'loss': 'mse', 'max_depth': 8, 'num_leaves': 160, 'num_threads': 20, 'subsample': 0.7536051584789751}, 'strategy_space': {'buffer_margin': 250, 'topk': 40}}
...
(Estimator experiment screen log)
...
Searching params: {'model_space': {'colsample_bytree': 0.6667379039007301, 'lambda_l1': 382.10698024977904, 'lambda_l2': 117.02506488151757, 'learning_rate': 0.18514539615228137, 'loss': 'mse', 'max_depth': 6, 'num_leaves': 200, 'num_threads': 12, 'subsample': 0.9449255686969292}, 'strategy_space': {'buffer_margin': 200, 'topk': 30}}
...
(Estimator experiment screen log)
...
Local best params: {'model_space': {'colsample_bytree': 0.6667379039007301, 'lambda_l1': 382.10698024977904, 'lambda_l2': 117.02506488151757, 'learning_rate': 0.18514539615228137, 'loss': 'mse', 'max_depth': 6, 'num_leaves': 200, 'num_threads': 12, 'subsample': 0.9449255686969292}, 'strategy_space': {'buffer_margin': 200, 'topk': 30}}
Time cost: 489.87220 | Finished searching best parameters in Tuner 0.
Time cost: 0.00069 | Finished saving local best tuner parameters to: tuner_experiment/estimator_experiment/estimator_experiment_0/local_best_params.json .
Searching params: {'data_label_space': {'labels': ('Ref($vwap, -2)/Ref($vwap, -1) - 2',)}, 'model_space': {'input_dim': 158, 'lr': 0.001, 'lr_decay': 0.9100529502185579, 'lr_decay_steps': 162.48901403763966, 'optimizer': 'gd', 'output_dim': 1}, 'strategy_space': {'buffer_margin': 300, 'topk': 35}}
...
(Estimator experiment screen log)
...
Searching params: {'data_label_space': {'labels': ('Ref($vwap, -2)/Ref($vwap, -1) - 1',)}, 'model_space': {'input_dim': 158, 'lr': 0.1, 'lr_decay': 0.9882802970847494, 'lr_decay_steps': 164.76742865207729, 'optimizer': 'adam', 'output_dim': 1}, 'strategy_space': {'buffer_margin': 250, 'topk': 35}}
...
(Estimator experiment screen log)
...
Local best params: {'data_label_space': {'labels': ('Ref($vwap, -2)/Ref($vwap, -1) - 1',)}, 'model_space': {'input_dim': 158, 'lr': 0.1, 'lr_decay': 0.9882802970847494, 'lr_decay_steps': 164.76742865207729, 'optimizer': 'adam', 'output_dim': 1}, 'strategy_space': {'buffer_margin': 250, 'topk': 35}}
Time cost: 550.74039 | Finished searching best parameters in Tuner 1.
Time cost: 0.00023 | Finished saving local best tuner parameters to: tuner_experiment/estimator_experiment/estimator_experiment_1/local_best_params.json .
Time cost: 1784.14691 | Finished tuner pipeline.
Time cost: 0.00014 | Finished save global best tuner parameters.
Best Tuner id: 0.
You can check the best parameters at tuner_experiment/global_best_params.json.
Finally, you can check the results of your experiment in the given path.
Configuration file
------------------
Before using `tuner`, you need to prepare a configuration file. Next we will show you how to prepare each part of the configuration file.
About the experiment
~~~~~~~~~~~~~~~~~~~~
First, your configuration file needs to have a field about the experiment, whose key is `experiment`, this field and its contents determine the saving path and tuner class.
Usually it should contain the following content:
.. code-block:: YAML
experiment:
name: tuner_experiment
tuner_class: QLibTuner
Also, there are some optional fields. The meaning of each field is as follows:
- `name`
The experiment name, str type, the program will use this experiment name to construct a directory to save the process of the whole experiment and the results. The default value is `tuner_experiment`.
- `dir`
The saving path, str type, the program will construct the experiment directory in this path. The default value is the path where configuration locate.
- `tuner_class`
The class of tuner, str type, must be an already implemented model, such as `QLibTuner` in `qlib`, or a custom tuner, but it must be a subclass of `qlib.contrib.tuner.Tuner`, the default value is `QLibTuner`.
- `tuner_module_path`
The module path, str type, absolute url is also supported, indicates the path of the implementation of tuner. The default value is `qlib.contrib.tuner.tuner`
About the optimization criteria
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You need to designate a factor to optimize, for tuner need a factor to decide which case is better than other cases.
Usually, we use the result of `estimator`, such as backtest results and the score of model.
This part needs contain these fields:
.. code-block:: YAML
optimization_criteria:
report_type: model
report_factor: model_pearsonr
optim_type: max
- `report_type`
The type of the report, str type, determines which kind of report you want to use. If you want to use the backtest result type, you can choose `pred_long`, `pred_long_short`, `pred_short`, `excess_return_without_cost` and `excess_return_with_cost`. If you want to use the model result type, you can only choose `model`.
- `report_factor`
The factor you want to use in the report, str type, determines which factor you want to optimize. If your `report_type` is backtest result type, you can choose `annualized_return`, `information_ratio`, `max_drawdown`, `mean` and `std`. If your `report_type` is model result type, you can choose `model_score` and `model_pearsonr`.
- `optim_type`
The optimization type, str type, determines what kind of optimization you want to do. you can minimize the factor or maximize the factor, so you can choose `max`, `min` or `correlation` at this field.
Note: `correlation` means the factor's best value is 1, such as `model_pearsonr` (a corraltion coefficient).
If you want to process the factor or you want fetch other kinds of factor, you can override the `objective` method in your own tuner.
About the tuner pipeline
~~~~~~~~~~~~~~~~~~~~~~~~
The tuner pipeline contains different tuners, and the `tuner` program will process each tuner in pipeline. Each tuner will get an optimal hyper-parameters of its specific combination of modules. The pipeline will contrast the results of each tuner, and get the best combination and its optimal hyper-parameters. So, you need to configurate the pipeline and each tuner, here is an example:
.. code-block:: YAML
tuner_pipeline:
-
model:
class: SomeModel
space: SomeModelSpace
trainer:
class: RollingTrainer
strategy:
class: TopkAmountStrategy
space: TopkAmountStrategySpace
max_evals: 2
Each part represents a tuner, and its modules which are to be tuned. Space in each part is the hyper-parameters' space of a certain module, you need to create your searching space and modify it in `/qlib/contrib/tuner/space.py`. We use `hyperopt` package to help us to construct the space, you can see the detail of how to use it in https://github.com/hyperopt/hyperopt/wiki/FMin .
- model
You need to provide the `class` and the `space` of the model. If the model is user's own implementation, you need to privide the `module_path`.
- trainer
You need to proveide the `class` of the trainer. If the trainer is user's own implementation, you need to privide the `module_path`.
- strategy
You need to provide the `class` and the `space` of the strategy. If the strategy is user's own implementation, you need to privide the `module_path`.
- data_label
The label of the data, you can search which kinds of labels will lead to a better result. This part is optional, and you only need to provide `space`.
- max_evals
Allow up to this many function evaluations in this tuner. The default value is 10.
If you don't want to search some modules, you can fix their spaces in `space.py`. We will not give the default module.
About the time period
~~~~~~~~~~~~~~~~~~~~~
You need to use the same dataset to evaluate your different `estimator` experiments in `tuner` experiment. Two experiments using different dataset are uncomparable. You can specify `time_period` through the configuration file:
.. code-block:: YAML
time_period:
rolling_period: 360
train_start_date: 2005-01-01
train_end_date: 2014-12-31
validate_start_date: 2015-01-01
validate_end_date: 2016-06-30
test_start_date: 2016-07-01
test_end_date: 2018-04-30
- `rolling_period`
The rolling period, integer type, indicates how many time steps need rolling when rolling the data. The default value is `60`. If you use `RollingTrainer`, this config will be used, or it will be ignored.
- `train_start_date`
Training start time, str type.
- `train_end_date`
Training end time, str type.
- `validate_start_date`
Validation start time, str type.
- `validate_end_date`
Validation end time, str type.
- `test_start_date`
Test start time, str type.
- `test_end_date`
Test end time, str type. If `test_end_date` is `-1` or greater than the last date of the data, the last date of the data will be used as `test_end_date`.
About the data and backtest
~~~~~~~~~~~~~~~~~~~~~~~~~~~
`data` and `backtest` are all same in the whole `tuner` experiment. Different `estimator` experiments must use the same data and backtest method. So, these two parts of config are same with that in `estimator` configuration. You can see the precise defination of these parts in `estimator` introduction. We only provide an example here.
.. code-block:: YAML
data:
class: ALPHA360
provider_uri: /data/qlib
args:
start_date: 2005-01-01
end_date: 2018-04-30
dropna_label: True
dropna_feature: True
feature_label_config: /home/v-yindzh/v-yindzh/QLib/cfg/feature_config.yaml
filter:
market: csi500
filter_pipeline:
-
class: NameDFilter
module_path: qlib.filter
args:
name_rule_re: S(?!Z3)
fstart_time: 2018-01-01
fend_time: 2018-12-11
-
class: ExpressionDFilter
module_path: qlib.filter
args:
rule_expression: $open/$factor<=45
fstart_time: 2018-01-01
fend_time: 2018-12-11
backtest:
normal_backtest_args:
limit_threshold: 0.095
account: 500000
benchmark: SH000905
deal_price: vwap
long_short_backtest_args:
topk: 50
Experiment Result
-----------------
All the results are stored in experiment file directly, you can check them directly in the corresponding files.
What we save are as following:
- Global optimal parameters
- Local optimal parameters of each tuner
- Config file of this `tuner` experiment
- Every `estimator` experiments result in the process