mirror of
https://github.com/microsoft/qlib.git
synced 2026-06-06 05:51:17 +08:00
* MVP for Indian Stocks in qlib using yahooquery * cleaned with black * cleaned with black * add YahooNormalizeIN and YahooNormalizeIN1d * cleaned the code * added 1min for IN and also updated readme * update comments * fix comments * recorder support upload both raw file and directory * fix comments * Update README.md * Fix docs of QlibRecorder * sort index after loader (#538) make sure the fetch method is based on a index-sorted pd.DataFrame * refactor online serving rolling api * refactor TRA * format by black * fix horizon * fix TRA when use single head * clean up * improve pretrain * update README * fix tra when logdir is None * fix tra when logdir is None * Update strategy.py * Update README.md * Update README.md * Conda Suggestion * code standard docs * Update ensemble.py (#560) * Fix CI Bug (#575) Co-authored-by: yuxwang <anduinnn@foxmail.com> * Update gen.py (#576) * Fix multi-process loop calls (#574) * check lexsort in the 'lazy_sort_index' function (#566) * check lexsort * check lexsort * lexsort comment * lexsort comment * Delete .DS_Store * Update README.md * bug fix & use oracle transport pretrain * mend * Add `backend_freq_config` parameter, support multi-freq uri * Add sample_config to QlibDataLoader, support multi-freq * add multi-freq example * get_cls_kwargs renamed get_callable_kwargs * support multi-freq uri * Add inst_processors to D.features * Fix typo * Fix the index type of the multi-freq example * Fix duplicate mlflow directories in tests * Add DataPathManager to QlibConfig && modify inst_processors to supports list only * Modify the default value in the multi_freq example * Modify client-server mode and dataset-cache to disable inst_processor * Add wheel package to github CI * fix comment * Update FAQ.rst * Update README.md Fix wrong link * Update the docs of TaskManager (#586) * Update manage.py * update yaml * update run_all_model * Modify the Feature to be case sensitive (#589) * update README * remove verbose * fix spell bug * fix typos (#592) * Update Release Note * fix portfolio bug * Add calendar support for resample * add freq kwargs * test.yml: Remove redundant code (#595) * Supporting shared processor (#596) * Supporting shared processor * fix readonly reverse bug * remove pytests dependency * with fit bug * fix parameter error * fix comments * Fix undefined names in Python code (#599) * Update pytorch_tabnet.py $ `flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics` ``` ./qlib/qlib/contrib/model/pytorch_tabnet.py:567:38: F821 undefined name 'inp' self.independ.append(GLU(inp, out_dim, vbs=vbs)) ^ ./qlib/examples/model_rolling/task_manager_rolling.py:75:18: F821 undefined name 'task_train' run_task(task_train, self.task_pool, experiment_name=self.experiment_name) ^ 2 F821 undefined name 'task_train' 2 ``` * Fix undefined names in Python code * from qlib.model.trainer import task_train * update seed * fix some docstring * add comments * Fix SimpleDatasetCache * Update setup.py updated classifiers * Update setup.py change to matplotlib==3.3 * Update python-publish.yml added python 3.9 * updategrade version number * Update model list * fix the type of filter_pipe * fix comment * fix record_temp * update cvxpy version * Update code_standard.rst (#587) * Update code_standard.rst * Update docs/developer/code_standard.rst Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> * Add file lock for MLflowExpManager (#619) * fix torch version * Share version number (#620) * Update initialization.rst (#622) * Update initialization.rst * Update docs/start/initialization.rst Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> * Update docs/start/initialization.rst Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> * fix bugs for running previous exmaple * fix deal amount bug * update change doc (#623) * Add files via upload * Update README.md * Update README.md * Update README.md * Delete change doc.gif * Add files via upload * Update README.md * Delete change doc.gif * Add files via upload * Delete change doc.gif * Add files via upload * Update README.md Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> * update doc * simplify run all model * fix run all model bug * Fix Models (#483) * fix gat dataset * fix tft model * Update tft.py * Fix tft.py Co-authored-by: Pengrong Zhu <zhu.pengrong@foxmail.com> * type and skip empty exp * fix model yaml config * fix tft import bug * skip empty result * fix model and yaml bug * fix wrong generate parameter * Modify multi-freq example (#626) * modify the example of multi-freq * add Copyright * add a comment to average_ops.py * modify the example of multi-freq * add comment to multi_freq_handler.py * add the Ref expression description to multi_freq_handler.py * add expression description to multi_freq_handler.py * update images * fix workflow and update framework Co-authored-by: Gaurav <2796gaurav@gmail.com> Co-authored-by: 2796gaurav <17353992+2796gaurav@users.noreply.github.com> Co-authored-by: bxdd <bxd98@126.com> Co-authored-by: Young <afe.young@gmail.com> Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> Co-authored-by: Dong Zhou <Zhou.Dong@microsoft.com> Co-authored-by: ZhangTP1996 <ztp18@mails.tsinghua.edu.cn> Co-authored-by: demon143 <59681577+demon143@users.noreply.github.com> Co-authored-by: Wangwuyi123 <51237097+Wangwuyi123@users.noreply.github.com> Co-authored-by: yuxwang <anduinnn@foxmail.com> Co-authored-by: Pengrong Zhu <zhu.pengrong@foxmail.com> Co-authored-by: Mark Zhao <50850474+markzhao98@users.noreply.github.com> Co-authored-by: cslwqxx <cslwqxx@users.noreply.github.com> Co-authored-by: Dong Zhou <evanzd@users.noreply.github.com> Co-authored-by: SaintMalik <37118134+saintmalik@users.noreply.github.com> Co-authored-by: Christian Clauss <cclauss@me.com> Co-authored-by: Anurag Kumar <mailanu98@gmail.com> Co-authored-by: demon143 <785696300@qq.com>
326 lines
14 KiB
ReStructuredText
326 lines
14 KiB
ReStructuredText
.. _tuner:
|
|
|
|
Tuner
|
|
===================
|
|
.. currentmodule:: qlib
|
|
|
|
Introduction
|
|
-------------------
|
|
|
|
Welcome to use Tuner, this document is based on that you can use Estimator proficiently and correctly.
|
|
|
|
You can find the optimal hyper-parameters and combinations of models, trainers, strategies and data labels.
|
|
|
|
The usage of program `tuner` is similar with `estimator`, you need provide the URL of the configuration file.
|
|
The `tuner` will do the following things:
|
|
|
|
- Construct tuner pipeline
|
|
- Search and save best hyper-parameters of one tuner
|
|
- Search next tuner in pipeline
|
|
- Save the global best hyper-parameters and combination
|
|
|
|
Each tuner is consisted with a kind of combination of modules, and its goal is searching the optimal hyper-parameters of this combination.
|
|
The pipeline is consisted with different tuners, it is aim at finding the optimal combination of modules.
|
|
|
|
The result will be printed on screen and saved in file, you can check the result in your experiment saving files.
|
|
|
|
Example
|
|
~~~~~~~
|
|
|
|
Let's see an example,
|
|
|
|
First make sure you have the latest version of `qlib` installed.
|
|
|
|
Then, you need to privide a configuration to setup the experiment.
|
|
We write a simple configuration example as following,
|
|
|
|
.. code-block:: YAML
|
|
|
|
experiment:
|
|
name: tuner_experiment
|
|
tuner_class: QLibTuner
|
|
qlib_client:
|
|
auto_mount: False
|
|
logging_level: INFO
|
|
optimization_criteria:
|
|
report_type: model
|
|
report_factor: model_score
|
|
optim_type: max
|
|
tuner_pipeline:
|
|
-
|
|
model:
|
|
class: SomeModel
|
|
space: SomeModelSpace
|
|
trainer:
|
|
class: RollingTrainer
|
|
strategy:
|
|
class: TopkAmountStrategy
|
|
space: TopkAmountStrategySpace
|
|
max_evals: 2
|
|
|
|
time_period:
|
|
rolling_period: 360
|
|
train_start_date: 2005-01-01
|
|
train_end_date: 2014-12-31
|
|
validate_start_date: 2015-01-01
|
|
validate_end_date: 2016-06-30
|
|
test_start_date: 2016-07-01
|
|
test_end_date: 2018-04-30
|
|
data:
|
|
class: ALPHA360
|
|
provider_uri: /data/qlib
|
|
args:
|
|
start_date: 2005-01-01
|
|
end_date: 2018-04-30
|
|
dropna_label: True
|
|
dropna_feature: True
|
|
filter:
|
|
market: csi500
|
|
filter_pipeline:
|
|
-
|
|
class: NameDFilter
|
|
module_path: qlib.data.filter
|
|
args:
|
|
name_rule_re: S(?!Z3)
|
|
fstart_time: 2018-01-01
|
|
fend_time: 2018-12-11
|
|
-
|
|
class: ExpressionDFilter
|
|
module_path: qlib.data.filter
|
|
args:
|
|
rule_expression: $open/$factor<=45
|
|
fstart_time: 2018-01-01
|
|
fend_time: 2018-12-11
|
|
backtest:
|
|
normal_backtest_args:
|
|
limit_threshold: 0.095
|
|
account: 500000
|
|
benchmark: SH000905
|
|
deal_price: vwap
|
|
long_short_backtest_args:
|
|
topk: 50
|
|
|
|
Next, we run the following command, and you can see:
|
|
|
|
.. code-block:: bash
|
|
|
|
~/v-yindzh/Qlib/cfg$ tuner -c tuner_config.yaml
|
|
|
|
Searching params: {'model_space': {'colsample_bytree': 0.8870905643607678, 'lambda_l1': 472.3188735122233, 'lambda_l2': 92.75390994877243, 'learning_rate': 0.09741751430635413, 'loss': 'mse', 'max_depth': 8, 'num_leaves': 160, 'num_threads': 20, 'subsample': 0.7536051584789751}, 'strategy_space': {'buffer_margin': 250, 'topk': 40}}
|
|
...
|
|
(Estimator experiment screen log)
|
|
...
|
|
Searching params: {'model_space': {'colsample_bytree': 0.6667379039007301, 'lambda_l1': 382.10698024977904, 'lambda_l2': 117.02506488151757, 'learning_rate': 0.18514539615228137, 'loss': 'mse', 'max_depth': 6, 'num_leaves': 200, 'num_threads': 12, 'subsample': 0.9449255686969292}, 'strategy_space': {'buffer_margin': 200, 'topk': 30}}
|
|
...
|
|
(Estimator experiment screen log)
|
|
...
|
|
Local best params: {'model_space': {'colsample_bytree': 0.6667379039007301, 'lambda_l1': 382.10698024977904, 'lambda_l2': 117.02506488151757, 'learning_rate': 0.18514539615228137, 'loss': 'mse', 'max_depth': 6, 'num_leaves': 200, 'num_threads': 12, 'subsample': 0.9449255686969292}, 'strategy_space': {'buffer_margin': 200, 'topk': 30}}
|
|
Time cost: 489.87220 | Finished searching best parameters in Tuner 0.
|
|
Time cost: 0.00069 | Finished saving local best tuner parameters to: tuner_experiment/estimator_experiment/estimator_experiment_0/local_best_params.json .
|
|
Searching params: {'data_label_space': {'labels': ('Ref($vwap, -2)/Ref($vwap, -1) - 2',)}, 'model_space': {'input_dim': 158, 'lr': 0.001, 'lr_decay': 0.9100529502185579, 'lr_decay_steps': 162.48901403763966, 'optimizer': 'gd', 'output_dim': 1}, 'strategy_space': {'buffer_margin': 300, 'topk': 35}}
|
|
...
|
|
(Estimator experiment screen log)
|
|
...
|
|
Searching params: {'data_label_space': {'labels': ('Ref($vwap, -2)/Ref($vwap, -1) - 1',)}, 'model_space': {'input_dim': 158, 'lr': 0.1, 'lr_decay': 0.9882802970847494, 'lr_decay_steps': 164.76742865207729, 'optimizer': 'adam', 'output_dim': 1}, 'strategy_space': {'buffer_margin': 250, 'topk': 35}}
|
|
...
|
|
(Estimator experiment screen log)
|
|
...
|
|
Local best params: {'data_label_space': {'labels': ('Ref($vwap, -2)/Ref($vwap, -1) - 1',)}, 'model_space': {'input_dim': 158, 'lr': 0.1, 'lr_decay': 0.9882802970847494, 'lr_decay_steps': 164.76742865207729, 'optimizer': 'adam', 'output_dim': 1}, 'strategy_space': {'buffer_margin': 250, 'topk': 35}}
|
|
Time cost: 550.74039 | Finished searching best parameters in Tuner 1.
|
|
Time cost: 0.00023 | Finished saving local best tuner parameters to: tuner_experiment/estimator_experiment/estimator_experiment_1/local_best_params.json .
|
|
Time cost: 1784.14691 | Finished tuner pipeline.
|
|
Time cost: 0.00014 | Finished save global best tuner parameters.
|
|
Best Tuner id: 0.
|
|
You can check the best parameters at tuner_experiment/global_best_params.json.
|
|
|
|
|
|
Finally, you can check the results of your experiment in the given path.
|
|
|
|
Configuration file
|
|
------------------
|
|
|
|
Before using `tuner`, you need to prepare a configuration file. Next we will show you how to prepare each part of the configuration file.
|
|
|
|
About the experiment
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
First, your configuration file needs to have a field about the experiment, whose key is `experiment`, this field and its contents determine the saving path and tuner class.
|
|
|
|
Usually it should contain the following content:
|
|
|
|
.. code-block:: YAML
|
|
|
|
experiment:
|
|
name: tuner_experiment
|
|
tuner_class: QLibTuner
|
|
|
|
Also, there are some optional fields. The meaning of each field is as follows:
|
|
|
|
- `name`
|
|
The experiment name, str type, the program will use this experiment name to construct a directory to save the process of the whole experiment and the results. The default value is `tuner_experiment`.
|
|
|
|
- `dir`
|
|
The saving path, str type, the program will construct the experiment directory in this path. The default value is the path where configuration locate.
|
|
|
|
- `tuner_class`
|
|
The class of tuner, str type, must be an already implemented model, such as `QLibTuner` in `qlib`, or a custom tuner, but it must be a subclass of `qlib.contrib.tuner.Tuner`, the default value is `QLibTuner`.
|
|
|
|
- `tuner_module_path`
|
|
The module path, str type, absolute url is also supported, indicates the path of the implementation of tuner. The default value is `qlib.contrib.tuner.tuner`
|
|
|
|
About the optimization criteria
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
You need to designate a factor to optimize, for tuner need a factor to decide which case is better than other cases.
|
|
Usually, we use the result of `estimator`, such as backtest results and the score of model.
|
|
|
|
This part needs contain these fields:
|
|
|
|
.. code-block:: YAML
|
|
|
|
optimization_criteria:
|
|
report_type: model
|
|
report_factor: model_pearsonr
|
|
optim_type: max
|
|
|
|
- `report_type`
|
|
The type of the report, str type, determines which kind of report you want to use. If you want to use the backtest result type, you can choose `pred_long`, `pred_long_short`, `pred_short`, `excess_return_without_cost` and `excess_return_with_cost`. If you want to use the model result type, you can only choose `model`.
|
|
|
|
- `report_factor`
|
|
The factor you want to use in the report, str type, determines which factor you want to optimize. If your `report_type` is backtest result type, you can choose `annualized_return`, `information_ratio`, `max_drawdown`, `mean` and `std`. If your `report_type` is model result type, you can choose `model_score` and `model_pearsonr`.
|
|
|
|
- `optim_type`
|
|
The optimization type, str type, determines what kind of optimization you want to do. you can minimize the factor or maximize the factor, so you can choose `max`, `min` or `correlation` at this field.
|
|
Note: `correlation` means the factor's best value is 1, such as `model_pearsonr` (a corraltion coefficient).
|
|
|
|
If you want to process the factor or you want fetch other kinds of factor, you can override the `objective` method in your own tuner.
|
|
|
|
About the tuner pipeline
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The tuner pipeline contains different tuners, and the `tuner` program will process each tuner in pipeline. Each tuner will get an optimal hyper-parameters of its specific combination of modules. The pipeline will contrast the results of each tuner, and get the best combination and its optimal hyper-parameters. So, you need to configurate the pipeline and each tuner, here is an example:
|
|
|
|
.. code-block:: YAML
|
|
|
|
tuner_pipeline:
|
|
-
|
|
model:
|
|
class: SomeModel
|
|
space: SomeModelSpace
|
|
trainer:
|
|
class: RollingTrainer
|
|
strategy:
|
|
class: TopkAmountStrategy
|
|
space: TopkAmountStrategySpace
|
|
max_evals: 2
|
|
|
|
Each part represents a tuner, and its modules which are to be tuned. Space in each part is the hyper-parameters' space of a certain module, you need to create your searching space and modify it in `/qlib/contrib/tuner/space.py`. We use `hyperopt` package to help us to construct the space, you can see the detail of how to use it in https://github.com/hyperopt/hyperopt/wiki/FMin .
|
|
|
|
- model
|
|
You need to provide the `class` and the `space` of the model. If the model is user's own implementation, you need to privide the `module_path`.
|
|
|
|
- trainer
|
|
You need to proveide the `class` of the trainer. If the trainer is user's own implementation, you need to privide the `module_path`.
|
|
|
|
- strategy
|
|
You need to provide the `class` and the `space` of the strategy. If the strategy is user's own implementation, you need to privide the `module_path`.
|
|
|
|
- data_label
|
|
The label of the data, you can search which kinds of labels will lead to a better result. This part is optional, and you only need to provide `space`.
|
|
|
|
- max_evals
|
|
Allow up to this many function evaluations in this tuner. The default value is 10.
|
|
|
|
If you don't want to search some modules, you can fix their spaces in `space.py`. We will not give the default module.
|
|
|
|
About the time period
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
You need to use the same dataset to evaluate your different `estimator` experiments in `tuner` experiment. Two experiments using different dataset are uncomparable. You can specify `time_period` through the configuration file:
|
|
|
|
.. code-block:: YAML
|
|
|
|
time_period:
|
|
rolling_period: 360
|
|
train_start_date: 2005-01-01
|
|
train_end_date: 2014-12-31
|
|
validate_start_date: 2015-01-01
|
|
validate_end_date: 2016-06-30
|
|
test_start_date: 2016-07-01
|
|
test_end_date: 2018-04-30
|
|
|
|
- `rolling_period`
|
|
The rolling period, integer type, indicates how many time steps need rolling when rolling the data. The default value is `60`. If you use `RollingTrainer`, this config will be used, or it will be ignored.
|
|
|
|
- `train_start_date`
|
|
Training start time, str type.
|
|
|
|
- `train_end_date`
|
|
Training end time, str type.
|
|
|
|
- `validate_start_date`
|
|
Validation start time, str type.
|
|
|
|
- `validate_end_date`
|
|
Validation end time, str type.
|
|
|
|
- `test_start_date`
|
|
Test start time, str type.
|
|
|
|
- `test_end_date`
|
|
Test end time, str type. If `test_end_date` is `-1` or greater than the last date of the data, the last date of the data will be used as `test_end_date`.
|
|
|
|
About the data and backtest
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
`data` and `backtest` are all same in the whole `tuner` experiment. Different `estimator` experiments must use the same data and backtest method. So, these two parts of config are same with that in `estimator` configuration. You can see the precise defination of these parts in `estimator` introduction. We only provide an example here.
|
|
|
|
.. code-block:: YAML
|
|
|
|
data:
|
|
class: ALPHA360
|
|
provider_uri: /data/qlib
|
|
args:
|
|
start_date: 2005-01-01
|
|
end_date: 2018-04-30
|
|
dropna_label: True
|
|
dropna_feature: True
|
|
feature_label_config: /home/v-yindzh/v-yindzh/QLib/cfg/feature_config.yaml
|
|
filter:
|
|
market: csi500
|
|
filter_pipeline:
|
|
-
|
|
class: NameDFilter
|
|
module_path: qlib.filter
|
|
args:
|
|
name_rule_re: S(?!Z3)
|
|
fstart_time: 2018-01-01
|
|
fend_time: 2018-12-11
|
|
-
|
|
class: ExpressionDFilter
|
|
module_path: qlib.filter
|
|
args:
|
|
rule_expression: $open/$factor<=45
|
|
fstart_time: 2018-01-01
|
|
fend_time: 2018-12-11
|
|
backtest:
|
|
normal_backtest_args:
|
|
limit_threshold: 0.095
|
|
account: 500000
|
|
benchmark: SH000905
|
|
deal_price: vwap
|
|
long_short_backtest_args:
|
|
topk: 50
|
|
|
|
Experiment Result
|
|
-----------------
|
|
|
|
All the results are stored in experiment file directly, you can check them directly in the corresponding files.
|
|
What we save are as following:
|
|
|
|
- Global optimal parameters
|
|
- Local optimal parameters of each tuner
|
|
- Config file of this `tuner` experiment
|
|
- Every `estimator` experiments result in the process
|
|
|