.. _workflow: ================================= Workflow: Workflow Management ================================= .. currentmodule:: qlib Introduction =================== The components in `Qlib Framework <../introduction/introduction.html#framework>`_ are designed in a loosely-coupled way. Users could build their own Quant research workflow with these components like `Example `_. Besides, ``Qlib`` provides more user-friendly interfaces named ``qrun`` to automatically run the whole workflow defined by configuration. Running the whole workflow is called an `execution`. With ``qrun``, user can easily start an `execution`, which includes the following steps: - Data - Loading - Processing - Slicing - Model - Training and inference - Saving & loading - Evaluation - Forecast signal analysis - Backtest For each `execution`, ``Qlib`` has a complete system to tracking all the information as well as artifacts generated during training, inference and evaluation phase. For more information about how ``Qlib`` handles this, please refer to the related document: `Recorder: Experiment Management <../component/recorder.html>`_. Complete Example =================== Before getting into details, here is a complete example of ``qrun``, which defines the workflow in typical Quant research. Below is a typical config file of ``qrun``. .. code-block:: YAML qlib_init: provider_uri: "~/.qlib/qlib_data/cn_data" region: cn market: &market csi300 benchmark: &benchmark SH000300 data_handler_config: &data_handler_config start_time: 2008-01-01 end_time: 2020-08-01 fit_start_time: 2008-01-01 fit_end_time: 2014-12-31 instruments: *market port_analysis_config: &port_analysis_config strategy: class: TopkDropoutStrategy module_path: qlib.contrib.strategy.strategy kwargs: topk: 50 n_drop: 5 backtest: verbose: False limit_threshold: 0.095 account: 100000000 benchmark: *benchmark deal_price: close open_cost: 0.0005 close_cost: 0.0015 min_cost: 5 task: model: class: LGBModel module_path: qlib.contrib.model.gbdt kwargs: loss: mse colsample_bytree: 0.8879 learning_rate: 0.0421 subsample: 0.8789 lambda_l1: 205.6999 lambda_l2: 580.9768 max_depth: 8 num_leaves: 210 num_threads: 20 dataset: class: DatasetH module_path: qlib.data.dataset kwargs: handler: class: Alpha158 module_path: qlib.contrib.data.handler kwargs: *data_handler_config segments: train: [2008-01-01, 2014-12-31] valid: [2015-01-01, 2016-12-31] test: [2017-01-01, 2020-08-01] record: - class: SignalRecord module_path: qlib.workflow.record_temp kwargs: {} - class: PortAnaRecord module_path: qlib.workflow.record_temp kwargs: config: *port_analysis_config After saving the config into `configuration.yaml`, users could start the workflow and test their ideas with a single command below. .. code-block:: bash qrun configuration.yaml If users want to use ``qrun`` under debug mode, please use the following command: .. code-block:: bash python -m pdb qlib/workflow/cli.py examples/benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml .. note:: `qrun` will be placed in your $PATH directory when installing ``Qlib``. .. note:: The symbol `&` in `yaml` file stands for an anchor of a field, which is useful when another fields include this parameter as part of the value. Taking the configuration file above as an example, users can directly change the value of `market` and `benchmark` without traversing the entire configuration file. Configuration File =================== Let's get into details of ``qrun`` in this section. Before using ``qrun``, users need to prepare a configuration file. The following content shows how to prepare each part of the configuration file. Qlib Init Section -------------------- At first, the configuration file needs to contain several basic parameters which will be used for qlib initialization. .. code-block:: YAML provider_uri: "~/.qlib/qlib_data/cn_data" region: cn The meaning of each field is as follows: - `provider_uri` Type: str. The URI of the Qlib data. For example, it could be the location where the data loaded by ``get_data.py`` are stored. - `region` - If `region` == "us", ``Qlib`` will be initialized in US-stock mode. - If `region` == "cn", ``Qlib`` will be initialized in china-stock mode. .. note:: The value of `region` should be aligned with the data stored in `provider_uri`. Task Section -------------------- The `task` field in the configuration corresponds to a `task`, which contains the parameters of three different subsections: `Model`, `Dataset` and `Record`. Model Section ~~~~~~~~~~~~~~~~~~~~ In the `task` field, the `model` section describes the parameters of the model to be used for training and inference. For more information about the base ``Model`` class, please refer to `Qlib Model <../component/model.html>`_. .. code-block:: YAML model: class: LGBModel module_path: qlib.contrib.model.gbdt kwargs: loss: mse colsample_bytree: 0.8879 learning_rate: 0.0421 subsample: 0.8789 lambda_l1: 205.6999 lambda_l2: 580.9768 max_depth: 8 num_leaves: 210 num_threads: 20 The meaning of each field is as follows: - `class` Type: str. The name for the model class. - `module_path` Type: str. The path for the model in qlib. - `kwargs` The keywords arguments for the model. Please refer to the specific model implementation for more information: `models `_. .. note:: ``Qlib`` provides a util named: ``init_instance_by_config`` to initialize any class inside ``Qlib`` with the configuration includes the fields: `class`, `module_path` and `kwargs`. Dataset Section ~~~~~~~~~~~~~~~~~~~~ The `dataset` field describes the parameters for the ``Dataset`` module in ``Qlib`` as well those for the module ``DataHandler``. For more information about the ``Dataset`` module, please refer to `Qlib Model <../component/data.html#dataset>`_. The keywords arguments configuration of the ``DataHandler`` is as follows: .. code-block:: YAML data_handler_config: &data_handler_config start_time: 2008-01-01 end_time: 2020-08-01 fit_start_time: 2008-01-01 fit_end_time: 2014-12-31 instruments: *market Users can refer to the document of `DataHandler <../component/data.html#datahandler>`_ for more information about the meaning of each field in the configuration. Here is the configuration for the ``Dataset`` module which will take care of data preprossing and slicing during the training and testing phase. .. code-block:: YAML dataset: class: DatasetH module_path: qlib.data.dataset kwargs: handler: class: Alpha158 module_path: qlib.contrib.data.handler kwargs: *data_handler_config segments: train: [2008-01-01, 2014-12-31] valid: [2015-01-01, 2016-12-31] test: [2017-01-01, 2020-08-01] Record Section ~~~~~~~~~~~~~~~~~~~~ The `record` field is about the parameters the ``Record`` module in ``Qlib``. ``Record`` is responsible for tracking training process and results such as `information Coefficient (IC)` and `backtest` in a standard format. The following script is the configuration of `backtest` and the `strategy` used in `backtest`: .. code-block:: YAML port_analysis_config: &port_analysis_config strategy: class: TopkDropoutStrategy module_path: qlib.contrib.strategy.strategy kwargs: topk: 50 n_drop: 5 backtest: verbose: False limit_threshold: 0.095 account: 100000000 benchmark: *benchmark deal_price: close open_cost: 0.0005 close_cost: 0.0015 min_cost: 5 For more information about the meaning of each field in configuration of `strategy` and `backtest`, users can look up the documents: `Strategy <../component/strategy.html>`_ and `Backtest <../component/backtest.html>`_. Here is the configuration details of different `Record Template` such as ``SignalRecord`` and ``PortAnaRecord``: .. code-block:: YAML record: - class: SignalRecord module_path: qlib.workflow.record_temp kwargs: {} - class: PortAnaRecord module_path: qlib.workflow.record_temp kwargs: config: *port_analysis_config For more information about the ``Record`` module in ``Qlib``, user can refer to the related document: `Record <../component/recorder.html#record-template>`_.