mirror of
https://github.com/microsoft/qlib.git
synced 2026-06-06 05:51:17 +08:00
89 lines
5.1 KiB
ReStructuredText
89 lines
5.1 KiB
ReStructuredText
.. _task_management:
|
|
|
|
=================================
|
|
Task Management
|
|
=================================
|
|
.. currentmodule:: qlib
|
|
|
|
|
|
Introduction
|
|
=============
|
|
|
|
The `Workflow <../component/introduction.html>`_ part introduces how to run research workflow in a loosely-coupled way. But it can only execute one ``task`` when you use ``qrun``.
|
|
To automatically generate and execute different tasks, ``Task Management`` provides a whole process including `Task Generating`_, `Task Storing`_, `Task Training`_ and `Task Collecting`_.
|
|
With this module, users can run their ``task`` automatically at different periods, in different losses, or even by different models.
|
|
|
|
This whole process can be used in `Online Serving <../component/online.html>`_.
|
|
|
|
An example of the entire process is shown `here <https://github.com/microsoft/qlib/tree/main/examples/model_rolling/task_manager_rolling.py>`_.
|
|
|
|
Task Generating
|
|
===============
|
|
A ``task`` consists of `Model`, `Dataset`, `Record`, or anything added by users.
|
|
The specific task template can be viewed in
|
|
`Task Section <../component/workflow.html#task-section>`_.
|
|
Even though the task template is fixed, users can customize their ``TaskGen`` to generate different ``task`` by task template.
|
|
|
|
Here is the base class of ``TaskGen``:
|
|
|
|
.. autoclass:: qlib.workflow.task.gen.TaskGen
|
|
:members:
|
|
|
|
``Qlib`` provides a class `RollingGen <https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/gen.py>`_ to generate a list of ``task`` of the dataset in different date segments.
|
|
This class allows users to verify the effect of data from different periods on the model in one experiment. More information is `here <../reference/api.html#TaskGen>`_.
|
|
|
|
Task Storing
|
|
===============
|
|
To achieve higher efficiency and the possibility of cluster operation, ``Task Manager`` will store all tasks in `MongoDB <https://www.mongodb.com/>`_.
|
|
``TaskManager`` can fetch undone tasks automatically and manage the lifecycle of a set of tasks with error handling.
|
|
Users **MUST** finish the configuration of `MongoDB <https://www.mongodb.com/>`_ when using this module.
|
|
|
|
Users need to provide the MongoDB URL and database name for using ``TaskManager`` in `initialization <../start/initialization.html#Parameters>`_ or make a statement like this.
|
|
|
|
.. code-block:: python
|
|
|
|
from qlib.config import C
|
|
C["mongo"] = {
|
|
"task_url" : "mongodb://localhost:27017/", # your MongoDB url
|
|
"task_db_name" : "rolling_db" # database name
|
|
}
|
|
|
|
.. autoclass:: qlib.workflow.task.manage.TaskManager
|
|
:members:
|
|
|
|
More information of ``Task Manager`` can be found in `here <../reference/api.html#TaskManager>`_.
|
|
|
|
Task Training
|
|
===============
|
|
After generating and storing those ``task``, it's time to run the ``task`` which is in the *WAITING* status.
|
|
``Qlib`` provides a method called ``run_task`` to run those ``task`` in task pool, however, users can also customize how tasks are executed.
|
|
An easy way to get the ``task_func`` is using ``qlib.model.trainer.task_train`` directly.
|
|
It will run the whole workflow defined by ``task``, which includes *Model*, *Dataset*, *Record*.
|
|
|
|
.. autofunction:: qlib.workflow.task.manage.run_task
|
|
|
|
Meanwhile, ``Qlib`` provides a module called ``Trainer``.
|
|
|
|
.. autoclass:: qlib.model.trainer.Trainer
|
|
:members:
|
|
|
|
``Trainer`` will train a list of tasks and return a list of model recorders.
|
|
``Qlib`` offer two kinds of Trainer, TrainerR is the simplest way and TrainerRM is based on TaskManager to help manager tasks lifecycle automatically.
|
|
If you do not want to use ``Task Manager`` to manage tasks, then use TrainerR to train a list of tasks generated by ``TaskGen`` is enough.
|
|
`Here <../reference/api.html#Trainer>`_ are the details about different ``Trainer``.
|
|
|
|
Task Collecting
|
|
===============
|
|
To collect the results of ``task`` after training, ``Qlib`` provides `Collector <../reference/api.html#Collector>`_, `Group <../reference/api.html#Group>`_ and `Ensemble <../reference/api.html#Ensemble>`_ to collect the results in a readable, expandable and loosely-coupled way.
|
|
|
|
`Collector <../reference/api.html#Collector>`_ can collect objects from everywhere and process them such as merging, grouping, averaging and so on. It has 2 step action including ``collect`` (collect anything in a dict) and ``process_collect`` (process collected dict).
|
|
|
|
`Group <../reference/api.html#Group>`_ also has 2 steps including ``group`` (can group a set of object based on `group_func` and change them to a dict) and ``reduce`` (can make a dict become an ensemble based on some rule).
|
|
For example: {(A,B,C1): object, (A,B,C2): object} ---``group``---> {(A,B): {C1: object, C2: object}} ---``reduce``---> {(A,B): object}
|
|
|
|
`Ensemble <../reference/api.html#Ensemble>`_ can merge the objects in an ensemble.
|
|
For example: {C1: object, C2: object} ---``Ensemble``---> object
|
|
|
|
So the hierarchy is ``Collector``'s second step corresponds to ``Group``. And ``Group``'s second step correspond to ``Ensemble``.
|
|
|
|
For more information, please see `Collector <../reference/api.html#Collector>`_, `Group <../reference/api.html#Group>`_ and `Ensemble <../reference/api.html#Ensemble>`_, or the `example <https://github.com/microsoft/qlib/tree/main/examples/model_rolling/task_manager_rolling.py>`_. |