update docstring and document

2026-07-21 11:17:34 +08:00 · 2021-03-15 03:50:43 +00:00
parent 9d84d389ab
commit 646d899f8d
7 changed files with 106 additions and 57 deletions
--- a/docs/advanced/task_management.rst
+++ b/docs/advanced/task_management.rst
@@ -9,50 +9,52 @@ Task Management
 Introduction
 =============

-The `Workflow <../component/introduction.html>`_ part introduce how to run research workflow in a loosely-coupled way. But it can only execute one ``task`` when you use ``qrun``. To automatically generate and execute different tasks, Task Management module provide a whole process including `Task Generating`_, `Task Storing`_, `Task Running`_ and `Task Collecting`_. 
-With this module, users can run their ``task`` automatically at different periods, in different losses or even by different models.
+The `Workflow <../component/introduction.html>`_ part introduces how to run research workflow in a loosely-coupled way. But it can only execute one ``task`` when you use ``qrun``.
+To automatically generate and execute different tasks, ``Task Management`` provides a whole process including `Task Generating`_, `Task Storing`_, `Task Running`_ and `Task Collecting`_. 
+With this module, users can run their ``task`` automatically at different periods, in different losses, or even by different models.

-An example of the entire process is shown `here <>`_.
+An example of the entire process is shown `here <https://github.com/microsoft/qlib/tree/main/examples/taskmanager/task_manager_rolling.py>`_.

 Task Generating
 ===============
 A ``task`` consists of `Model`, `Dataset`, `Record` or anything added by users. 
-The specific task template can be viewed in 
+The specific task template(/definition/config) can be viewed in 
 `Task Section <../component/workflow.html#task-section>`_.
-Even though the task template is fixed, Users can use ``TaskGen`` to generate different ``task`` by task template.
+Even though the task template is fixed, users can customize their ``TaskGen`` to generate different ``task`` by task template.

-Here is the base class of TaskGen:
+Here is the base class of ``TaskGen``:

 .. autoclass:: qlib.workflow.task.gen.TaskGen
    :members:

-``Qlib`` provider a class `RollingGen<https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/gen.py>`_ to generate a list of ``task`` of dataset in different date segments.
-This allows users to verify the effect of data from different periods on the model in one experiment.
+``Qlib`` provider a class `RollingGen <https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/gen.py>`_ to generate a list of ``task`` of the dataset in different date segments.
+This class allows users to verify the effect of data from different periods on the model in one experiment.

 Task Storing
 ===============
-In order to achieve higher efficiency and the possibility of cluster operation, ``Task Manager`` will store all tasks in `MongoDB <https://www.mongodb.com/>`_.
+To achieve higher efficiency and the possibility of cluster operation, ``Task Manager`` will store all tasks in `MongoDB <https://www.mongodb.com/>`_.
 Users **MUST** finished the configuration of `MongoDB <https://www.mongodb.com/>`_ when using this module.

-Users need to provide the url and database of ``task`` storing like this.
+Users need to provide the URL and database name of ``task`` storing like this.

    .. code-block:: python

        from qlib.config import C
        C["mongo"] = {
-            "task_url" : "mongodb://localhost:27017/", # maybe you need to change it to your url
-            "task_db_name" : "rolling_db" # you can custom database name
+            "task_url" : "mongodb://localhost:27017/", # your MongoDB url
+            "task_db_name" : "rolling_db" # database name
        }

-The CRUD methods of ``task`` can be found in TaskManager. More methods can be seen in the `Github<https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/manage.py>`_.
+The CRUD methods of ``task`` can be found in TaskManager. 
+More methods can be seen in the `Github <https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/manage.py>`_.

 .. autoclass:: qlib.workflow.task.manage.TaskManager
    :members:

 Task Running
 ===============
-After generating and storing those ``task``, it's time to run the ``task`` in the *WAITING* status.
-``qlib`` provide a method to run those ``task`` in task pool, however users can also customize how tasks are executed.
+After generating and storing those ``task``, it's time to run the ``task`` which are in the *WAITING* status.
+``Qlib`` provides a method called ``run_task`` to run those ``task`` in task pool, however, users can also customize how tasks are executed.
 An easy way to get the ``task_func`` is using ``qlib.model.trainer.task_train`` directly.
 It will run the whole workflow defined by ``task``, which includes *Model*, *Dataset*, *Record*.

@@ -60,8 +62,12 @@ It will run the whole workflow defined by ``task``, which includes *Model*, *Dat

 Task Collecting
 ===============
-To see the results of ``task`` after running, ``Qlib`` provide a task collector to collect the tasks by filter condition (optional).
-The collector will return a dict of filtered key (users defined by task config) and value (predict scores from ``pred.pkl``).
+To see the results of ``task`` after running or to update something, ``Qlib`` provides a ``TaskCollector`` to collect the tasks by filter condition (optional).
+Here are some methods in this class.

 .. autoclass:: qlib.workflow.task.collect.TaskCollector
-    :members:
+    :members:
+
+``Qlib`` provides a concrete `example <https://github.com/microsoft/qlib/tree/main/examples/taskmanager/task_manager_rolling_with_updating.py>`_, including a whole process of `Task Generating`_ (using `RollingGen <https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/gen.py>`_), `Task Storing`_, `Task Running`_ and `Task Collecting`_.
+Besides, the `example <https://github.com/microsoft/qlib/tree/main/examples/taskmanager/task_manager_rolling_with_updating.py>`_ uses a ``ModelUpdater`` inherited from ``TaskCollector``, which can update the inferences and retrain the model if it is out of date.
+Actually, the model updating can be viewed as a subset of ``Online Serving``.
--- a/docs/reference/api.rst
+++ b/docs/reference/api.rst
@@ -155,6 +155,35 @@ Record Template
    :members:


+Task Management
+====================
+
+
+RollingGen
+--------------------
+.. autoclass:: qlib.workflow.task.gen.RollingGen
+    :members:
+
+TaskManager
+--------------------
+.. autoclass:: qlib.workflow.task.manage.TaskManager
+    :members:
+
+TaskCollector
+--------------------
+.. autoclass:: qlib.workflow.task.collect.TaskCollector
+    :members:
+
+ModelUpdater
+--------------------
+.. autoclass:: qlib.workflow.task.update.ModelUpdater
+    :members:
+
+TimeAdjuster
+--------------------
+.. autoclass:: qlib.workflow.task.utils.TimeAdjuster
+    :members:
+
 Utils
 ====================

--- a/qlib/workflow/task/collect.py
+++ b/qlib/workflow/task/collect.py
@@ -18,8 +18,8 @@ class TaskCollector:

    def list_recorders(self, rec_filter_func=None, task_filter_func=None, only_finished=True, only_have_task=False):
        """
-        Return a dict of {rid:Recorder} by recorder filter and task filter. It is not necessary to use those filter.
-        If you don't train with "task_train", then there is no "task" which includes the task config.
+        Return a dict of {rid: Recorder} by recorder filter and task filter. It is not necessary to use those filter.
+        If you don't train with "task_train", then there is no "task"(a file in mlruns/artifacts) which includes the task config.
        If there is a "task", then it will become rec.task which can be get simply.

        Parameters
@@ -36,12 +36,8 @@ class TaskCollector:
        Returns
        -------
        dict
-            a dict of {rid:Recorder}
+            a dict of {rid: Recorder}

-        Raises
-        ------
-        OSError
-            if you use a task filter, but there is no "task" which includes the task config
        """
        recs = self.exp.list_recorders()
        recs_flt = {}
@@ -69,13 +65,14 @@ class TaskCollector:
        task_filter_func=None,
    ):
        """
+        Collect predictions using a filter and a key function.

        Parameters
        ----------
        experiment_name : str
-        get_key_func : function(task: dict) -> Union[Number, str, tuple]
+        get_key_func : Callable[[dict], bool] -> Union[Number, str, tuple]
            get the key of a task when collect it
-        filter_func : function(task: dict) -> bool
+        filter_func : Callable[[dict], bool] -> bool
            to judge a task will be collected or not

        Returns
@@ -108,6 +105,18 @@ class TaskCollector:
        self,
        task_filter_func=None,
    ):
+        """Collect latest recorders using a filter.
+
+        Parameters
+        ----------
+        task_filter_func : Callable[[dict], bool], optional
+            to judge a task will be collected or not, by default None
+
+        Returns
+        -------
+        dict, tuple
+            a dict of recorders and a tuple of test segments
+        """
        recs_flt = self.list_recorders(task_filter_func=task_filter_func, only_have_task=True)

        if len(recs_flt) == 0:
--- a/qlib/workflow/task/gen.py
+++ b/qlib/workflow/task/gen.py
@@ -130,30 +130,32 @@ class RollingGen(TaskGen):
        task : dict
            A dict describing a task. For example.

-            DEFAULT_TASK = {
-                "model": {
-                    "class": "LGBModel",
-                    "module_path": "qlib.contrib.model.gbdt",
-                },
-                "dataset": {
-                    "class": "DatasetH",
-                    "module_path": "qlib.data.dataset",
-                    "kwargs": {
-                        "handler": {
-                            "class": "Alpha158",
-                            "module_path": "qlib.contrib.data.handler",
-                            "kwargs": data_handler_config,
-                        },
-                        "segments": {
-                            "train": ("2008-01-01", "2014-12-31"),
-                            "valid": ("2015-01-01", "2016-12-20"),  # Please avoid leaking the future test data into validation
-                            "test": ("2017-01-01", "2020-08-01"),
+            .. code-block:: python
+
+                DEFAULT_TASK = {
+                    "model": {
+                        "class": "LGBModel",
+                        "module_path": "qlib.contrib.model.gbdt",
+                    },
+                    "dataset": {
+                        "class": "DatasetH",
+                        "module_path": "qlib.data.dataset",
+                        "kwargs": {
+                            "handler": {
+                                "class": "Alpha158",
+                                "module_path": "qlib.contrib.data.handler",
+                                "kwargs": data_handler_config,
+                            },
+                            "segments": {
+                                "train": ("2008-01-01", "2014-12-31"),
+                                "valid": ("2015-01-01", "2016-12-20"),  # Please avoid leaking the future test data into validation
+                                "test": ("2017-01-01", "2020-08-01"),
+                            },
                        },
                    },
-                },
-                # You shoud record the data in specific sequence
-                # "record": ['SignalRecord', 'SigAnaRecord', 'PortAnaRecord'],
-            }
+                    # You shoud record the data in specific sequence
+                    # "record": ['SignalRecord', 'SigAnaRecord', 'PortAnaRecord'],
+                }
        """
        res = []

--- a/qlib/workflow/task/manage.py
+++ b/qlib/workflow/task/manage.py
@@ -18,13 +18,12 @@ import concurrent
 import pymongo
 from qlib.config import C
 from .utils import get_mongodb
-from qlib import auto_init
 from qlib import get_module_logger


 class TaskManager:
    """TaskManager
-    here is the what will a task looks like
+    here is what will a task looks like when it created by TaskManager

    .. code-block:: python

@@ -40,7 +39,7 @@ class TaskManager:

    .. note::

-        assumption: the data in MongoDB was encoded and the data out of MongoDB was decoded
+        Assumption: the data in MongoDB was encoded and the data out of MongoDB was decoded
    """

    STATUS_WAITING = "waiting"
@@ -118,6 +117,7 @@ class TaskManager:
        Parameters
        ----------
        task_def: dict
+            the task definition
        task_pool: str
            the name of Collection in MongoDB

--- a/qlib/workflow/task/update.py
+++ b/qlib/workflow/task/update.py
@@ -110,7 +110,7 @@ class ModelUpdater(TaskCollector):
    def update_all_pred(self, rec_filter_func=None):
        """update all predictions in this experiment after filter.

-            An example of filter function:
+        An example of filter function:

            .. code-block:: python

--- a/qlib/workflow/task/utils.py
+++ b/qlib/workflow/task/utils.py
@@ -107,11 +107,14 @@ class TimeAdjuster:
        align the given date to trade date

        for example:
-            input: {'train': ('2008-01-01', '2014-12-31'), 'valid': ('2015-01-01', '2016-12-31'), 'test': ('2017-01-01', '2020-08-01')}

-            output: {'train': (Timestamp('2008-01-02 00:00:00'), Timestamp('2014-12-31 00:00:00')),
-                    'valid': (Timestamp('2015-01-05 00:00:00'), Timestamp('2016-12-30 00:00:00')),
-                    'test': (Timestamp('2017-01-03 00:00:00'), Timestamp('2020-07-31 00:00:00'))}
+            .. code-block:: python
+
+                input: {'train': ('2008-01-01', '2014-12-31'), 'valid': ('2015-01-01', '2016-12-31'), 'test': ('2017-01-01', '2020-08-01')}
+
+                output: {'train': (Timestamp('2008-01-02 00:00:00'), Timestamp('2014-12-31 00:00:00')),
+                        'valid': (Timestamp('2015-01-05 00:00:00'), Timestamp('2016-12-30 00:00:00')),
+                        'test': (Timestamp('2017-01-03 00:00:00'), Timestamp('2020-07-31 00:00:00'))}

        Parameters
        ----------