diff --git a/docs/advanced/task_management.rst b/docs/advanced/task_management.rst index 78ac62410..230a4e9d1 100644 --- a/docs/advanced/task_management.rst +++ b/docs/advanced/task_management.rst @@ -9,50 +9,52 @@ Task Management Introduction ============= -The `Workflow <../component/introduction.html>`_ part introduce how to run research workflow in a loosely-coupled way. But it can only execute one ``task`` when you use ``qrun``. To automatically generate and execute different tasks, Task Management module provide a whole process including `Task Generating`_, `Task Storing`_, `Task Running`_ and `Task Collecting`_. -With this module, users can run their ``task`` automatically at different periods, in different losses or even by different models. +The `Workflow <../component/introduction.html>`_ part introduces how to run research workflow in a loosely-coupled way. But it can only execute one ``task`` when you use ``qrun``. +To automatically generate and execute different tasks, ``Task Management`` provides a whole process including `Task Generating`_, `Task Storing`_, `Task Running`_ and `Task Collecting`_. +With this module, users can run their ``task`` automatically at different periods, in different losses, or even by different models. -An example of the entire process is shown `here <>`_. +An example of the entire process is shown `here `_. Task Generating =============== A ``task`` consists of `Model`, `Dataset`, `Record` or anything added by users. -The specific task template can be viewed in +The specific task template(/definition/config) can be viewed in `Task Section <../component/workflow.html#task-section>`_. -Even though the task template is fixed, Users can use ``TaskGen`` to generate different ``task`` by task template. +Even though the task template is fixed, users can customize their ``TaskGen`` to generate different ``task`` by task template. -Here is the base class of TaskGen: +Here is the base class of ``TaskGen``: .. autoclass:: qlib.workflow.task.gen.TaskGen :members: -``Qlib`` provider a class `RollingGen`_ to generate a list of ``task`` of dataset in different date segments. -This allows users to verify the effect of data from different periods on the model in one experiment. +``Qlib`` provider a class `RollingGen `_ to generate a list of ``task`` of the dataset in different date segments. +This class allows users to verify the effect of data from different periods on the model in one experiment. Task Storing =============== -In order to achieve higher efficiency and the possibility of cluster operation, ``Task Manager`` will store all tasks in `MongoDB `_. +To achieve higher efficiency and the possibility of cluster operation, ``Task Manager`` will store all tasks in `MongoDB `_. Users **MUST** finished the configuration of `MongoDB `_ when using this module. -Users need to provide the url and database of ``task`` storing like this. +Users need to provide the URL and database name of ``task`` storing like this. .. code-block:: python from qlib.config import C C["mongo"] = { - "task_url" : "mongodb://localhost:27017/", # maybe you need to change it to your url - "task_db_name" : "rolling_db" # you can custom database name + "task_url" : "mongodb://localhost:27017/", # your MongoDB url + "task_db_name" : "rolling_db" # database name } -The CRUD methods of ``task`` can be found in TaskManager. More methods can be seen in the `Github`_. +The CRUD methods of ``task`` can be found in TaskManager. +More methods can be seen in the `Github `_. .. autoclass:: qlib.workflow.task.manage.TaskManager :members: Task Running =============== -After generating and storing those ``task``, it's time to run the ``task`` in the *WAITING* status. -``qlib`` provide a method to run those ``task`` in task pool, however users can also customize how tasks are executed. +After generating and storing those ``task``, it's time to run the ``task`` which are in the *WAITING* status. +``Qlib`` provides a method called ``run_task`` to run those ``task`` in task pool, however, users can also customize how tasks are executed. An easy way to get the ``task_func`` is using ``qlib.model.trainer.task_train`` directly. It will run the whole workflow defined by ``task``, which includes *Model*, *Dataset*, *Record*. @@ -60,8 +62,12 @@ It will run the whole workflow defined by ``task``, which includes *Model*, *Dat Task Collecting =============== -To see the results of ``task`` after running, ``Qlib`` provide a task collector to collect the tasks by filter condition (optional). -The collector will return a dict of filtered key (users defined by task config) and value (predict scores from ``pred.pkl``). +To see the results of ``task`` after running or to update something, ``Qlib`` provides a ``TaskCollector`` to collect the tasks by filter condition (optional). +Here are some methods in this class. .. autoclass:: qlib.workflow.task.collect.TaskCollector - :members: \ No newline at end of file + :members: + +``Qlib`` provides a concrete `example `_, including a whole process of `Task Generating`_ (using `RollingGen `_), `Task Storing`_, `Task Running`_ and `Task Collecting`_. +Besides, the `example `_ uses a ``ModelUpdater`` inherited from ``TaskCollector``, which can update the inferences and retrain the model if it is out of date. +Actually, the model updating can be viewed as a subset of ``Online Serving``. \ No newline at end of file diff --git a/docs/reference/api.rst b/docs/reference/api.rst index 3167d8a62..691dff703 100644 --- a/docs/reference/api.rst +++ b/docs/reference/api.rst @@ -155,6 +155,35 @@ Record Template :members: +Task Management +==================== + + +RollingGen +-------------------- +.. autoclass:: qlib.workflow.task.gen.RollingGen + :members: + +TaskManager +-------------------- +.. autoclass:: qlib.workflow.task.manage.TaskManager + :members: + +TaskCollector +-------------------- +.. autoclass:: qlib.workflow.task.collect.TaskCollector + :members: + +ModelUpdater +-------------------- +.. autoclass:: qlib.workflow.task.update.ModelUpdater + :members: + +TimeAdjuster +-------------------- +.. autoclass:: qlib.workflow.task.utils.TimeAdjuster + :members: + Utils ==================== diff --git a/qlib/workflow/task/collect.py b/qlib/workflow/task/collect.py index 45e51da36..5d81864cc 100644 --- a/qlib/workflow/task/collect.py +++ b/qlib/workflow/task/collect.py @@ -18,8 +18,8 @@ class TaskCollector: def list_recorders(self, rec_filter_func=None, task_filter_func=None, only_finished=True, only_have_task=False): """ - Return a dict of {rid:Recorder} by recorder filter and task filter. It is not necessary to use those filter. - If you don't train with "task_train", then there is no "task" which includes the task config. + Return a dict of {rid: Recorder} by recorder filter and task filter. It is not necessary to use those filter. + If you don't train with "task_train", then there is no "task"(a file in mlruns/artifacts) which includes the task config. If there is a "task", then it will become rec.task which can be get simply. Parameters @@ -36,12 +36,8 @@ class TaskCollector: Returns ------- dict - a dict of {rid:Recorder} + a dict of {rid: Recorder} - Raises - ------ - OSError - if you use a task filter, but there is no "task" which includes the task config """ recs = self.exp.list_recorders() recs_flt = {} @@ -69,13 +65,14 @@ class TaskCollector: task_filter_func=None, ): """ + Collect predictions using a filter and a key function. Parameters ---------- experiment_name : str - get_key_func : function(task: dict) -> Union[Number, str, tuple] + get_key_func : Callable[[dict], bool] -> Union[Number, str, tuple] get the key of a task when collect it - filter_func : function(task: dict) -> bool + filter_func : Callable[[dict], bool] -> bool to judge a task will be collected or not Returns @@ -108,6 +105,18 @@ class TaskCollector: self, task_filter_func=None, ): + """Collect latest recorders using a filter. + + Parameters + ---------- + task_filter_func : Callable[[dict], bool], optional + to judge a task will be collected or not, by default None + + Returns + ------- + dict, tuple + a dict of recorders and a tuple of test segments + """ recs_flt = self.list_recorders(task_filter_func=task_filter_func, only_have_task=True) if len(recs_flt) == 0: diff --git a/qlib/workflow/task/gen.py b/qlib/workflow/task/gen.py index 19793c485..96448cefe 100644 --- a/qlib/workflow/task/gen.py +++ b/qlib/workflow/task/gen.py @@ -130,30 +130,32 @@ class RollingGen(TaskGen): task : dict A dict describing a task. For example. - DEFAULT_TASK = { - "model": { - "class": "LGBModel", - "module_path": "qlib.contrib.model.gbdt", - }, - "dataset": { - "class": "DatasetH", - "module_path": "qlib.data.dataset", - "kwargs": { - "handler": { - "class": "Alpha158", - "module_path": "qlib.contrib.data.handler", - "kwargs": data_handler_config, - }, - "segments": { - "train": ("2008-01-01", "2014-12-31"), - "valid": ("2015-01-01", "2016-12-20"), # Please avoid leaking the future test data into validation - "test": ("2017-01-01", "2020-08-01"), + .. code-block:: python + + DEFAULT_TASK = { + "model": { + "class": "LGBModel", + "module_path": "qlib.contrib.model.gbdt", + }, + "dataset": { + "class": "DatasetH", + "module_path": "qlib.data.dataset", + "kwargs": { + "handler": { + "class": "Alpha158", + "module_path": "qlib.contrib.data.handler", + "kwargs": data_handler_config, + }, + "segments": { + "train": ("2008-01-01", "2014-12-31"), + "valid": ("2015-01-01", "2016-12-20"), # Please avoid leaking the future test data into validation + "test": ("2017-01-01", "2020-08-01"), + }, }, }, - }, - # You shoud record the data in specific sequence - # "record": ['SignalRecord', 'SigAnaRecord', 'PortAnaRecord'], - } + # You shoud record the data in specific sequence + # "record": ['SignalRecord', 'SigAnaRecord', 'PortAnaRecord'], + } """ res = [] diff --git a/qlib/workflow/task/manage.py b/qlib/workflow/task/manage.py index a5741e3ed..e97fdb774 100644 --- a/qlib/workflow/task/manage.py +++ b/qlib/workflow/task/manage.py @@ -18,13 +18,12 @@ import concurrent import pymongo from qlib.config import C from .utils import get_mongodb -from qlib import auto_init from qlib import get_module_logger class TaskManager: """TaskManager - here is the what will a task looks like + here is what will a task looks like when it created by TaskManager .. code-block:: python @@ -40,7 +39,7 @@ class TaskManager: .. note:: - assumption: the data in MongoDB was encoded and the data out of MongoDB was decoded + Assumption: the data in MongoDB was encoded and the data out of MongoDB was decoded """ STATUS_WAITING = "waiting" @@ -118,6 +117,7 @@ class TaskManager: Parameters ---------- task_def: dict + the task definition task_pool: str the name of Collection in MongoDB diff --git a/qlib/workflow/task/update.py b/qlib/workflow/task/update.py index bafb6561b..73c2f7241 100644 --- a/qlib/workflow/task/update.py +++ b/qlib/workflow/task/update.py @@ -110,7 +110,7 @@ class ModelUpdater(TaskCollector): def update_all_pred(self, rec_filter_func=None): """update all predictions in this experiment after filter. - An example of filter function: + An example of filter function: .. code-block:: python diff --git a/qlib/workflow/task/utils.py b/qlib/workflow/task/utils.py index 091952b81..272f219ec 100644 --- a/qlib/workflow/task/utils.py +++ b/qlib/workflow/task/utils.py @@ -107,11 +107,14 @@ class TimeAdjuster: align the given date to trade date for example: - input: {'train': ('2008-01-01', '2014-12-31'), 'valid': ('2015-01-01', '2016-12-31'), 'test': ('2017-01-01', '2020-08-01')} - output: {'train': (Timestamp('2008-01-02 00:00:00'), Timestamp('2014-12-31 00:00:00')), - 'valid': (Timestamp('2015-01-05 00:00:00'), Timestamp('2016-12-30 00:00:00')), - 'test': (Timestamp('2017-01-03 00:00:00'), Timestamp('2020-07-31 00:00:00'))} + .. code-block:: python + + input: {'train': ('2008-01-01', '2014-12-31'), 'valid': ('2015-01-01', '2016-12-31'), 'test': ('2017-01-01', '2020-08-01')} + + output: {'train': (Timestamp('2008-01-02 00:00:00'), Timestamp('2014-12-31 00:00:00')), + 'valid': (Timestamp('2015-01-05 00:00:00'), Timestamp('2016-12-30 00:00:00')), + 'test': (Timestamp('2017-01-03 00:00:00'), Timestamp('2020-07-31 00:00:00'))} Parameters ----------