1
0
mirror of https://github.com/microsoft/qlib.git synced 2026-06-06 05:51:17 +08:00

update docstring and document

This commit is contained in:
lzh222333
2021-03-15 03:50:43 +00:00
parent 9d84d389ab
commit 646d899f8d
7 changed files with 106 additions and 57 deletions

View File

@@ -9,50 +9,52 @@ Task Management
Introduction
=============
The `Workflow <../component/introduction.html>`_ part introduce how to run research workflow in a loosely-coupled way. But it can only execute one ``task`` when you use ``qrun``. To automatically generate and execute different tasks, Task Management module provide a whole process including `Task Generating`_, `Task Storing`_, `Task Running`_ and `Task Collecting`_.
With this module, users can run their ``task`` automatically at different periods, in different losses or even by different models.
The `Workflow <../component/introduction.html>`_ part introduces how to run research workflow in a loosely-coupled way. But it can only execute one ``task`` when you use ``qrun``.
To automatically generate and execute different tasks, ``Task Management`` provides a whole process including `Task Generating`_, `Task Storing`_, `Task Running`_ and `Task Collecting`_.
With this module, users can run their ``task`` automatically at different periods, in different losses, or even by different models.
An example of the entire process is shown `here <>`_.
An example of the entire process is shown `here <https://github.com/microsoft/qlib/tree/main/examples/taskmanager/task_manager_rolling.py>`_.
Task Generating
===============
A ``task`` consists of `Model`, `Dataset`, `Record` or anything added by users.
The specific task template can be viewed in
The specific task template(/definition/config) can be viewed in
`Task Section <../component/workflow.html#task-section>`_.
Even though the task template is fixed, Users can use ``TaskGen`` to generate different ``task`` by task template.
Even though the task template is fixed, users can customize their ``TaskGen`` to generate different ``task`` by task template.
Here is the base class of TaskGen:
Here is the base class of ``TaskGen``:
.. autoclass:: qlib.workflow.task.gen.TaskGen
:members:
``Qlib`` provider a class `RollingGen<https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/gen.py>`_ to generate a list of ``task`` of dataset in different date segments.
This allows users to verify the effect of data from different periods on the model in one experiment.
``Qlib`` provider a class `RollingGen <https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/gen.py>`_ to generate a list of ``task`` of the dataset in different date segments.
This class allows users to verify the effect of data from different periods on the model in one experiment.
Task Storing
===============
In order to achieve higher efficiency and the possibility of cluster operation, ``Task Manager`` will store all tasks in `MongoDB <https://www.mongodb.com/>`_.
To achieve higher efficiency and the possibility of cluster operation, ``Task Manager`` will store all tasks in `MongoDB <https://www.mongodb.com/>`_.
Users **MUST** finished the configuration of `MongoDB <https://www.mongodb.com/>`_ when using this module.
Users need to provide the url and database of ``task`` storing like this.
Users need to provide the URL and database name of ``task`` storing like this.
.. code-block:: python
from qlib.config import C
C["mongo"] = {
"task_url" : "mongodb://localhost:27017/", # maybe you need to change it to your url
"task_db_name" : "rolling_db" # you can custom database name
"task_url" : "mongodb://localhost:27017/", # your MongoDB url
"task_db_name" : "rolling_db" # database name
}
The CRUD methods of ``task`` can be found in TaskManager. More methods can be seen in the `Github<https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/manage.py>`_.
The CRUD methods of ``task`` can be found in TaskManager.
More methods can be seen in the `Github <https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/manage.py>`_.
.. autoclass:: qlib.workflow.task.manage.TaskManager
:members:
Task Running
===============
After generating and storing those ``task``, it's time to run the ``task`` in the *WAITING* status.
``qlib`` provide a method to run those ``task`` in task pool, however users can also customize how tasks are executed.
After generating and storing those ``task``, it's time to run the ``task`` which are in the *WAITING* status.
``Qlib`` provides a method called ``run_task`` to run those ``task`` in task pool, however, users can also customize how tasks are executed.
An easy way to get the ``task_func`` is using ``qlib.model.trainer.task_train`` directly.
It will run the whole workflow defined by ``task``, which includes *Model*, *Dataset*, *Record*.
@@ -60,8 +62,12 @@ It will run the whole workflow defined by ``task``, which includes *Model*, *Dat
Task Collecting
===============
To see the results of ``task`` after running, ``Qlib`` provide a task collector to collect the tasks by filter condition (optional).
The collector will return a dict of filtered key (users defined by task config) and value (predict scores from ``pred.pkl``).
To see the results of ``task`` after running or to update something, ``Qlib`` provides a ``TaskCollector`` to collect the tasks by filter condition (optional).
Here are some methods in this class.
.. autoclass:: qlib.workflow.task.collect.TaskCollector
:members:
:members:
``Qlib`` provides a concrete `example <https://github.com/microsoft/qlib/tree/main/examples/taskmanager/task_manager_rolling_with_updating.py>`_, including a whole process of `Task Generating`_ (using `RollingGen <https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/gen.py>`_), `Task Storing`_, `Task Running`_ and `Task Collecting`_.
Besides, the `example <https://github.com/microsoft/qlib/tree/main/examples/taskmanager/task_manager_rolling_with_updating.py>`_ uses a ``ModelUpdater`` inherited from ``TaskCollector``, which can update the inferences and retrain the model if it is out of date.
Actually, the model updating can be viewed as a subset of ``Online Serving``.

View File

@@ -155,6 +155,35 @@ Record Template
:members:
Task Management
====================
RollingGen
--------------------
.. autoclass:: qlib.workflow.task.gen.RollingGen
:members:
TaskManager
--------------------
.. autoclass:: qlib.workflow.task.manage.TaskManager
:members:
TaskCollector
--------------------
.. autoclass:: qlib.workflow.task.collect.TaskCollector
:members:
ModelUpdater
--------------------
.. autoclass:: qlib.workflow.task.update.ModelUpdater
:members:
TimeAdjuster
--------------------
.. autoclass:: qlib.workflow.task.utils.TimeAdjuster
:members:
Utils
====================

View File

@@ -18,8 +18,8 @@ class TaskCollector:
def list_recorders(self, rec_filter_func=None, task_filter_func=None, only_finished=True, only_have_task=False):
"""
Return a dict of {rid:Recorder} by recorder filter and task filter. It is not necessary to use those filter.
If you don't train with "task_train", then there is no "task" which includes the task config.
Return a dict of {rid: Recorder} by recorder filter and task filter. It is not necessary to use those filter.
If you don't train with "task_train", then there is no "task"(a file in mlruns/artifacts) which includes the task config.
If there is a "task", then it will become rec.task which can be get simply.
Parameters
@@ -36,12 +36,8 @@ class TaskCollector:
Returns
-------
dict
a dict of {rid:Recorder}
a dict of {rid: Recorder}
Raises
------
OSError
if you use a task filter, but there is no "task" which includes the task config
"""
recs = self.exp.list_recorders()
recs_flt = {}
@@ -69,13 +65,14 @@ class TaskCollector:
task_filter_func=None,
):
"""
Collect predictions using a filter and a key function.
Parameters
----------
experiment_name : str
get_key_func : function(task: dict) -> Union[Number, str, tuple]
get_key_func : Callable[[dict], bool] -> Union[Number, str, tuple]
get the key of a task when collect it
filter_func : function(task: dict) -> bool
filter_func : Callable[[dict], bool] -> bool
to judge a task will be collected or not
Returns
@@ -108,6 +105,18 @@ class TaskCollector:
self,
task_filter_func=None,
):
"""Collect latest recorders using a filter.
Parameters
----------
task_filter_func : Callable[[dict], bool], optional
to judge a task will be collected or not, by default None
Returns
-------
dict, tuple
a dict of recorders and a tuple of test segments
"""
recs_flt = self.list_recorders(task_filter_func=task_filter_func, only_have_task=True)
if len(recs_flt) == 0:

View File

@@ -130,30 +130,32 @@ class RollingGen(TaskGen):
task : dict
A dict describing a task. For example.
DEFAULT_TASK = {
"model": {
"class": "LGBModel",
"module_path": "qlib.contrib.model.gbdt",
},
"dataset": {
"class": "DatasetH",
"module_path": "qlib.data.dataset",
"kwargs": {
"handler": {
"class": "Alpha158",
"module_path": "qlib.contrib.data.handler",
"kwargs": data_handler_config,
},
"segments": {
"train": ("2008-01-01", "2014-12-31"),
"valid": ("2015-01-01", "2016-12-20"), # Please avoid leaking the future test data into validation
"test": ("2017-01-01", "2020-08-01"),
.. code-block:: python
DEFAULT_TASK = {
"model": {
"class": "LGBModel",
"module_path": "qlib.contrib.model.gbdt",
},
"dataset": {
"class": "DatasetH",
"module_path": "qlib.data.dataset",
"kwargs": {
"handler": {
"class": "Alpha158",
"module_path": "qlib.contrib.data.handler",
"kwargs": data_handler_config,
},
"segments": {
"train": ("2008-01-01", "2014-12-31"),
"valid": ("2015-01-01", "2016-12-20"), # Please avoid leaking the future test data into validation
"test": ("2017-01-01", "2020-08-01"),
},
},
},
},
# You shoud record the data in specific sequence
# "record": ['SignalRecord', 'SigAnaRecord', 'PortAnaRecord'],
}
# You shoud record the data in specific sequence
# "record": ['SignalRecord', 'SigAnaRecord', 'PortAnaRecord'],
}
"""
res = []

View File

@@ -18,13 +18,12 @@ import concurrent
import pymongo
from qlib.config import C
from .utils import get_mongodb
from qlib import auto_init
from qlib import get_module_logger
class TaskManager:
"""TaskManager
here is the what will a task looks like
here is what will a task looks like when it created by TaskManager
.. code-block:: python
@@ -40,7 +39,7 @@ class TaskManager:
.. note::
assumption: the data in MongoDB was encoded and the data out of MongoDB was decoded
Assumption: the data in MongoDB was encoded and the data out of MongoDB was decoded
"""
STATUS_WAITING = "waiting"
@@ -118,6 +117,7 @@ class TaskManager:
Parameters
----------
task_def: dict
the task definition
task_pool: str
the name of Collection in MongoDB

View File

@@ -110,7 +110,7 @@ class ModelUpdater(TaskCollector):
def update_all_pred(self, rec_filter_func=None):
"""update all predictions in this experiment after filter.
An example of filter function:
An example of filter function:
.. code-block:: python

View File

@@ -107,11 +107,14 @@ class TimeAdjuster:
align the given date to trade date
for example:
input: {'train': ('2008-01-01', '2014-12-31'), 'valid': ('2015-01-01', '2016-12-31'), 'test': ('2017-01-01', '2020-08-01')}
output: {'train': (Timestamp('2008-01-02 00:00:00'), Timestamp('2014-12-31 00:00:00')),
'valid': (Timestamp('2015-01-05 00:00:00'), Timestamp('2016-12-30 00:00:00')),
'test': (Timestamp('2017-01-03 00:00:00'), Timestamp('2020-07-31 00:00:00'))}
.. code-block:: python
input: {'train': ('2008-01-01', '2014-12-31'), 'valid': ('2015-01-01', '2016-12-31'), 'test': ('2017-01-01', '2020-08-01')}
output: {'train': (Timestamp('2008-01-02 00:00:00'), Timestamp('2014-12-31 00:00:00')),
'valid': (Timestamp('2015-01-05 00:00:00'), Timestamp('2016-12-30 00:00:00')),
'test': (Timestamp('2017-01-03 00:00:00'), Timestamp('2020-07-31 00:00:00'))}
Parameters
----------