mirror of
https://github.com/microsoft/qlib.git
synced 2026-06-06 05:51:17 +08:00
update docstring and document
This commit is contained in:
@@ -9,50 +9,52 @@ Task Management
|
||||
Introduction
|
||||
=============
|
||||
|
||||
The `Workflow <../component/introduction.html>`_ part introduce how to run research workflow in a loosely-coupled way. But it can only execute one ``task`` when you use ``qrun``. To automatically generate and execute different tasks, Task Management module provide a whole process including `Task Generating`_, `Task Storing`_, `Task Running`_ and `Task Collecting`_.
|
||||
With this module, users can run their ``task`` automatically at different periods, in different losses or even by different models.
|
||||
The `Workflow <../component/introduction.html>`_ part introduces how to run research workflow in a loosely-coupled way. But it can only execute one ``task`` when you use ``qrun``.
|
||||
To automatically generate and execute different tasks, ``Task Management`` provides a whole process including `Task Generating`_, `Task Storing`_, `Task Running`_ and `Task Collecting`_.
|
||||
With this module, users can run their ``task`` automatically at different periods, in different losses, or even by different models.
|
||||
|
||||
An example of the entire process is shown `here <>`_.
|
||||
An example of the entire process is shown `here <https://github.com/microsoft/qlib/tree/main/examples/taskmanager/task_manager_rolling.py>`_.
|
||||
|
||||
Task Generating
|
||||
===============
|
||||
A ``task`` consists of `Model`, `Dataset`, `Record` or anything added by users.
|
||||
The specific task template can be viewed in
|
||||
The specific task template(/definition/config) can be viewed in
|
||||
`Task Section <../component/workflow.html#task-section>`_.
|
||||
Even though the task template is fixed, Users can use ``TaskGen`` to generate different ``task`` by task template.
|
||||
Even though the task template is fixed, users can customize their ``TaskGen`` to generate different ``task`` by task template.
|
||||
|
||||
Here is the base class of TaskGen:
|
||||
Here is the base class of ``TaskGen``:
|
||||
|
||||
.. autoclass:: qlib.workflow.task.gen.TaskGen
|
||||
:members:
|
||||
|
||||
``Qlib`` provider a class `RollingGen<https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/gen.py>`_ to generate a list of ``task`` of dataset in different date segments.
|
||||
This allows users to verify the effect of data from different periods on the model in one experiment.
|
||||
``Qlib`` provider a class `RollingGen <https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/gen.py>`_ to generate a list of ``task`` of the dataset in different date segments.
|
||||
This class allows users to verify the effect of data from different periods on the model in one experiment.
|
||||
|
||||
Task Storing
|
||||
===============
|
||||
In order to achieve higher efficiency and the possibility of cluster operation, ``Task Manager`` will store all tasks in `MongoDB <https://www.mongodb.com/>`_.
|
||||
To achieve higher efficiency and the possibility of cluster operation, ``Task Manager`` will store all tasks in `MongoDB <https://www.mongodb.com/>`_.
|
||||
Users **MUST** finished the configuration of `MongoDB <https://www.mongodb.com/>`_ when using this module.
|
||||
|
||||
Users need to provide the url and database of ``task`` storing like this.
|
||||
Users need to provide the URL and database name of ``task`` storing like this.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from qlib.config import C
|
||||
C["mongo"] = {
|
||||
"task_url" : "mongodb://localhost:27017/", # maybe you need to change it to your url
|
||||
"task_db_name" : "rolling_db" # you can custom database name
|
||||
"task_url" : "mongodb://localhost:27017/", # your MongoDB url
|
||||
"task_db_name" : "rolling_db" # database name
|
||||
}
|
||||
|
||||
The CRUD methods of ``task`` can be found in TaskManager. More methods can be seen in the `Github<https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/manage.py>`_.
|
||||
The CRUD methods of ``task`` can be found in TaskManager.
|
||||
More methods can be seen in the `Github <https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/manage.py>`_.
|
||||
|
||||
.. autoclass:: qlib.workflow.task.manage.TaskManager
|
||||
:members:
|
||||
|
||||
Task Running
|
||||
===============
|
||||
After generating and storing those ``task``, it's time to run the ``task`` in the *WAITING* status.
|
||||
``qlib`` provide a method to run those ``task`` in task pool, however users can also customize how tasks are executed.
|
||||
After generating and storing those ``task``, it's time to run the ``task`` which are in the *WAITING* status.
|
||||
``Qlib`` provides a method called ``run_task`` to run those ``task`` in task pool, however, users can also customize how tasks are executed.
|
||||
An easy way to get the ``task_func`` is using ``qlib.model.trainer.task_train`` directly.
|
||||
It will run the whole workflow defined by ``task``, which includes *Model*, *Dataset*, *Record*.
|
||||
|
||||
@@ -60,8 +62,12 @@ It will run the whole workflow defined by ``task``, which includes *Model*, *Dat
|
||||
|
||||
Task Collecting
|
||||
===============
|
||||
To see the results of ``task`` after running, ``Qlib`` provide a task collector to collect the tasks by filter condition (optional).
|
||||
The collector will return a dict of filtered key (users defined by task config) and value (predict scores from ``pred.pkl``).
|
||||
To see the results of ``task`` after running or to update something, ``Qlib`` provides a ``TaskCollector`` to collect the tasks by filter condition (optional).
|
||||
Here are some methods in this class.
|
||||
|
||||
.. autoclass:: qlib.workflow.task.collect.TaskCollector
|
||||
:members:
|
||||
:members:
|
||||
|
||||
``Qlib`` provides a concrete `example <https://github.com/microsoft/qlib/tree/main/examples/taskmanager/task_manager_rolling_with_updating.py>`_, including a whole process of `Task Generating`_ (using `RollingGen <https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/gen.py>`_), `Task Storing`_, `Task Running`_ and `Task Collecting`_.
|
||||
Besides, the `example <https://github.com/microsoft/qlib/tree/main/examples/taskmanager/task_manager_rolling_with_updating.py>`_ uses a ``ModelUpdater`` inherited from ``TaskCollector``, which can update the inferences and retrain the model if it is out of date.
|
||||
Actually, the model updating can be viewed as a subset of ``Online Serving``.
|
||||
@@ -155,6 +155,35 @@ Record Template
|
||||
:members:
|
||||
|
||||
|
||||
Task Management
|
||||
====================
|
||||
|
||||
|
||||
RollingGen
|
||||
--------------------
|
||||
.. autoclass:: qlib.workflow.task.gen.RollingGen
|
||||
:members:
|
||||
|
||||
TaskManager
|
||||
--------------------
|
||||
.. autoclass:: qlib.workflow.task.manage.TaskManager
|
||||
:members:
|
||||
|
||||
TaskCollector
|
||||
--------------------
|
||||
.. autoclass:: qlib.workflow.task.collect.TaskCollector
|
||||
:members:
|
||||
|
||||
ModelUpdater
|
||||
--------------------
|
||||
.. autoclass:: qlib.workflow.task.update.ModelUpdater
|
||||
:members:
|
||||
|
||||
TimeAdjuster
|
||||
--------------------
|
||||
.. autoclass:: qlib.workflow.task.utils.TimeAdjuster
|
||||
:members:
|
||||
|
||||
Utils
|
||||
====================
|
||||
|
||||
|
||||
@@ -18,8 +18,8 @@ class TaskCollector:
|
||||
|
||||
def list_recorders(self, rec_filter_func=None, task_filter_func=None, only_finished=True, only_have_task=False):
|
||||
"""
|
||||
Return a dict of {rid:Recorder} by recorder filter and task filter. It is not necessary to use those filter.
|
||||
If you don't train with "task_train", then there is no "task" which includes the task config.
|
||||
Return a dict of {rid: Recorder} by recorder filter and task filter. It is not necessary to use those filter.
|
||||
If you don't train with "task_train", then there is no "task"(a file in mlruns/artifacts) which includes the task config.
|
||||
If there is a "task", then it will become rec.task which can be get simply.
|
||||
|
||||
Parameters
|
||||
@@ -36,12 +36,8 @@ class TaskCollector:
|
||||
Returns
|
||||
-------
|
||||
dict
|
||||
a dict of {rid:Recorder}
|
||||
a dict of {rid: Recorder}
|
||||
|
||||
Raises
|
||||
------
|
||||
OSError
|
||||
if you use a task filter, but there is no "task" which includes the task config
|
||||
"""
|
||||
recs = self.exp.list_recorders()
|
||||
recs_flt = {}
|
||||
@@ -69,13 +65,14 @@ class TaskCollector:
|
||||
task_filter_func=None,
|
||||
):
|
||||
"""
|
||||
Collect predictions using a filter and a key function.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
experiment_name : str
|
||||
get_key_func : function(task: dict) -> Union[Number, str, tuple]
|
||||
get_key_func : Callable[[dict], bool] -> Union[Number, str, tuple]
|
||||
get the key of a task when collect it
|
||||
filter_func : function(task: dict) -> bool
|
||||
filter_func : Callable[[dict], bool] -> bool
|
||||
to judge a task will be collected or not
|
||||
|
||||
Returns
|
||||
@@ -108,6 +105,18 @@ class TaskCollector:
|
||||
self,
|
||||
task_filter_func=None,
|
||||
):
|
||||
"""Collect latest recorders using a filter.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
task_filter_func : Callable[[dict], bool], optional
|
||||
to judge a task will be collected or not, by default None
|
||||
|
||||
Returns
|
||||
-------
|
||||
dict, tuple
|
||||
a dict of recorders and a tuple of test segments
|
||||
"""
|
||||
recs_flt = self.list_recorders(task_filter_func=task_filter_func, only_have_task=True)
|
||||
|
||||
if len(recs_flt) == 0:
|
||||
|
||||
@@ -130,30 +130,32 @@ class RollingGen(TaskGen):
|
||||
task : dict
|
||||
A dict describing a task. For example.
|
||||
|
||||
DEFAULT_TASK = {
|
||||
"model": {
|
||||
"class": "LGBModel",
|
||||
"module_path": "qlib.contrib.model.gbdt",
|
||||
},
|
||||
"dataset": {
|
||||
"class": "DatasetH",
|
||||
"module_path": "qlib.data.dataset",
|
||||
"kwargs": {
|
||||
"handler": {
|
||||
"class": "Alpha158",
|
||||
"module_path": "qlib.contrib.data.handler",
|
||||
"kwargs": data_handler_config,
|
||||
},
|
||||
"segments": {
|
||||
"train": ("2008-01-01", "2014-12-31"),
|
||||
"valid": ("2015-01-01", "2016-12-20"), # Please avoid leaking the future test data into validation
|
||||
"test": ("2017-01-01", "2020-08-01"),
|
||||
.. code-block:: python
|
||||
|
||||
DEFAULT_TASK = {
|
||||
"model": {
|
||||
"class": "LGBModel",
|
||||
"module_path": "qlib.contrib.model.gbdt",
|
||||
},
|
||||
"dataset": {
|
||||
"class": "DatasetH",
|
||||
"module_path": "qlib.data.dataset",
|
||||
"kwargs": {
|
||||
"handler": {
|
||||
"class": "Alpha158",
|
||||
"module_path": "qlib.contrib.data.handler",
|
||||
"kwargs": data_handler_config,
|
||||
},
|
||||
"segments": {
|
||||
"train": ("2008-01-01", "2014-12-31"),
|
||||
"valid": ("2015-01-01", "2016-12-20"), # Please avoid leaking the future test data into validation
|
||||
"test": ("2017-01-01", "2020-08-01"),
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
# You shoud record the data in specific sequence
|
||||
# "record": ['SignalRecord', 'SigAnaRecord', 'PortAnaRecord'],
|
||||
}
|
||||
# You shoud record the data in specific sequence
|
||||
# "record": ['SignalRecord', 'SigAnaRecord', 'PortAnaRecord'],
|
||||
}
|
||||
"""
|
||||
res = []
|
||||
|
||||
|
||||
@@ -18,13 +18,12 @@ import concurrent
|
||||
import pymongo
|
||||
from qlib.config import C
|
||||
from .utils import get_mongodb
|
||||
from qlib import auto_init
|
||||
from qlib import get_module_logger
|
||||
|
||||
|
||||
class TaskManager:
|
||||
"""TaskManager
|
||||
here is the what will a task looks like
|
||||
here is what will a task looks like when it created by TaskManager
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@@ -40,7 +39,7 @@ class TaskManager:
|
||||
|
||||
.. note::
|
||||
|
||||
assumption: the data in MongoDB was encoded and the data out of MongoDB was decoded
|
||||
Assumption: the data in MongoDB was encoded and the data out of MongoDB was decoded
|
||||
"""
|
||||
|
||||
STATUS_WAITING = "waiting"
|
||||
@@ -118,6 +117,7 @@ class TaskManager:
|
||||
Parameters
|
||||
----------
|
||||
task_def: dict
|
||||
the task definition
|
||||
task_pool: str
|
||||
the name of Collection in MongoDB
|
||||
|
||||
|
||||
@@ -110,7 +110,7 @@ class ModelUpdater(TaskCollector):
|
||||
def update_all_pred(self, rec_filter_func=None):
|
||||
"""update all predictions in this experiment after filter.
|
||||
|
||||
An example of filter function:
|
||||
An example of filter function:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
|
||||
@@ -107,11 +107,14 @@ class TimeAdjuster:
|
||||
align the given date to trade date
|
||||
|
||||
for example:
|
||||
input: {'train': ('2008-01-01', '2014-12-31'), 'valid': ('2015-01-01', '2016-12-31'), 'test': ('2017-01-01', '2020-08-01')}
|
||||
|
||||
output: {'train': (Timestamp('2008-01-02 00:00:00'), Timestamp('2014-12-31 00:00:00')),
|
||||
'valid': (Timestamp('2015-01-05 00:00:00'), Timestamp('2016-12-30 00:00:00')),
|
||||
'test': (Timestamp('2017-01-03 00:00:00'), Timestamp('2020-07-31 00:00:00'))}
|
||||
.. code-block:: python
|
||||
|
||||
input: {'train': ('2008-01-01', '2014-12-31'), 'valid': ('2015-01-01', '2016-12-31'), 'test': ('2017-01-01', '2020-08-01')}
|
||||
|
||||
output: {'train': (Timestamp('2008-01-02 00:00:00'), Timestamp('2014-12-31 00:00:00')),
|
||||
'valid': (Timestamp('2015-01-05 00:00:00'), Timestamp('2016-12-30 00:00:00')),
|
||||
'test': (Timestamp('2017-01-03 00:00:00'), Timestamp('2020-07-31 00:00:00'))}
|
||||
|
||||
Parameters
|
||||
----------
|
||||
|
||||
Reference in New Issue
Block a user