mirror of
https://github.com/microsoft/qlib.git
synced 2026-06-06 05:51:17 +08:00
Update docs
This commit is contained in:
@@ -29,7 +29,18 @@ Qlib Format Data
|
||||
------------------
|
||||
|
||||
We've specially designed a data structure to manage financial data, please refer to the `File storage design section in Qlib paper <https://arxiv.org/abs/2009.11189>`_ for detailed information.
|
||||
Such data will be stored with filename suffix `.bin` (We'll call them `.bin` file, `.bin` format, or qlib format). `.bin` file is designed for scientific computing on finance data
|
||||
Such data will be stored with filename suffix `.bin` (We'll call them `.bin` file, `.bin` format, or qlib format). `.bin` file is designed for scientific computing on finance data.
|
||||
|
||||
``Qlib`` provides two different off-the-shelf dataset, which can be accessed through this `link <https://github.com/microsoft/qlib/blob/main/qlib/contrib/data/handler.py>`_:
|
||||
|
||||
======================== ================= ================
|
||||
Dataset US Market China Market
|
||||
======================== ================= ================
|
||||
Alpha360 √ √
|
||||
|
||||
Alpha158 √ √
|
||||
======================== ================= ================
|
||||
|
||||
|
||||
Qlib Format Dataset
|
||||
--------------------
|
||||
@@ -45,7 +56,7 @@ In addition to China-Stock data, ``Qlib`` also includes a US-Stock dataset, whic
|
||||
|
||||
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/us_data --region us
|
||||
|
||||
After running the above command, users can find china-stock and us-stock data in Qlib format in the ``~/.qlib/csv_data/cn_data`` directory and ``~/.qlib/csv_data/us_data`` directory respectively.
|
||||
After running the above command, users can find china-stock and us-stock data in ``Qlib`` format in the ``~/.qlib/csv_data/cn_data`` directory and ``~/.qlib/csv_data/us_data`` directory respectively.
|
||||
|
||||
``Qlib`` also provides the scripts in ``scripts/data_collector`` to help users crawl the latest data on the Internet and convert it to qlib format.
|
||||
|
||||
@@ -54,8 +65,7 @@ When ``Qlib`` is initialized with this dataset, users could build and evaluate t
|
||||
Converting CSV Format into Qlib Format
|
||||
-------------------------------------------
|
||||
|
||||
``Qlib`` has provided the script ``scripts/dump_bin.py`` to convert data in CSV format into `.bin` files (Qlib format).
|
||||
|
||||
``Qlib`` has provided the script ``scripts/dump_bin.py`` to convert **any** data in CSV format into `.bin` files (``Qlib`` format) as long as they are in the correct format.
|
||||
|
||||
Users can download the demo china-stock data in CSV format as follows for reference to the CSV format.
|
||||
|
||||
@@ -130,9 +140,21 @@ After conversion, users can find their Qlib format data in the directory `~/.qli
|
||||
|
||||
In the convention of `Qlib` data processing, `open, close, high, low, volume, money and factor` will be set to NaN if the stock is suspended.
|
||||
|
||||
China-Stock Mode & US-Stock Mode
|
||||
Multiple Stock Modes
|
||||
--------------------------------
|
||||
|
||||
``Qlib`` now provides two different stock modes for users: China-Stock Mode & US-Stock Mode. Here are some different settings of these two modes:
|
||||
|
||||
============== ================= ================
|
||||
Region Trade Unit Limit Threshold
|
||||
============== ================= ================
|
||||
China 100 0.099
|
||||
|
||||
US 1 None
|
||||
============== ================= ================
|
||||
|
||||
The `trade unit` defines the unit number of stocks can be used in a trade, and the `limit threshold` defines the bound set to the percentage of ups and downs of a stock.
|
||||
|
||||
- If users use ``Qlib`` in china-stock mode, china-stock data is required. Users can use ``Qlib`` in china-stock mode according to the following steps:
|
||||
- Download china-stock in qlib format, please refer to section `Qlib Format Dataset <#qlib-format-dataset>`_.
|
||||
- Initialize ``Qlib`` in china-stock mode
|
||||
@@ -208,13 +230,19 @@ QlibDataLoader
|
||||
|
||||
The ``QlibDataLoader`` class in ``Qlib`` is such an interface that allows users to load raw data from the ``Qlib`` data source.
|
||||
|
||||
StaticDataLoader
|
||||
---------------
|
||||
|
||||
The ``StaticDataLoader`` class in ``Qlib`` is such an interface that allows users to load raw data from file or as provided.
|
||||
|
||||
|
||||
Interface
|
||||
------------
|
||||
|
||||
Here are some interfaces of the ``QlibDataLoader`` class:
|
||||
|
||||
.. autoclass:: qlib.data.dataset.loader.QlibDataLoader
|
||||
:members: load, load_group_df
|
||||
.. autoclass:: qlib.data.dataset.loader.DataLoader
|
||||
:members:
|
||||
|
||||
API
|
||||
-----------
|
||||
|
||||
@@ -18,45 +18,10 @@ Base Class & Interface
|
||||
|
||||
The base class provides the following interfaces:
|
||||
|
||||
- `__init__(**kwargs)`
|
||||
- Initialization.
|
||||
|
||||
- `fit(self, dataset, **kwargs)`
|
||||
- Train model.
|
||||
- Parameter:
|
||||
- `dataset`, ``Qlib``'s ``DatasetH`` type. For more information about ``DatasetH``, users can refer to the related document: `Qlib Dataset <../component/data.html#dataset>`_.
|
||||
The `dataset` is passed into the `model`'s method because there are some unique data preprocessing procedures for each, we want to give each model maximum flexibility to handle the data that is suitable for their own.
|
||||
The following code example shows how to retrieve `x_train`, `y_train` and `w_train` from the `dataset`:
|
||||
|
||||
.. code-block:: Python
|
||||
|
||||
# get features and labels
|
||||
df_train, df_valid = dataset.prepare(
|
||||
["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L
|
||||
)
|
||||
x_train, y_train = df_train["feature"], df_train["label"]
|
||||
x_valid, y_valid = df_valid["feature"], df_valid["label"]
|
||||
|
||||
# get weights
|
||||
try:
|
||||
wdf_train, wdf_valid = dataset.prepare(["train", "valid"], col_set=["weight"], data_key=DataHandlerLP.DK_L)
|
||||
w_train, w_valid = wdf_train["weight"], wdf_valid["weight"]
|
||||
except KeyError as e:
|
||||
w_train = pd.DataFrame(np.ones_like(y_train.values), index=y_train.index)
|
||||
w_valid = pd.DataFrame(np.ones_like(y_valid.values), index=y_valid.index)
|
||||
|
||||
- `predict(self, dataset, **kwargs)`
|
||||
- Predict test data.
|
||||
- Parameter:
|
||||
- `dataset`, ``Qlib``'s ``DatasetH`` type. The usage is similar to the example above.
|
||||
- Returns:
|
||||
- Predic results with type: `pandas.Series`.
|
||||
|
||||
- `finetune(self, dataset, **kwargs)`
|
||||
- Finetune the model.
|
||||
- Parameter:
|
||||
- `dataset`, ``Qlib``'s ``DatasetH`` type. The usage is similar to the example above.
|
||||
.. autoclass:: qlib.model.base.Model
|
||||
:members:
|
||||
|
||||
``Qlib`` also provides a base class `qlib.model.base.ModelFT <../reference/api.html#qlib.model.base.ModelFT>`_, which includes the method for finetuning the model.
|
||||
|
||||
For other interfaces such as `finetune`, please refer to `Model API <../reference/api.html#module-qlib.model.base>`_.
|
||||
|
||||
|
||||
@@ -72,6 +72,8 @@ The ``Experiment`` class is solely responsible for a single experiment, and it w
|
||||
|
||||
For other interfaces such as `search_records`, `delete_recorder`, please refer to `Experiment API <../reference/api.html#experiment>`_.
|
||||
|
||||
``Qlib`` also provides a default ``Experiment``, which will be created and used under certain situations when users use the APIs such as `log_metrics` or `get_exp`. If the default ``Experiment`` is used, there will be related logged information when running ``Qlib``. Users are able to change the name of the default ``Experiment`` in the config file of ``Qlib`` or during ``Qlib``'s `initialization <../start/initialization.html#parameters>`_, which is set to be '`Experiment`'.
|
||||
|
||||
Recorder
|
||||
===================
|
||||
|
||||
|
||||
@@ -11,8 +11,8 @@ Introduction
|
||||
The components in `Qlib Framework <../introduction/introduction.html#framework>`_ are designed in a loosely-coupled way. Users could build their own Quant research workflow with these components like `Example <https://github.com/microsoft/qlib/blob/main/examples/workflow_by_code.py>`_.
|
||||
|
||||
|
||||
Besides, ``Qlib`` provides more user-friendly interfaces named ``qrun`` to automatically run the whole workflow defined by configuration. A concrete execution of the whole workflow is called an `experiment`.
|
||||
With ``qrun``, user can easily run an `experiment`, which includes the following steps:
|
||||
Besides, ``Qlib`` provides more user-friendly interfaces named ``qrun`` to automatically run the whole workflow defined by configuration. Running the whole workflow is called an `execution`.
|
||||
With ``qrun``, user can easily start an `execution`, which includes the following steps:
|
||||
|
||||
- Data
|
||||
- Loading
|
||||
@@ -25,7 +25,7 @@ With ``qrun``, user can easily run an `experiment`, which includes the following
|
||||
- Forecast signal analysis
|
||||
- Backtest
|
||||
|
||||
For each `experiment`, ``Qlib`` has a complete system to tracking all the information as well as artifacts generated during training, inference and evaluation phase. For more information about how Qlib handles `experiment`, please refer to the related document: `Recorder: Experiment Management <../component/recorder.html>`_.
|
||||
For each `execution`, ``Qlib`` has a complete system to tracking all the information as well as artifacts generated during training, inference and evaluation phase. For more information about how ``Qlib`` handles this, please refer to the related document: `Recorder: Experiment Management <../component/recorder.html>`_.
|
||||
|
||||
Complete Example
|
||||
===================
|
||||
@@ -35,8 +35,9 @@ Below is a typical config file of ``qrun``.
|
||||
|
||||
.. code-block:: YAML
|
||||
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
@@ -100,12 +101,16 @@ After saving the config into `configuration.yaml`, users could start the workflo
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
qrun -c configuration.yaml
|
||||
qrun configuration.yaml
|
||||
|
||||
.. note::
|
||||
|
||||
`qrun` will be placed in your $PATH directory when installing ``Qlib``.
|
||||
|
||||
.. note::
|
||||
|
||||
The symbol `&` in `yaml` file stands for an anchor of a field, which is useful when another fields include this parameter as part of the value. Taking the configuration file above as an example, users can directly change the value of `market` and `benchmark` without traversing the entire configuration file.
|
||||
|
||||
|
||||
Configuration File
|
||||
===================
|
||||
@@ -114,17 +119,15 @@ Let's get into details of ``qrun`` in this section.
|
||||
|
||||
Before using ``qrun``, users need to prepare a configuration file. The following content shows how to prepare each part of the configuration file.
|
||||
|
||||
Qlib Data Section
|
||||
Qlib Init Section
|
||||
--------------------
|
||||
|
||||
At first, the configuration file needs to contain several basic parameters about the data, which will be used for qlib initialization, data handling and backtest.
|
||||
At first, the configuration file needs to contain several basic parameters which will be used for qlib initialization.
|
||||
|
||||
.. code-block:: YAML
|
||||
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
|
||||
The meaning of each field is as follows:
|
||||
|
||||
@@ -139,34 +142,14 @@ The meaning of each field is as follows:
|
||||
|
||||
The value of `region` should be aligned with the data stored in `provider_uri`.
|
||||
|
||||
- `market`
|
||||
Type: str. Index name, the default value is `csi500`.
|
||||
|
||||
- `benchmark`
|
||||
Type: str, list or pandas.Series. Stock index symbol, the default value is `SH000905`.
|
||||
Task Section
|
||||
--------------------
|
||||
|
||||
.. note::
|
||||
|
||||
* If `benchmark` is str, it will use the daily change as the 'bench'.
|
||||
|
||||
* If `benchmark` is list, it will use the daily average change of the stock pool in the list as the 'bench'.
|
||||
|
||||
* If `benchmark` is pandas.Series, whose `index` is trading date and the value T is the change from T-1 to T, it will be directly used as the 'bench'. An example is as following:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
print(D.features(D.instruments('csi500'), ['$close/Ref($close, 1)-1'])['$close/Ref($close, 1)-1'].head())
|
||||
2017-01-04 0.011693
|
||||
2017-01-05 0.000721
|
||||
2017-01-06 -0.004322
|
||||
2017-01-09 0.006874
|
||||
2017-01-10 -0.003350
|
||||
.. note::
|
||||
|
||||
The symbol `&` in `yaml` file stands for an anchor of a field, which is useful when another fields include this parameter as part of the value. Taking the configuration file above as an example, users can directly change the value of `market` and `benchmark` without traversing the entire configuration file.
|
||||
The `task` field in the configuration corresponds to a `task`, which contains the parameters of three different subsections: `Model`, `Dataset` and `Record`.
|
||||
|
||||
Model Section
|
||||
--------------------
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In the `task` field, the `model` section describes the parameters of the model to be used for training and inference. For more information about the base ``Model`` class, please refer to `Qlib Model <../component/model.html>`_.
|
||||
|
||||
@@ -202,7 +185,7 @@ The meaning of each field is as follows:
|
||||
``Qlib`` provides a util named: ``init_instance_by_config`` to initialize any class inside ``Qlib`` with the configuration includes the fields: `class`, `module_path` and `kwargs`.
|
||||
|
||||
Dataset Section
|
||||
--------------------
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The `dataset` field describes the parameters for the ``Dataset`` module in ``Qlib`` as well those for the module ``DataHandler``. For more information about the ``Dataset`` module, please refer to `Qlib Model <../component/data.html#dataset>`_.
|
||||
|
||||
@@ -237,9 +220,9 @@ Here is the configuration for the ``Dataset`` module which will take care of dat
|
||||
test: [2017-01-01, 2020-08-01]
|
||||
|
||||
Record Section
|
||||
--------------------
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The `record` field is about the parameters the ``Record`` module in ``Qlib``. ``Record`` is responsible for generating certain analysis and evaluation results such as `prediction`, `information Coefficient (IC)` and `backtest`.
|
||||
The `record` field is about the parameters the ``Record`` module in ``Qlib``. ``Record`` is responsible for tracking training process and results such as `information Coefficient (IC)` and `backtest` in a standard format.
|
||||
|
||||
The following script is the configuration of `backtest` and the `strategy` used in `backtest`:
|
||||
|
||||
|
||||
@@ -19,8 +19,8 @@ The Custom models need to inherit `qlib.model.base.Model <../reference/api.html#
|
||||
|
||||
- Override the `__init__` method
|
||||
- ``Qlib`` passes the initialized parameters to the \_\_init\_\_ method.
|
||||
- The parameter must be consistent with the hyperparameters in the configuration file.
|
||||
- Code Example: In the following example, the hyperparameter filed of the configuration file should contain parameters such as `loss:mse`.
|
||||
- The hyperparameters of model in the configuration must be consistent with those defined in the `__init__` method.
|
||||
- Code Example: In the following example, the hyperparameters of model in the configuration file should contain parameters such as `loss:mse`.
|
||||
.. code-block:: Python
|
||||
|
||||
def __init__(self, loss='mse', **kwargs):
|
||||
@@ -31,9 +31,9 @@ The Custom models need to inherit `qlib.model.base.Model <../reference/api.html#
|
||||
self._model = None
|
||||
|
||||
- Override the `fit` method
|
||||
- ``Qlib`` calls the fit method to train the model
|
||||
- The parameters must include training feature `dataset`.
|
||||
- The parameters could include some optional parameters with default values, such as `num_boost_round = 1000` for `GBDT`.
|
||||
- ``Qlib`` calls the fit method to train the model.
|
||||
- The parameters must include training feature `dataset`, which is designed in the interface.
|
||||
- The parameters could include some `optional` parameters with default values, such as `num_boost_round = 1000` for `GBDT`.
|
||||
- Code Example: In the following example, `num_boost_round = 1000` is an optional parameter.
|
||||
.. code-block:: Python
|
||||
|
||||
@@ -69,7 +69,7 @@ The Custom models need to inherit `qlib.model.base.Model <../reference/api.html#
|
||||
)
|
||||
|
||||
- Override the `predict` method
|
||||
- The parameters must include training feature `dataset`, which will be userd to get the test dataset.
|
||||
- The parameters must include the parameter `dataset`, which will be userd to get the test dataset.
|
||||
- Return the `prediction score`.
|
||||
- Please refer to `Model API <../reference/api.html#module-qlib.model.base>`_ for the parameter types of the fit method.
|
||||
- Code Example: In the following example, users need to use `LightGBM` to predict the label(such as `preds`) of test data `x_test` and return it.
|
||||
@@ -81,8 +81,9 @@ The Custom models need to inherit `qlib.model.base.Model <../reference/api.html#
|
||||
x_test = dataset.prepare("test", col_set="feature", data_key=DataHandlerLP.DK_I)
|
||||
return pd.Series(self.model.predict(x_test.values), index=x_test.index)
|
||||
|
||||
- Override the `finetune` method
|
||||
- The parameters must include training feature `dataset`.
|
||||
- Override the `finetune` method (Optional)
|
||||
- This method is optional to the users, and when users one to use this method on their own models, they should inherit the ``ModelFT`` base class, which includes the interface of `finetune`.
|
||||
- The parameters must include the parameter `dataset`.
|
||||
- Code Example: In the following example, users will use `LightGBM` as the model and finetune it.
|
||||
.. code-block:: Python
|
||||
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
|
||||
@@ -1,7 +1,8 @@
|
||||
sys:
|
||||
rel_path: .
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
@@ -46,6 +47,11 @@ task:
|
||||
- class: SignalRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs: {}
|
||||
- class: SigAnaRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
ana_long_short: False
|
||||
ann_scaler: 252
|
||||
- class: PortAnaRecord
|
||||
module_path: qlib.workflow.record_temp
|
||||
kwargs:
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
|
||||
@@ -27,13 +27,32 @@ class Model(BaseModel):
|
||||
|
||||
.. note::
|
||||
|
||||
The the attribute names of learned model should `not` start with '_'. So that the model could be
|
||||
The attribute names of learned model should `not` start with '_'. So that the model could be
|
||||
dumped to disk.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
dataset : Dataset
|
||||
dataset will generate the processed data from model training.
|
||||
|
||||
The following code example shows how to retrieve `x_train`, `y_train` and `w_train` from the `dataset`:
|
||||
|
||||
.. code-block:: Python
|
||||
|
||||
# get features and labels
|
||||
df_train, df_valid = dataset.prepare(
|
||||
["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L
|
||||
)
|
||||
x_train, y_train = df_train["feature"], df_train["label"]
|
||||
x_valid, y_valid = df_valid["feature"], df_valid["label"]
|
||||
|
||||
# get weights
|
||||
try:
|
||||
wdf_train, wdf_valid = dataset.prepare(["train", "valid"], col_set=["weight"], data_key=DataHandlerLP.DK_L)
|
||||
w_train, w_valid = wdf_train["weight"], wdf_valid["weight"]
|
||||
except KeyError as e:
|
||||
w_train = pd.DataFrame(np.ones_like(y_train.values), index=y_train.index)
|
||||
w_valid = pd.DataFrame(np.ones_like(y_valid.values), index=y_valid.index)
|
||||
"""
|
||||
raise NotImplementedError()
|
||||
|
||||
@@ -45,6 +64,10 @@ class Model(BaseModel):
|
||||
----------
|
||||
dataset : Dataset
|
||||
dataset will generate the processed dataset from model training.
|
||||
|
||||
Returns
|
||||
-------
|
||||
Prediction results with certain type such as `pandas.Series`.
|
||||
"""
|
||||
raise NotImplementedError()
|
||||
|
||||
|
||||
@@ -6,29 +6,29 @@ from qlib.workflow import R
|
||||
from qlib.workflow.record_temp import SignalRecord
|
||||
|
||||
|
||||
def task_train(config: dict, experiment_name):
|
||||
def task_train(task_config: dict, experiment_name):
|
||||
"""
|
||||
task based training
|
||||
|
||||
Parameters
|
||||
----------
|
||||
config : dict
|
||||
A dict describing the training process
|
||||
task_config : dict
|
||||
A dict describes a task setting.
|
||||
"""
|
||||
|
||||
# model initiaiton
|
||||
model = init_instance_by_config(config.get("task")["model"])
|
||||
dataset = init_instance_by_config(config.get("task")["dataset"])
|
||||
model = init_instance_by_config(task_config["model"])
|
||||
dataset = init_instance_by_config(task_config["dataset"])
|
||||
|
||||
# start exp
|
||||
with R.start(experiment_name=experiment_name):
|
||||
# train model
|
||||
R.log_params(**flatten_dict(config.get("task")))
|
||||
R.log_params(**flatten_dict(task_config))
|
||||
model.fit(dataset)
|
||||
recorder = R.get_recorder()
|
||||
|
||||
# generate records: prediction, backtest, and analysis
|
||||
for record in config.get("task")["record"]:
|
||||
for record in task_config.get["record"]:
|
||||
if record["class"] == SignalRecord.__name__:
|
||||
srconf = {"model": model, "dataset": dataset, "recorder": recorder}
|
||||
record["kwargs"].update(srconf)
|
||||
|
||||
@@ -90,7 +90,11 @@ class QlibRecorder:
|
||||
|
||||
def search_records(self, experiment_ids, **kwargs):
|
||||
"""
|
||||
Get a pandas DataFrame of records that fit the search criteria. Here is the example code of the method:
|
||||
Get a pandas DataFrame of records that fit the search criteria.
|
||||
|
||||
The arguments of this function are not set to be rigid, and they will be different with different implementation of
|
||||
``ExpManager`` in ``Qlib``. ``Qlib`` now provides an implementation of ``ExpManager`` with mlflow, and here is the
|
||||
example code of the this method with the ``MLflowExpManager``:
|
||||
|
||||
.. code-block:: Python
|
||||
|
||||
@@ -139,7 +143,8 @@ class QlibRecorder:
|
||||
|
||||
If user doesn't provide the id or name of the experiment, this method will try to retrieve the default experiment and
|
||||
list all the recorders of the default experiment. If the default experiment doesn't exist, the method will first
|
||||
create the default experiment, and then create a new recorder under it.
|
||||
create the default experiment, and then create a new recorder under it. (More information about the default experiment
|
||||
can be found `here <../component/recorder.html#qlib.workflow.exp.Experiment>`_).
|
||||
|
||||
Here is the example code:
|
||||
|
||||
@@ -168,27 +173,27 @@ class QlibRecorder:
|
||||
|
||||
- If '`create`' is True:
|
||||
|
||||
- If ``R``'s running:
|
||||
- If `active experiment` exists:
|
||||
|
||||
- no id or name specified, return the active experiment.
|
||||
|
||||
- if id or name is specified, return the specified experiment. If no such exp found, create a new experiment with given id or name, and the experiment is set to be running.
|
||||
- if id or name is specified, return the specified experiment. If no such exp found, create a new experiment with given id or name, and the experiment is set to be active.
|
||||
|
||||
- If ``R``'s not running:
|
||||
- If `active experiment` not exists:
|
||||
|
||||
- no id or name specified, create a default experiment, and the experiment is set to be running.
|
||||
- no id or name specified, create a default experiment, and the experiment is set to be active.
|
||||
|
||||
- if id or name is specified, return the specified experiment. If no such exp found, create a new experiment with given name or the default experiment, and the experiment is set to be running.
|
||||
- if id or name is specified, return the specified experiment. If no such exp found, create a new experiment with given name or the default experiment, and the experiment is set to be active.
|
||||
|
||||
- Else If '`create`' is False:
|
||||
|
||||
- If ``R``'s running:
|
||||
- If ``active experiment` exists:
|
||||
|
||||
- no id or name specified, return the active experiment.
|
||||
|
||||
- if id or name is specified, return the specified experiment. If no such exp found, raise Error.
|
||||
|
||||
- If ``R``'s not running:
|
||||
- If `active experiment` not exists:
|
||||
|
||||
- no id or name specified. If the default experiment exists, return it, otherwise, raise Error.
|
||||
|
||||
@@ -272,13 +277,13 @@ class QlibRecorder:
|
||||
"""
|
||||
Method for retrieving a recorder.
|
||||
|
||||
- If ``R``'s running:
|
||||
- If `active recorder` exists:
|
||||
|
||||
- no id or name specified, return the active recorder.
|
||||
|
||||
- if id or name is specified, return the specified recorder.
|
||||
|
||||
- If ``R``'s not running:
|
||||
- If `active recorder` not exists:
|
||||
|
||||
- no id or name specified, raise Error.
|
||||
|
||||
@@ -351,8 +356,8 @@ class QlibRecorder:
|
||||
from a local file/directory, or directly saving objects. User can use valid python's keywords arguments
|
||||
to specify the object to be saved as well as its name (name: value).
|
||||
|
||||
- If R's running: it will save the objects through the running recorder.
|
||||
- If R's not running: the system will create a default experiment, and a new recorder and save objects under it.
|
||||
- If `active recorder` exists: it will save the objects through the active recorder.
|
||||
- If `active recorder` not exists: the system will create a default experiment, and a new recorder and save objects under it.
|
||||
|
||||
.. note::
|
||||
|
||||
@@ -384,8 +389,8 @@ class QlibRecorder:
|
||||
"""
|
||||
Method for logging parameters during an experiment. In addition to using ``R``, one can also log to a specific recorder after getting it with `get_recorder` API.
|
||||
|
||||
- If R's running: it will log parameters through the running recorder.
|
||||
- If R's not running: the system will create a default experiment as well as a new recorder, and log parameters under it.
|
||||
- If `active recorder` exists: it will log parameters through the active recorder.
|
||||
- If `active recorder` not exists: the system will create a default experiment as well as a new recorder, and log parameters under it.
|
||||
|
||||
Here are some use cases:
|
||||
|
||||
@@ -409,8 +414,8 @@ class QlibRecorder:
|
||||
"""
|
||||
Method for logging metrics during an experiment. In addition to using ``R``, one can also log to a specific recorder after getting it with `get_recorder` API.
|
||||
|
||||
- If R's running: it will log metrics through the running recorder.
|
||||
- If R's not running: the system will create a default experiment as well as a new recorder, and log metrics under it.
|
||||
- If `active recorder` exists: it will log metrics through the active recorder.
|
||||
- If `active recorder` not exists: the system will create a default experiment as well as a new recorder, and log metrics under it.
|
||||
|
||||
Here are some use cases:
|
||||
|
||||
@@ -434,8 +439,8 @@ class QlibRecorder:
|
||||
"""
|
||||
Method for setting tags for a recorder. In addition to using ``R``, one can also set the tag to a specific recorder after getting it with `get_recorder` API.
|
||||
|
||||
- If R's running: it will set tags through the running recorder.
|
||||
- If R's not running: the system will create a default experiment as well as a new recorder, and set the tags under it.
|
||||
- If `active recorder` exists: it will set tags through the active recorder.
|
||||
- If `active recorder` not exists: the system will create a default experiment as well as a new recorder, and set the tags under it.
|
||||
|
||||
Here are some use cases:
|
||||
|
||||
|
||||
@@ -49,13 +49,11 @@ def workflow(config_path, experiment_name="workflow", uri_folder="mlruns"):
|
||||
# config the `sys` section
|
||||
sys_config(config, config_path)
|
||||
|
||||
provider_uri = config.get("provider_uri")
|
||||
region = config.get("region")
|
||||
exp_manager = C["exp_manager"]
|
||||
exp_manager["kwargs"]["uri"] = "file:" + str(Path(os.getcwd()).resolve() / uri_folder)
|
||||
qlib.init(provider_uri=provider_uri, region=region, exp_manager=exp_manager)
|
||||
qlib.init(**config.get("qlib_init"), exp_manager=exp_manager)
|
||||
|
||||
task_train(config, experiment_name=experiment_name)
|
||||
task_train(config.get("task"), experiment_name=experiment_name)
|
||||
|
||||
|
||||
# function to run worklflow by config
|
||||
|
||||
@@ -114,24 +114,24 @@ class Experiment:
|
||||
|
||||
* If `create` is True:
|
||||
|
||||
* If R's running:
|
||||
* If `active recorder` exists:
|
||||
|
||||
* no id or name specified, return the active recorder.
|
||||
* if id or name is specified, return the specified recorder. If no such exp found, create a new recorder with given id or name, and the recorder shoud be running.
|
||||
* if id or name is specified, return the specified recorder. If no such exp found, create a new recorder with given id or name, and the recorder shoud be active.
|
||||
|
||||
* If R's not running:
|
||||
* If `active recorder` not exists:
|
||||
|
||||
* no id or name specified, create a new recorder.
|
||||
* if id or name is specified, return the specified experiment. If no such exp found, create a new recorder with given id or name, and the recorder shoud be running.
|
||||
* if id or name is specified, return the specified experiment. If no such exp found, create a new recorder with given id or name, and the recorder shoud be active.
|
||||
|
||||
* Else If `create` is False:
|
||||
|
||||
* If R's running:
|
||||
* If `active recorder` exists:
|
||||
|
||||
* no id or name specified, return the active recorder.
|
||||
* if id or name is specified, return the specified recorder. If no such exp found, raise Error.
|
||||
|
||||
* If R's not running:
|
||||
* If `active recorder` not exists:
|
||||
|
||||
* no id or name specified, raise Error.
|
||||
* if id or name is specified, return the specified recorder. If no such exp found, raise Error.
|
||||
|
||||
@@ -23,12 +23,12 @@ class ExpManager:
|
||||
def __init__(self, uri, default_exp_name):
|
||||
self.uri = uri
|
||||
self.default_exp_name = default_exp_name
|
||||
self.active_experiment = None # only one experiment can running each time
|
||||
self.active_experiment = None # only one experiment can active each time
|
||||
|
||||
def start_exp(self, experiment_name=None, recorder_name=None, uri=None, **kwargs):
|
||||
"""
|
||||
Start an experiment. This method includes first get_or_create an experiment, and then
|
||||
set it to be running.
|
||||
set it to be active.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
@@ -47,7 +47,7 @@ class ExpManager:
|
||||
|
||||
def end_exp(self, recorder_status: str = Recorder.STATUS_S, **kwargs):
|
||||
"""
|
||||
End an running experiment.
|
||||
End an active experiment.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
@@ -90,7 +90,7 @@ class ExpManager:
|
||||
def get_exp(self, experiment_id=None, experiment_name=None, create: bool = True):
|
||||
"""
|
||||
Retrieve an experiment. This method includes getting an active experiment, and get_or_create a specific experiment.
|
||||
The returned experiment will be running.
|
||||
The returned experiment will be active.
|
||||
|
||||
When user specify experiment id and name, the method will try to return the specific experiment.
|
||||
When user does not provide recorder id or name, the method will try to return the current active experiment.
|
||||
@@ -99,24 +99,24 @@ class ExpManager:
|
||||
|
||||
* If `create` is True:
|
||||
|
||||
* If R's running:
|
||||
* If `active experiment` exists:
|
||||
|
||||
* no id or name specified, return the active experiment.
|
||||
* if id or name is specified, return the specified experiment. If no such exp found, create a new experiment with given id or name, and the experiment is set to be running.
|
||||
* if id or name is specified, return the specified experiment. If no such exp found, create a new experiment with given id or name, and the experiment is set to be active.
|
||||
|
||||
* If R's not running:
|
||||
* If `active experiment` not exists:
|
||||
|
||||
* no id or name specified, create a default experiment.
|
||||
* if id or name is specified, return the specified experiment. If no such exp found, create a new experiment with given id or name, and the experiment is set to be running.
|
||||
* if id or name is specified, return the specified experiment. If no such exp found, create a new experiment with given id or name, and the experiment is set to be active.
|
||||
|
||||
* Else If `create` is False:
|
||||
|
||||
* If R's running:
|
||||
* If `active experiment` exists:
|
||||
|
||||
* no id or name specified, return the active experiment.
|
||||
* if id or name is specified, return the specified experiment. If no such exp found, raise Error.
|
||||
|
||||
* If R's not running:
|
||||
* If `active experiment` not exists:
|
||||
|
||||
* no id or name specified. If the default experiment exists, return it, otherwise, raise Error.
|
||||
* if id or name is specified, return the specified experiment. If no such exp found, raise Error.
|
||||
|
||||
Reference in New Issue
Block a user