From d8da94de108bcc53d751c7796a8090075b861775 Mon Sep 17 00:00:00 2001 From: bxdd Date: Wed, 3 Feb 2021 15:46:54 +0000 Subject: [PATCH] update docs --- docs/advanced/serial.rst | 42 +++++++++++++++++++++++++++++++++++++ docs/reference/api.rst | 10 +++++++++ examples/highfreq/README.md | 4 ++-- 3 files changed, 54 insertions(+), 2 deletions(-) create mode 100644 docs/advanced/serial.rst diff --git a/docs/advanced/serial.rst b/docs/advanced/serial.rst new file mode 100644 index 000000000..a0e6480b9 --- /dev/null +++ b/docs/advanced/serial.rst @@ -0,0 +1,42 @@ +.. _serial: + +================================= +Serialization +================================= +.. currentmodule:: qlib + +Introduction +=================== +``Qlib`` supports dumping the state of ``DataHandler``, ``DataSet``, ``Processor`` and ``Model``, etc. into a disk and reloading them. + +Serializable Class +======================== + +``Qlib`` provides a base class ``qlib.utils.serial.Serializable``, whose state can be dumped in or loaded from disk in `pickle` format. +When users dump the state of the ``Serializable`` instance, the attributes of the instance whose name **does not** start with `_` will be saved on the disk. + +Example +========================== +``Qlib``'s serializable class includes ``DataHandler``, ``DataSet``, ``Processor`` and ``Model``, etc., which are subclass of ``qlib.utils.serial.Serializable``. +Specifically, ``qlib.data.dataset.DatasetH`` is one of them. Users can serialize ``DatasetH`` as follows. + +.. code-block:: Python + + ##=============dump dataset============= + dataset.to_pickle(path="dataset.pkl") # dataset is the instance of qlib.data.dataset.DatasetH + + ##=============reload dataset============= + with open("dataset.pkl", "rb") as file_dataset: + dataset = pickle.load(file_dataset) + +.. note:: + Only state of ``DatasetH`` should be saved on the disk, such as some `mean` and `variance` used for data normalization, etc. + + After reloading the ``DatasetH``, users need to reinitialize it. It means that users can reset some states of ``DatasetH`` or ``QlibDataHandler`` such as `instruments`, `start_time`, `end_time` and `segments`, etc., and generate new data according to the states (data is not state and should not be saved on the disk). + +A more detailed example is in this `link `_. + + +API +=================== +Please refer to `Serializable API <../reference/api.html#module-qlib.utils.serial.Serializable>`_. diff --git a/docs/reference/api.rst b/docs/reference/api.rst index f21a9f518..3167d8a62 100644 --- a/docs/reference/api.rst +++ b/docs/reference/api.rst @@ -152,4 +152,14 @@ Recorder Record Template -------------------- .. automodule:: qlib.workflow.record_temp + :members: + + +Utils +==================== + +Serializable +-------------------- + +.. automodule:: qlib.utils.serial.Serializable :members: \ No newline at end of file diff --git a/examples/highfreq/README.md b/examples/highfreq/README.md index 067db4318..30c2e19db 100644 --- a/examples/highfreq/README.md +++ b/examples/highfreq/README.md @@ -12,11 +12,11 @@ Get high-frequency data by running the following command: ## Dump & Reload & Reinitialize the Dataset -The High-Frequency Dataset is implemented as `qlib.data.dataset.DatasetH` in the `workflow.py`. `DatatsetH` is the subclass of `qlib.utils.serial.Serializable`, which supports being dumped in or loaded from disk in `pickle` format. +The High-Frequency Dataset is implemented as `qlib.data.dataset.DatasetH` in the `workflow.py`. `DatatsetH` is the subclass of [`qlib.utils.serial.Serializable`](https://qlib.readthedocs.io/en/latest/advanced/serial.html), whose state can be dumped in or loaded from disk in `pickle` format. ### About Reinitialization -After reloading `Dataset` from disk, `Qlib` also support reinitializing the dataset. It means that users can reset some config of `Dataset` or `DataHandler` such as `instruments`, `start_time`, `end_time` and `segmens`, etc. +After reloading `Dataset` from disk, `Qlib` also support reinitializing the dataset. It means that users can reset some states of `Dataset` or `DataHandler` such as `instruments`, `start_time`, `end_time` and `segments`, etc., and generate new data according to the states. The example is given in `workflow.py`, users can run the code as follows.