|
|
|
|
@@ -24,8 +24,8 @@ The introduction of ``Data Layer`` includes the following parts.
|
|
|
|
|
Here is a typical example of Qlib data workflow
|
|
|
|
|
|
|
|
|
|
- Users download data and converting data into Qlib format(with filename suffix `.bin`). In this step, typically only some basic data are stored on disk(such as OHLCV).
|
|
|
|
|
- Creating some basic features based on Qlib's expression Engine(e.g. "Ref($close, 60) / $close", the return of last 60 trading days). Supported operators in the expression engine can be found `here <https://github.com/microsoft/qlib/blob/main/qlib/data/ops.py>`_. This step is typically implemented in Qlib's `Data Loader <https://qlib.readthedocs.io/en/latest/component/data.html#data-loader>`_ which is a component of `Data Handler <https://qlib.readthedocs.io/en/latest/component/data.html#data-handler>`_ .
|
|
|
|
|
- If users require more complicated data processing (e.g. data normalization), `Data Handler <https://qlib.readthedocs.io/en/latest/component/data.html#data-handler>`_ support user-customized processors to process data(some predefined processors can be found `here <https://github.com/microsoft/qlib/blob/main/qlib/data/dataset/processor.py>`_). The processors are different from operators in expression engine. It is designed for some complicated data processing methods which is hard to supported in operators in expression engine.
|
|
|
|
|
- Creating some basic features based on Qlib's expression Engine(e.g. "Ref($close, 60) / $close", the return of last 60 trading days). Supported operators in the expression engine can be found `here <https://github.com/microsoft/qlib/blob/main/qlib/data/ops.py>`__. This step is typically implemented in Qlib's `Data Loader <https://qlib.readthedocs.io/en/latest/component/data.html#data-loader>`_ which is a component of `Data Handler <https://qlib.readthedocs.io/en/latest/component/data.html#data-handler>`_ .
|
|
|
|
|
- If users require more complicated data processing (e.g. data normalization), `Data Handler <https://qlib.readthedocs.io/en/latest/component/data.html#data-handler>`_ support user-customized processors to process data(some predefined processors can be found `here <https://github.com/microsoft/qlib/blob/main/qlib/data/dataset/processor.py>`__). The processors are different from operators in expression engine. It is designed for some complicated data processing methods which is hard to supported in operators in expression engine.
|
|
|
|
|
- At last, `Dataset <https://qlib.readthedocs.io/en/latest/component/data.html#dataset>`_ is responsible to prepare model-specific dataset from the processed data of Data Handler
|
|
|
|
|
|
|
|
|
|
Data Preparation
|
|
|
|
|
@@ -37,7 +37,7 @@ Qlib Format Data
|
|
|
|
|
We've specially designed a data structure to manage financial data, please refer to the `File storage design section in Qlib paper <https://arxiv.org/abs/2009.11189>`_ for detailed information.
|
|
|
|
|
Such data will be stored with filename suffix `.bin` (We'll call them `.bin` file, `.bin` format, or qlib format). `.bin` file is designed for scientific computing on finance data.
|
|
|
|
|
|
|
|
|
|
``Qlib`` provides two different off-the-shelf datasets, which can be accessed through this `link <https://github.com/microsoft/qlib/blob/main/qlib/contrib/data/handler.py>`_:
|
|
|
|
|
``Qlib`` provides two different off-the-shelf datasets, which can be accessed through this `link <https://github.com/microsoft/qlib/blob/main/qlib/contrib/data/handler.py>`__:
|
|
|
|
|
|
|
|
|
|
======================== ================= ================
|
|
|
|
|
Dataset US Market China Market
|
|
|
|
|
@@ -47,7 +47,7 @@ Alpha360 √ √
|
|
|
|
|
Alpha158 √ √
|
|
|
|
|
======================== ================= ================
|
|
|
|
|
|
|
|
|
|
Also, ``Qlib`` provides a high-frequency dataset. Users can run a high-frequency dataset example through this `link <https://github.com/microsoft/qlib/tree/main/examples/highfreq>`_.
|
|
|
|
|
Also, ``Qlib`` provides a high-frequency dataset. Users can run a high-frequency dataset example through this `link <https://github.com/microsoft/qlib/tree/main/examples/highfreq>`__.
|
|
|
|
|
|
|
|
|
|
Qlib Format Dataset
|
|
|
|
|
-------------------
|
|
|
|
|
@@ -512,7 +512,7 @@ Data and Cache File Structure
|
|
|
|
|
|
|
|
|
|
We've specially designed a file structure to manage data and cache, please refer to the `File storage design section in Qlib paper <https://arxiv.org/abs/2009.11189>`_ for detailed information. The file structure of data and cache is listed as follows.
|
|
|
|
|
|
|
|
|
|
.. code-block:: json
|
|
|
|
|
.. code-block::
|
|
|
|
|
|
|
|
|
|
- data/
|
|
|
|
|
[raw data] updated by data providers
|
|
|
|
|
|