1
0
mirror of https://github.com/microsoft/qlib.git synced 2026-07-03 11:00:57 +08:00

Update docs

This commit is contained in:
Jactus
2020-11-30 18:54:31 +08:00
committed by you-n-g
parent 1877ad8c39
commit 29f12e857f
22 changed files with 180 additions and 159 deletions

View File

@@ -29,7 +29,18 @@ Qlib Format Data
------------------
We've specially designed a data structure to manage financial data, please refer to the `File storage design section in Qlib paper <https://arxiv.org/abs/2009.11189>`_ for detailed information.
Such data will be stored with filename suffix `.bin` (We'll call them `.bin` file, `.bin` format, or qlib format). `.bin` file is designed for scientific computing on finance data
Such data will be stored with filename suffix `.bin` (We'll call them `.bin` file, `.bin` format, or qlib format). `.bin` file is designed for scientific computing on finance data.
``Qlib`` provides two different off-the-shelf dataset, which can be accessed through this `link <https://github.com/microsoft/qlib/blob/main/qlib/contrib/data/handler.py>`_:
======================== ================= ================
Dataset US Market China Market
======================== ================= ================
Alpha360 √ √
Alpha158 √ √
======================== ================= ================
Qlib Format Dataset
--------------------
@@ -45,7 +56,7 @@ In addition to China-Stock data, ``Qlib`` also includes a US-Stock dataset, whic
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/us_data --region us
After running the above command, users can find china-stock and us-stock data in Qlib format in the ``~/.qlib/csv_data/cn_data`` directory and ``~/.qlib/csv_data/us_data`` directory respectively.
After running the above command, users can find china-stock and us-stock data in ``Qlib`` format in the ``~/.qlib/csv_data/cn_data`` directory and ``~/.qlib/csv_data/us_data`` directory respectively.
``Qlib`` also provides the scripts in ``scripts/data_collector`` to help users crawl the latest data on the Internet and convert it to qlib format.
@@ -54,8 +65,7 @@ When ``Qlib`` is initialized with this dataset, users could build and evaluate t
Converting CSV Format into Qlib Format
-------------------------------------------
``Qlib`` has provided the script ``scripts/dump_bin.py`` to convert data in CSV format into `.bin` files (Qlib format).
``Qlib`` has provided the script ``scripts/dump_bin.py`` to convert **any** data in CSV format into `.bin` files (``Qlib`` format) as long as they are in the correct format.
Users can download the demo china-stock data in CSV format as follows for reference to the CSV format.
@@ -130,9 +140,21 @@ After conversion, users can find their Qlib format data in the directory `~/.qli
In the convention of `Qlib` data processing, `open, close, high, low, volume, money and factor` will be set to NaN if the stock is suspended.
China-Stock Mode & US-Stock Mode
Multiple Stock Modes
--------------------------------
``Qlib`` now provides two different stock modes for users: China-Stock Mode & US-Stock Mode. Here are some different settings of these two modes:
============== ================= ================
Region Trade Unit Limit Threshold
============== ================= ================
China 100 0.099
US 1 None
============== ================= ================
The `trade unit` defines the unit number of stocks can be used in a trade, and the `limit threshold` defines the bound set to the percentage of ups and downs of a stock.
- If users use ``Qlib`` in china-stock mode, china-stock data is required. Users can use ``Qlib`` in china-stock mode according to the following steps:
- Download china-stock in qlib format, please refer to section `Qlib Format Dataset <#qlib-format-dataset>`_.
- Initialize ``Qlib`` in china-stock mode
@@ -208,13 +230,19 @@ QlibDataLoader
The ``QlibDataLoader`` class in ``Qlib`` is such an interface that allows users to load raw data from the ``Qlib`` data source.
StaticDataLoader
---------------
The ``StaticDataLoader`` class in ``Qlib`` is such an interface that allows users to load raw data from file or as provided.
Interface
------------
Here are some interfaces of the ``QlibDataLoader`` class:
.. autoclass:: qlib.data.dataset.loader.QlibDataLoader
:members: load, load_group_df
.. autoclass:: qlib.data.dataset.loader.DataLoader
:members:
API
-----------

View File

@@ -18,45 +18,10 @@ Base Class & Interface
The base class provides the following interfaces:
- `__init__(**kwargs)`
- Initialization.
- `fit(self, dataset, **kwargs)`
- Train model.
- Parameter:
- `dataset`, ``Qlib``'s ``DatasetH`` type. For more information about ``DatasetH``, users can refer to the related document: `Qlib Dataset <../component/data.html#dataset>`_.
The `dataset` is passed into the `model`'s method because there are some unique data preprocessing procedures for each, we want to give each model maximum flexibility to handle the data that is suitable for their own.
The following code example shows how to retrieve `x_train`, `y_train` and `w_train` from the `dataset`:
.. code-block:: Python
# get features and labels
df_train, df_valid = dataset.prepare(
["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L
)
x_train, y_train = df_train["feature"], df_train["label"]
x_valid, y_valid = df_valid["feature"], df_valid["label"]
# get weights
try:
wdf_train, wdf_valid = dataset.prepare(["train", "valid"], col_set=["weight"], data_key=DataHandlerLP.DK_L)
w_train, w_valid = wdf_train["weight"], wdf_valid["weight"]
except KeyError as e:
w_train = pd.DataFrame(np.ones_like(y_train.values), index=y_train.index)
w_valid = pd.DataFrame(np.ones_like(y_valid.values), index=y_valid.index)
- `predict(self, dataset, **kwargs)`
- Predict test data.
- Parameter:
- `dataset`, ``Qlib``'s ``DatasetH`` type. The usage is similar to the example above.
- Returns:
- Predic results with type: `pandas.Series`.
- `finetune(self, dataset, **kwargs)`
- Finetune the model.
- Parameter:
- `dataset`, ``Qlib``'s ``DatasetH`` type. The usage is similar to the example above.
.. autoclass:: qlib.model.base.Model
:members:
``Qlib`` also provides a base class `qlib.model.base.ModelFT <../reference/api.html#qlib.model.base.ModelFT>`_, which includes the method for finetuning the model.
For other interfaces such as `finetune`, please refer to `Model API <../reference/api.html#module-qlib.model.base>`_.

View File

@@ -72,6 +72,8 @@ The ``Experiment`` class is solely responsible for a single experiment, and it w
For other interfaces such as `search_records`, `delete_recorder`, please refer to `Experiment API <../reference/api.html#experiment>`_.
``Qlib`` also provides a default ``Experiment``, which will be created and used under certain situations when users use the APIs such as `log_metrics` or `get_exp`. If the default ``Experiment`` is used, there will be related logged information when running ``Qlib``. Users are able to change the name of the default ``Experiment`` in the config file of ``Qlib`` or during ``Qlib``'s `initialization <../start/initialization.html#parameters>`_, which is set to be '`Experiment`'.
Recorder
===================

View File

@@ -11,8 +11,8 @@ Introduction
The components in `Qlib Framework <../introduction/introduction.html#framework>`_ are designed in a loosely-coupled way. Users could build their own Quant research workflow with these components like `Example <https://github.com/microsoft/qlib/blob/main/examples/workflow_by_code.py>`_.
Besides, ``Qlib`` provides more user-friendly interfaces named ``qrun`` to automatically run the whole workflow defined by configuration. A concrete execution of the whole workflow is called an `experiment`.
With ``qrun``, user can easily run an `experiment`, which includes the following steps:
Besides, ``Qlib`` provides more user-friendly interfaces named ``qrun`` to automatically run the whole workflow defined by configuration. Running the whole workflow is called an `execution`.
With ``qrun``, user can easily start an `execution`, which includes the following steps:
- Data
- Loading
@@ -25,7 +25,7 @@ With ``qrun``, user can easily run an `experiment`, which includes the following
- Forecast signal analysis
- Backtest
For each `experiment`, ``Qlib`` has a complete system to tracking all the information as well as artifacts generated during training, inference and evaluation phase. For more information about how Qlib handles `experiment`, please refer to the related document: `Recorder: Experiment Management <../component/recorder.html>`_.
For each `execution`, ``Qlib`` has a complete system to tracking all the information as well as artifacts generated during training, inference and evaluation phase. For more information about how ``Qlib`` handles this, please refer to the related document: `Recorder: Experiment Management <../component/recorder.html>`_.
Complete Example
===================
@@ -35,8 +35,9 @@ Below is a typical config file of ``qrun``.
.. code-block:: YAML
provider_uri: "~/.qlib/qlib_data/cn_data"
region: cn
qlib_init:
provider_uri: "~/.qlib/qlib_data/cn_data"
region: cn
market: &market csi300
benchmark: &benchmark SH000300
data_handler_config: &data_handler_config
@@ -100,12 +101,16 @@ After saving the config into `configuration.yaml`, users could start the workflo
.. code-block:: bash
qrun -c configuration.yaml
qrun configuration.yaml
.. note::
`qrun` will be placed in your $PATH directory when installing ``Qlib``.
.. note::
The symbol `&` in `yaml` file stands for an anchor of a field, which is useful when another fields include this parameter as part of the value. Taking the configuration file above as an example, users can directly change the value of `market` and `benchmark` without traversing the entire configuration file.
Configuration File
===================
@@ -114,17 +119,15 @@ Let's get into details of ``qrun`` in this section.
Before using ``qrun``, users need to prepare a configuration file. The following content shows how to prepare each part of the configuration file.
Qlib Data Section
Qlib Init Section
--------------------
At first, the configuration file needs to contain several basic parameters about the data, which will be used for qlib initialization, data handling and backtest.
At first, the configuration file needs to contain several basic parameters which will be used for qlib initialization.
.. code-block:: YAML
provider_uri: "~/.qlib/qlib_data/cn_data"
region: cn
market: &market csi300
benchmark: &benchmark SH000300
The meaning of each field is as follows:
@@ -139,34 +142,14 @@ The meaning of each field is as follows:
The value of `region` should be aligned with the data stored in `provider_uri`.
- `market`
Type: str. Index name, the default value is `csi500`.
- `benchmark`
Type: str, list or pandas.Series. Stock index symbol, the default value is `SH000905`.
Task Section
--------------------
.. note::
* If `benchmark` is str, it will use the daily change as the 'bench'.
* If `benchmark` is list, it will use the daily average change of the stock pool in the list as the 'bench'.
* If `benchmark` is pandas.Series, whose `index` is trading date and the value T is the change from T-1 to T, it will be directly used as the 'bench'. An example is as following:
.. code-block:: python
print(D.features(D.instruments('csi500'), ['$close/Ref($close, 1)-1'])['$close/Ref($close, 1)-1'].head())
2017-01-04 0.011693
2017-01-05 0.000721
2017-01-06 -0.004322
2017-01-09 0.006874
2017-01-10 -0.003350
.. note::
The symbol `&` in `yaml` file stands for an anchor of a field, which is useful when another fields include this parameter as part of the value. Taking the configuration file above as an example, users can directly change the value of `market` and `benchmark` without traversing the entire configuration file.
The `task` field in the configuration corresponds to a `task`, which contains the parameters of three different subsections: `Model`, `Dataset` and `Record`.
Model Section
--------------------
~~~~~~~~~~~~~~~~~~~~
In the `task` field, the `model` section describes the parameters of the model to be used for training and inference. For more information about the base ``Model`` class, please refer to `Qlib Model <../component/model.html>`_.
@@ -202,7 +185,7 @@ The meaning of each field is as follows:
``Qlib`` provides a util named: ``init_instance_by_config`` to initialize any class inside ``Qlib`` with the configuration includes the fields: `class`, `module_path` and `kwargs`.
Dataset Section
--------------------
~~~~~~~~~~~~~~~~~~~~
The `dataset` field describes the parameters for the ``Dataset`` module in ``Qlib`` as well those for the module ``DataHandler``. For more information about the ``Dataset`` module, please refer to `Qlib Model <../component/data.html#dataset>`_.
@@ -237,9 +220,9 @@ Here is the configuration for the ``Dataset`` module which will take care of dat
test: [2017-01-01, 2020-08-01]
Record Section
--------------------
~~~~~~~~~~~~~~~~~~~~
The `record` field is about the parameters the ``Record`` module in ``Qlib``. ``Record`` is responsible for generating certain analysis and evaluation results such as `prediction`, `information Coefficient (IC)` and `backtest`.
The `record` field is about the parameters the ``Record`` module in ``Qlib``. ``Record`` is responsible for tracking training process and results such as `information Coefficient (IC)` and `backtest` in a standard format.
The following script is the configuration of `backtest` and the `strategy` used in `backtest`: