mirror of
https://github.com/microsoft/qlib.git
synced 2026-07-03 11:00:57 +08:00
Update docs
This commit is contained in:
@@ -29,7 +29,18 @@ Qlib Format Data
|
||||
------------------
|
||||
|
||||
We've specially designed a data structure to manage financial data, please refer to the `File storage design section in Qlib paper <https://arxiv.org/abs/2009.11189>`_ for detailed information.
|
||||
Such data will be stored with filename suffix `.bin` (We'll call them `.bin` file, `.bin` format, or qlib format). `.bin` file is designed for scientific computing on finance data
|
||||
Such data will be stored with filename suffix `.bin` (We'll call them `.bin` file, `.bin` format, or qlib format). `.bin` file is designed for scientific computing on finance data.
|
||||
|
||||
``Qlib`` provides two different off-the-shelf dataset, which can be accessed through this `link <https://github.com/microsoft/qlib/blob/main/qlib/contrib/data/handler.py>`_:
|
||||
|
||||
======================== ================= ================
|
||||
Dataset US Market China Market
|
||||
======================== ================= ================
|
||||
Alpha360 √ √
|
||||
|
||||
Alpha158 √ √
|
||||
======================== ================= ================
|
||||
|
||||
|
||||
Qlib Format Dataset
|
||||
--------------------
|
||||
@@ -45,7 +56,7 @@ In addition to China-Stock data, ``Qlib`` also includes a US-Stock dataset, whic
|
||||
|
||||
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/us_data --region us
|
||||
|
||||
After running the above command, users can find china-stock and us-stock data in Qlib format in the ``~/.qlib/csv_data/cn_data`` directory and ``~/.qlib/csv_data/us_data`` directory respectively.
|
||||
After running the above command, users can find china-stock and us-stock data in ``Qlib`` format in the ``~/.qlib/csv_data/cn_data`` directory and ``~/.qlib/csv_data/us_data`` directory respectively.
|
||||
|
||||
``Qlib`` also provides the scripts in ``scripts/data_collector`` to help users crawl the latest data on the Internet and convert it to qlib format.
|
||||
|
||||
@@ -54,8 +65,7 @@ When ``Qlib`` is initialized with this dataset, users could build and evaluate t
|
||||
Converting CSV Format into Qlib Format
|
||||
-------------------------------------------
|
||||
|
||||
``Qlib`` has provided the script ``scripts/dump_bin.py`` to convert data in CSV format into `.bin` files (Qlib format).
|
||||
|
||||
``Qlib`` has provided the script ``scripts/dump_bin.py`` to convert **any** data in CSV format into `.bin` files (``Qlib`` format) as long as they are in the correct format.
|
||||
|
||||
Users can download the demo china-stock data in CSV format as follows for reference to the CSV format.
|
||||
|
||||
@@ -130,9 +140,21 @@ After conversion, users can find their Qlib format data in the directory `~/.qli
|
||||
|
||||
In the convention of `Qlib` data processing, `open, close, high, low, volume, money and factor` will be set to NaN if the stock is suspended.
|
||||
|
||||
China-Stock Mode & US-Stock Mode
|
||||
Multiple Stock Modes
|
||||
--------------------------------
|
||||
|
||||
``Qlib`` now provides two different stock modes for users: China-Stock Mode & US-Stock Mode. Here are some different settings of these two modes:
|
||||
|
||||
============== ================= ================
|
||||
Region Trade Unit Limit Threshold
|
||||
============== ================= ================
|
||||
China 100 0.099
|
||||
|
||||
US 1 None
|
||||
============== ================= ================
|
||||
|
||||
The `trade unit` defines the unit number of stocks can be used in a trade, and the `limit threshold` defines the bound set to the percentage of ups and downs of a stock.
|
||||
|
||||
- If users use ``Qlib`` in china-stock mode, china-stock data is required. Users can use ``Qlib`` in china-stock mode according to the following steps:
|
||||
- Download china-stock in qlib format, please refer to section `Qlib Format Dataset <#qlib-format-dataset>`_.
|
||||
- Initialize ``Qlib`` in china-stock mode
|
||||
@@ -208,13 +230,19 @@ QlibDataLoader
|
||||
|
||||
The ``QlibDataLoader`` class in ``Qlib`` is such an interface that allows users to load raw data from the ``Qlib`` data source.
|
||||
|
||||
StaticDataLoader
|
||||
---------------
|
||||
|
||||
The ``StaticDataLoader`` class in ``Qlib`` is such an interface that allows users to load raw data from file or as provided.
|
||||
|
||||
|
||||
Interface
|
||||
------------
|
||||
|
||||
Here are some interfaces of the ``QlibDataLoader`` class:
|
||||
|
||||
.. autoclass:: qlib.data.dataset.loader.QlibDataLoader
|
||||
:members: load, load_group_df
|
||||
.. autoclass:: qlib.data.dataset.loader.DataLoader
|
||||
:members:
|
||||
|
||||
API
|
||||
-----------
|
||||
|
||||
@@ -18,45 +18,10 @@ Base Class & Interface
|
||||
|
||||
The base class provides the following interfaces:
|
||||
|
||||
- `__init__(**kwargs)`
|
||||
- Initialization.
|
||||
|
||||
- `fit(self, dataset, **kwargs)`
|
||||
- Train model.
|
||||
- Parameter:
|
||||
- `dataset`, ``Qlib``'s ``DatasetH`` type. For more information about ``DatasetH``, users can refer to the related document: `Qlib Dataset <../component/data.html#dataset>`_.
|
||||
The `dataset` is passed into the `model`'s method because there are some unique data preprocessing procedures for each, we want to give each model maximum flexibility to handle the data that is suitable for their own.
|
||||
The following code example shows how to retrieve `x_train`, `y_train` and `w_train` from the `dataset`:
|
||||
|
||||
.. code-block:: Python
|
||||
|
||||
# get features and labels
|
||||
df_train, df_valid = dataset.prepare(
|
||||
["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L
|
||||
)
|
||||
x_train, y_train = df_train["feature"], df_train["label"]
|
||||
x_valid, y_valid = df_valid["feature"], df_valid["label"]
|
||||
|
||||
# get weights
|
||||
try:
|
||||
wdf_train, wdf_valid = dataset.prepare(["train", "valid"], col_set=["weight"], data_key=DataHandlerLP.DK_L)
|
||||
w_train, w_valid = wdf_train["weight"], wdf_valid["weight"]
|
||||
except KeyError as e:
|
||||
w_train = pd.DataFrame(np.ones_like(y_train.values), index=y_train.index)
|
||||
w_valid = pd.DataFrame(np.ones_like(y_valid.values), index=y_valid.index)
|
||||
|
||||
- `predict(self, dataset, **kwargs)`
|
||||
- Predict test data.
|
||||
- Parameter:
|
||||
- `dataset`, ``Qlib``'s ``DatasetH`` type. The usage is similar to the example above.
|
||||
- Returns:
|
||||
- Predic results with type: `pandas.Series`.
|
||||
|
||||
- `finetune(self, dataset, **kwargs)`
|
||||
- Finetune the model.
|
||||
- Parameter:
|
||||
- `dataset`, ``Qlib``'s ``DatasetH`` type. The usage is similar to the example above.
|
||||
.. autoclass:: qlib.model.base.Model
|
||||
:members:
|
||||
|
||||
``Qlib`` also provides a base class `qlib.model.base.ModelFT <../reference/api.html#qlib.model.base.ModelFT>`_, which includes the method for finetuning the model.
|
||||
|
||||
For other interfaces such as `finetune`, please refer to `Model API <../reference/api.html#module-qlib.model.base>`_.
|
||||
|
||||
|
||||
@@ -72,6 +72,8 @@ The ``Experiment`` class is solely responsible for a single experiment, and it w
|
||||
|
||||
For other interfaces such as `search_records`, `delete_recorder`, please refer to `Experiment API <../reference/api.html#experiment>`_.
|
||||
|
||||
``Qlib`` also provides a default ``Experiment``, which will be created and used under certain situations when users use the APIs such as `log_metrics` or `get_exp`. If the default ``Experiment`` is used, there will be related logged information when running ``Qlib``. Users are able to change the name of the default ``Experiment`` in the config file of ``Qlib`` or during ``Qlib``'s `initialization <../start/initialization.html#parameters>`_, which is set to be '`Experiment`'.
|
||||
|
||||
Recorder
|
||||
===================
|
||||
|
||||
|
||||
@@ -11,8 +11,8 @@ Introduction
|
||||
The components in `Qlib Framework <../introduction/introduction.html#framework>`_ are designed in a loosely-coupled way. Users could build their own Quant research workflow with these components like `Example <https://github.com/microsoft/qlib/blob/main/examples/workflow_by_code.py>`_.
|
||||
|
||||
|
||||
Besides, ``Qlib`` provides more user-friendly interfaces named ``qrun`` to automatically run the whole workflow defined by configuration. A concrete execution of the whole workflow is called an `experiment`.
|
||||
With ``qrun``, user can easily run an `experiment`, which includes the following steps:
|
||||
Besides, ``Qlib`` provides more user-friendly interfaces named ``qrun`` to automatically run the whole workflow defined by configuration. Running the whole workflow is called an `execution`.
|
||||
With ``qrun``, user can easily start an `execution`, which includes the following steps:
|
||||
|
||||
- Data
|
||||
- Loading
|
||||
@@ -25,7 +25,7 @@ With ``qrun``, user can easily run an `experiment`, which includes the following
|
||||
- Forecast signal analysis
|
||||
- Backtest
|
||||
|
||||
For each `experiment`, ``Qlib`` has a complete system to tracking all the information as well as artifacts generated during training, inference and evaluation phase. For more information about how Qlib handles `experiment`, please refer to the related document: `Recorder: Experiment Management <../component/recorder.html>`_.
|
||||
For each `execution`, ``Qlib`` has a complete system to tracking all the information as well as artifacts generated during training, inference and evaluation phase. For more information about how ``Qlib`` handles this, please refer to the related document: `Recorder: Experiment Management <../component/recorder.html>`_.
|
||||
|
||||
Complete Example
|
||||
===================
|
||||
@@ -35,8 +35,9 @@ Below is a typical config file of ``qrun``.
|
||||
|
||||
.. code-block:: YAML
|
||||
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
qlib_init:
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
data_handler_config: &data_handler_config
|
||||
@@ -100,12 +101,16 @@ After saving the config into `configuration.yaml`, users could start the workflo
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
qrun -c configuration.yaml
|
||||
qrun configuration.yaml
|
||||
|
||||
.. note::
|
||||
|
||||
`qrun` will be placed in your $PATH directory when installing ``Qlib``.
|
||||
|
||||
.. note::
|
||||
|
||||
The symbol `&` in `yaml` file stands for an anchor of a field, which is useful when another fields include this parameter as part of the value. Taking the configuration file above as an example, users can directly change the value of `market` and `benchmark` without traversing the entire configuration file.
|
||||
|
||||
|
||||
Configuration File
|
||||
===================
|
||||
@@ -114,17 +119,15 @@ Let's get into details of ``qrun`` in this section.
|
||||
|
||||
Before using ``qrun``, users need to prepare a configuration file. The following content shows how to prepare each part of the configuration file.
|
||||
|
||||
Qlib Data Section
|
||||
Qlib Init Section
|
||||
--------------------
|
||||
|
||||
At first, the configuration file needs to contain several basic parameters about the data, which will be used for qlib initialization, data handling and backtest.
|
||||
At first, the configuration file needs to contain several basic parameters which will be used for qlib initialization.
|
||||
|
||||
.. code-block:: YAML
|
||||
|
||||
provider_uri: "~/.qlib/qlib_data/cn_data"
|
||||
region: cn
|
||||
market: &market csi300
|
||||
benchmark: &benchmark SH000300
|
||||
|
||||
The meaning of each field is as follows:
|
||||
|
||||
@@ -139,34 +142,14 @@ The meaning of each field is as follows:
|
||||
|
||||
The value of `region` should be aligned with the data stored in `provider_uri`.
|
||||
|
||||
- `market`
|
||||
Type: str. Index name, the default value is `csi500`.
|
||||
|
||||
- `benchmark`
|
||||
Type: str, list or pandas.Series. Stock index symbol, the default value is `SH000905`.
|
||||
Task Section
|
||||
--------------------
|
||||
|
||||
.. note::
|
||||
|
||||
* If `benchmark` is str, it will use the daily change as the 'bench'.
|
||||
|
||||
* If `benchmark` is list, it will use the daily average change of the stock pool in the list as the 'bench'.
|
||||
|
||||
* If `benchmark` is pandas.Series, whose `index` is trading date and the value T is the change from T-1 to T, it will be directly used as the 'bench'. An example is as following:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
print(D.features(D.instruments('csi500'), ['$close/Ref($close, 1)-1'])['$close/Ref($close, 1)-1'].head())
|
||||
2017-01-04 0.011693
|
||||
2017-01-05 0.000721
|
||||
2017-01-06 -0.004322
|
||||
2017-01-09 0.006874
|
||||
2017-01-10 -0.003350
|
||||
.. note::
|
||||
|
||||
The symbol `&` in `yaml` file stands for an anchor of a field, which is useful when another fields include this parameter as part of the value. Taking the configuration file above as an example, users can directly change the value of `market` and `benchmark` without traversing the entire configuration file.
|
||||
The `task` field in the configuration corresponds to a `task`, which contains the parameters of three different subsections: `Model`, `Dataset` and `Record`.
|
||||
|
||||
Model Section
|
||||
--------------------
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In the `task` field, the `model` section describes the parameters of the model to be used for training and inference. For more information about the base ``Model`` class, please refer to `Qlib Model <../component/model.html>`_.
|
||||
|
||||
@@ -202,7 +185,7 @@ The meaning of each field is as follows:
|
||||
``Qlib`` provides a util named: ``init_instance_by_config`` to initialize any class inside ``Qlib`` with the configuration includes the fields: `class`, `module_path` and `kwargs`.
|
||||
|
||||
Dataset Section
|
||||
--------------------
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The `dataset` field describes the parameters for the ``Dataset`` module in ``Qlib`` as well those for the module ``DataHandler``. For more information about the ``Dataset`` module, please refer to `Qlib Model <../component/data.html#dataset>`_.
|
||||
|
||||
@@ -237,9 +220,9 @@ Here is the configuration for the ``Dataset`` module which will take care of dat
|
||||
test: [2017-01-01, 2020-08-01]
|
||||
|
||||
Record Section
|
||||
--------------------
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The `record` field is about the parameters the ``Record`` module in ``Qlib``. ``Record`` is responsible for generating certain analysis and evaluation results such as `prediction`, `information Coefficient (IC)` and `backtest`.
|
||||
The `record` field is about the parameters the ``Record`` module in ``Qlib``. ``Record`` is responsible for tracking training process and results such as `information Coefficient (IC)` and `backtest` in a standard format.
|
||||
|
||||
The following script is the configuration of `backtest` and the `strategy` used in `backtest`:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user