1
0
mirror of https://github.com/microsoft/qlib.git synced 2026-06-06 05:51:17 +08:00

init commit

This commit is contained in:
Young
2020-09-22 01:43:21 +00:00
parent aa51e5aad3
commit 99ebd87cba
131 changed files with 20218 additions and 0 deletions

137
docs/start/getdata.rst Normal file
View File

@@ -0,0 +1,137 @@
.. _getdata:
=============================
Data Retrieval
=============================
.. currentmodule:: qlib
Introduction
====================
Users can get stock data by ``Qlib``. Following examples will demonstrate the basic user interface.
Examples
====================
``QLib`` Initialization:
.. note:: In order to get the data, users need to initialize ``Qlib`` with `qlib.init` first. Please refer to `initialization <initialization.rst>`_.
It is recommended to use the following code to initialize qlib:
.. code-block:: python
>>> import qlib
>>> qlib.init(provider_uri='~/.qlib/qlib_data/cn_data')
Load trading calendar with the given time range and frequency:
.. code-block:: python
>>> from qlib.data import D
>>> D.calendar(start_time='2010-01-01', end_time='2017-12-31', freq='day')[:2]
[Timestamp('2010-01-04 00:00:00'), Timestamp('2010-01-05 00:00:00')]
Parse a given market name into a stockpool config:
.. code-block:: python
>>> from qlib.data import D
>>> D.instruments(market='all')
{'market': 'all', 'filter_pipe': []}
Load instruments of certain stockpool in the given time range:
.. code-block:: python
>>> from qlib.data import D
>>> instruments = D.instruments(market='csi300')
>>> D.list_instruments(instruments=instruments, start_time='2010-01-01', end_time='2017-12-31', as_list=True)[:6]
Load dynamic instruments from a base market according to a name filter
.. code-block:: python
>>> from qlib.data import D
>>> from qlib.data.filter import NameDFilter
>>> nameDFilter = NameDFilter(name_rule_re='SH[0-9]{4}55')
>>> instruments = D.instruments(market='csi300', filter_pipe=[nameDFilter])
>>> D.list_instruments(instruments=instruments, start_time='2015-01-01', end_time='2016-02-15', as_list=True)
Load dynamic instruments from a base market according to an expression filter
.. code-block:: python
>>> from qlib.data import D
>>> from qlib.data.filter import ExpressionDFilter
>>> expressionDFilter = ExpressionDFilter(rule_expression='$close>100')
>>> instruments = D.instruments(market='csi300', filter_pipe=[expressionDFilter])
>>> D.list_instruments(instruments=instruments, start_time='2015-01-01', end_time='2016-02-15', as_list=True)
To know more about how to use the filter or how to build one's own filter, go to API Reference: `filter API <../reference/api.html#filter>`_
Load features of certain instruments in given time range:
.. note:: This is not a recommended way to get features.
.. code-block:: python
>>> from qlib.data import D
>>> instruments = ['SH600000']
>>> fields = ['$close', '$volume', 'Ref($close, 1)', 'Mean($close, 3)', '$high-$low']
>>> D.features(instruments, fields, start_time='2010-01-01', end_time='2017-12-31', freq='day').head()
$close $volume Ref($close,1) Mean($close,3) \
instrument datetime
SH600000 2010-01-04 81.809998 17144536.0 NaN 81.809998
2010-01-05 82.419998 29827816.0 81.809998 82.114998
2010-01-06 80.800003 25070040.0 82.419998 81.676666
2010-01-07 78.989998 22077858.0 80.800003 80.736666
2010-01-08 79.879997 17019168.0 78.989998 79.889999
Sub($high,$low)
instrument datetime
SH600000 2010-01-04 2.741158
2010-01-05 3.049736
2010-01-06 1.621399
2010-01-07 2.856926
2010-01-08 1.930397
2010-01-08 1.930397
Load features of certain stockpool in given time range:
.. note:: Since the server need to cache all-time data for your request stockpool and fields, it may take longer to process your request than before. But in the second time, your request will be processed and responded in a flash even if you change the timespan.
.. code-block:: python
>>> from qlib.data import D
>>> from qlib.data.filter import NameDFilter, ExpressionDFilter
>>> nameDFilter = NameDFilter(name_rule_re='SH[0-9]{4}55')
>>> expressionDFilter = ExpressionDFilter(rule_expression='($close/$factor)>100')
>>> instruments = D.instruments(market='csi300', filter_pipe=[nameDFilter, expressionDFilter])
>>> fields = ['$close', '$volume', 'Ref($close, 1)', 'Mean($close, 3)', '$high-$low']
>>> D.features(instruments, fields, start_time='2010-01-01', end_time='2017-12-31', freq='day').head()
$close $volume Ref($close, 1) \
instrument datetime
SH600655 2015-06-15 4342.160156 258706.359375 4530.459961
2015-06-16 4409.270020 257349.718750 4342.160156
2015-06-17 4312.330078 235214.890625 4409.270020
2015-06-18 4086.729980 196772.859375 4312.330078
2015-06-19 3678.250000 182916.453125 4086.729980
Mean($close, 3) high low
instrument datetime
SH600655 2015-06-15 4480.743327 285.251465
2015-06-16 4427.296712 298.301270
2015-06-16 4354.586751 356.098145
2015-06-16 4269.443359 363.554932
2015-06-16 4025.770020 368.954346
.. note:: When calling D.features() at client, use parameter 'disk_cache=0' to skip dataset cache, use 'disk_cache=1' to generate and use dataset cache. In addition, when calling at server, you can use 'disk_cache=2' to update the dataset cache.
API
====================
To know more about how to use the Data, go to API Reference: `Data API <../reference/api.html#Data>`_

View File

@@ -0,0 +1,60 @@
.. _initialization:
====================
Qlib Initialization
====================
.. currentmodule:: qlib
Initialization
=========================
Please execute the following process to initialize ``Qlib``.
- Download and prepare the Data: execute the following command to download the stock data.
.. code-block:: bash
python scripts/get_data.py qlib_data_cn --target_dir ~/.qlib/qlib_data/cn_data
Know more about how to use ``get_data.py``, refer to `Raw Data <../advanced/data.html#raw-data>`_.
- Run the initialization code: run the following code in python:
.. code-block:: Python
import qlib
# region in [REG_CN, REG_US]
from qlib.config import REG_CN
provider_uri = "~/.qlib/qlib_data/cn_data" # target_dir
qlib.init(provider_uri=provider_uri, region=REG_CN)
Parameters
-------------------
In fact, in addition to `provider_uri` and `region`, `qlib.init` has other parameters. The following are all the parameters of `qlib.init`:
- `provider_uri`
Type: str. The local directory where the data loaded by ``get_data.py`` is stored.
- `region`
Type: str, optional parameter(default: ``qlib.config.REG_CN``).
Currently: ``qlib.config.REG_US``('us') and ``qlib.config.REG_CN``('cn') is supported. Different value of ``region`` will
result in different stock market mode.
- ``qlib.config.REG_US``: US stock market.
- ``qlib.config.REG_CN``: China stock market.
- `redis_host`
Type: str, optional parameter(default: "127.0.0.1"), host of `redis`
The lock and cache mechanism relies on redis.
- `redis_port`
Type: int, optional parameter(default: 6379), port of `redis`
.. note::
The value of `region` should be aligned with the data stored in `provider_uri`. Currently, ``scripts/get_data.py`` only provides China stock market data. If users want to use the US stock market data, they should prepare their own US-stock data in `provider_uri` and switch to US-stock mode.
.. note::
If redis connection failed with `redis_host` and `redis_port`, cache will not be used! Please refer to `Cache <../advanced/cache.rst>`_.

View File

@@ -0,0 +1,43 @@
.. _installation:
====================
Installation
====================
.. currentmodule:: qlib
How to Install ``Qlib``
====================
``Qlib`` only supports Python3, and supports up to Python3.8.
Please execute the following process to install ``Qlib``:
- Change the directory to ``Qlib``, in which the file ``setup.py`` exists.
- Then, please execute the following command:
.. code-block:: bash
$ pip install numpy
$ pip install --upgrade cython
$ python setup.py install
.. note::
It's recommended to use anaconda/miniconda to setup environment.
``Qlib`` needs lightgbm and tensorflow packages, use pip to install them.
.. note::
Do not import qlib in the repository folder which contains ``qlib``, otherwise errors may occur.
Use the following code to confirm installation successful:
.. code-block:: python
>>> import qlib
>>> qlib.__version__
<LATEST VERSION>

146
docs/start/integration.rst Normal file
View File

@@ -0,0 +1,146 @@
=========================================
Custom Model Integration
=========================================
Introduction
===================
``Qlib`` provides ``lightGBM`` and ``Dnn`` model as the baseline of ``Interday Model``. In addition to the default model, users can integrate their own custom models into ``Qlib``.
Users can integrate their own custom models according to the following steps.
- Define a custom model class, which should be a subclass of the `qlib.contrib.model.base.Model <../reference/api.html#module-qlib.contrib.model.base>`_
- Write a configuration file that describes the path and parameters of the custom model
- Test the custom model
Custom Model Class
===========================
The Custom models need to inherit `qlib.contrib.model.base.Model <../reference/api.html#module-qlib.contrib.model.base>`_ and override the methods in it.
- Override the `__init__` method
- ``Qlib`` passes the initialized parameters to the \_\_init\_\_ method
- The parameter must be consistent with the hyperparameters in the configuration file.
- Code Example: In the following example, the hyperparameter filed of the configuration file should contain parameters such as loss:mse.
.. code-block:: Python
def __init__(self, loss='mse', **kwargs):
if loss not in {'mse', 'binary'}:
raise NotImplementedError
self._scorer = mean_squared_error if loss == 'mse' else roc_auc_score
self._params.update(objective=loss, **kwargs)
self._model = None
- Override the `fit` method
- ``Qlib`` calls the fit method to train the model
- The parameters must include training feature 'x_train', training label 'y_train', test feature 'x_valid', test label 'y_valid'at least.
- The parameters could include some optional parameters with default values, such as train weight 'w_train', test weight 'w_valid' and 'num_boost_round = 1000'.
- Code Example: In the following example, 'num_boost_round = 1000' is an optional parameter.
.. code-block:: Python
def fit(self, x_train:pd.DataFrame, y_train:pd.DataFrame, x_valid:pd.DataFrame, y_valid:pd.DataFrame,
w_train:pd.DataFrame = None, w_valid:pd.DataFrame = None, num_boost_round = 1000, **kwargs):
# Lightgbm need 1D array as its label
if y_train.values.ndim == 2 and y_train.values.shape[1] == 1:
y_train_1d, y_valid_1d = np.squeeze(y_train.values), np.squeeze(y_valid.values)
else:
raise ValueError('LightGBM doesn\'t support multi-label training')
w_train_weight = None if w_train is None else w_train.values
w_valid_weight = None if w_valid is None else w_valid.values
dtrain = lgb.Dataset(x_train.values, label=y_train_1d, weight=w_train_weight)
dvalid = lgb.Dataset(x_valid.values, label=y_valid_1d, weight=w_valid_weight)
self._model = lgb.train(
self._params,
dtrain,
num_boost_round=num_boost_round,
valid_sets=[dtrain, dvalid],
valid_names=['train', 'valid'],
**kwargs
)
- Override the `predict` method
- The parameters include the test features
- Return the prediction score
- Please refer to `qlib.contrib.model.base.Model <../reference/api.html#module-qlib.contrib.model.base>`_ for the parameter types of the fit method
- Code Example:In the following example, user need to user dnn to predict the label(such as 'preds') of test data 'x_test' and return it.
.. code-block:: Python
def predict(self, x_test:pd.DataFrame, **kwargs)-> numpy.ndarray:
if self._model is None:
raise ValueError('model is not fitted yet!')
return self._model.predict(x_test.values)
- Override the `score` method
- The parameters include the test features and test labels
- Return the evaluation score of model. It's recommended to adopt the loss between labels and prediction score.
- Code Example:In the following example, user need to calculate the weighted loss with test data 'x_test', test label 'y_test' and the weight 'w_test'.
.. code-block:: Python
def score(self, x_test:pd.Dataframe, y_test:pd.Dataframe, w_test:pd.DataFrame = None) -> float:
# Remove rows from x, y and w, which contain Nan in any columns in y_test.
x_test, y_test, w_test = drop_nan_by_y_index(x_test, y_test, w_test)
preds = self.predict(x_test)
w_test_weight = None if w_test is None else w_test.values
scorer = mean_squared_error if self.loss_type == 'mse' else roc_auc_score
return scorer(y_test.values, preds, sample_weight=w_test_weight)
- Override the `save` method & `load` method
- The `save` method parameter include the a `filename` that represents an absolute path, user need to save model into the path.
- The `load` method parameter include the a `buffer` read from the `filename` passed in `save` method , user need to load model from the `buffer`.
- Code Example:
.. code-block:: Python
def save(self, filename):
if self._model is None:
raise ValueError('model is not fitted yet!')
self._model.save_model(filename)
def load(self, buffer):
self._model = lgb.Booster(params={'model_str': buffer.decode('utf-8')})
Configuration File
=======================
The configuration file is described in detail in the `estimator <../advanced/estimator.html#Example>`_ document. In order to integrate the custom model into ``Qlib``, you need to modify the "model" field in the configuration file.
- Example: The following example describes the model field of configuration file about the custom lightgbm model mentioned above , where module_path is the module path, class is the class name, and args is the hyperparameter passed into the __init__ method. All parameters in the field is passed to 'self._params' by '\*\*kwargs' in `__init__` except 'loss = mse'.
.. code-block:: YAML
model:
class: LGBModel
module_path: qlib.contrib.model.gbdt
args:
loss: mse
colsample_bytree: 0.8879
learning_rate: 0.0421
subsample: 0.8789
lambda_l1: 205.6999
lambda_l2: 580.9768
max_depth: 8
num_leaves: 210
num_threads: 20
Users could find configuration file of the baseline of the ``Model`` in ``qlib/examples/estimator/estimator_config.yaml`` and ``qlib/examples/estimator/estimator_config_dnn.yaml``
Model Testing
=====================
Assuming that the configuration file is ``examples/estimator/estimator_config.yaml``, user can run the following command to test the custom model:
.. code-block:: bash
cd examples # Avoid running program under the directory contains `qlib`
estimator -c estimator/estimator_config.yaml
.. note:: ``estimator`` is a built-in command of ``Qlib``.
Also, ``Model`` can also be tested as a single module. An example has been given in ``examples.estimator.train_backtest_analyze.ipynb``.
Reference
=====================
To know more about ``Model``, please refer to `Interday Model: Model Training & Prediction <../advanced/model.rst>`_ and `Model API <../reference/api.html#module-qlib.contrib.model.base>`_.