mirror of
https://github.com/microsoft/qlib.git
synced 2026-06-06 05:51:17 +08:00
init commit
This commit is contained in:
137
docs/start/getdata.rst
Normal file
137
docs/start/getdata.rst
Normal file
@@ -0,0 +1,137 @@
|
||||
.. _getdata:
|
||||
=============================
|
||||
Data Retrieval
|
||||
=============================
|
||||
|
||||
.. currentmodule:: qlib
|
||||
|
||||
Introduction
|
||||
====================
|
||||
|
||||
Users can get stock data by ``Qlib``. Following examples will demonstrate the basic user interface.
|
||||
|
||||
Examples
|
||||
====================
|
||||
|
||||
|
||||
``QLib`` Initialization:
|
||||
|
||||
.. note:: In order to get the data, users need to initialize ``Qlib`` with `qlib.init` first. Please refer to `initialization <initialization.rst>`_.
|
||||
|
||||
It is recommended to use the following code to initialize qlib:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>>> import qlib
|
||||
>>> qlib.init(provider_uri='~/.qlib/qlib_data/cn_data')
|
||||
|
||||
|
||||
Load trading calendar with the given time range and frequency:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>>> from qlib.data import D
|
||||
>>> D.calendar(start_time='2010-01-01', end_time='2017-12-31', freq='day')[:2]
|
||||
[Timestamp('2010-01-04 00:00:00'), Timestamp('2010-01-05 00:00:00')]
|
||||
|
||||
Parse a given market name into a stockpool config:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>>> from qlib.data import D
|
||||
>>> D.instruments(market='all')
|
||||
{'market': 'all', 'filter_pipe': []}
|
||||
|
||||
Load instruments of certain stockpool in the given time range:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>>> from qlib.data import D
|
||||
>>> instruments = D.instruments(market='csi300')
|
||||
>>> D.list_instruments(instruments=instruments, start_time='2010-01-01', end_time='2017-12-31', as_list=True)[:6]
|
||||
|
||||
|
||||
Load dynamic instruments from a base market according to a name filter
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>>> from qlib.data import D
|
||||
>>> from qlib.data.filter import NameDFilter
|
||||
>>> nameDFilter = NameDFilter(name_rule_re='SH[0-9]{4}55')
|
||||
>>> instruments = D.instruments(market='csi300', filter_pipe=[nameDFilter])
|
||||
>>> D.list_instruments(instruments=instruments, start_time='2015-01-01', end_time='2016-02-15', as_list=True)
|
||||
|
||||
Load dynamic instruments from a base market according to an expression filter
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>>> from qlib.data import D
|
||||
>>> from qlib.data.filter import ExpressionDFilter
|
||||
>>> expressionDFilter = ExpressionDFilter(rule_expression='$close>100')
|
||||
>>> instruments = D.instruments(market='csi300', filter_pipe=[expressionDFilter])
|
||||
>>> D.list_instruments(instruments=instruments, start_time='2015-01-01', end_time='2016-02-15', as_list=True)
|
||||
|
||||
To know more about how to use the filter or how to build one's own filter, go to API Reference: `filter API <../reference/api.html#filter>`_
|
||||
|
||||
Load features of certain instruments in given time range:
|
||||
|
||||
.. note:: This is not a recommended way to get features.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>>> from qlib.data import D
|
||||
>>> instruments = ['SH600000']
|
||||
>>> fields = ['$close', '$volume', 'Ref($close, 1)', 'Mean($close, 3)', '$high-$low']
|
||||
>>> D.features(instruments, fields, start_time='2010-01-01', end_time='2017-12-31', freq='day').head()
|
||||
$close $volume Ref($close,1) Mean($close,3) \
|
||||
instrument datetime
|
||||
SH600000 2010-01-04 81.809998 17144536.0 NaN 81.809998
|
||||
2010-01-05 82.419998 29827816.0 81.809998 82.114998
|
||||
2010-01-06 80.800003 25070040.0 82.419998 81.676666
|
||||
2010-01-07 78.989998 22077858.0 80.800003 80.736666
|
||||
2010-01-08 79.879997 17019168.0 78.989998 79.889999
|
||||
|
||||
Sub($high,$low)
|
||||
instrument datetime
|
||||
SH600000 2010-01-04 2.741158
|
||||
2010-01-05 3.049736
|
||||
2010-01-06 1.621399
|
||||
2010-01-07 2.856926
|
||||
2010-01-08 1.930397
|
||||
2010-01-08 1.930397
|
||||
|
||||
Load features of certain stockpool in given time range:
|
||||
|
||||
.. note:: Since the server need to cache all-time data for your request stockpool and fields, it may take longer to process your request than before. But in the second time, your request will be processed and responded in a flash even if you change the timespan.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>>> from qlib.data import D
|
||||
>>> from qlib.data.filter import NameDFilter, ExpressionDFilter
|
||||
>>> nameDFilter = NameDFilter(name_rule_re='SH[0-9]{4}55')
|
||||
>>> expressionDFilter = ExpressionDFilter(rule_expression='($close/$factor)>100')
|
||||
>>> instruments = D.instruments(market='csi300', filter_pipe=[nameDFilter, expressionDFilter])
|
||||
>>> fields = ['$close', '$volume', 'Ref($close, 1)', 'Mean($close, 3)', '$high-$low']
|
||||
>>> D.features(instruments, fields, start_time='2010-01-01', end_time='2017-12-31', freq='day').head()
|
||||
|
||||
$close $volume Ref($close, 1) \
|
||||
instrument datetime
|
||||
SH600655 2015-06-15 4342.160156 258706.359375 4530.459961
|
||||
2015-06-16 4409.270020 257349.718750 4342.160156
|
||||
2015-06-17 4312.330078 235214.890625 4409.270020
|
||||
2015-06-18 4086.729980 196772.859375 4312.330078
|
||||
2015-06-19 3678.250000 182916.453125 4086.729980
|
||||
Mean($close, 3) high− low
|
||||
instrument datetime
|
||||
SH600655 2015-06-15 4480.743327 285.251465
|
||||
2015-06-16 4427.296712 298.301270
|
||||
2015-06-16 4354.586751 356.098145
|
||||
2015-06-16 4269.443359 363.554932
|
||||
2015-06-16 4025.770020 368.954346
|
||||
|
||||
|
||||
.. note:: When calling D.features() at client, use parameter 'disk_cache=0' to skip dataset cache, use 'disk_cache=1' to generate and use dataset cache. In addition, when calling at server, you can use 'disk_cache=2' to update the dataset cache.
|
||||
|
||||
API
|
||||
====================
|
||||
To know more about how to use the Data, go to API Reference: `Data API <../reference/api.html#Data>`_
|
||||
60
docs/start/initialization.rst
Normal file
60
docs/start/initialization.rst
Normal file
@@ -0,0 +1,60 @@
|
||||
.. _initialization:
|
||||
====================
|
||||
Qlib Initialization
|
||||
====================
|
||||
|
||||
.. currentmodule:: qlib
|
||||
|
||||
|
||||
Initialization
|
||||
=========================
|
||||
|
||||
Please execute the following process to initialize ``Qlib``.
|
||||
|
||||
- Download and prepare the Data: execute the following command to download the stock data.
|
||||
.. code-block:: bash
|
||||
|
||||
python scripts/get_data.py qlib_data_cn --target_dir ~/.qlib/qlib_data/cn_data
|
||||
|
||||
Know more about how to use ``get_data.py``, refer to `Raw Data <../advanced/data.html#raw-data>`_.
|
||||
|
||||
|
||||
- Run the initialization code: run the following code in python:
|
||||
|
||||
.. code-block:: Python
|
||||
|
||||
import qlib
|
||||
# region in [REG_CN, REG_US]
|
||||
from qlib.config import REG_CN
|
||||
provider_uri = "~/.qlib/qlib_data/cn_data" # target_dir
|
||||
qlib.init(provider_uri=provider_uri, region=REG_CN)
|
||||
|
||||
|
||||
|
||||
Parameters
|
||||
-------------------
|
||||
|
||||
In fact, in addition to `provider_uri` and `region`, `qlib.init` has other parameters. The following are all the parameters of `qlib.init`:
|
||||
|
||||
- `provider_uri`
|
||||
Type: str. The local directory where the data loaded by ``get_data.py`` is stored.
|
||||
- `region`
|
||||
Type: str, optional parameter(default: ``qlib.config.REG_CN``).
|
||||
Currently: ``qlib.config.REG_US``('us') and ``qlib.config.REG_CN``('cn') is supported. Different value of ``region`` will
|
||||
result in different stock market mode.
|
||||
|
||||
- ``qlib.config.REG_US``: US stock market.
|
||||
- ``qlib.config.REG_CN``: China stock market.
|
||||
- `redis_host`
|
||||
Type: str, optional parameter(default: "127.0.0.1"), host of `redis`
|
||||
The lock and cache mechanism relies on redis.
|
||||
- `redis_port`
|
||||
Type: int, optional parameter(default: 6379), port of `redis`
|
||||
|
||||
.. note::
|
||||
|
||||
The value of `region` should be aligned with the data stored in `provider_uri`. Currently, ``scripts/get_data.py`` only provides China stock market data. If users want to use the US stock market data, they should prepare their own US-stock data in `provider_uri` and switch to US-stock mode.
|
||||
|
||||
.. note::
|
||||
|
||||
If redis connection failed with `redis_host` and `redis_port`, cache will not be used! Please refer to `Cache <../advanced/cache.rst>`_.
|
||||
43
docs/start/installation.rst
Normal file
43
docs/start/installation.rst
Normal file
@@ -0,0 +1,43 @@
|
||||
.. _installation:
|
||||
====================
|
||||
Installation
|
||||
====================
|
||||
|
||||
.. currentmodule:: qlib
|
||||
|
||||
|
||||
How to Install ``Qlib``
|
||||
====================
|
||||
|
||||
``Qlib`` only supports Python3, and supports up to Python3.8.
|
||||
|
||||
Please execute the following process to install ``Qlib``:
|
||||
|
||||
- Change the directory to ``Qlib``, in which the file ``setup.py`` exists.
|
||||
- Then, please execute the following command:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ pip install numpy
|
||||
$ pip install --upgrade cython
|
||||
$ python setup.py install
|
||||
|
||||
|
||||
.. note::
|
||||
It's recommended to use anaconda/miniconda to setup environment.
|
||||
``Qlib`` needs lightgbm and tensorflow packages, use pip to install them.
|
||||
|
||||
.. note::
|
||||
Do not import qlib in the repository folder which contains ``qlib``, otherwise errors may occur.
|
||||
|
||||
|
||||
|
||||
Use the following code to confirm installation successful:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>>> import qlib
|
||||
>>> qlib.__version__
|
||||
<LATEST VERSION>
|
||||
|
||||
|
||||
146
docs/start/integration.rst
Normal file
146
docs/start/integration.rst
Normal file
@@ -0,0 +1,146 @@
|
||||
=========================================
|
||||
Custom Model Integration
|
||||
=========================================
|
||||
|
||||
Introduction
|
||||
===================
|
||||
|
||||
``Qlib`` provides ``lightGBM`` and ``Dnn`` model as the baseline of ``Interday Model``. In addition to the default model, users can integrate their own custom models into ``Qlib``.
|
||||
|
||||
Users can integrate their own custom models according to the following steps.
|
||||
|
||||
- Define a custom model class, which should be a subclass of the `qlib.contrib.model.base.Model <../reference/api.html#module-qlib.contrib.model.base>`_
|
||||
- Write a configuration file that describes the path and parameters of the custom model
|
||||
- Test the custom model
|
||||
|
||||
Custom Model Class
|
||||
===========================
|
||||
The Custom models need to inherit `qlib.contrib.model.base.Model <../reference/api.html#module-qlib.contrib.model.base>`_ and override the methods in it.
|
||||
|
||||
- Override the `__init__` method
|
||||
- ``Qlib`` passes the initialized parameters to the \_\_init\_\_ method
|
||||
- The parameter must be consistent with the hyperparameters in the configuration file.
|
||||
- Code Example: In the following example, the hyperparameter filed of the configuration file should contain parameters such as ‘loss:mse’.
|
||||
.. code-block:: Python
|
||||
|
||||
def __init__(self, loss='mse', **kwargs):
|
||||
if loss not in {'mse', 'binary'}:
|
||||
raise NotImplementedError
|
||||
self._scorer = mean_squared_error if loss == 'mse' else roc_auc_score
|
||||
self._params.update(objective=loss, **kwargs)
|
||||
self._model = None
|
||||
|
||||
- Override the `fit` method
|
||||
- ``Qlib`` calls the fit method to train the model
|
||||
- The parameters must include training feature 'x_train', training label 'y_train', test feature 'x_valid', test label 'y_valid'at least.
|
||||
- The parameters could include some optional parameters with default values, such as train weight 'w_train', test weight 'w_valid' and 'num_boost_round = 1000'.
|
||||
- Code Example: In the following example, 'num_boost_round = 1000' is an optional parameter.
|
||||
.. code-block:: Python
|
||||
|
||||
def fit(self, x_train:pd.DataFrame, y_train:pd.DataFrame, x_valid:pd.DataFrame, y_valid:pd.DataFrame,
|
||||
w_train:pd.DataFrame = None, w_valid:pd.DataFrame = None, num_boost_round = 1000, **kwargs):
|
||||
|
||||
# Lightgbm need 1D array as its label
|
||||
if y_train.values.ndim == 2 and y_train.values.shape[1] == 1:
|
||||
y_train_1d, y_valid_1d = np.squeeze(y_train.values), np.squeeze(y_valid.values)
|
||||
else:
|
||||
raise ValueError('LightGBM doesn\'t support multi-label training')
|
||||
|
||||
w_train_weight = None if w_train is None else w_train.values
|
||||
w_valid_weight = None if w_valid is None else w_valid.values
|
||||
|
||||
dtrain = lgb.Dataset(x_train.values, label=y_train_1d, weight=w_train_weight)
|
||||
dvalid = lgb.Dataset(x_valid.values, label=y_valid_1d, weight=w_valid_weight)
|
||||
self._model = lgb.train(
|
||||
self._params,
|
||||
dtrain,
|
||||
num_boost_round=num_boost_round,
|
||||
valid_sets=[dtrain, dvalid],
|
||||
valid_names=['train', 'valid'],
|
||||
**kwargs
|
||||
)
|
||||
|
||||
- Override the `predict` method
|
||||
- The parameters include the test features
|
||||
- Return the prediction score
|
||||
- Please refer to `qlib.contrib.model.base.Model <../reference/api.html#module-qlib.contrib.model.base>`_ for the parameter types of the fit method
|
||||
- Code Example:In the following example, user need to user dnn to predict the label(such as 'preds') of test data 'x_test' and return it.
|
||||
.. code-block:: Python
|
||||
|
||||
def predict(self, x_test:pd.DataFrame, **kwargs)-> numpy.ndarray:
|
||||
if self._model is None:
|
||||
raise ValueError('model is not fitted yet!')
|
||||
return self._model.predict(x_test.values)
|
||||
|
||||
- Override the `score` method
|
||||
- The parameters include the test features and test labels
|
||||
- Return the evaluation score of model. It's recommended to adopt the loss between labels and prediction score.
|
||||
- Code Example:In the following example, user need to calculate the weighted loss with test data 'x_test', test label 'y_test' and the weight 'w_test'.
|
||||
.. code-block:: Python
|
||||
|
||||
def score(self, x_test:pd.Dataframe, y_test:pd.Dataframe, w_test:pd.DataFrame = None) -> float:
|
||||
# Remove rows from x, y and w, which contain Nan in any columns in y_test.
|
||||
x_test, y_test, w_test = drop_nan_by_y_index(x_test, y_test, w_test)
|
||||
preds = self.predict(x_test)
|
||||
w_test_weight = None if w_test is None else w_test.values
|
||||
scorer = mean_squared_error if self.loss_type == 'mse' else roc_auc_score
|
||||
return scorer(y_test.values, preds, sample_weight=w_test_weight)
|
||||
|
||||
- Override the `save` method & `load` method
|
||||
- The `save` method parameter include the a `filename` that represents an absolute path, user need to save model into the path.
|
||||
- The `load` method parameter include the a `buffer` read from the `filename` passed in `save` method , user need to load model from the `buffer`.
|
||||
- Code Example:
|
||||
.. code-block:: Python
|
||||
|
||||
def save(self, filename):
|
||||
if self._model is None:
|
||||
raise ValueError('model is not fitted yet!')
|
||||
self._model.save_model(filename)
|
||||
|
||||
def load(self, buffer):
|
||||
self._model = lgb.Booster(params={'model_str': buffer.decode('utf-8')})
|
||||
|
||||
|
||||
Configuration File
|
||||
=======================
|
||||
|
||||
The configuration file is described in detail in the `estimator <../advanced/estimator.html#Example>`_ document. In order to integrate the custom model into ``Qlib``, you need to modify the "model" field in the configuration file.
|
||||
|
||||
- Example: The following example describes the ‘model’ field of configuration file about the custom lightgbm model mentioned above , where ‘module_path’ is the module path, ‘class’ is the class name, and ‘args’ is the hyperparameter passed into the __init__ method. All parameters in the field is passed to 'self._params' by '\*\*kwargs' in `__init__` except 'loss = mse'.
|
||||
|
||||
.. code-block:: YAML
|
||||
|
||||
model:
|
||||
class: LGBModel
|
||||
module_path: qlib.contrib.model.gbdt
|
||||
args:
|
||||
loss: mse
|
||||
colsample_bytree: 0.8879
|
||||
learning_rate: 0.0421
|
||||
subsample: 0.8789
|
||||
lambda_l1: 205.6999
|
||||
lambda_l2: 580.9768
|
||||
max_depth: 8
|
||||
num_leaves: 210
|
||||
num_threads: 20
|
||||
|
||||
Users could find configuration file of the baseline of the ``Model`` in ``qlib/examples/estimator/estimator_config.yaml`` and ``qlib/examples/estimator/estimator_config_dnn.yaml``
|
||||
|
||||
Model Testing
|
||||
=====================
|
||||
Assuming that the configuration file is ``examples/estimator/estimator_config.yaml``, user can run the following command to test the custom model:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
cd examples # Avoid running program under the directory contains `qlib`
|
||||
estimator -c estimator/estimator_config.yaml
|
||||
|
||||
.. note:: ``estimator`` is a built-in command of ``Qlib``.
|
||||
|
||||
Also, ``Model`` can also be tested as a single module. An example has been given in ``examples.estimator.train_backtest_analyze.ipynb``.
|
||||
|
||||
|
||||
Reference
|
||||
=====================
|
||||
|
||||
To know more about ``Model``, please refer to `Interday Model: Model Training & Prediction <../advanced/model.rst>`_ and `Model API <../reference/api.html#module-qlib.contrib.model.base>`_.
|
||||
Reference in New Issue
Block a user