1
0
mirror of https://github.com/microsoft/qlib.git synced 2026-07-06 04:20:57 +08:00

split code into core and contrib for data&model

This commit is contained in:
Young
2020-10-12 02:12:27 +00:00
parent 77e2f25f7b
commit d4091a8711
20 changed files with 346 additions and 104 deletions

View File

@@ -195,8 +195,8 @@ Your PR of new Quant models is highly welcomed.
# Quant Dataset Zoo
Dataset plays a very important role in Quant. Here is a list of the datasets built on `Qlib`.
- [Alpha360](./qlib/contrib/estimator/handler.py)
- [Alpha158](./qlib/contrib/estimator/handler.py)
- [Alpha360](./qlib/contrib/data/handler.py)
- [Alpha158](./qlib/contrib/data/handler.py)
[Here](https://qlib.readthedocs.io/en/latest/advanced/alpha.html) is a tutorial to build dataset with `Qlib`.
Your PR to build new Quant dataset is highly welcomed.

View File

@@ -49,7 +49,7 @@ Users can use ``Data Handler`` to build formulaic alphas `MACD` in qlib:
.. code-block:: python
>> from qlib.contrib.estimator.handler import QLibDataHandler
>> from qlib.data.dataset.handler import QLibDataHandler
>> MACD_EXP = '(EMA($close, 12) - EMA($close, 26))/$close - EMA((EMA($close, 12) - EMA($close, 26))/$close, 9)/$close'
>> fields = [MACD_EXP] # MACD
>> names = ['MACD']

View File

@@ -156,12 +156,12 @@ Data Handler
Users can use ``Data Handler`` in an automatic workflow by ``Estimator``, refer to `Estimator: Workflow Management <estimator.html>`_ for more details.
Also, ``Data Handler`` can be used as an independent module, by which users can easily preprocess data(standardization, remove NaN, etc.) and build datasets. It is a subclass of ``qlib.contrib.estimator.handler.BaseDataHandler``, which provides some interfaces as follows.
Also, ``Data Handler`` can be used as an independent module, by which users can easily preprocess data(standardization, remove NaN, etc.) and build datasets. It is a subclass of ``qlib.data.dataset.handler.BaseDataHandler``, which provides some interfaces as follows.
Base Class & Interface
----------------------
Qlib provides a base class `qlib.contrib.estimator.BaseDataHandler <../reference/api.html#qlib.contrib.estimator.handler.BaseDataHandler>`_, which provides the following interfaces:
Qlib provides a base class `qlib.data.dataset.BaseDataHandler <../reference/api.html#qlib.data.dataset.handler.BaseDataHandler>`_, which provides the following interfaces:
- `setup_feature`
Implement the interface to load the data features.
@@ -182,7 +182,7 @@ Qlib also provides two functions to help users init the data handler, users can
Users can init the raw df, feature names, and label names of data handler in this function.
If the index of feature df and label df are not the same, users need to override this method to merge them (e.g. inner, left, right merge).
If users want to load features and labels by config, users can inherit ``qlib.contrib.estimator.handler.ConfigDataHandler``, ``Qlib`` also provides some preprocess method in this subclass.
If users want to load features and labels by config, users can inherit ``qlib.data.dataset.handler.ConfigDataHandler``, ``Qlib`` also provides some preprocess method in this subclass.
If users want to use qlib data, `QLibDataHandler` is recommended. Users can inherit their custom class from `QLibDataHandler`, which is also a subclass of `ConfigDataHandler`.
@@ -214,7 +214,7 @@ Qlib provides implemented data handler `Alpha158`. The following example shows h
.. code-block:: Python
from qlib.contrib.estimator.handler import Alpha158
from qlib.contrib.data.handler import Alpha158
from qlib.contrib.model.gbdt import LGBModel
DATA_HANDLER_CONFIG = {
@@ -251,7 +251,7 @@ Also, the above example has been given in ``examples.estimator.train_backtest_an
API
---------
To know more about ``Data Handler``, please refer to `Data Handler API <../reference/api.html#module-qlib.contrib.estimator.handler>`_.
To know more about ``Data Handler``, please refer to `Data Handler API <../reference/api.html#module-qlib.data.dataset.handler>`_.
Cache
==========

View File

@@ -266,7 +266,7 @@ Users can use a specified model by configuration with hyper-parameters.
Custom Models
~~~~~~~~~~~~~~~~~
Qlib supports custom models, but it must be a subclass of the `qlib.contrib.model.Model`, the config for a custom model may be as following.
Qlib supports custom models, but it must be a subclass of the `qlib.model.Model`, the config for a custom model may be as following.
.. code-block:: YAML
@@ -284,7 +284,7 @@ To know more about ``Interday Model``, please refer to `Interday Model: Training
Data Section
-----------------
``Data Handler`` can be used to load raw data, prepare features and label columns, preprocess data (standardization, remove NaN, etc.), split training, validation, and test sets. It is a subclass of `qlib.contrib.estimator.handler.BaseDataHandler`.
``Data Handler`` can be used to load raw data, prepare features and label columns, preprocess data (standardization, remove NaN, etc.), split training, validation, and test sets. It is a subclass of `qlib.data.dataset.handler.BaseDataHandler`.
Users can use the specified data handler by config as follows.
@@ -315,10 +315,10 @@ Users can use the specified data handler by config as follows.
fend_time: 2018-12-11
- `class`
Data handler class, str type, which should be a subclass of `qlib.contrib.estimator.handler.BaseDataHandler`, and implements 5 important interfaces for loading features, loading raw data, preprocessing raw data, slicing train, validation, and test data. The default value is `ALPHA360`. If users want to write a data handler to retrieve the data in ``Qlib``, `QlibDataHandler` is suggested.
Data handler class, str type, which should be a subclass of `qlib.data.dataset.handler.BaseDataHandler`, and implements 5 important interfaces for loading features, loading raw data, preprocessing raw data, slicing train, validation, and test data. The default value is `ALPHA360`. If users want to write a data handler to retrieve the data in ``Qlib``, `QlibDataHandler` is suggested.
- `module_path`
The module path, str type, absolute url is also supported, indicates the path of the `class` implementation of the data processor class. The default value is `qlib.contrib.estimator.handler`.
The module path, str type, absolute url is also supported, indicates the path of the `class` implementation of the data processor class. The default value is `qlib.data.dataset.handler`.
- `args`
Parameters used for ``Data Handler`` initialization.
@@ -376,7 +376,7 @@ Qlib support custom data handler, but it must be a subclass of the ``qlib.contri
The class `SomeDataHandler` should be in the module `custom_data_handler`, and ``Qlib`` could parse the `module_path` to load the class.
If users want to load features and labels by config, they can inherit ``qlib.contrib.estimator.handler.ConfigDataHandler``, ``Qlib`` also has provided some preprocess methods in this subclass.
If users want to load features and labels by config, they can inherit ``qlib.data.dataset.handler.ConfigDataHandler``, ``Qlib`` also has provided some preprocess methods in this subclass.
If users want to use qlib data, `QLibDataHandler` is recommended, from which users can inherit the custom class. `QLibDataHandler` is also a subclass of `ConfigDataHandler`.
To know more about ``Data Handler``, please refer to `Data Framework&Usage <data.html>`_.

View File

@@ -13,7 +13,7 @@ Because the components in ``Qlib`` are designed in a loosely-coupled way, ``Inte
Base Class & Interface
======================
``Qlib`` provides a base class `qlib.contrib.model.base.Model <../reference/api.html#module-qlib.contrib.model.base>`_ from which all models should inherit.
``Qlib`` provides a base class `qlib.model.base.Model <../reference/api.html#module-qlib.model.base>`_ from which all models should inherit.
The base class provides the following interfaces:
@@ -110,7 +110,7 @@ The base class provides the following interfaces:
The format of `w_test` is same as `w_train` in `fit` method.
- Return: float type, evaluation score
For other interfaces such as `save`, `load`, `finetune`, please refer to `Model API <../reference/api.html#module-qlib.contrib.model.base>`_.
For other interfaces such as `save`, `load`, `finetune`, please refer to `Model API <../reference/api.html#module-qlib.model.base>`_.
Example
==================
@@ -121,7 +121,7 @@ Example
- Run the following code to get the `prediction score` `pred_score`
.. code-block:: Python
from qlib.contrib.estimator.handler import Alpha158
from qlib.contrib.data.handler import Alpha158
from qlib.contrib.model.gbdt import LGBModel
DATA_HANDLER_CONFIG = {
@@ -175,4 +175,4 @@ Qlib supports custom models. If users are interested in customizing their own mo
API
===================
Please refer to `Model API <../reference/api.html#module-qlib.contrib.model.base>`_.
Please refer to `Model API <../reference/api.html#module-qlib.model.base>`_.

View File

@@ -63,12 +63,12 @@ Contrib
Data Handler
---------------
.. automodule:: qlib.contrib.estimator.handler
.. automodule:: qlib.data.dataset.handler
:members:
Model
--------------------
.. automodule:: qlib.contrib.model.base
.. automodule:: qlib.model.base
:members:
Strategy

View File

@@ -9,13 +9,13 @@ Introduction
Users can integrate their own custom models according to the following steps.
- Define a custom model class, which should be a subclass of the `qlib.contrib.model.base.Model <../reference/api.html#module-qlib.contrib.model.base>`_.
- Define a custom model class, which should be a subclass of the `qlib.model.base.Model <../reference/api.html#module-qlib.model.base>`_.
- Write a configuration file that describes the path and parameters of the custom model.
- Test the custom model.
Custom Model Class
===========================
The Custom models need to inherit `qlib.contrib.model.base.Model <../reference/api.html#module-qlib.contrib.model.base>`_ and override the methods in it.
The Custom models need to inherit `qlib.model.base.Model <../reference/api.html#module-qlib.model.base>`_ and override the methods in it.
- Override the `__init__` method
- ``Qlib`` passes the initialized parameters to the \_\_init\_\_ method.
@@ -63,7 +63,7 @@ The Custom models need to inherit `qlib.contrib.model.base.Model <../reference/a
- Override the `predict` method
- The parameters include the test features.
- Return the `prediction score`.
- Please refer to `Model API <../reference/api.html#module-qlib.contrib.model.base>`_ for the parameter types of the fit method.
- Please refer to `Model API <../reference/api.html#module-qlib.model.base>`_ for the parameter types of the fit method.
- Code Example: In the following example, users need to use dnn to predict the label(such as `preds`) of test data `x_test` and return it.
.. code-block:: Python
@@ -143,4 +143,4 @@ Also, ``Model`` can also be tested as a single module. An example has been given
Reference
=====================
To know more about ``Interday Model``, please refer to `Interday Model: Model Training & Prediction <../component/model.html>`_ and `Model API <../reference/api.html#module-qlib.contrib.model.base>`_.
To know more about ``Interday Model``, please refer to `Interday Model: Model Training & Prediction <../component/model.html>`_ and `Model API <../reference/api.html#module-qlib.model.base>`_.

View File

@@ -5,7 +5,7 @@ experiment:
model:
class: LGBModel
module_path: qlib.contrib.model.gbdt
module_path: qlib.gbdt.model.gbdt
args:
loss: mse
colsample_bytree: 0.8879

View File

@@ -4,7 +4,7 @@ experiment:
mode: train
model:
module_path: qlib.contrib.model.pytorch_nn
module_path: qlib.model.pytorch_nn
class: DNNModelPytorch
args:
loss: mse

View File

@@ -8,7 +8,7 @@ import qlib
import pandas as pd
from qlib.config import REG_CN
from qlib.contrib.model.gbdt import LGBModel
from qlib.contrib.estimator.handler import Alpha158
from qlib.contrib.data.handler import Alpha158
from qlib.contrib.strategy.strategy import TopkDropoutStrategy
from qlib.contrib.evaluate import (
backtest as normal_backtest,

View File

@@ -0,0 +1,63 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
from ...data.dataset.handler import ConfigQLibDataHandler
from ...log import TimeInspector
class ALPHA360(ConfigQLibDataHandler):
config_template = {
"price": {"windows": range(60)},
"volume": {"windows": range(60)},
}
class QLibDataHandlerV1(ConfigQLibDataHandler):
config_template = {
"kbar": {},
"price": {
"windows": [0],
"feature": ["OPEN", "HIGH", "LOW", "VWAP"],
},
"rolling": {},
}
def __init__(self, start_date, end_date, processors=None, **kwargs):
if processors is None:
processors = ["PanelProcessor"] # V1 default processor
super().__init__(start_date, end_date, processors, **kwargs)
def setup_label(self):
"""
load the labels df
:return: df_labels
"""
TimeInspector.set_time_mark()
df_labels = super().setup_label()
## calculate new labels
df_labels["LABEL1"] = df_labels["LABEL0"].groupby(level="datetime").apply(lambda x: (x - x.mean()) / x.std())
df_labels = df_labels.drop(["LABEL0"], axis=1)
TimeInspector.log_cost_time("Finished loading labels.")
return df_labels
class Alpha158(QLibDataHandlerV1):
config_template = {
"kbar": {},
"price": {
"windows": [0],
"feature": ["OPEN", "HIGH", "LOW", "CLOSE"],
},
"rolling": {},
}
def _init_kwargs(self, **kwargs):
kwargs["labels"] = ["Ref($close, -2)/Ref($close, -1) - 1"]
super(Alpha158, self)._init_kwargs(**kwargs)

View File

@@ -103,7 +103,7 @@ class DataConfig(object):
:param config: The config dict for data
:param CONFIG_MANAGER: The estimator config manager
"""
self.handler_module_path = config.get("module_path", "qlib.contrib.estimator.handler")
self.handler_module_path = config.get("module_path", "qlib.contrib.data.handler")
self.handler_class = config.get("class", "ALPHA360")
self.handler_parameters = config.get("args", dict())
self.handler_filter = config.get("filter", dict())
@@ -118,7 +118,7 @@ class ModelConfig(object):
:param CONFIG_MANAGER: The estimator config manager
"""
self.model_class = config.get("class", "Model")
self.model_module_path = config.get("module_path", "qlib.contrib.model")
self.model_module_path = config.get("module_path", "qlib.model")
self.save_dir = os.path.join(CONFIG_MANAGER.ex_config.tmp_run_dir, "model")
self.save_path = config.get("save_path", os.path.join(self.save_dir, "model.bin"))
self.parameters = config.get("args", dict())

View File

View File

@@ -9,7 +9,7 @@ import numpy as np
import lightgbm as lgb
from sklearn.metrics import roc_auc_score, mean_squared_error
from .base import Model
from ...model.base import Model
from ...utils import drop_nan_by_y_index

View File

@@ -17,7 +17,7 @@ import torch
import torch.nn as nn
import torch.optim as optim
from .base import Model
from ...model.base import Model
class DNNModelPytorch(Model):

View File

View File

@@ -513,73 +513,3 @@ class ConfigQLibDataHandler(QLibDataHandler):
if "labels" not in kwargs:
kwargs["labels"] = ["Ref($vwap, -2)/Ref($vwap, -1) - 1"]
super()._init_kwargs(**kwargs)
class ALPHA360(ConfigQLibDataHandler):
config_template = {
"price": {"windows": range(60)},
"volume": {"windows": range(60)},
}
class QLibDataHandlerV1(ConfigQLibDataHandler):
config_template = {
"kbar": {},
"price": {
"windows": [0],
"feature": ["OPEN", "HIGH", "LOW", "VWAP"],
},
"rolling": {},
}
def __init__(self, start_date, end_date, processors=None, **kwargs):
if processors is None:
processors = ["PanelProcessor"] # V1 default processor
super().__init__(start_date, end_date, processors, **kwargs)
def setup_label(self):
"""
load the labels df
:return: df_labels
"""
TimeInspector.set_time_mark()
df_labels = super().setup_label()
## calculate new labels
df_labels["LABEL1"] = df_labels["LABEL0"].groupby(level="datetime").apply(lambda x: (x - x.mean()) / x.std())
df_labels = df_labels.drop(["LABEL0"], axis=1)
TimeInspector.log_cost_time("Finished loading labels.")
return df_labels
class Alpha158(QLibDataHandlerV1):
config_template = {
"kbar": {},
"price": {
"windows": [0],
"feature": ["OPEN", "HIGH", "LOW", "CLOSE"],
},
"rolling": {},
}
def _init_kwargs(self, **kwargs):
kwargs["labels"] = ["Ref($close, -2)/Ref($close, -1) - 1"]
super(Alpha158, self)._init_kwargs(**kwargs)
# if __name__ == '__main__':
# import qlib
#
# qlib.init()
#
# handler = ALPHA80('2010-01-01', '2018-12-31')
# data = handler.get_split_data(
# pd.Timestamp('2010-01-01'), pd.Timestamp('2014-01-01'),
# pd.Timestamp('2015-01-01'), pd.Timestamp('2016-01-01'),
# pd.Timestamp('2017-01-01'), pd.Timestamp('2018-01-01'))
# print(data[0])
# data[0].to_pickle('alpha80.pkl')

View File

@@ -0,0 +1,249 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import abc
import numpy as np
import pandas as pd
from ...log import TimeInspector
EPS = 1e-12
class Processor(abc.ABC):
def __init__(self, feature_names, label_names, **kwargs):
self.feature_names = feature_names
self.label_names = label_names
@abc.abstractmethod
def __call__(self, df_train, df_valid, df_test):
pass
class PanelProcessor(Processor):
"""Panel Preprocessor"""
STD_NORM = "Std"
MINMAX_NORM = "MinMax"
def __init__(self, feature_names, label_names, **kwargs):
super().__init__(feature_names, label_names)
# Options.
self.dropna_label = kwargs.get("dropna_label", True)
self.dropna_feature = kwargs.get("dropna_feature", False)
self.normalize_method = kwargs.get("normalize_method", None)
self.replace_inf = kwargs.get("replace_inf_feature", False)
def __call__(self, df_train, df_valid, df_test):
"""
Preprocess the data
:param df: the dataframe to process data.
"""
# Drop null labels.
if self.dropna_label:
df_train, df_valid, df_test = self._process_drop_null_label(df_train, df_valid, df_test)
# Dropna if need.
if self.dropna_feature:
df_train, df_valid, df_test = self._process_drop_null_feature(df_train, df_valid, df_test)
# replace the 'inf' with the mean the corresponding dimension
if self.replace_inf:
df_train, df_valid, df_test = self._process_replace_inf_feature(df_train, df_valid, df_test)
# normalize data in given method.
if self.normalize_method is not None:
df_train, df_valid, df_test = self._process_normalize_feature(df_train, df_valid, df_test)
return df_train, df_valid, df_test
def _process_drop_null_label(self, df_train, df_valid, df_test):
"""
Drop null labels.
"""
TimeInspector.set_time_mark()
df_train = df_train.dropna(subset=self.label_names)
df_valid = df_valid.dropna(subset=self.label_names)
# The test data's label is Unkown. They can not be seen when preprocessing
TimeInspector.log_cost_time("Finished dropping null labels.")
return df_train, df_valid, df_test
def _process_drop_null_feature(self, df_train, df_valid, df_test):
"""
Drop data which contain null features if needed.
"""
# TODO - `Pandas.dropna` is a low performance method.
TimeInspector.set_time_mark()
df_train = df_train.dropna(subset=self.feature_names)
df_valid = df_valid.dropna(subset=self.feature_names)
df_test = df_test.dropna(subset=self.feature_names)
TimeInspector.log_cost_time("Finished dropping nan.")
return df_train, df_valid, df_test
def _process_replace_inf_feature(self, df_train, df_valid, df_test):
"""
replace the 'inf' in feature with the mean of this dimension.
"""
TimeInspector.set_time_mark()
def replace_inf(data):
def process_inf(df):
for col in df.columns:
df[col] = df[col].replace([np.inf, -np.inf], df[col][~np.isinf(df[col])].mean())
return df
data = data.groupby("datetime").apply(process_inf)
data.sort_index(inplace=True)
return data
df_train = replace_inf(df_train)
df_valid = replace_inf(df_valid)
df_test = replace_inf(df_test)
TimeInspector.log_cost_time("Finished replace inf.")
return df_train, df_valid, df_test
def _process_normalize_feature(self, df_train, df_valid, df_test):
"""
Normalize data if needed, we provide two method now: min-max normalization and standard normalization.
"""
TimeInspector.set_time_mark()
if self.normalize_method == self.MINMAX_NORM:
min_train = np.nanmin(df_train[self.feature_names].values, axis=0)
max_train = np.nanmax(df_train[self.feature_names].values, axis=0)
ignore = min_train == max_train
def normalize(x, min_train=min_train, max_train=max_train, ignore=ignore):
if (~ignore).all():
return (x - min_train) / (max_train - min_train)
for i in range(ignore.size):
if not ignore[i]:
x[i] = (x[i] - min_train) / (max_train - min_train)
return x
elif self.normalize_method == self.STD_NORM:
mean_train = np.nanmean(df_train[self.feature_names].values, axis=0)
std_train = np.nanstd(df_train[self.feature_names].values, axis=0)
ignore = std_train == 0
def normalize(x, mean_train=mean_train, std_train=std_train, ignore=ignore):
if (~ignore).all():
return (x - mean_train) / std_train
for i in range(ignore.size):
if not ignore[i]:
x[i] = (x[i] - mean_train) / std_train
return x
else:
raise ValueError("Normalize method {} is not allowed".format(self.normalize_method))
df_train.loc(axis=1)[self.feature_names] = normalize(df_train[self.feature_names].values)
df_valid.loc(axis=1)[self.feature_names] = normalize(df_valid[self.feature_names].values)
df_test.loc(axis=1)[self.feature_names] = normalize(df_test[self.feature_names].values)
TimeInspector.log_cost_time("Finished normalizing data.")
return df_train, df_valid, df_test
class ConfigSectionProcessor(Processor):
def __init__(self, feature_names, label_names, **kwargs):
super().__init__(feature_names, label_names)
# Options
self.fillna_feature = kwargs.get("fillna_feature", True)
self.fillna_label = kwargs.get("fillna_label", True)
self.clip_feature_outlier = kwargs.get("clip_feature_outlier", False)
self.shrink_feature_outlier = kwargs.get("shrink_feature_outlier", True)
self.clip_label_outlier = kwargs.get("clip_label_outlier", False)
def __call__(self, *args):
return [self._transform(x) for x in args]
def _transform(self, df):
def _label_norm(x):
x = x - x.mean() # copy
x /= x.std()
if self.clip_label_outlier:
x.clip(-3, 3, inplace=True)
if self.fillna_label:
x.fillna(0, inplace=True)
return x
def _feature_norm(x):
x = x - x.median() # copy
x /= x.abs().median() * 1.4826
if self.clip_feature_outlier:
x.clip(-3, 3, inplace=True)
if self.shrink_feature_outlier:
x.where(x <= 3, 3 + (x - 3).div(x.max() - 3) * 0.5, inplace=True)
x.where(x >= -3, -3 - (x + 3).div(x.min() + 3) * 0.5, inplace=True)
if self.fillna_feature:
x.fillna(0, inplace=True)
return x
TimeInspector.set_time_mark()
# Copy
df_new = df.copy()
# Label
cols = df.columns[df.columns.str.contains("^LABEL")]
df_new[cols] = df[cols].groupby(level="datetime").apply(_label_norm)
# Features
cols = df.columns[df.columns.str.contains("^KLEN|^KLOW|^KUP")]
df_new[cols] = df[cols].apply(lambda x: x ** 0.25).groupby(level="datetime").apply(_feature_norm)
cols = df.columns[df.columns.str.contains("^KLOW2|^KUP2")]
df_new[cols] = df[cols].apply(lambda x: x ** 0.5).groupby(level="datetime").apply(_feature_norm)
_cols = [
"KMID",
"KSFT",
"OPEN",
"HIGH",
"LOW",
"CLOSE",
"VWAP",
"ROC",
"MA",
"BETA",
"RESI",
"QTLU",
"QTLD",
"RSV",
"SUMP",
"SUMN",
"SUMD",
"VSUMP",
"VSUMN",
"VSUMD",
]
pat = "|".join(["^" + x for x in _cols])
cols = df.columns[df.columns.str.contains(pat) & (~df.columns.isin(["HIGH0", "LOW0"]))]
df_new[cols] = df[cols].groupby(level="datetime").apply(_feature_norm)
cols = df.columns[df.columns.str.contains("^STD|^VOLUME|^VMA|^VSTD")]
df_new[cols] = df[cols].apply(np.log).groupby(level="datetime").apply(_feature_norm)
cols = df.columns[df.columns.str.contains("^RSQR")]
df_new[cols] = df[cols].fillna(0).groupby(level="datetime").apply(_feature_norm)
cols = df.columns[df.columns.str.contains("^MAX|^HIGH0")]
df_new[cols] = df[cols].apply(lambda x: (x - 1) ** 0.5).groupby(level="datetime").apply(_feature_norm)
cols = df.columns[df.columns.str.contains("^MIN|^LOW0")]
df_new[cols] = df[cols].apply(lambda x: (1 - x) ** 0.5).groupby(level="datetime").apply(_feature_norm)
cols = df.columns[df.columns.str.contains("^CORR|^CORD")]
df_new[cols] = df[cols].apply(np.exp).groupby(level="datetime").apply(_feature_norm)
cols = df.columns[df.columns.str.contains("^WVMA")]
df_new[cols] = df[cols].apply(np.log1p).groupby(level="datetime").apply(_feature_norm)
TimeInspector.log_cost_time("Finished preprocessing data.")
return df_new

View File

@@ -2,12 +2,12 @@
# Licensed under the MIT License.
import logging
import logging.handlers
import os
import re
import logging
from time import time
import logging.handlers
from logging import config as logging_config
from time import time
from .config import C

View File

@@ -13,7 +13,7 @@ import qlib
from qlib.config import REG_CN
from qlib.utils import drop_nan_by_y_index
from qlib.contrib.model.gbdt import LGBModel
from qlib.contrib.estimator.handler import Alpha158
from qlib.contrib.data.handler import Alpha158
from qlib.contrib.strategy.strategy import TopkDropoutStrategy
from qlib.contrib.evaluate import (
backtest as normal_backtest,