split code into core and contrib for data&model

2026-07-06 04:20:57 +08:00 · 2020-10-12 02:12:27 +00:00
parent 77e2f25f7b
commit d4091a8711
20 changed files with 346 additions and 104 deletions
--- a/README.md
+++ b/README.md
@@ -195,8 +195,8 @@ Your PR of new Quant models is highly welcomed.

 # Quant Dataset Zoo
 Dataset plays a very important role in Quant. Here is a list of the datasets built on `Qlib`.
- [Alpha360](./qlib/contrib/estimator/handler.py)
- [Alpha158](./qlib/contrib/estimator/handler.py)
+- [Alpha360](./qlib/contrib/data/handler.py)
+- [Alpha158](./qlib/contrib/data/handler.py)

 [Here](https://qlib.readthedocs.io/en/latest/advanced/alpha.html) is a tutorial to build dataset with `Qlib`.
 Your PR to build new Quant dataset is highly welcomed.
--- a/docs/advanced/alpha.rst
+++ b/docs/advanced/alpha.rst
@@ -49,7 +49,7 @@ Users can use ``Data Handler`` to build formulaic alphas `MACD` in qlib:

 .. code-block:: python

-    >> from qlib.contrib.estimator.handler import QLibDataHandler
+    >> from qlib.data.dataset.handler import QLibDataHandler
    >> MACD_EXP = '(EMA($close, 12) - EMA($close, 26))/$close - EMA((EMA($close, 12) - EMA($close, 26))/$close, 9)/$close'
    >> fields = [MACD_EXP] # MACD
    >> names = ['MACD']
--- a/docs/component/data.rst
+++ b/docs/component/data.rst
@@ -156,12 +156,12 @@ Data Handler

 Users can use ``Data Handler`` in an automatic workflow by ``Estimator``, refer to `Estimator: Workflow Management <estimator.html>`_ for more details. 

-Also, ``Data Handler`` can be used as an independent module, by which users can easily preprocess data(standardization, remove NaN, etc.) and build datasets. It is a subclass of ``qlib.contrib.estimator.handler.BaseDataHandler``, which provides some interfaces as follows.
+Also, ``Data Handler`` can be used as an independent module, by which users can easily preprocess data(standardization, remove NaN, etc.) and build datasets. It is a subclass of ``qlib.data.dataset.handler.BaseDataHandler``, which provides some interfaces as follows.

 Base Class & Interface
 ----------------------

-Qlib provides a base class `qlib.contrib.estimator.BaseDataHandler <../reference/api.html#qlib.contrib.estimator.handler.BaseDataHandler>`_, which provides the following interfaces:
+Qlib provides a base class `qlib.data.dataset.BaseDataHandler <../reference/api.html#qlib.data.dataset.handler.BaseDataHandler>`_, which provides the following interfaces:

 - `setup_feature`    
    Implement the interface to load the data features.
@@ -182,7 +182,7 @@ Qlib also provides two functions to help users init the data handler, users can
    Users can init the raw df, feature names, and label names of data handler in this function. 
    If the index of feature df and label df are not the same, users need to override this method to merge them (e.g. inner, left, right merge).

-If users want to load features and labels by config, users can inherit ``qlib.contrib.estimator.handler.ConfigDataHandler``, ``Qlib`` also provides some preprocess method in this subclass.
+If users want to load features and labels by config, users can inherit ``qlib.data.dataset.handler.ConfigDataHandler``, ``Qlib`` also provides some preprocess method in this subclass.
 If users want to use qlib data, `QLibDataHandler` is recommended. Users can inherit their custom class from `QLibDataHandler`, which is also a subclass of `ConfigDataHandler`.


@@ -214,7 +214,7 @@ Qlib provides implemented data handler `Alpha158`. The following example shows h

 .. code-block:: Python

-    from qlib.contrib.estimator.handler import Alpha158
+    from qlib.contrib.data.handler import Alpha158
    from qlib.contrib.model.gbdt import LGBModel

    DATA_HANDLER_CONFIG = {
@@ -251,7 +251,7 @@ Also, the above example has been given in ``examples.estimator.train_backtest_an
 API
 ---------

-To know more about ``Data Handler``, please refer to `Data Handler API <../reference/api.html#module-qlib.contrib.estimator.handler>`_.
+To know more about ``Data Handler``, please refer to `Data Handler API <../reference/api.html#module-qlib.data.dataset.handler>`_.

 Cache
 ==========
--- a/docs/component/estimator.rst
+++ b/docs/component/estimator.rst
@@ -266,7 +266,7 @@ Users can use a specified model by configuration with hyper-parameters.
 Custom Models
 ~~~~~~~~~~~~~~~~~

-Qlib supports custom models, but it must be a subclass of the `qlib.contrib.model.Model`, the config for a custom model may be as following.
+Qlib supports custom models, but it must be a subclass of the `qlib.model.Model`, the config for a custom model may be as following.

 .. code-block:: YAML

@@ -284,7 +284,7 @@ To know more about ``Interday Model``, please refer to `Interday Model: Training
 Data Section
 -----------------

-``Data Handler`` can be used to load raw data, prepare features and label columns, preprocess data (standardization, remove NaN, etc.), split training, validation, and test sets. It is a subclass of `qlib.contrib.estimator.handler.BaseDataHandler`.
+``Data Handler`` can be used to load raw data, prepare features and label columns, preprocess data (standardization, remove NaN, etc.), split training, validation, and test sets. It is a subclass of `qlib.data.dataset.handler.BaseDataHandler`.

 Users can use the specified data handler by config as follows.

@@ -315,10 +315,10 @@ Users can use the specified data handler by config as follows.
                  fend_time: 2018-12-11

 - `class`    
-    Data handler class, str type, which should be a subclass of `qlib.contrib.estimator.handler.BaseDataHandler`, and implements 5 important interfaces for loading features, loading raw data, preprocessing raw data, slicing train, validation, and test data. The default value is `ALPHA360`. If users want to write a data handler to retrieve the data in ``Qlib``, `QlibDataHandler` is suggested.
+    Data handler class, str type, which should be a subclass of `qlib.data.dataset.handler.BaseDataHandler`, and implements 5 important interfaces for loading features, loading raw data, preprocessing raw data, slicing train, validation, and test data. The default value is `ALPHA360`. If users want to write a data handler to retrieve the data in ``Qlib``, `QlibDataHandler` is suggested.

 - `module_path`    
-   The module path, str type, absolute url is also supported, indicates the path of the `class` implementation of the data processor class. The default value is `qlib.contrib.estimator.handler`.
+   The module path, str type, absolute url is also supported, indicates the path of the `class` implementation of the data processor class. The default value is `qlib.data.dataset.handler`.

 - `args`
    Parameters used for ``Data Handler`` initialization.
@@ -376,7 +376,7 @@ Qlib support custom data handler, but it must be a subclass of the ``qlib.contri

 The class `SomeDataHandler` should be in the module `custom_data_handler`, and ``Qlib`` could parse the `module_path` to load the class.

-If users want to load features and labels by config, they can inherit ``qlib.contrib.estimator.handler.ConfigDataHandler``, ``Qlib`` also has provided some preprocess methods in this subclass.
+If users want to load features and labels by config, they can inherit ``qlib.data.dataset.handler.ConfigDataHandler``, ``Qlib`` also has provided some preprocess methods in this subclass.
 If users want to use qlib data, `QLibDataHandler` is recommended, from which users can inherit the custom class. `QLibDataHandler` is also a subclass of `ConfigDataHandler`.

 To know more about ``Data Handler``, please refer to `Data Framework&Usage <data.html>`_.
--- a/docs/component/model.rst
+++ b/docs/component/model.rst
@@ -13,7 +13,7 @@ Because the components in ``Qlib`` are designed in a loosely-coupled way, ``Inte
 Base Class & Interface
 ======================

-``Qlib`` provides a base class `qlib.contrib.model.base.Model <../reference/api.html#module-qlib.contrib.model.base>`_ from which all models should inherit.
+``Qlib`` provides a base class `qlib.model.base.Model <../reference/api.html#module-qlib.model.base>`_ from which all models should inherit.

 The base class provides the following interfaces:

@@ -110,7 +110,7 @@ The base class provides the following interfaces:
            The format of `w_test` is same as `w_train` in `fit` method.
    - Return: float type, evaluation score

-For other interfaces such as `save`, `load`, `finetune`, please refer to `Model API <../reference/api.html#module-qlib.contrib.model.base>`_.
+For other interfaces such as `save`, `load`, `finetune`, please refer to `Model API <../reference/api.html#module-qlib.model.base>`_.

 Example
 ==================
@@ -121,7 +121,7 @@ Example
 - Run the following code to get the `prediction score` `pred_score`
    .. code-block:: Python

-        from qlib.contrib.estimator.handler import Alpha158
+        from qlib.contrib.data.handler import Alpha158
        from qlib.contrib.model.gbdt import LGBModel

        DATA_HANDLER_CONFIG = {
@@ -175,4 +175,4 @@ Qlib supports custom models. If users are interested in customizing their own mo

 API
 ===================
-Please refer to `Model API <../reference/api.html#module-qlib.contrib.model.base>`_.
+Please refer to `Model API <../reference/api.html#module-qlib.model.base>`_.
--- a/docs/reference/api.rst
+++ b/docs/reference/api.rst
@@ -63,12 +63,12 @@ Contrib

 Data Handler
 ---------------
-.. automodule:: qlib.contrib.estimator.handler
+.. automodule:: qlib.data.dataset.handler
    :members:

 Model
 --------------------
-.. automodule:: qlib.contrib.model.base
+.. automodule:: qlib.model.base
    :members:

 Strategy
--- a/docs/start/integration.rst
+++ b/docs/start/integration.rst
@@ -9,13 +9,13 @@ Introduction

 Users can integrate their own custom models according to the following steps.

- Define a custom model class, which should be a subclass of the `qlib.contrib.model.base.Model <../reference/api.html#module-qlib.contrib.model.base>`_.
+- Define a custom model class, which should be a subclass of the `qlib.model.base.Model <../reference/api.html#module-qlib.model.base>`_.
 - Write a configuration file that describes the path and parameters of the custom model.
 - Test the custom model.

 Custom Model Class
 ===========================
-The Custom models need to inherit `qlib.contrib.model.base.Model <../reference/api.html#module-qlib.contrib.model.base>`_ and override the methods in it.
+The Custom models need to inherit `qlib.model.base.Model <../reference/api.html#module-qlib.model.base>`_ and override the methods in it.

 - Override the `__init__` method
    - ``Qlib`` passes the initialized parameters to the \_\_init\_\_ method.
@@ -63,7 +63,7 @@ The Custom models need to inherit `qlib.contrib.model.base.Model <../reference/a
 - Override the `predict` method
    - The parameters include the test features.
    - Return the `prediction score`.
-    - Please refer to `Model API <../reference/api.html#module-qlib.contrib.model.base>`_ for the parameter types of the fit method.
+    - Please refer to `Model API <../reference/api.html#module-qlib.model.base>`_ for the parameter types of the fit method.
    - Code Example: In the following example, users need to use dnn to predict the label(such as `preds`) of test data `x_test` and return it.
    .. code-block:: Python

@@ -143,4 +143,4 @@ Also, ``Model`` can also be tested as a single module. An example has been given
 Reference
 =====================

-To know more about ``Interday Model``, please refer to `Interday Model: Model Training & Prediction <../component/model.html>`_ and `Model API <../reference/api.html#module-qlib.contrib.model.base>`_.
+To know more about ``Interday Model``, please refer to `Interday Model: Model Training & Prediction <../component/model.html>`_ and `Model API <../reference/api.html#module-qlib.model.base>`_.
--- a/examples/estimator/estimator_config.yaml
+++ b/examples/estimator/estimator_config.yaml
@@ -5,7 +5,7 @@ experiment:

 model:
  class: LGBModel
-  module_path: qlib.contrib.model.gbdt
+  module_path: qlib.gbdt.model.gbdt
  args:
    loss: mse
    colsample_bytree: 0.8879
--- a/examples/estimator/estimator_config_dnn.yaml
+++ b/examples/estimator/estimator_config_dnn.yaml
@@ -4,7 +4,7 @@ experiment:
  mode: train

 model:
-    module_path: qlib.contrib.model.pytorch_nn
+    module_path: qlib.model.pytorch_nn
    class: DNNModelPytorch
    args:
        loss: mse
--- a/examples/train_and_backtest.py
+++ b/examples/train_and_backtest.py
@@ -8,7 +8,7 @@ import qlib
 import pandas as pd
 from qlib.config import REG_CN
 from qlib.contrib.model.gbdt import LGBModel
-from qlib.contrib.estimator.handler import Alpha158
+from qlib.contrib.data.handler import Alpha158
 from qlib.contrib.strategy.strategy import TopkDropoutStrategy
 from qlib.contrib.evaluate import (
    backtest as normal_backtest,
--- a/qlib/contrib/data/handler.py
+++ b/qlib/contrib/data/handler.py
@@ -0,0 +1,63 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+from ...data.dataset.handler import ConfigQLibDataHandler
+from ...log import TimeInspector
+
+
+class ALPHA360(ConfigQLibDataHandler):
+    config_template = {
+        "price": {"windows": range(60)},
+        "volume": {"windows": range(60)},
+    }
+
+
+class QLibDataHandlerV1(ConfigQLibDataHandler):
+    config_template = {
+        "kbar": {},
+        "price": {
+            "windows": [0],
+            "feature": ["OPEN", "HIGH", "LOW", "VWAP"],
+        },
+        "rolling": {},
+    }
+
+    def __init__(self, start_date, end_date, processors=None, **kwargs):
+        if processors is None:
+            processors = ["PanelProcessor"]  # V1 default processor
+        super().__init__(start_date, end_date, processors, **kwargs)
+
+    def setup_label(self):
+        """
+        load the labels df
+        :return:  df_labels
+        """
+        TimeInspector.set_time_mark()
+
+        df_labels = super().setup_label()
+
+        ## calculate new labels
+        df_labels["LABEL1"] = df_labels["LABEL0"].groupby(level="datetime").apply(lambda x: (x - x.mean()) / x.std())
+
+        df_labels = df_labels.drop(["LABEL0"], axis=1)
+
+        TimeInspector.log_cost_time("Finished loading labels.")
+
+        return df_labels
+
+
+class Alpha158(QLibDataHandlerV1):
+    config_template = {
+        "kbar": {},
+        "price": {
+            "windows": [0],
+            "feature": ["OPEN", "HIGH", "LOW", "CLOSE"],
+        },
+        "rolling": {},
+    }
+
+    def _init_kwargs(self, **kwargs):
+        kwargs["labels"] = ["Ref($close, -2)/Ref($close, -1) - 1"]
+        super(Alpha158, self)._init_kwargs(**kwargs)
+
+
--- a/qlib/contrib/estimator/config.py
+++ b/qlib/contrib/estimator/config.py
@@ -103,7 +103,7 @@ class DataConfig(object):
        :param config:         The config dict for data
        :param CONFIG_MANAGER: The estimator config manager
        """
-        self.handler_module_path = config.get("module_path", "qlib.contrib.estimator.handler")
+        self.handler_module_path = config.get("module_path", "qlib.contrib.data.handler")
        self.handler_class = config.get("class", "ALPHA360")
        self.handler_parameters = config.get("args", dict())
        self.handler_filter = config.get("filter", dict())
@@ -118,7 +118,7 @@ class ModelConfig(object):
        :param CONFIG_MANAGER: The estimator config manager
        """
        self.model_class = config.get("class", "Model")
-        self.model_module_path = config.get("module_path", "qlib.contrib.model")
+        self.model_module_path = config.get("module_path", "qlib.model")
        self.save_dir = os.path.join(CONFIG_MANAGER.ex_config.tmp_run_dir, "model")
        self.save_path = config.get("save_path", os.path.join(self.save_dir, "model.bin"))
        self.parameters = config.get("args", dict())
--- a/qlib/contrib/model/init.py
+++ b/qlib/contrib/model/init.py
--- a/qlib/contrib/model/gbdt.py
+++ b/qlib/contrib/model/gbdt.py
@@ -9,7 +9,7 @@ import numpy as np
 import lightgbm as lgb
 from sklearn.metrics import roc_auc_score, mean_squared_error

-from .base import Model
+from ...model.base import Model
 from ...utils import drop_nan_by_y_index


--- a/qlib/contrib/model/pytorch_nn.py
+++ b/qlib/contrib/model/pytorch_nn.py
@@ -17,7 +17,7 @@ import torch
 import torch.nn as nn
 import torch.optim as optim

-from .base import Model
+from ...model.base import Model


 class DNNModelPytorch(Model):
--- a/qlib/data/dataset/init.py
+++ b/qlib/data/dataset/init.py
--- a/qlib/data/dataset/handler.py
+++ b/qlib/data/dataset/handler.py
@@ -513,73 +513,3 @@ class ConfigQLibDataHandler(QLibDataHandler):
        if "labels" not in kwargs:
            kwargs["labels"] = ["Ref($vwap, -2)/Ref($vwap, -1) - 1"]
        super()._init_kwargs(**kwargs)
-
-
-class ALPHA360(ConfigQLibDataHandler):
-    config_template = {
-        "price": {"windows": range(60)},
-        "volume": {"windows": range(60)},
-    }
-
-
-class QLibDataHandlerV1(ConfigQLibDataHandler):
-    config_template = {
-        "kbar": {},
-        "price": {
-            "windows": [0],
-            "feature": ["OPEN", "HIGH", "LOW", "VWAP"],
-        },
-        "rolling": {},
-    }
-
-    def __init__(self, start_date, end_date, processors=None, **kwargs):
-        if processors is None:
-            processors = ["PanelProcessor"]  # V1 default processor
-        super().__init__(start_date, end_date, processors, **kwargs)
-
-    def setup_label(self):
-        """
-        load the labels df
-        :return:  df_labels
-        """
-        TimeInspector.set_time_mark()
-
-        df_labels = super().setup_label()
-
-        ## calculate new labels
-        df_labels["LABEL1"] = df_labels["LABEL0"].groupby(level="datetime").apply(lambda x: (x - x.mean()) / x.std())
-
-        df_labels = df_labels.drop(["LABEL0"], axis=1)
-
-        TimeInspector.log_cost_time("Finished loading labels.")
-
-        return df_labels
-
-
-class Alpha158(QLibDataHandlerV1):
-    config_template = {
-        "kbar": {},
-        "price": {
-            "windows": [0],
-            "feature": ["OPEN", "HIGH", "LOW", "CLOSE"],
-        },
-        "rolling": {},
-    }
-
-    def _init_kwargs(self, **kwargs):
-        kwargs["labels"] = ["Ref($close, -2)/Ref($close, -1) - 1"]
-        super(Alpha158, self)._init_kwargs(**kwargs)
-
-
-# if __name__ == '__main__':
-#     import qlib
-#
-#     qlib.init()
-#
-#     handler = ALPHA80('2010-01-01', '2018-12-31')
-#     data = handler.get_split_data(
-#         pd.Timestamp('2010-01-01'), pd.Timestamp('2014-01-01'),
-#         pd.Timestamp('2015-01-01'), pd.Timestamp('2016-01-01'),
-#         pd.Timestamp('2017-01-01'), pd.Timestamp('2018-01-01'))
-#     print(data[0])
-#     data[0].to_pickle('alpha80.pkl')
--- a/qlib/data/dataset/processor.py
+++ b/qlib/data/dataset/processor.py
@@ -0,0 +1,249 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+import abc
+import numpy as np
+import pandas as pd
+
+from ...log import TimeInspector
+
+EPS = 1e-12
+
+
+class Processor(abc.ABC):
+    def __init__(self, feature_names, label_names, **kwargs):
+        self.feature_names = feature_names
+        self.label_names = label_names
+
+    @abc.abstractmethod
+    def __call__(self, df_train, df_valid, df_test):
+        pass
+
+
+class PanelProcessor(Processor):
+    """Panel Preprocessor"""
+
+    STD_NORM = "Std"
+    MINMAX_NORM = "MinMax"
+
+    def __init__(self, feature_names, label_names, **kwargs):
+        super().__init__(feature_names, label_names)
+        # Options.
+        self.dropna_label = kwargs.get("dropna_label", True)
+        self.dropna_feature = kwargs.get("dropna_feature", False)
+        self.normalize_method = kwargs.get("normalize_method", None)
+        self.replace_inf = kwargs.get("replace_inf_feature", False)
+
+    def __call__(self, df_train, df_valid, df_test):
+        """
+        Preprocess the data
+        :param df:  the dataframe to process data.
+        """
+        # Drop null labels.
+        if self.dropna_label:
+            df_train, df_valid, df_test = self._process_drop_null_label(df_train, df_valid, df_test)
+
+        # Dropna if need.
+        if self.dropna_feature:
+            df_train, df_valid, df_test = self._process_drop_null_feature(df_train, df_valid, df_test)
+
+        # replace the 'inf' with the mean the corresponding dimension
+        if self.replace_inf:
+            df_train, df_valid, df_test = self._process_replace_inf_feature(df_train, df_valid, df_test)
+
+        # normalize data in given method.
+        if self.normalize_method is not None:
+            df_train, df_valid, df_test = self._process_normalize_feature(df_train, df_valid, df_test)
+
+        return df_train, df_valid, df_test
+
+    def _process_drop_null_label(self, df_train, df_valid, df_test):
+        """
+        Drop null labels.
+        """
+        TimeInspector.set_time_mark()
+        df_train = df_train.dropna(subset=self.label_names)
+        df_valid = df_valid.dropna(subset=self.label_names)
+        # The test data's label is Unkown. They can not be seen when preprocessing
+        TimeInspector.log_cost_time("Finished dropping null labels.")
+
+        return df_train, df_valid, df_test
+
+    def _process_drop_null_feature(self, df_train, df_valid, df_test):
+        """
+        Drop data which contain null features if needed.
+        """
+        # TODO - `Pandas.dropna` is a low performance method.
+        TimeInspector.set_time_mark()
+        df_train = df_train.dropna(subset=self.feature_names)
+        df_valid = df_valid.dropna(subset=self.feature_names)
+        df_test = df_test.dropna(subset=self.feature_names)
+        TimeInspector.log_cost_time("Finished dropping nan.")
+
+        return df_train, df_valid, df_test
+
+    def _process_replace_inf_feature(self, df_train, df_valid, df_test):
+        """
+        replace the 'inf' in feature with the mean of this dimension.
+        """
+        TimeInspector.set_time_mark()
+
+        def replace_inf(data):
+            def process_inf(df):
+                for col in df.columns:
+                    df[col] = df[col].replace([np.inf, -np.inf], df[col][~np.isinf(df[col])].mean())
+                return df
+
+            data = data.groupby("datetime").apply(process_inf)
+            data.sort_index(inplace=True)
+            return data
+
+        df_train = replace_inf(df_train)
+        df_valid = replace_inf(df_valid)
+        df_test = replace_inf(df_test)
+        TimeInspector.log_cost_time("Finished replace inf.")
+
+        return df_train, df_valid, df_test
+
+    def _process_normalize_feature(self, df_train, df_valid, df_test):
+        """
+        Normalize data if needed, we provide two method now: min-max normalization and standard normalization.
+        """
+        TimeInspector.set_time_mark()
+
+        if self.normalize_method == self.MINMAX_NORM:
+            min_train = np.nanmin(df_train[self.feature_names].values, axis=0)
+            max_train = np.nanmax(df_train[self.feature_names].values, axis=0)
+            ignore = min_train == max_train
+
+            def normalize(x, min_train=min_train, max_train=max_train, ignore=ignore):
+                if (~ignore).all():
+                    return (x - min_train) / (max_train - min_train)
+                for i in range(ignore.size):
+                    if not ignore[i]:
+                        x[i] = (x[i] - min_train) / (max_train - min_train)
+                return x
+
+        elif self.normalize_method == self.STD_NORM:
+            mean_train = np.nanmean(df_train[self.feature_names].values, axis=0)
+            std_train = np.nanstd(df_train[self.feature_names].values, axis=0)
+            ignore = std_train == 0
+
+            def normalize(x, mean_train=mean_train, std_train=std_train, ignore=ignore):
+                if (~ignore).all():
+                    return (x - mean_train) / std_train
+                for i in range(ignore.size):
+                    if not ignore[i]:
+                        x[i] = (x[i] - mean_train) / std_train
+                return x
+
+        else:
+            raise ValueError("Normalize method {} is not allowed".format(self.normalize_method))
+
+        df_train.loc(axis=1)[self.feature_names] = normalize(df_train[self.feature_names].values)
+        df_valid.loc(axis=1)[self.feature_names] = normalize(df_valid[self.feature_names].values)
+        df_test.loc(axis=1)[self.feature_names] = normalize(df_test[self.feature_names].values)
+
+        TimeInspector.log_cost_time("Finished normalizing data.")
+
+        return df_train, df_valid, df_test
+
+
+class ConfigSectionProcessor(Processor):
+    def __init__(self, feature_names, label_names, **kwargs):
+        super().__init__(feature_names, label_names)
+        # Options
+        self.fillna_feature = kwargs.get("fillna_feature", True)
+        self.fillna_label = kwargs.get("fillna_label", True)
+        self.clip_feature_outlier = kwargs.get("clip_feature_outlier", False)
+        self.shrink_feature_outlier = kwargs.get("shrink_feature_outlier", True)
+        self.clip_label_outlier = kwargs.get("clip_label_outlier", False)
+
+    def __call__(self, *args):
+        return [self._transform(x) for x in args]
+
+    def _transform(self, df):
+        def _label_norm(x):
+            x = x - x.mean()  # copy
+            x /= x.std()
+            if self.clip_label_outlier:
+                x.clip(-3, 3, inplace=True)
+            if self.fillna_label:
+                x.fillna(0, inplace=True)
+            return x
+
+        def _feature_norm(x):
+            x = x - x.median()  # copy
+            x /= x.abs().median() * 1.4826
+            if self.clip_feature_outlier:
+                x.clip(-3, 3, inplace=True)
+            if self.shrink_feature_outlier:
+                x.where(x <= 3, 3 + (x - 3).div(x.max() - 3) * 0.5, inplace=True)
+                x.where(x >= -3, -3 - (x + 3).div(x.min() + 3) * 0.5, inplace=True)
+            if self.fillna_feature:
+                x.fillna(0, inplace=True)
+            return x
+
+        TimeInspector.set_time_mark()
+
+        # Copy
+        df_new = df.copy()
+
+        # Label
+        cols = df.columns[df.columns.str.contains("^LABEL")]
+        df_new[cols] = df[cols].groupby(level="datetime").apply(_label_norm)
+
+        # Features
+        cols = df.columns[df.columns.str.contains("^KLEN|^KLOW|^KUP")]
+        df_new[cols] = df[cols].apply(lambda x: x ** 0.25).groupby(level="datetime").apply(_feature_norm)
+
+        cols = df.columns[df.columns.str.contains("^KLOW2|^KUP2")]
+        df_new[cols] = df[cols].apply(lambda x: x ** 0.5).groupby(level="datetime").apply(_feature_norm)
+
+        _cols = [
+            "KMID",
+            "KSFT",
+            "OPEN",
+            "HIGH",
+            "LOW",
+            "CLOSE",
+            "VWAP",
+            "ROC",
+            "MA",
+            "BETA",
+            "RESI",
+            "QTLU",
+            "QTLD",
+            "RSV",
+            "SUMP",
+            "SUMN",
+            "SUMD",
+            "VSUMP",
+            "VSUMN",
+            "VSUMD",
+        ]
+        pat = "|".join(["^" + x for x in _cols])
+        cols = df.columns[df.columns.str.contains(pat) & (~df.columns.isin(["HIGH0", "LOW0"]))]
+        df_new[cols] = df[cols].groupby(level="datetime").apply(_feature_norm)
+
+        cols = df.columns[df.columns.str.contains("^STD|^VOLUME|^VMA|^VSTD")]
+        df_new[cols] = df[cols].apply(np.log).groupby(level="datetime").apply(_feature_norm)
+
+        cols = df.columns[df.columns.str.contains("^RSQR")]
+        df_new[cols] = df[cols].fillna(0).groupby(level="datetime").apply(_feature_norm)
+
+        cols = df.columns[df.columns.str.contains("^MAX|^HIGH0")]
+        df_new[cols] = df[cols].apply(lambda x: (x - 1) ** 0.5).groupby(level="datetime").apply(_feature_norm)
+
+        cols = df.columns[df.columns.str.contains("^MIN|^LOW0")]
+        df_new[cols] = df[cols].apply(lambda x: (1 - x) ** 0.5).groupby(level="datetime").apply(_feature_norm)
+
+        cols = df.columns[df.columns.str.contains("^CORR|^CORD")]
+        df_new[cols] = df[cols].apply(np.exp).groupby(level="datetime").apply(_feature_norm)
+
+        cols = df.columns[df.columns.str.contains("^WVMA")]
+        df_new[cols] = df[cols].apply(np.log1p).groupby(level="datetime").apply(_feature_norm)
+
+        TimeInspector.log_cost_time("Finished preprocessing data.")
+
+        return df_new
--- a/qlib/log.py
+++ b/qlib/log.py
@@ -2,12 +2,12 @@
 # Licensed under the MIT License.


+import logging
+import logging.handlers
 import os
 import re
-import logging
-from time import time
-import logging.handlers
 from logging import config as logging_config
+from time import time

 from .config import C

--- a/tests/test_all_pipeline.py
+++ b/tests/test_all_pipeline.py
@@ -13,7 +13,7 @@ import qlib
 from qlib.config import REG_CN
 from qlib.utils import drop_nan_by_y_index
 from qlib.contrib.model.gbdt import LGBModel
-from qlib.contrib.estimator.handler import Alpha158
+from qlib.contrib.data.handler import Alpha158
 from qlib.contrib.strategy.strategy import TopkDropoutStrategy
 from qlib.contrib.evaluate import (
    backtest as normal_backtest,