merge all commit

Optimize prompt for entire learn loop (#1589 )
* Adjust prompt and fix cases * adjust summarizeTask & learn prompts; * fix typos & drop duplicate task method; * adjust learn prompts;
2026-06-29 00:51:19 +08:00 · 2023-07-13 16:29:44 +08:00 · 2023-07-11 18:13:52 +08:00 · 2023-07-06 11:39:36 +08:00 · 2023-07-04 20:28:08 +08:00 · 2023-07-04 16:51:51 +08:00
79 changed files with 4214 additions and 340 deletions
--- a/.github/workflows/test_qlib_from_pip.yml
+++ b/.github/workflows/test_qlib_from_pip.yml
@@ -13,7 +13,7 @@ jobs:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
-        os: [windows-latest, ubuntu-18.04, ubuntu-20.04, macos-11, macos-latest]
+        os: [windows-latest, ubuntu-20.04, ubuntu-22.04, macos-11, macos-latest]
        # not supporting 3.6 due to annotations is not supported https://stackoverflow.com/a/52890129
        python-version: [3.7, 3.8]

--- a/.github/workflows/test_qlib_from_source.yml
+++ b/.github/workflows/test_qlib_from_source.yml
@@ -14,7 +14,7 @@ jobs:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
-        os: [windows-latest, ubuntu-18.04, ubuntu-20.04, macos-11, macos-latest]
+        os: [windows-latest, ubuntu-20.04, ubuntu-22.04, macos-11, macos-latest]
        # not supporting 3.6 due to annotations is not supported https://stackoverflow.com/a/52890129
        python-version: [3.7, 3.8]

@@ -28,8 +28,10 @@ jobs:
        python-version: ${{ matrix.python-version }}

    - name: Update pip to the latest version
+      # pip release version 23.1 on Apr.15 2023, CI failed to run, Please refer to #1495 ofr detailed logs.
+      # The pip version has been temporarily fixed to 23.0.1
      run: |
-        python -m pip install --upgrade pip
+        python -m pip install pip==23.0.1

    - name: Installing pytorch for macos
      if: ${{ matrix.os == 'macos-11' || matrix.os == 'macos-latest' }}
@@ -37,15 +39,13 @@ jobs:
        python -m pip install torch torchvision torchaudio

    - name: Installing pytorch for ubuntu
-      if: ${{ matrix.os == 'ubuntu-18.04' || matrix.os == 'ubuntu-20.04' }}
+      if: ${{ matrix.os == 'ubuntu-20.04' || matrix.os == 'ubuntu-22.04' }}
      run: |
-        python -m pip install --upgrade pip
        python -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu

    - name: Installing pytorch for windows
      if: ${{ matrix.os == 'windows-latest' }}
      run: |
-        python -m pip install --upgrade pip
        python -m pip install torch torchvision torchaudio

    - name: Set up Python tools
--- a/.github/workflows/test_qlib_from_source_slow.yml
+++ b/.github/workflows/test_qlib_from_source_slow.yml
@@ -14,7 +14,7 @@ jobs:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
-        os: [windows-latest, ubuntu-18.04, ubuntu-20.04, macos-11, macos-latest]
+        os: [windows-latest, ubuntu-20.04, ubuntu-22.04, macos-11, macos-latest]
        # not supporting 3.6 due to annotations is not supported https://stackoverflow.com/a/52890129
        python-version: [3.7, 3.8]

@@ -28,9 +28,10 @@ jobs:
        python-version: ${{ matrix.python-version }}

    - name: Set up Python tools
+      # pip release version 23.1 on Apr.15 2023, CI failed to run, Please refer to #1495 ofr detailed logs.
+      # The pip version has been temporarily fixed to 23.0.1
      run: |
-        python -m pip install --upgrade pip
-        # python -m pip is necessary to upgrade pip.
+        python -m pip install pip==23.0.1
        pip install --upgrade cython numpy
        pip install -e .[dev]

--- a/.gitignore
+++ b/.gitignore
@@ -22,6 +22,7 @@ dist/
 qlib/VERSION.txt
 qlib/data/_libs/expanding.cpp
 qlib/data/_libs/rolling.cpp
+qlib/finco/prompt_cache.json
 examples/estimator/estimator_example/
 examples/rl/data/
 examples/rl/checkpoints/
--- a/README.md
+++ b/README.md
@@ -42,13 +42,11 @@ Features released before 2021 are not listed here.
  <img src="http://fintech.msra.cn/images_v070/logo/1.png" />
 </p>

+Qlib is an open-source, AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms, including supervised learning, market dynamics modeling, and reinforcement learning.

-Qlib is an AI-oriented quantitative investment platform, which aims to realize the potential, empower the research, and create the value of AI technologies in quantitative investment.
+An increasing number of SOTA Quant research works/papers in diverse paradigms are being released in Qlib to collaboratively solve key challenges in quantitative investment. For example, 1) using supervised learning to mine the market's complex non-linear patterns from rich and heterogeneous financial data, 2) modeling the dynamic nature of the financial market using adaptive concept drift technology, and 3) using reinforcement learning to model continuous investment decisions and assist investors in optimizing their trading strategies.

 It contains the full ML pipeline of data processing, model training, back-testing; and covers the entire chain of quantitative investment: alpha seeking, risk modeling, portfolio optimization, and order execution. 
-
-With Qlib, users can easily try ideas to create better Quant investment strategies.
-
 For more details, please refer to our paper ["Qlib: An AI-oriented Quantitative Investment Platform"](https://arxiv.org/abs/2009.11189).


--- a/examples/benchmarks/MLP/workflow_config_mlp_Alpha158.yaml
+++ b/examples/benchmarks/MLP/workflow_config_mlp_Alpha158.yaml
@@ -64,8 +64,6 @@ task:
        kwargs:
            loss: mse
            lr: 0.002
-            lr_decay: 0.96
-            lr_decay_steps: 100
            optimizer: adam
            max_steps: 8000
            batch_size: 8192
--- a/examples/benchmarks/MLP/workflow_config_mlp_Alpha158_csi500.yaml
+++ b/examples/benchmarks/MLP/workflow_config_mlp_Alpha158_csi500.yaml
@@ -64,8 +64,6 @@ task:
        kwargs:
            loss: mse
            lr: 0.002
-            lr_decay: 0.96
-            lr_decay_steps: 100
            optimizer: adam
            max_steps: 8000
            batch_size: 8192
--- a/examples/benchmarks/MLP/workflow_config_mlp_Alpha360.yaml
+++ b/examples/benchmarks/MLP/workflow_config_mlp_Alpha360.yaml
@@ -52,8 +52,6 @@ task:
        kwargs:
            loss: mse
            lr: 0.002
-            lr_decay: 0.96
-            lr_decay_steps: 100
            optimizer: adam
            max_steps: 8000
            batch_size: 4096
--- a/examples/benchmarks/MLP/workflow_config_mlp_Alpha360_csi500.yaml
+++ b/examples/benchmarks/MLP/workflow_config_mlp_Alpha360_csi500.yaml
@@ -52,8 +52,6 @@ task:
        kwargs:
            loss: mse
            lr: 0.002
-            lr_decay: 0.96
-            lr_decay_steps: 100
            optimizer: adam
            max_steps: 8000
            batch_size: 4096
--- a/examples/benchmarks_dynamic/DDG-DA/vis_data.py
+++ b/examples/benchmarks_dynamic/DDG-DA/vis_data.py
@@ -0,0 +1,107 @@
+import pickle
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+import seaborn as sns
+
+sns.set(color_codes=True)
+plt.rcParams["font.sans-serif"] = "SimHei"
+plt.rcParams["axes.unicode_minus"] = False
+from tqdm.auto import tqdm
+
+# tqdm.pandas()  # for progress_apply
+# %matplotlib inline
+# %load_ext autoreload
+
+
+# # Meta Input
+
+# +
+with open("./internal_data_s20.pkl", "rb") as f:
+    data = pickle.load(f)
+
+data.data_ic_df.columns.names = ["start_date", "end_date"]
+
+data_sim = data.data_ic_df.droplevel(axis=1, level="end_date")
+
+data_sim.index.name = "test datetime"
+# -
+
+plt.figure(figsize=(40, 20))
+sns.heatmap(data_sim)
+
+plt.figure(figsize=(40, 20))
+sns.heatmap(data_sim.rolling(20).mean())
+
+# # Meta Model
+
+from qlib import auto_init
+
+auto_init()
+from qlib.workflow import R
+
+exp = R.get_exp(experiment_name="DDG-DA")
+meta_rec = exp.list_recorders(rtype="list", max_results=1)[0]
+meta_m = meta_rec.load_object("model")
+
+pd.DataFrame(meta_m.tn.twm.linear.weight.detach().numpy()).T[0].plot()
+
+pd.DataFrame(meta_m.tn.twm.linear.weight.detach().numpy()).T[0].rolling(5).mean().plot()
+
+# # Meta Output
+
+# +
+with open("./tasks_s20.pkl", "rb") as f:
+    tasks = pickle.load(f)
+
+task_df = {}
+for t in tasks:
+    test_seg = t["dataset"]["kwargs"]["segments"]["test"]
+    if None not in test_seg:
+        # The last rolling is skipped.
+        task_df[test_seg] = t["reweighter"].time_weight
+task_df = pd.concat(task_df)
+
+task_df.index.names = ["OS_start", "OS_end", "IS_start", "IS_end"]
+task_df = task_df.droplevel(["OS_end", "IS_end"])
+task_df = task_df.unstack("OS_start")
+# -
+
+plt.figure(figsize=(40, 20))
+sns.heatmap(task_df.T)
+
+plt.figure(figsize=(40, 20))
+sns.heatmap(task_df.rolling(10).mean().T)
+
+# # Sub Models
+#
+# NOTE:
+# - this section assumes that the model is Linear model!!
+# - Other models does not support this analysis
+
+exp = R.get_exp(experiment_name="rolling_ds")
+
+
+def show_linear_weight(exp):
+    coef_df = {}
+    for r in exp.list_recorders("list"):
+        t = r.load_object("task")
+        if None in t["dataset"]["kwargs"]["segments"]["test"]:
+            continue
+        m = r.load_object("params.pkl")
+        coef_df[t["dataset"]["kwargs"]["segments"]["test"]] = pd.Series(m.coef_)
+
+    coef_df = pd.concat(coef_df)
+
+    coef_df.index.names = ["test_start", "test_end", "coef_idx"]
+
+    coef_df = coef_df.droplevel("test_end").unstack("coef_idx").T
+
+    plt.figure(figsize=(40, 20))
+    sns.heatmap(coef_df)
+    plt.show()
+
+
+show_linear_weight(R.get_exp(experiment_name="rolling_ds"))
+
+show_linear_weight(R.get_exp(experiment_name="rolling_models"))
--- a/examples/benchmarks_dynamic/DDG-DA/workflow.py
+++ b/examples/benchmarks_dynamic/DDG-DA/workflow.py
@@ -10,8 +10,10 @@ import pandas as pd
 import fire
 import sys
 import pickle
+from typing import Optional
 from qlib import auto_init
 from qlib.model.trainer import TrainerR
+from qlib.typehint import Literal
 from qlib.utils import init_instance_by_config
 from qlib.workflow import R
 from qlib.tests.data import GetData
@@ -30,7 +32,33 @@ class DDGDA:
    - `rm -r mlruns`
    """

-    def __init__(self, sim_task_model="linear", forecast_model="linear"):
+    def __init__(
+        self,
+        sim_task_model: Literal["linear", "gbdt"] = "linear",
+        forecast_model: Literal["linear", "gbdt"] = "linear",
+        h_path: Optional[str] = None,
+        test_end: Optional[str] = None,
+        train_start: Optional[str] = None,
+        meta_1st_train_end: Optional[str] = None,
+        task_ext_conf: Optional[dict] = None,
+        alpha: float = 0.0,
+        proxy_hd: str = "handler_proxy.pkl",
+    ):
+        """
+
+        Parameters
+        ----------
+
+        train_start: Optional[str]
+            the start datetime for data.  It is used in training start time (for both tasks & meta learing)
+        test_end: Optional[str]
+            the end datetime for data. It is used in test end time
+        meta_1st_train_end: Optional[str]
+            the datetime of training end of the first meta_task
+        alpha: float
+            Setting the L2 regularization for ridge
+            The `alpha` is only passed to MetaModelDS (it is not passed to sim_task_model currently..)
+        """
        self.step = 20
        # NOTE:
        # the horizon must match the meaning in the base task template
@@ -38,10 +66,19 @@ class DDGDA:
        self.meta_exp_name = "DDG-DA"
        self.sim_task_model = sim_task_model  # The model to capture the distribution of data.
        self.forecast_model = forecast_model  # downstream forecasting models' type
+        self.rb_kwargs = {
+            "h_path": h_path,
+            "test_end": test_end,
+            "train_start": train_start,
+            "task_ext_conf": task_ext_conf,
+        }
+        self.alpha = alpha
+        self.meta_1st_train_end = meta_1st_train_end
+        self.proxy_hd = proxy_hd

    def get_feature_importance(self):
        # this must be lightGBM, because it needs to get the feature importance
-        rb = RollingBenchmark(model_type="gbdt")
+        rb = RollingBenchmark(model_type="gbdt", **self.rb_kwargs)
        task = rb.basic_task()

        with R.start(experiment_name="feature_importance"):
@@ -69,7 +106,7 @@ class DDGDA:
        fi = self.get_feature_importance()
        col_selected = fi.nlargest(topk)

-        rb = RollingBenchmark(model_type=self.sim_task_model)
+        rb = RollingBenchmark(model_type=self.sim_task_model, **self.rb_kwargs)
        task = rb.basic_task()
        dataset = init_instance_by_config(task["dataset"])
        prep_ds = dataset.prepare(slice(None), col_set=["feature", "label"], data_key=DataHandlerLP.DK_L)
@@ -96,7 +133,7 @@ class DDGDA:
                "kwargs": {"config": DIRNAME / "fea_label_df.pkl"},
            }
        )
-        handler.to_pickle(DIRNAME / "handler_proxy.pkl", dump_all=True)
+        handler.to_pickle(DIRNAME / self.proxy_hd, dump_all=True)

    @property
    def _internal_data_path(self):
@@ -108,7 +145,7 @@ class DDGDA:
        This function will dump the input data for meta model
        """
        # According to the experiments, the choice of the model type is very important for achieving good results
-        rb = RollingBenchmark(model_type=self.sim_task_model)
+        rb = RollingBenchmark(model_type=self.sim_task_model, **self.rb_kwargs)
        sim_task = rb.basic_task()

        if self.sim_task_model == "gbdt":
@@ -122,24 +159,27 @@ class DDGDA:
        with self._internal_data_path.open("wb") as f:
            pickle.dump(internal_data, f)

-    def train_meta_model(self):
+    def train_meta_model(self, fill_method="max"):
        """
        training a meta model based on a simplified linear proxy model;
        """

        # 1) leverage the simplified proxy forecasting model to train meta model.
        # - Only the dataset part is important, in current version of meta model will integrate the
-        rb = RollingBenchmark(model_type=self.sim_task_model)
+        rb = RollingBenchmark(model_type=self.sim_task_model, **self.rb_kwargs)
        sim_task = rb.basic_task()
+        train_start = self.rb_kwargs.get("train_start", "2008-01-01")
+        train_end = "2010-12-31" if self.meta_1st_train_end is None else self.meta_1st_train_end
+        test_start = (pd.Timestamp(train_end) + pd.Timedelta(days=1)).strftime("%Y-%m-%d")
        proxy_forecast_model_task = {
            # "model": "qlib.contrib.model.linear.LinearModel",
            "dataset": {
                "class": "qlib.data.dataset.DatasetH",
                "kwargs": {
-                    "handler": f"file://{(DIRNAME / 'handler_proxy.pkl').absolute()}",
+                    "handler": f"file://{(DIRNAME / self.proxy_hd).absolute()}",
                    "segments": {
-                        "train": ("2008-01-01", "2010-12-31"),
-                        "test": ("2011-01-01", sim_task["dataset"]["kwargs"]["segments"]["test"][1]),
+                        "train": (train_start, train_end),
+                        "test": (test_start, sim_task["dataset"]["kwargs"]["segments"]["test"][1]),
                    },
                },
            },
@@ -156,7 +196,7 @@ class DDGDA:
            segments=0.62,  # keep test period consistent with the dataset yaml
            trunc_days=1 + self.horizon,
            hist_step_n=30,
-            fill_method="max",
+            fill_method=fill_method,
            rolling_ext_days=0,
        )
        # NOTE:
@@ -165,12 +205,15 @@ class DDGDA:
        # So the misalignment will not affect the effectiveness of the method.
        with self._internal_data_path.open("rb") as f:
            internal_data = pickle.load(f)
+
        md = MetaDatasetDS(exp_name=internal_data, **kwargs)

        # 3) train and logging meta model
        with R.start(experiment_name=self.meta_exp_name):
            R.log_params(**kwargs)
-            mm = MetaModelDS(step=self.step, hist_step_n=kwargs["hist_step_n"], lr=0.001, max_epoch=100, seed=43)
+            mm = MetaModelDS(
+                step=self.step, hist_step_n=kwargs["hist_step_n"], lr=0.001, max_epoch=100, seed=43, alpha=self.alpha
+            )
            mm.fit(md)
            R.save_objects(model=mm)

@@ -203,7 +246,7 @@ class DDGDA:
        hist_step_n = int(param["hist_step_n"])
        fill_method = param.get("fill_method", "max")

-        rb = RollingBenchmark(model_type=self.forecast_model)
+        rb = RollingBenchmark(model_type=self.forecast_model, **self.rb_kwargs)
        task_l = rb.create_rolling_tasks()

        # 2.2) create meta dataset for final dataset
@@ -233,13 +276,13 @@ class DDGDA:
        """
        with self._task_path.open("rb") as f:
            tasks = pickle.load(f)
-        rb = RollingBenchmark(rolling_exp="rolling_ds", model_type=self.forecast_model)
+        rb = RollingBenchmark(rolling_exp="rolling_ds", model_type=self.forecast_model, **self.rb_kwargs)
        rb.train_rolling_tasks(tasks)
        rb.ens_rolling()
        rb.update_rolling_rec()

    def run_all(self):
-        # 1) file: handler_proxy.pkl
+        # 1) file: handler_proxy.pkl (self.proxy_hd)
        self.dump_data_for_proxy_model()
        # 2)
        # file: internal_data_s20.pkl
--- a/examples/benchmarks_dynamic/baseline/rolling_benchmark.py
+++ b/examples/benchmarks_dynamic/baseline/rolling_benchmark.py
@@ -1,13 +1,17 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT License.
+from typing import Optional
 from qlib.model.ens.ensemble import RollingEnsemble
 from qlib.utils import init_instance_by_config
 import fire
 import yaml
+import pandas as pd
 from qlib import auto_init
 from pathlib import Path
 from tqdm.auto import tqdm
 from qlib.model.trainer import TrainerR
+from qlib.log import get_module_logger
+from qlib.utils.data import update_config
 from qlib.workflow import R
 from qlib.tests.data import GetData

@@ -25,11 +29,40 @@ class RollingBenchmark:

    """

-    def __init__(self, rolling_exp="rolling_models", model_type="linear") -> None:
+    def __init__(
+        self,
+        rolling_exp: str = "rolling_models",
+        model_type: str = "linear",
+        h_path: Optional[str] = None,
+        train_start: Optional[str] = None,
+        test_end: Optional[str] = None,
+        task_ext_conf: Optional[dict] = None,
+    ) -> None:
+        """
+        Parameters
+        ----------
+        rolling_exp : str
+            The name for the experiments for rolling
+        model_type : str
+            The model to be boosted.
+        h_path : Optional[str]
+            the dumped data handler;
+        test_end : Optional[str]
+            the test end for the data. It is typically used together with the handler
+        train_start : Optional[str]
+            the train start for the data.  It is typically used together with the handler.
+        task_ext_conf : Optional[dict]
+            some option to update the
+        """
        self.step = 20
        self.horizon = 20
        self.rolling_exp = rolling_exp
        self.model_type = model_type
+        self.h_path = h_path
+        self.train_start = train_start
+        self.test_end = test_end
+        self.logger = get_module_logger("RollingBenchmark")
+        self.task_ext_conf = task_ext_conf

    def basic_task(self):
        """For fast training rolling"""
@@ -42,6 +75,10 @@ class RollingBenchmark:
            h_path = DIRNAME / "linear_alpha158_handler_horizon{}.pkl".format(self.horizon)
        else:
            raise AssertionError("Model type is not supported!")
+
+        if self.h_path is not None:
+            h_path = Path(self.h_path)
+
        with conf_path.open("r") as f:
            conf = yaml.safe_load(f)

@@ -52,6 +89,9 @@ class RollingBenchmark:

        task = conf["task"]

+        if self.task_ext_conf is not None:
+            task = update_config(task, self.task_ext_conf)
+
        if not h_path.exists():
            h_conf = task["dataset"]["kwargs"]["handler"]
            h = init_instance_by_config(h_conf)
@@ -59,6 +99,15 @@ class RollingBenchmark:

        task["dataset"]["kwargs"]["handler"] = f"file://{h_path}"
        task["record"] = ["qlib.workflow.record_temp.SignalRecord"]
+
+        if self.train_start is not None:
+            seg = task["dataset"]["kwargs"]["segments"]["train"]
+            task["dataset"]["kwargs"]["segments"]["train"] = pd.Timestamp(self.train_start), seg[1]
+
+        if self.test_end is not None:
+            seg = task["dataset"]["kwargs"]["segments"]["test"]
+            task["dataset"]["kwargs"]["segments"]["test"] = seg[0], pd.Timestamp(self.test_end)
+        self.logger.info(task)
        return task

    def create_rolling_tasks(self):
@@ -93,7 +142,7 @@ class RollingBenchmark:
        """
        Evaluate the combined rolling results
        """
-        for rid, rec in R.list_recorders(experiment_name=self.COMB_EXP).items():
+        for _, rec in R.list_recorders(experiment_name=self.COMB_EXP).items():
            for rt_cls in SigAnaRecord, PortAnaRecord:
                rt = rt_cls(recorder=rec, skip_existing=True)
                rt.generate()
--- a/examples/rl_order_execution/README.md
+++ b/examples/rl_order_execution/README.md
@@ -14,9 +14,10 @@ python -m qlib.run.get_data qlib_data qlib_data --target_dir ./data/bin --region

 To run codes in this example, we need data in pickle format. To achieve this, run following commands (might need a few minutes to finish):

+[//]: # (TODO: Instead of dumping dataframe with different format &#40;like `_gen_dataset` and `_gen_day_dataset` in `qlib/contrib/data/highfreq_provider.py`&#41;, we encourage to implement different subclass of `Dataset` and `DataHandler`. This will keep the workflow cleaner and interfaces more consistent, and move all the complexity to the subclass.)
+
 ```
 python scripts/gen_pickle_data.py -c scripts/pickle_data_config.yml
-python scripts/collect_pickle_dataframe.py
 python scripts/gen_training_orders.py
 python scripts/merge_orders.py
 ```
@@ -27,8 +28,7 @@ When finished, the structure under `data/` should be:
 data
 ├── bin
 ├── orders
-├── pickle
-└── pickle_dataframe
+└── pickle
 ```

 ## Training
--- a/examples/rl_order_execution/exp_configs/backtest_opds.yml
+++ b/examples/rl_order_execution/exp_configs/backtest_opds.yml
@@ -1,17 +1,9 @@
 order_file: ./data/orders/test_orders.pkl
 start_time: "9:30"
 end_time: "14:54"
+data_granularity: "5min"
 qlib:
  provider_uri_5min: ./data/bin/
-  feature_root_dir: ./data/pickle/
-  feature_columns_today: [
-    "$open", "$high", "$low", "$close", "$vwap", "$bid", "$ask", "$volume",
-    "$bidV", "$bidV1", "$bidV3", "$bidV5", "$askV", "$askV1", "$askV3", "$askV5"
-  ]
-  feature_columns_yesterday: [
-    "$open_1", "$high_1", "$low_1", "$close_1", "$vwap_1", "$bid_1", "$ask_1", "$volume_1",
-    "$bidV_1", "$bidV1_1", "$bidV3_1", "$bidV5_1", "$askV_1", "$askV1_1", "$askV3_1", "$askV5_1"
-  ]
 exchange:
  limit_threshold: null
  deal_price: ["$close", "$close"]
@@ -45,10 +37,12 @@ strategies:
          data_ticks: 48
          max_step: 8
          processed_data_provider:
-            class: PickleProcessedDataProvider
+            class: HandlerProcessedDataProvider
            kwargs:
-              data_dir: ./data/pickle_dataframe/feature
-            module_path: qlib.rl.data.pickle_styled
+              data_dir: ./data/pickle/
+              feature_columns_today: ["$high", "$low", "$open", "$close", "$volume"]
+              feature_columns_yesterday: ["$high_1", "$low_1", "$open_1", "$close_1", "$volume_1"]
+            module_path: qlib.rl.data.native
        module_path: qlib.rl.order_execution.interpreter
    module_path: qlib.rl.order_execution.strategy
  30min:
--- a/examples/rl_order_execution/exp_configs/backtest_ppo.yml
+++ b/examples/rl_order_execution/exp_configs/backtest_ppo.yml
@@ -1,17 +1,9 @@
 order_file: ./data/orders/test_orders.pkl
 start_time: "9:30"
 end_time: "14:54"
+data_granularity: "5min"
 qlib:
  provider_uri_5min: ./data/bin/
-  feature_root_dir: ./data/pickle/
-  feature_columns_today: [
-    "$open", "$high", "$low", "$close", "$vwap", "$bid", "$ask", "$volume",
-    "$bidV", "$bidV1", "$bidV3", "$bidV5", "$askV", "$askV1", "$askV3", "$askV5"
-  ]
-  feature_columns_yesterday: [
-    "$open_1", "$high_1", "$low_1", "$close_1", "$vwap_1", "$bid_1", "$ask_1", "$volume_1",
-    "$bidV_1", "$bidV1_1", "$bidV3_1", "$bidV5_1", "$askV_1", "$askV1_1", "$askV3_1", "$askV5_1"
-  ]
 exchange:
  limit_threshold: null
  deal_price: ["$close", "$close"]
@@ -45,10 +37,12 @@ strategies:
          data_ticks: 48
          max_step: 8
          processed_data_provider:
-            class: PickleProcessedDataProvider
+            class: HandlerProcessedDataProvider
            kwargs:
-              data_dir: ./data/pickle_dataframe/feature
-            module_path: qlib.rl.data.pickle_styled
+              data_dir: ./data/pickle/
+              feature_columns_today: ["$high", "$low", "$open", "$close", "$volume"]
+              feature_columns_yesterday: ["$high_1", "$low_1", "$open_1", "$close_1", "$volume_1"]
+            module_path: qlib.rl.data.native
        module_path: qlib.rl.order_execution.interpreter
    module_path: qlib.rl.order_execution.strategy
  30min:
--- a/examples/rl_order_execution/exp_configs/backtest_twap.yml
+++ b/examples/rl_order_execution/exp_configs/backtest_twap.yml
@@ -1,17 +1,9 @@
 order_file: ./data/orders/test_orders.pkl
 start_time: "9:30"
 end_time: "14:54"
+data_granularity: "5min"
 qlib:
  provider_uri_5min: ./data/bin/
-  feature_root_dir: ./data/pickle/
-  feature_columns_today: [
-    "$open", "$high", "$low", "$close", "$vwap", "$bid", "$ask", "$volume",
-    "$bidV", "$bidV1", "$bidV3", "$bidV5", "$askV", "$askV1", "$askV3", "$askV5"
-  ]
-  feature_columns_yesterday: [
-    "$open_1", "$high_1", "$low_1", "$close_1", "$vwap_1", "$bid_1", "$ask_1", "$volume_1",
-    "$bidV_1", "$bidV1_1", "$bidV3_1", "$bidV5_1", "$askV_1", "$askV1_1", "$askV3_1", "$askV5_1"
-  ]
 exchange:
  limit_threshold: null
  deal_price: ["$close", "$close"]
--- a/examples/rl_order_execution/exp_configs/train_opds.yml
+++ b/examples/rl_order_execution/exp_configs/train_opds.yml
@@ -3,8 +3,8 @@ simulator:
  time_per_step: 30
  vol_limit: null
 env:
-  concurrency: 48
-  parallel_mode: shmem
+  concurrency: 32
+  parallel_mode: dummy
 action_interpreter:
  class: CategoricalActionInterpreter
  kwargs:
@@ -18,10 +18,13 @@ state_interpreter:
    data_ticks: 48  # 48 = 240 min / 5 min
    max_step: 8
    processed_data_provider:
-      class: PickleProcessedDataProvider
-      module_path: qlib.rl.data.pickle_styled
+      class: HandlerProcessedDataProvider
      kwargs:
-        data_dir: ./data/pickle_dataframe/feature
+        data_dir: ./data/pickle/
+        feature_columns_today: ["$high", "$low", "$open", "$close", "$volume"]
+        feature_columns_yesterday: ["$high_1", "$low_1", "$open_1", "$close_1", "$volume_1"]
+        backtest: false
+      module_path: qlib.rl.data.native
  module_path: qlib.rl.order_execution.interpreter
 reward:
  class: PAPenaltyReward
@@ -32,7 +35,9 @@ reward:
 data:
  source:
    order_dir: ./data/orders
-    data_dir: ./data/pickle_dataframe/backtest
+    feature_root_dir: ./data/pickle/
+    feature_columns_today: ["$close0", "$volume0"]
+    feature_columns_yesterday: []
    total_time: 240
    default_start_time_index: 0
    default_end_time_index: 235
--- a/examples/rl_order_execution/exp_configs/train_ppo.yml
+++ b/examples/rl_order_execution/exp_configs/train_ppo.yml
@@ -3,8 +3,8 @@ simulator:
  time_per_step: 30
  vol_limit: null
 env:
-  concurrency: 48
-  parallel_mode: shmem
+  concurrency: 32
+  parallel_mode: dummy
 action_interpreter:
  class: CategoricalActionInterpreter
  kwargs:
@@ -18,10 +18,13 @@ state_interpreter:
    data_ticks: 48  # 48 = 240 min / 5 min
    max_step: 8
    processed_data_provider:
-      class: PickleProcessedDataProvider
-      module_path: qlib.rl.data.pickle_styled
+      class: HandlerProcessedDataProvider
      kwargs:
-        data_dir: ./data/pickle_dataframe/feature
+        data_dir: ./data/pickle/
+        feature_columns_today: ["$high", "$low", "$open", "$close", "$volume"]
+        feature_columns_yesterday: ["$high_1", "$low_1", "$open_1", "$close_1", "$volume_1"]
+        backtest: false
+      module_path: qlib.rl.data.native
  module_path: qlib.rl.order_execution.interpreter
 reward:
  class: PPOReward
@@ -33,7 +36,9 @@ reward:
 data:
  source:
    order_dir: ./data/orders
-    data_dir: ./data/pickle_dataframe/backtest
+    feature_root_dir: ./data/pickle/
+    feature_columns_today: ["$close0", "$volume0"]
+    feature_columns_yesterday: []
    total_time: 240
    default_start_time_index: 0
    default_end_time_index: 235
--- a/examples/rl_order_execution/scripts/collect_pickle_dataframe.py
+++ b/examples/rl_order_execution/scripts/collect_pickle_dataframe.py
@@ -1,26 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT License.
-
-import os
-import pickle
-import pandas as pd
-from joblib import Parallel, delayed
-
-os.makedirs(os.path.join("data", "pickle_dataframe"), exist_ok=True)
-
-
-def _collect(df: pd.DataFrame, instrument: str, tag: str) -> None:
-    cur = df[df["instrument"] == instrument].sort_values(by=["datetime"])
-    cur = cur.set_index(["instrument", "datetime", "date"])
-    pickle.dump(cur, open(os.path.join("data", "pickle_dataframe", tag, f"{instrument}.pkl"), "wb"))
-
-
-for tag in ("backtest", "feature"):
-    df = pickle.load(open(os.path.join("data", "pickle", f"{tag}.pkl"), "rb"))
-    df = pd.concat(list(df.values())).reset_index()
-    df["date"] = df["datetime"].dt.date.astype("datetime64")
-    instruments = sorted(set(df["instrument"]))
-
-    os.makedirs(os.path.join("data", "pickle_dataframe", tag), exist_ok=True)
-
-    Parallel(n_jobs=-1, verbose=10)(delayed(_collect)(df, instrument, tag) for instrument in instruments)
--- a/examples/rl_order_execution/scripts/gen_training_orders.py
+++ b/examples/rl_order_execution/scripts/gen_training_orders.py
@@ -4,17 +4,22 @@
 import os
 import numpy as np
 import pandas as pd
-from tqdm import tqdm
+
 from pathlib import Path

-DATA_PATH = Path(os.path.join("data", "pickle_dataframe", "backtest"))
+DATA_PATH = Path(os.path.join("data", "pickle", "backtest"))
 OUTPUT_PATH = Path(os.path.join("data", "orders"))


-def generate_order(stock: str, start_idx: int, end_idx: int) -> None:
-    df = pd.read_pickle(DATA_PATH / f"{stock}.pkl")
+def generate_order(stock: str, start_idx: int, end_idx: int) -> bool:
+    dataset = pd.read_pickle(DATA_PATH / f"{stock}.pkl")
+    df = dataset.handler.fetch(level=None).reset_index()
+    if len(df) == 0 or df.isnull().values.any() or min(df["$volume0"]) < 1e-5:
+        return False
+
+    df["date"] = df["datetime"].dt.date.astype("datetime64")
+    df = df.set_index(["instrument", "datetime", "date"])
    df = df.groupby("date").take(range(start_idx, end_idx)).droplevel(level=0)
-    div = df["$volume0"].rolling((end_idx - start_idx) * 60).mean().shift(1).groupby(level="date").transform("first")

    order_all = pd.DataFrame(df.groupby(level=(2, 0)).mean().dropna())
    order_all["amount"] = np.random.lognormal(-3.28, 1.14) * order_all["$volume0"]
@@ -32,11 +37,17 @@ def generate_order(stock: str, start_idx: int, end_idx: int) -> None:
        os.makedirs(path, exist_ok=True)
        if len(order) > 0:
            order.to_pickle(path / f"{stock}.pkl.target")
+    return True


 np.random.seed(1234)
 file_list = sorted(os.listdir(DATA_PATH))
 stocks = [f.replace(".pkl", "") for f in file_list]
-stocks = sorted(np.random.choice(stocks, size=100, replace=False))
-for stock in tqdm(stocks):
-    generate_order(stock, 0, 240 // 5 - 1)
+np.random.shuffle(stocks)
+
+cnt = 0
+for stock in stocks:
+    if generate_order(stock, 0, 240 // 5 - 1):
+        cnt += 1
+        if cnt == 100:
+            break
--- a/qlib/config.py
+++ b/qlib/config.py
@@ -147,6 +147,7 @@ _default_config = {
    "redis_host": "127.0.0.1",
    "redis_port": 6379,
    "redis_task_db": 1,
+    "redis_password": None,
    # This value can be reset via qlib.init
    "logging_level": logging.INFO,
    # Global configuration of qlib log
--- a/qlib/contrib/analyzer.py
+++ b/qlib/contrib/analyzer.py
@@ -0,0 +1,111 @@
+import logging
+import matplotlib.pyplot as plt
+from pathlib import Path
+import numpy as np
+
+from ..log import get_module_logger
+from ..contrib.eva.alpha import calc_ic, calc_long_short_return, calc_long_short_prec
+
+logger = get_module_logger("analysis", logging.INFO)
+
+
+class AnalyzerTemp:
+    def __init__(self, recorder, output_dir=None, **kwargs):
+        self.recorder = recorder
+        self.output_dir = Path(output_dir) if output_dir else "./"
+
+    def load(self, name: str):
+        """
+        It behaves the same as self.recorder.load_object.
+        But it is an easier interface because users don't have to care about `get_path` and `artifact_path`
+
+        Parameters
+        ----------
+        name : str
+            the name for the file to be load.
+
+        Return
+        ------
+        The stored records.
+        """
+        return self.recorder.load_object(name)
+
+    def analyse(self, **kwargs):
+        """
+        Analyse data index, distribution .etc
+
+        Parameters
+        ----------
+
+
+        Return
+        ------
+        The handled data.
+        """
+        raise NotImplementedError(f"Please implement the `analysis` method.")
+
+
+class HFAnalyzer(AnalyzerTemp):
+    """
+    This is the Signal Analysis class that generates the analysis results such as IC and IR.
+
+    default output image filename is "HFAnalyzerTable.jpeg"
+    """
+
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+
+    def analyse(self):
+        pred = self.load("pred.pkl")
+        label = self.load("label.pkl")
+
+        long_pre, short_pre = calc_long_short_prec(pred.iloc[:, 0], label.iloc[:, 0], is_alpha=True)
+        ic, ric = calc_ic(pred.iloc[:, 0], label.iloc[:, 0])
+        metrics = {
+            "IC": ic.mean(),
+            "ICIR": ic.mean() / ic.std(),
+            "Rank IC": ric.mean(),
+            "Rank ICIR": ric.mean() / ric.std(),
+            "Long precision": long_pre.mean(),
+            "Short precision": short_pre.mean(),
+        }
+
+        long_short_r, long_avg_r = calc_long_short_return(pred.iloc[:, 0], label.iloc[:, 0])
+        metrics.update(
+            {
+                "Long-Short Average Return": long_short_r.mean(),
+                "Long-Short Average Sharpe": long_short_r.mean() / long_short_r.std(),
+            }
+        )
+
+        table = [[k, v] for (k, v) in metrics.items()]
+        plt.table(cellText=table, loc="center")
+        plt.axis("off")
+        plt.savefig(self.output_dir.joinpath("HFAnalyzerTable.jpeg"))
+        plt.clf()
+
+        plt.scatter(np.arange(0, len(pred)), pred.iloc[:, 0])
+        plt.scatter(np.arange(0, len(label)), label.iloc[:, 0])
+        plt.title("HFAnalyzer")
+        plt.savefig(self.output_dir.joinpath("HFAnalyzer.jpeg"))
+        return "HFAnalyzer.jpeg"
+
+
+class SignalAnalyzer(AnalyzerTemp):
+    """
+    This is the Signal Analysis class that generates the analysis results such as IC and IR.
+
+    default output image filename is "signalAnalysis.jpeg"
+    """
+
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+
+    def analyse(self, dataset=None, **kwargs):
+        label = self.load("label.pkl")
+
+        plt.hist(label)
+        plt.title("SignalAnalyzer")
+        plt.savefig(self.output_dir.joinpath("signalAnalysis.jpeg"))
+
+        return "signalAnalysis.jpeg"
--- a/qlib/contrib/data/handler.py
+++ b/qlib/contrib/data/handler.py
@@ -1,6 +1,8 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT License.

+from typing import Optional
+from qlib.utils.data import update_config
 from ...data.dataset.handler import DataHandlerLP
 from ...data.dataset.processor import Processor
 from ...utils import get_callable_kwargs
@@ -57,12 +59,13 @@ class Alpha360(DataHandlerLP):
        fit_end_time=None,
        filter_pipe=None,
        inst_processors=None,
+        data_loader: Optional[dict] = None,
        **kwargs
    ):
        infer_processors = check_transform_proc(infer_processors, fit_start_time, fit_end_time)
        learn_processors = check_transform_proc(learn_processors, fit_start_time, fit_end_time)

-        data_loader = {
+        _data_loader = {
            "class": "QlibDataLoader",
            "kwargs": {
                "config": {
@@ -74,12 +77,14 @@ class Alpha360(DataHandlerLP):
                "inst_processors": inst_processors,
            },
        }
+        if data_loader is not None:
+            update_config(_data_loader, data_loader)

        super().__init__(
            instruments=instruments,
            start_time=start_time,
            end_time=end_time,
-            data_loader=data_loader,
+            data_loader=_data_loader,
            learn_processors=learn_processors,
            infer_processors=infer_processors,
            **kwargs
@@ -153,12 +158,13 @@ class Alpha158(DataHandlerLP):
        process_type=DataHandlerLP.PTYPE_A,
        filter_pipe=None,
        inst_processors=None,
+        data_loader: Optional[dict] = None,
        **kwargs
    ):
        infer_processors = check_transform_proc(infer_processors, fit_start_time, fit_end_time)
        learn_processors = check_transform_proc(learn_processors, fit_start_time, fit_end_time)

-        data_loader = {
+        _data_loader = {
            "class": "QlibDataLoader",
            "kwargs": {
                "config": {
@@ -170,11 +176,13 @@ class Alpha158(DataHandlerLP):
                "inst_processors": inst_processors,
            },
        }
+        if data_loader is not None:
+            update_config(_data_loader, data_loader)
        super().__init__(
            instruments=instruments,
            start_time=start_time,
            end_time=end_time,
-            data_loader=data_loader,
+            data_loader=_data_loader,
            infer_processors=infer_processors,
            learn_processors=learn_processors,
            process_type=process_type,
--- a/qlib/contrib/meta/data_selection/dataset.py
+++ b/qlib/contrib/meta/data_selection/dataset.py
@@ -55,8 +55,10 @@ class InternalData:
        # The handler is initialized for only once.
        if not trainer.has_worker():
            self.dh = init_task_handler(perf_task_tpl)
+            self.dh.config(dump_all=False)  # in some cases, the data handler are saved to disk with `dump_all=True`
        else:
            self.dh = init_instance_by_config(perf_task_tpl["dataset"]["kwargs"]["handler"])
+        assert self.dh.dump_all is False  # otherwise, it will save all the detailed data

        seg = perf_task_tpl["dataset"]["kwargs"]["segments"]

@@ -77,7 +79,7 @@ class InternalData:
            get_module_logger("Internal Data").info("the data has been initialized")
        else:
            # train new models
-            assert 0 == len(recorders), "An empty experiment is required for setup `InternalData``"
+            assert 0 == len(recorders), "An empty experiment is required for setup `InternalData`"
            trainer.train(gen_task)

        # 2) extract the similarity matrix
@@ -119,6 +121,7 @@ class MetaTaskDS(MetaTask):

    def __init__(self, task: dict, meta_info: pd.DataFrame, mode: str = MetaTask.PROC_MODE_FULL, fill_method="max"):
        """
+
        The description of the processed data

            time_perf: A array with shape  <hist_step_n * step, data pieces>  ->  data piece performance
@@ -132,6 +135,10 @@ class MetaTaskDS(MetaTask):
                   [0., 0., 0., ..., 0., 0., 1.],
                   [0., 0., 0., ..., 0., 0., 1.]])

+        Parameters
+        ----------
+        meta_info: pd.DataFrame
+            please refer to the docs of _prepare_meta_ipt for detailed explanation.
        """
        super().__init__(task, meta_info)
        self.fill_method = fill_method
@@ -180,12 +187,41 @@ class MetaTaskDS(MetaTask):
        self.processed_meta_input = data_to_tensor(self.processed_meta_input)

    def _get_processed_meta_info(self):
-        meta_info_norm = self.meta_info.sub(self.meta_info.mean(axis=1), axis=0)  # .fillna(0.)
-        if self.fill_method == "max":
-            meta_info_norm = meta_info_norm.T.fillna(
-                meta_info_norm.max(axis=1)
-            ).T  # fill it with row max to align with previous implementation
+        meta_info_norm = self.meta_info.sub(self.meta_info.mean(axis=1), axis=0)
+        if self.fill_method.startswith("max"):
+            suffix = self.fill_method.lstrip("max")
+            if suffix == "seg":
+                fill_value = {}
+                for col in meta_info_norm.columns:
+                    fill_value[col] = meta_info_norm.loc[meta_info_norm[col].isna(), :].dropna(axis=1).mean().max()
+                fill_value = pd.Series(fill_value).sort_index()
+                # The NaN Values are filled segment-wise. Below is an exampleof fill_value
+                # 2009-01-05  2009-02-06    0.145809
+                # 2009-02-09  2009-03-06    0.148005
+                # 2009-03-09  2009-04-03    0.090385
+                # 2009-04-07  2009-05-05    0.114318
+                # 2009-05-06  2009-06-04    0.119328
+                # ...
+                meta_info_norm = meta_info_norm.fillna(fill_value)
+            else:
+                if len(suffix) > 0:
+                    get_module_logger("MetaTaskDS").warning(
+                        f"fill_method={self.fill_method}; the info after can't be correctly parsed. Please check your parameters."
+                    )
+                fill_value = meta_info_norm.max(axis=1)
+                # fill it with row max to align with previous implementation
+                # This will magnify the data similarity when data is in daily freq
+
+                # the fill value corresponds to data like this
+                # It get a performance value for each day.
+                # The performance value are get from other models on this day
+                # 2009-01-16    0.276320
+                # 2009-01-19    0.280603
+                #                 ...
+                # 2011-06-27    0.203773
+                meta_info_norm = meta_info_norm.T.fillna(fill_value).T
        elif self.fill_method == "zero":
+            # It will fillna(0.0) at the end.
            pass
        else:
            raise NotImplementedError(f"This type of input is not supported")
@@ -286,7 +322,33 @@ class MetaDatasetDS(MetaTaskDataset):
                logger.warning(f"ValueError: {e}")
        assert len(self.meta_task_l) > 0, "No meta tasks found. Please check the data and setting"

-    def _prepare_meta_ipt(self, task):
+    def _prepare_meta_ipt(self, task) -> pd.DataFrame:
+        """
+        Please refer to `self.internal_data.setup` for detailed information about `self.internal_data.data_ic_df`
+
+        Indices with format below can be successfully sliced by  `ic_df.loc[:end, pd.IndexSlice[:, :end]]`
+
+               2021-06-21 2021-06-04 .. 2021-03-22 2021-03-08
+               2021-07-02 2021-06-18 .. 2021-04-02 None
+
+        Returns
+        -------
+            a pd.DataFrame with similar content below.
+            - each column corresponds to a trained model named by the training data range
+            - each row corresponds to a day of data tested by the models of the columns
+            - The rows cells that overlaps with the data used by columns are masked
+
+
+                       2009-01-05 2009-02-09 ... 2011-04-27 2011-05-26
+                       2009-02-06 2009-03-06 ... 2011-05-25 2011-06-23
+            datetime                         ...
+            2009-01-13        NaN   0.310639 ...  -0.169057   0.137792
+            2009-01-14        NaN   0.261086 ...  -0.143567   0.082581
+            ...               ...        ... ...        ...        ...
+            2011-06-30  -0.054907  -0.020219 ...  -0.023226        NaN
+            2011-07-01  -0.075762  -0.026626 ...  -0.003167        NaN
+
+        """
        ic_df = self.internal_data.data_ic_df

        segs = task["dataset"]["kwargs"]["segments"]
@@ -294,15 +356,19 @@ class MetaDatasetDS(MetaTaskDataset):
        ic_df_avail = ic_df.loc[:end, pd.IndexSlice[:, :end]]

        # meta data set focus on the **information** instead of preprocess
-        # 1) filter the future info
-        def mask_future(s):
-            """mask future information"""
-            # from qlib.utils import get_date_by_shift
+        # 1) filter the overlap info
+        def mask_overlap(s):
+            """
+            mask overlap information
+            data after self.name[end] with self.trunc_days that contains future info are also considered as overlap info
+
+            Approximately the diagnal + horizon length of data are masked.
+            """
            start, end = s.name
            end = get_date_by_shift(trading_date=end, shift=self.trunc_days - 1, future=True)
            return s.mask((s.index >= start) & (s.index <= end))

-        ic_df_avail = ic_df_avail.apply(mask_future)  # apply to each col
+        ic_df_avail = ic_df_avail.apply(mask_overlap)  # apply to each col

        # 2) filter the info with too long periods
        total_len = self.step * self.hist_step_n
--- a/qlib/contrib/meta/data_selection/model.py
+++ b/qlib/contrib/meta/data_selection/model.py
@@ -52,6 +52,7 @@ class MetaModelDS(MetaTaskModel):
        lr=0.0001,
        max_epoch=100,
        seed=43,
+        alpha=0.0,
    ):
        self.step = step
        self.hist_step_n = hist_step_n
@@ -61,6 +62,7 @@ class MetaModelDS(MetaTaskModel):
        self.lr = lr
        self.max_epoch = max_epoch
        self.fitted = False
+        self.alpha = alpha
        torch.manual_seed(seed)

    def run_epoch(self, phase, task_list, epoch, opt, loss_l, ignore_weight=False):
@@ -144,7 +146,11 @@ class MetaModelDS(MetaTaskModel):
            )  # debug: record when the test phase starts

        self.tn = PredNet(
-            step=self.step, hist_step_n=self.hist_step_n, clip_weight=self.clip_weight, clip_method=self.clip_method
+            step=self.step,
+            hist_step_n=self.hist_step_n,
+            clip_weight=self.clip_weight,
+            clip_method=self.clip_method,
+            alpha=self.alpha,
        )

        opt = optim.Adam(self.tn.parameters(), lr=self.lr)
--- a/qlib/contrib/meta/data_selection/net.py
+++ b/qlib/contrib/meta/data_selection/net.py
@@ -41,11 +41,18 @@ class TimeWeightMeta(SingleMetaBase):


 class PredNet(nn.Module):
-    def __init__(self, step, hist_step_n, clip_weight=None, clip_method="tanh"):
+    def __init__(self, step, hist_step_n, clip_weight=None, clip_method="tanh", alpha: float = 0.0):
+        """
+        Parameters
+        ----------
+        alpha : float
+            the regularization for sub model (useful when align meta model with linear submodel)
+        """
        super().__init__()
        self.step = step
        self.twm = TimeWeightMeta(hist_step_n=hist_step_n, clip_weight=clip_weight, clip_method=clip_method)
        self.init_paramters(hist_step_n)
+        self.alpha = alpha

    def get_sample_weights(self, X, time_perf, time_belong, ignore_weight=False):
        weights = torch.from_numpy(np.ones(X.shape[0])).float().to(X.device)
@@ -59,7 +66,7 @@ class PredNet(nn.Module):
        """Please refer to the docs of MetaTaskDS for the description of the variables"""
        weights = self.get_sample_weights(X, time_perf, time_belong, ignore_weight=ignore_weight)
        X_w = X.T * weights.view(1, -1)
-        theta = torch.inverse(X_w @ X) @ X_w @ y
+        theta = torch.inverse(X_w @ X + self.alpha * torch.eye(X_w.shape[0])) @ X_w @ y
        return X_test @ theta, weights

    def init_paramters(self, hist_step_n):
--- a/qlib/contrib/meta/data_selection/utils.py
+++ b/qlib/contrib/meta/data_selection/utils.py
@@ -5,6 +5,9 @@ import numpy as np
 import torch
 from torch import nn

+from qlib.constant import EPS
+from qlib.log import get_module_logger
+

 class ICLoss(nn.Module):
    def forward(self, pred, y, idx, skip_size=50):
@@ -24,6 +27,7 @@ class ICLoss(nn.Module):
                diff_point.append(i)
            prev = date
        diff_point.append(None)
+        # The lengths of diff_point will be one more larger then diff_point

        ic_all = 0.0
        skip_n = 0
@@ -34,13 +38,23 @@ class ICLoss(nn.Module):
                skip_n += 1
                continue
            y_focus = y[start_i:end_i]
+            if pred_focus.std() < EPS or y_focus.std() < EPS:
+                # These cases often happend at the end of test data.
+                # Usually caused by fillna(0.)
+                skip_n += 1
+                continue
+
            ic_day = torch.dot(
                (pred_focus - pred_focus.mean()) / np.sqrt(pred_focus.shape[0]) / pred_focus.std(),
                (y_focus - y_focus.mean()) / np.sqrt(y_focus.shape[0]) / y_focus.std(),
            )
            ic_all += ic_day
        if len(diff_point) - 1 - skip_n <= 0:
-            raise ValueError("No enough data for calculating iC")
+            raise ValueError("No enough data for calculating IC")
+        if skip_n > 0:
+            get_module_logger("ICLoss").info(
+                f"{skip_n} days are skipped due to zero std or small scale of valid samples."
+            )
        ic_mean = ic_all / (len(diff_point) - 1 - skip_n)
        return -ic_mean  # ic loss

--- a/qlib/contrib/model/linear.py
+++ b/qlib/contrib/model/linear.py
@@ -4,6 +4,7 @@
 import numpy as np
 import pandas as pd
 from typing import Text, Union
+from qlib.log import get_module_logger
 from qlib.data.dataset.weight import Reweighter
 from scipy.optimize import nnls
 from sklearn.linear_model import LinearRegression, Ridge, Lasso
@@ -29,7 +30,7 @@ class LinearModel(Model):
    RIDGE = "ridge"
    LASSO = "lasso"

-    def __init__(self, estimator="ols", alpha=0.0, fit_intercept=False):
+    def __init__(self, estimator="ols", alpha=0.0, fit_intercept=False, include_valid: bool = False):
        """
        Parameters
        ----------
@@ -39,6 +40,9 @@ class LinearModel(Model):
            l1 or l2 regularization parameter
        fit_intercept : bool
            whether fit intercept
+        include_valid: bool
+            Should the validation data be included for training?
+            The validation data should be included
        """
        assert estimator in [self.OLS, self.NNLS, self.RIDGE, self.LASSO], f"unsupported estimator `{estimator}`"
        self.estimator = estimator
@@ -49,9 +53,16 @@ class LinearModel(Model):
        self.fit_intercept = fit_intercept

        self.coef_ = None
+        self.include_valid = include_valid

    def fit(self, dataset: DatasetH, reweighter: Reweighter = None):
        df_train = dataset.prepare("train", col_set=["feature", "label"], data_key=DataHandlerLP.DK_L)
+        if self.include_valid:
+            try:
+                df_valid = dataset.prepare("valid", col_set=["feature", "label"], data_key=DataHandlerLP.DK_L)
+                df_train = pd.concat([df_train, df_valid])
+            except KeyError:
+                get_module_logger("LinearModel").info("include_valid=True, but valid does not exist")
        if df_train.empty:
            raise ValueError("Empty data from dataset, please check your dataset config.")
        if reweighter is not None:
--- a/qlib/contrib/model/pytorch_nn.py
+++ b/qlib/contrib/model/pytorch_nn.py
@@ -47,10 +47,6 @@ class DNNModelPytorch(Model):
        layer sizes
    lr : float
        learning rate
-    lr_decay : float
-        learning rate decay
-    lr_decay_steps : int
-        learning rate decay steps
    optimizer : str
        optimizer name
    GPU : int
@@ -64,8 +60,6 @@ class DNNModelPytorch(Model):
        batch_size=2000,
        early_stop_rounds=50,
        eval_steps=20,
-        lr_decay=0.96,
-        lr_decay_steps=100,
        optimizer="gd",
        loss="mse",
        GPU=0,
@@ -93,8 +87,6 @@ class DNNModelPytorch(Model):
        self.batch_size = batch_size
        self.early_stop_rounds = early_stop_rounds
        self.eval_steps = eval_steps
-        self.lr_decay = lr_decay
-        self.lr_decay_steps = lr_decay_steps
        self.optimizer = optimizer.lower()
        self.loss_type = loss
        if isinstance(GPU, str):
@@ -116,8 +108,6 @@ class DNNModelPytorch(Model):
            f"\nbatch_size : {batch_size}"
            f"\nearly_stop_rounds : {early_stop_rounds}"
            f"\neval_steps : {eval_steps}"
-            f"\nlr_decay : {lr_decay}"
-            f"\nlr_decay_steps : {lr_decay_steps}"
            f"\noptimizer : {optimizer}"
            f"\nloss_type : {loss}"
            f"\nseed : {seed}"
--- a/qlib/contrib/strategy/rule_strategy.py
+++ b/qlib/contrib/strategy/rule_strategy.py
@@ -635,7 +635,7 @@ class FileOrderStrategy(BaseStrategy):
            self.order_df = file
        else:
            with get_io_object(file) as f:
-                self.order_df = pd.read_csv(f, dtype={"datetime": np.str})
+                self.order_df = pd.read_csv(f, dtype={"datetime": str})

        self.order_df["datetime"] = self.order_df["datetime"].apply(pd.Timestamp)
        self.order_df = self.order_df.set_index(["datetime", "instrument"])
--- a/qlib/data/data.py
+++ b/qlib/data/data.py
@@ -783,7 +783,7 @@ class LocalPITProvider(PITProvider):
        index_path = C.dpm.get_data_uri() / "financial" / instrument.lower() / f"{field}.index"
        data_path = C.dpm.get_data_uri() / "financial" / instrument.lower() / f"{field}.data"
        if not (index_path.exists() and data_path.exists()):
-            raise FileNotFoundError("No file is found. Raise exception and  ")
+            raise FileNotFoundError("No file is found.")
        # NOTE: The most significant performance loss is here.
        # Does the acceleration that makes the program complicated really matters?
        # - It makes parameters of the interface complicate
@@ -797,14 +797,14 @@ class LocalPITProvider(PITProvider):
        cur_time_int = int(cur_time.year) * 10000 + int(cur_time.month) * 100 + int(cur_time.day)
        loc = np.searchsorted(data["date"], cur_time_int, side="right")
        if loc <= 0:
-            return pd.Series()
+            return pd.Series(dtype=C.pit_record_type["value"])
        last_period = data["period"][:loc].max()  # return the latest quarter
        first_period = data["period"][:loc].min()
        period_list = get_period_list(first_period, last_period, quarterly)
        if period is not None:
            # NOTE: `period` has higher priority than `start_index` & `end_index`
            if period not in period_list:
-                return pd.Series()
+                return pd.Series(dtype=C.pit_record_type["value"])
            else:
                period_list = [period]
        else:
@@ -868,7 +868,7 @@ class LocalExpressionProvider(ExpressionProvider):
        # Ensure that each column type is consistent
        # FIXME:
        # 1) The stock data is currently float. If there is other types of data, this part needs to be re-implemented.
-        # 2) The the precision should be configurable
+        # 2) The precision should be configurable
        try:
            series = series.astype(np.float32)
        except ValueError:
--- a/qlib/data/dataset/init.py
+++ b/qlib/data/dataset/init.py
@@ -417,7 +417,7 @@ class TSDataSampler:
            # NOTE: bool(np.nan) is True !!!!!!!!
            # make sure reindex comes first. Otherwise extra NaN may appear.
            flt_data = flt_data.swaplevel()
-            flt_data = flt_data.reindex(self.data_index).fillna(False).astype(np.bool)
+            flt_data = flt_data.reindex(self.data_index).fillna(False).astype(bool)
            self.flt_data = flt_data.values
            self.idx_map = self.flt_idx_map(self.flt_data, self.idx_map)
            self.data_index = self.data_index[np.where(self.flt_data)[0]]
--- a/qlib/data/dataset/handler.py
+++ b/qlib/data/dataset/handler.py
@@ -720,3 +720,26 @@ class DataHandlerLP(DataHandler):
        ]:
            setattr(new_hd, key, getattr(handler, key, None))
        return new_hd
+
+    @classmethod
+    def from_df(cls, df: pd.DataFrame) -> "DataHandlerLP":
+        """
+        Motivation:
+        - When user want to get a quick data handler.
+
+        The created data handler will have only one shared Dataframe without processors.
+        After creating the handler, user may often want to dump the handler for reuse
+        Here is a typical use case
+
+        .. code-block:: python
+
+            from qlib.data.dataset import DataHandlerLP
+            dh = DataHandlerLP.from_df(df)
+            dh.to_pickle(fname, dump_all=True)
+
+        TODO:
+        - The StaticDataLoader is quite slow. It don't have to copy the data again...
+
+        """
+        loader = data_loader_module.StaticDataLoader(df)
+        return cls(data_loader=loader)
--- a/qlib/data/dataset/utils.py
+++ b/qlib/data/dataset/utils.py
@@ -2,9 +2,8 @@
 # Licensed under the MIT License.
 from __future__ import annotations
 import pandas as pd
-from typing import Union, List
+from typing import Union, List, TYPE_CHECKING
 from qlib.utils import init_instance_by_config
-from typing import TYPE_CHECKING

 if TYPE_CHECKING:
    from qlib.data.dataset import DataHandler
@@ -121,7 +120,7 @@ def convert_index_format(df: Union[pd.DataFrame, pd.Series], level: str = "datet
    return df


-def init_task_handler(task: dict) -> Union[DataHandler, None]:
+def init_task_handler(task: dict) -> DataHandler:
    """
    initialize the handler part of the task **inplace**

@@ -142,5 +141,6 @@ def init_task_handler(task: dict) -> Union[DataHandler, None]:
    if h_conf is not None:
        handler = init_instance_by_config(h_conf, accept_types=DataHandler)
        task["dataset"]["kwargs"]["handler"] = handler
-
        return handler
+    else:
+        raise ValueError("The task does not contains a handler part.")
--- a/qlib/finco/.env.example
+++ b/qlib/finco/.env.example
@@ -0,0 +1,18 @@
+
+OPENAI_API_KEY=your_api_key
+
+# USE_AZURE=True
+# AZURE_API_BASE=your_api_base
+# AZURE_API_VERSION=your_api_version
+
+# use gpt-4 means more token but more wait time
+# MODEL=gpt-4
+# MAX_TOKENS=1600
+# MAX_RETRY=1000
+
+
+MAX_TOKENS=1600
+MAX_RETRY=120
+
+CONTINOUS_MODE=True
+DEBUG_MODE=True
--- a/qlib/finco/README.md
+++ b/qlib/finco/README.md
@@ -0,0 +1,22 @@
+# This is an experimental branch of "`FI`nancial `CO`pilot of `Qlib`"
+
+## Installation
+
+- To run this module, you need to first install Qlib following the instruction in [install-from-source](/README.md#install-from-source) or follow:
+
+```python
+python -m pip install git+https://github.com/microsoft/qlib.git@finco
+```
+
+- then you need to install other dependencies of finco:
+```python
+python -m pip install pydantic openai python-dotenv
+```
+
+## Quick run
+
+To run this module, you can start the workflow easily with one command:
+
+```sh
+cd qlib/finco; python cli.py "your prompt"
+```
--- a/qlib/finco/init.py
+++ b/qlib/finco/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+from pathlib import Path
+
+DIRNAME = Path(__file__).absolute().resolve().parent
+
+
+def get_finco_path() -> Path:
+    """
+    return the template path
+    Because the template path is located in the folder. We don't know where it is located. So __file__ for this module will be used.
+    """
+    return DIRNAME
--- a/qlib/finco/cli.py
+++ b/qlib/finco/cli.py
@@ -0,0 +1,15 @@
+import fire
+from qlib.finco.workflow import WorkflowManager
+from dotenv import load_dotenv
+from qlib import auto_init
+
+
+def main(prompt=None):
+    load_dotenv(verbose=True, override=True)
+    wm = WorkflowManager()
+    wm.run(prompt)
+
+
+if __name__ == "__main__":
+    auto_init()
+    fire.Fire(main)
--- a/qlib/finco/cli_learn.py
+++ b/qlib/finco/cli_learn.py
@@ -0,0 +1,15 @@
+import fire
+from qlib.finco.workflow import LearnManager
+from dotenv import load_dotenv
+from qlib import auto_init
+
+
+def main(prompt=None):
+    load_dotenv(verbose=True, override=True)
+    lm = LearnManager()
+    lm.run(prompt)
+
+
+if __name__ == "__main__":
+    auto_init()
+    fire.Fire(main)
--- a/qlib/finco/conf.py
+++ b/qlib/finco/conf.py
@@ -0,0 +1,32 @@
+# TODO: use pydantic for other modules in Qlib
+from pydantic import BaseSettings
+from qlib.finco.utils import SingletonBaseClass
+
+import os
+
+
+class Config(SingletonBaseClass):
+    """
+    This config is for fast demo purpose.
+    Please use BaseSettings insetead in the future
+    """
+
+    def __init__(self):
+        self.use_azure = os.getenv("USE_AZURE") == "True"
+        self.temperature = 0.5 if os.getenv("TEMPERATURE") is None else float(os.getenv("TEMPERATURE"))
+        self.max_tokens = 800 if os.getenv("MAX_TOKENS") is None else int(os.getenv("MAX_TOKENS"))
+
+        self.openai_api_key = os.getenv("OPENAI_API_KEY")
+        self.use_azure = os.getenv("USE_AZURE") == "True"
+        self.azure_api_base = os.getenv("AZURE_API_BASE")
+        self.azure_api_version = os.getenv("AZURE_API_VERSION")
+        self.model = os.getenv("MODEL") or ("gpt-35-turbo" if self.use_azure else "gpt-3.5-turbo")
+
+        self.max_retry = int(os.getenv("MAX_RETRY")) if os.getenv("MAX_RETRY") is not None else None
+
+        self.continuous_mode = (
+            os.getenv("CONTINOUS_MODE") == "True" if os.getenv("CONTINOUS_MODE") is not None else False
+        )
+        self.debug_mode = os.getenv("DEBUG_MODE") == "True" if os.getenv("DEBUG_MODE") is not None else False
+        self.workspace = os.getenv("WORKSPACE") if os.getenv("WORKSPACE") is not None else "./finco_workspace"
+        self.max_past_message_include = int(os.getenv("MAX_PAST_MESSAGE_INCLUDE") or 6) // 2 * 2
--- a/qlib/finco/knowledge.py
+++ b/qlib/finco/knowledge.py
@@ -0,0 +1,156 @@
+from pathlib import Path
+from jinja2 import Template
+from typing import List
+
+from qlib.workflow import R
+from qlib.finco.log import FinCoLog
+from qlib.finco.llm import APIBackend
+
+
+class Knowledge:
+    """
+    Use to handle knowledge in finCo such as experiment and outside domain information
+    """
+
+    def __init__(self):
+        self.logger = FinCoLog()
+
+    def load(self, **kwargs):
+        """
+        Load knowledge in memory
+
+        Parameters
+        ----------
+
+        Return
+        ------
+        """
+        raise NotImplementedError(f"Please implement the `load` method.")
+
+    def brief(self, **kwargs):
+        """
+        Return a brief summary of knowledge
+
+        Parameters
+        ----------
+
+        Return
+        ------
+        """
+        raise NotImplementedError(f"Please implement the `load` method.")
+
+
+class KnowledgeExperiment(Knowledge):
+    """
+    Handle knowledge from experiments
+    """
+
+    def __init__(self, exp_name, rec_id=None):
+        super().__init__()
+        self.exp_name = exp_name
+        self.exp = None
+        self.recs = []
+
+        self.load(exp_name=exp_name, rec_id=rec_id)
+
+    def load(self, exp_name, rec_id=None):
+        recs = []
+        self.exp = R.get_exp(experiment_name=exp_name)
+        for r in self.exp.list_recorders(rtype=self.exp.RT_L):
+            if rec_id is not None and r.id != rec_id:
+                continue
+            recs.append(r)
+        self.recs.extend(recs)
+
+    def brief(self):
+        docs = []
+        for recorder in self.recs:
+            docs.append({"exp_name": self.exp.name, "record_info": recorder.info,
+                         "config": recorder.load_object("config"),
+                         "context_summary": recorder.load_object("context_summary")})
+
+        return docs
+
+
+class Topic:
+
+    def __init__(self, name: str, describe: Template):
+        self.name = name
+        self.describe = describe
+        self.docs = []
+        self.knowledge = None
+        self.logger = FinCoLog()
+
+    def summarize(self, docs: list):
+        self.logger.info(f"Summarize topic: \nname: {self.name}\ndescribe: {self.describe.module}")
+        prompt_workflow_selection = self.describe.render(docs=docs)
+        response = APIBackend().build_messages_and_create_chat_completion(
+            user_prompt=prompt_workflow_selection
+        )
+
+        self.knowledge = response
+        self.docs = docs
+
+
+class KnowledgeBase:
+    """
+    Load knowledge, offer brief information of knowledge and common handle interfaces
+    """
+
+    def __init__(self, init_path=None, topics: List[Topic] = None):
+        self.logger = FinCoLog()
+        init_path = init_path if init_path else Path.cwd()
+
+        if not init_path.exists():
+            self.logger.warning(f"{init_path} not exist, create empty directory.")
+            Path.mkdir(init_path)
+
+        self.knowledge = self.load(path=init_path)
+
+        # todo: replace list with persistent storage strategy such as ES/pinecone to enable
+        # literal search/semantic search
+        self.docs = self.brief(knowledge=self.knowledge)
+
+        self.topics = topics if topics else []
+
+    def load(self, path) -> List:
+        if isinstance(path, str):
+            path = Path(path)
+
+        knowledge = []
+        path = path if path.name == "mlruns" else path.joinpath("mlruns")
+        R.set_uri(path.as_uri())
+        for exp_name in R.list_experiments():
+            knowledge.append(KnowledgeExperiment(exp_name=exp_name))
+
+        self.logger.plain_info(f"Load knowledge from: {path} finished.")
+        return knowledge
+
+    def update(self, path):
+        # note: only update new knowledge in future
+        knowledge = self.load(path)
+        self.knowledge = knowledge
+        self.docs = self.brief(self.knowledge)
+        self.logger.plain_info(f"Update knowledge finished.")
+
+    def brief(self, knowledge: List[Knowledge]) -> List:
+        docs = []
+        for k in knowledge:
+            docs.extend(k.brief())
+
+        self.logger.plain_info(f"Generate brief knowledge summary finished.")
+        return docs
+
+    def query(self, content: str = None):
+        # todo: query by DSL
+        return self.docs
+
+    def query_topics(self):
+        knowledge_of_topics = []
+        for topic in self.topics:
+            knowledge_of_topics.append({topic.name: topic.knowledge})
+        return knowledge_of_topics
+
+    def summarize_by_topic(self):
+        for topic in self.topics:
+            topic.summarize(self.docs)
--- a/qlib/finco/llm.py
+++ b/qlib/finco/llm.py
@@ -0,0 +1,111 @@
+import os
+import time
+import openai
+import json
+from typing import Optional
+from qlib.finco.conf import Config
+from qlib.finco.utils import SingletonBaseClass
+from qlib.finco.log import FinCoLog
+
+
+class APIBackend(SingletonBaseClass):
+    def __init__(self):
+        self.cfg = Config()
+        openai.api_key = self.cfg.openai_api_key
+        if self.cfg.use_azure:
+            openai.api_type = "azure"
+            openai.api_base = self.cfg.azure_api_base
+            openai.api_version = self.cfg.azure_api_version
+        self.use_azure = self.cfg.use_azure
+
+        self.debug_mode = False
+        if self.cfg.debug_mode:
+            self.debug_mode = True
+            cwd = os.getcwd()
+            self.cache_file_location = os.path.join(cwd, "prompt_cache.json")
+            self.cache = (
+                json.load(open(self.cache_file_location, "r")) if os.path.exists(self.cache_file_location) else {}
+            )
+
+    def build_messages_and_create_chat_completion(self, user_prompt, system_prompt=None, former_messages=[], **kwargs):
+        """build the messages to avoid implementing several redundant lines of code"""
+        cfg = Config()
+        # TODO: system prompt should always be provided. In development stage we can use default value
+        if system_prompt is None:
+            try:
+                system_prompt = cfg.system_prompt
+            except AttributeError:
+                FinCoLog().warning("system_prompt is not set, using default value.")
+                system_prompt = "You are an AI assistant who helps to answer user's questions about finance."
+        messages = [
+            {
+                "role": "system",
+                "content": system_prompt,
+            }
+        ]
+        messages.extend(former_messages[-1*cfg.max_past_message_include:])
+        messages.append(
+            {
+                "role": "user",
+                "content": user_prompt,
+            }
+        )
+        fcl = FinCoLog()
+        response = self.try_create_chat_completion(messages=messages, **kwargs)
+        fcl.log_message(messages)
+        fcl.log_response(response)
+        return response
+
+    def try_create_chat_completion(self, max_retry=10, **kwargs):
+        max_retry = self.cfg.max_retry if self.cfg.max_retry is not None else max_retry
+        for i in range(max_retry):
+            try:
+                response = self.create_chat_completion(**kwargs)
+                return response
+            except (openai.error.RateLimitError, openai.error.Timeout, openai.error.APIError) as e:
+                print(e)
+                print(f"Retrying {i+1}th time...")
+                time.sleep(1)
+                continue
+            except openai.InvalidRequestError as e:
+                print("Invalid request, will try to reduce the messages length and retry...")
+                if len(kwargs["messages"]) > 2:
+                    kwargs["messages"] = kwargs["messages"][[0]] + kwargs["messages"][3:]
+                    continue
+                raise e
+        raise Exception(f"Failed to create chat completion after {max_retry} retries.")
+
+    def create_chat_completion(
+        self,
+        messages,
+        model=None,
+        temperature: float = None,
+        max_tokens: Optional[int] = None,
+    ) -> str:
+
+        if self.debug_mode:
+            key = json.dumps(messages)
+            if key in self.cache:
+                return self.cache[key]
+
+        if temperature is None:
+            temperature = self.cfg.temperature
+        if max_tokens is None:
+            max_tokens = self.cfg.max_tokens
+
+        if self.cfg.use_azure:
+            response = openai.ChatCompletion.create(
+                engine=self.cfg.model,
+                messages=messages,
+                max_tokens=self.cfg.max_tokens,
+            )
+        else:
+            response = openai.ChatCompletion.create(
+                model=self.cfg.model,
+                messages=messages,
+            )
+        resp = response.choices[0].message["content"]
+        if self.debug_mode:
+            self.cache[key] = resp
+            json.dump(self.cache, open(self.cache_file_location, "w"))
+        return resp
--- a/qlib/finco/log.py
+++ b/qlib/finco/log.py
@@ -0,0 +1,131 @@
+"""
+This module will base on Qlib's logger module and provides some interactive functions.
+"""
+import logging
+
+from typing import Dict, List
+from qlib.finco.utils import SingletonBaseClass
+from contextlib import contextmanager
+
+
+class LogColors:
+    """
+    ANSI color codes for use in console output.
+    """
+    RED = "\033[91m"
+    GREEN = "\033[92m"
+    YELLOW = "\033[93m"
+    BLUE = "\033[94m"
+    MAGENTA = "\033[95m"
+    CYAN = "\033[96m"
+    WHITE = "\033[97m"
+    GRAY = "\033[90m"
+    BLACK = "\033[30m"
+
+    BOLD = "\033[1m"
+    ITALIC = "\033[3m"
+
+    END = "\033[0m"
+
+    @classmethod
+    def get_all_colors(cls):
+        names = dir(cls)
+        names = [name for name in names if not name.startswith("__") and not callable(getattr(cls, name))]
+        var_values = [getattr(cls, name) for name in names]
+        return var_values
+
+    def render(self, text: str, color: str = "", style: str = ""):
+        """
+        render text by input color and style. It's not recommend that input text is already rendered.
+        """
+        # This method is called too frequently, which is not good.
+        colors = self.get_all_colors()
+        # Perhaps color and font should be distinguished here.
+        if color:
+            assert color in colors, f"color should be in: {colors} but now is: {color}"
+        if style:
+            assert style in colors, f"style should be in: {colors} but now is: {style}"
+
+        text = f"{color}{text}{self.END}"
+        text = f"{style}{text}{self.END}"
+
+        return text
+
+
+@contextmanager
+def formatting_log(logger, title="Info"):
+    """
+    a context manager, print liens before and after a function
+    """
+    length = {"Start": 120, "Task": 120, "Info": 60, "Interact": 60, "End": 120}.get(title, 60)
+    color, bold = (LogColors.YELLOW, LogColors.BOLD) \
+        if title in ["Start", "Task", "Info", "Interact", "End"] else (LogColors.CYAN, "")
+    logger.info("")
+    logger.info(f"{color}{bold}{'-'} {title} {'-' * (length - len(title))}{LogColors.END}")
+    yield
+    logger.info("")
+
+
+class FinCoLog(SingletonBaseClass):
+    # TODO:
+    # - config to file logger and save it into workspace
+    def __init__(self) -> None:
+        self.logger = logging.Logger("interactive")
+        # TODO:  merge these with Qlib's default logger.
+        #  We can do the same thing by changing the default log dict of Qlib.
+        #  Reference: https://github.com/microsoft/qlib/blob/main/qlib/config.py#L155
+
+        handler = logging.StreamHandler()
+        handler.setFormatter(logging.Formatter("%(message)s"))
+        self.logger.addHandler(handler)
+        self.logger.setLevel(logging.INFO)
+
+    def log_message(self, messages: List[Dict[str, str]]):
+        """
+        messages is some info like this  [
+            {
+                "role": "system",
+                "content": system_prompt,
+            },
+            {
+                "role": "user",
+                "content": user_prompt,
+            },
+        ]
+        """
+        with formatting_log(self.logger, "GPT Messages"):
+            for m in messages:
+                self.logger.info(
+                    f"{LogColors.MAGENTA}{LogColors.BOLD}Role:{LogColors.END} "
+                    f"{LogColors.CYAN}{m['role']}{LogColors.END}\n"
+                    + f"{LogColors.MAGENTA}{LogColors.BOLD}Content:{LogColors.END} "
+                      f"{LogColors.CYAN}{m['content']}{LogColors.END}\n")
+
+    def log_response(self, response: str):
+        with formatting_log(self.logger, "GPT Response"):
+            self.logger.info(
+                f"{LogColors.CYAN}{response}{LogColors.END}\n")
+
+    # TODO:
+    # It looks wierd if we only have logger
+    def info(self, *args, plain=False, title="Info"):
+        if plain:
+            return self.plain_info(*args)
+        with formatting_log(self.logger, title):
+            for arg in args:
+                self.logger.info(f"{LogColors.WHITE}{arg}{LogColors.END}")
+
+    def plain_info(self, *args):
+        for arg in args:
+            self.logger.info(
+                f"{LogColors.YELLOW}{LogColors.BOLD}Info:{LogColors.END}{LogColors.WHITE}{arg}{LogColors.END}")
+
+    def warning(self, *args):
+        for arg in args:
+            self.logger.warning(
+                f"{LogColors.BLUE}{LogColors.BOLD}Warning:{LogColors.END}{arg}")
+
+    def error(self, *args):
+        for arg in args:
+            self.logger.error(
+                f"{LogColors.RED}{LogColors.BOLD}Error:{LogColors.END}{arg}")
--- a/qlib/finco/prompt_template.py
+++ b/qlib/finco/prompt_template.py
@@ -0,0 +1,32 @@
+from typing import Union
+from pathlib import Path
+from jinja2 import Template
+import yaml
+
+from qlib.finco.utils import SingletonBaseClass
+from qlib.finco import get_finco_path
+
+
+class PromptTemplate(SingletonBaseClass):
+    def __init__(self) -> None:
+        super().__init__()
+        _template = yaml.load(open(Path.joinpath(get_finco_path(), "prompt_template.yaml"), "r"),
+                              Loader=yaml.FullLoader)
+        for k, v in _template.items():
+            if k == "mods":
+                continue
+            self.__setattr__(k, Template(v))
+
+    def get(self, key: str):
+        return self.__dict__.get(key, Template(""))
+
+    def update(self, key: str, value):
+        self.__setattr__(key, value)
+
+    def save(self, file_path: Union[str, Path]):
+        if isinstance(file_path, str):
+            file_path = Path(file_path)
+        Path.mkdir(file_path.parent, exist_ok=True)
+
+        with open(file_path, 'w') as f:
+            yaml.dump(self.__dict__, f)
--- a/qlib/finco/prompt_template.yaml
+++ b/qlib/finco/prompt_template.yaml
--- a/qlib/finco/task.py
+++ b/qlib/finco/task.py
--- a/qlib/finco/tpl/README.md
+++ b/qlib/finco/tpl/README.md
@@ -0,0 +1,12 @@
+This is a set of templates that should be copied for a new project.
+
+Here are the explanations for the templates folder.
+
+| folder | explanations                                                     |
+|--------|------------------------------------------------------------------|
+| sl     | Default configuration for supervised learning                    |
+| sl-cfg | Like configuration in sl. But the dataset is highly configurable |
+
+
+# TODO
+- [ ] [Copier](https://copier.readthedocs.io/en/stable/#quick-start) may be useful if the generation process becomes complicated
--- a/qlib/finco/tpl/init.py
+++ b/qlib/finco/tpl/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+from pathlib import Path
+
+DIRNAME = Path(__file__).absolute().resolve().parent
+
+
+def get_tpl_path() -> Path:
+    """
+    return the template path
+    Because the template path is located in the folder. We don't know where it is located. So __file__ for this module will be used.
+    """
+    return DIRNAME
--- a/qlib/finco/tpl/sl-cfg/workflow_config.yaml
+++ b/qlib/finco/tpl/sl-cfg/workflow_config.yaml
--- a/qlib/finco/tpl/sl/workflow_config.yaml
+++ b/qlib/finco/tpl/sl/workflow_config.yaml
@@ -0,0 +1,73 @@
+qlib_init:
+    provider_uri: "~/.qlib/qlib_data/cn_data"
+    region: cn
+experiment_name: finCo
+market: &market csi300
+benchmark: &benchmark SH000300
+data_handler_config: &data_handler_config
+    start_time: 2008-01-01
+    end_time: 2020-08-01
+    fit_start_time: 2008-01-01
+    fit_end_time: 2014-12-31
+    instruments: *market
+port_analysis_config: &port_analysis_config
+    strategy:
+        class: TopkDropoutStrategy
+        module_path: qlib.contrib.strategy
+        kwargs:
+            model: <MODEL>
+            dataset: <DATASET>
+            topk: 50
+            n_drop: 5
+    backtest:
+        start_time: 2017-01-01
+        end_time: 2020-08-01
+        account: 100000000
+        benchmark: *benchmark
+        exchange_kwargs:
+            limit_threshold: 0.095
+            deal_price: close
+            open_cost: 0.0005
+            close_cost: 0.0015
+            min_cost: 5
+task:
+    model:
+        class: LGBModel
+        module_path: qlib.contrib.model.gbdt
+        kwargs:
+            loss: mse
+            colsample_bytree: 0.8879
+            learning_rate: 0.2
+            subsample: 0.8789
+            lambda_l1: 205.6999
+            lambda_l2: 580.9768
+            max_depth: 8
+            num_leaves: 210
+            num_threads: 20
+    dataset:
+        class: DatasetH
+        module_path: qlib.data.dataset
+        kwargs:
+            handler:
+                class: Alpha158
+                module_path: qlib.contrib.data.handler
+                kwargs: *data_handler_config
+            segments:
+                train: [2008-01-01, 2014-12-31]
+                valid: [2015-01-01, 2016-12-31]
+                test: [2017-01-01, 2020-08-01]
+    record: 
+        - class: SignalRecord
+          module_path: qlib.workflow.record_temp
+          kwargs: 
+            model: <MODEL>
+            dataset: <DATASET>
+        - class: SigAnaRecord
+          module_path: qlib.workflow.record_temp
+          kwargs: 
+            ana_long_short: False
+            ann_scaler: 252
+        - class: PortAnaRecord
+          module_path: qlib.workflow.record_temp
+          kwargs: 
+            config: *port_analysis_config
--- a/qlib/finco/utils.py
+++ b/qlib/finco/utils.py
@@ -0,0 +1,38 @@
+import json
+
+from fuzzywuzzy import fuzz
+
+
+class SingletonMeta(type):
+    _instance = None
+
+    def __call__(cls, *args, **kwargs):
+        if cls._instance is None:
+            cls._instance = super(SingletonMeta, cls).__call__(*args, **kwargs)
+        return cls._instance
+
+
+class SingletonBaseClass(metaclass=SingletonMeta):
+    """
+    Because we try to support defining Singleton with `class A(SingletonBaseClass)` instead of `A(metaclass=SingletonMeta)`
+    This class becomes necessary
+
+    """
+    # TODO: Add move this class to Qlib's general utils.
+
+
+def parse_json(response):
+    try:
+        return json.loads(response)
+    except json.decoder.JSONDecodeError:
+        pass
+
+    raise Exception(f"Failed to parse response: {response}, please report it or help us to fix it.")
+
+
+def similarity(text1, text2):
+    text1 = text1 if isinstance(text1, str) else ""
+    text2 = text2 if isinstance(text2, str) else ""
+
+    # Maybe we can use other similarity algorithm such as tfidf
+    return fuzz.ratio(text1, text2)
--- a/qlib/finco/workflow.py
+++ b/qlib/finco/workflow.py
@@ -0,0 +1,223 @@
+import sys
+import copy
+import shutil
+from pathlib import Path
+from typing import List
+
+from qlib.finco.task import HighLevelPlanTask, SummarizeTask, TrainTask
+from qlib.finco.prompt_template import PromptTemplate, Template
+from qlib.finco.log import FinCoLog, LogColors
+from qlib.finco.utils import similarity
+from qlib.finco.llm import APIBackend
+from qlib.finco.conf import Config
+from qlib.finco.knowledge import KnowledgeBase, Topic
+
+
+class WorkflowContextManager:
+    """Context Manager stores the context of the workflow"""
+
+    """All context are key value pairs which saves the input, output and status of the whole workflow"""
+
+    def __init__(self) -> None:
+        self.context = {}
+        self.logger = FinCoLog()
+
+    def set_context(self, key, value):
+        if key in self.context:
+            self.logger.warning("The key already exists in the context, the value will be overwritten")
+        self.context[key] = value
+
+    def get_context(self, key):
+        # NOTE: if the key doesn't exist, return None. In the future, we may raise an error to detect abnormal behavior
+        if key not in self.context:
+            self.logger.warning("The key doesn't exist in the context")
+            return None
+        return self.context[key]
+
+    def update_context(self, key, new_value):
+        # NOTE: if the key doesn't exist, return None. In the future, we may raise an error to detect abnormal behavior
+        if key not in self.context:
+            self.logger.warning("The key doesn't exist in the context")
+        self.context.update({key: new_value})
+
+    def get_all_context(self):
+        """return a deep copy of the context"""
+        """TODO: do we need to return a deep copy?"""
+        return copy.deepcopy(self.context)
+
+    def retrieve(self, query: str) -> dict:
+        if query in self.context.keys():
+            return {query: self.context.get(query)}
+
+        # Note: retrieve information from context by string similarity maybe abandon in future
+        scores = {}
+        for k, v in self.context.items():
+            scores.update({k: max(similarity(query, k), similarity(query, v))})
+        max_score_key = max(scores, key=scores.get)
+        return {max_score_key: self.context.get(max_score_key)}
+
+    def clear(self, reserve: list = None):
+        if reserve is None:
+            reserve = []
+
+        _context = {k: self.get_context(k) for k in reserve}
+        self.context = _context
+
+
+class WorkflowManager:
+    """This manage the whole task automation workflow including tasks and actions"""
+
+    def __init__(self, workspace=None) -> None:
+        self.logger = FinCoLog()
+
+        if workspace is None:
+            self._workspace = Path.cwd() / "finco_workspace"
+        else:
+            self._workspace = Path(workspace)
+        self.conf = Config()
+        self._confirm_and_rm()
+
+        self.prompt_template = PromptTemplate()
+        self.context = WorkflowContextManager()
+        self.context.set_context("workspace", self._workspace)
+        self.default_user_prompt = "Please help me build a low turnover strategy that focus more on longterm return in China A csi300. Please help to use lightgbm model."
+
+    def _confirm_and_rm(self):
+        # if workspace exists, please confirm and remove it. Otherwise exit.
+        if self._workspace.exists() and not self.conf.continuous_mode:
+            self.logger.info(title="Interact")
+            flag = input(
+                LogColors().render(
+                    f"Will be deleted: \n\t{self._workspace}\n"
+                    f"If you do not need to delete {self._workspace},"
+                    f" please change the workspace dir or rename existing files\n"
+                    f"Are you sure you want to delete, yes(Y/y), no (N/n):",
+                    color=LogColors.WHITE)
+            )
+            if str(flag) not in ["Y", "y"]:
+                sys.exit()
+            else:
+                # remove self._workspace
+                shutil.rmtree(self._workspace)
+        elif self._workspace.exists() and self.conf.continuous_mode:
+            shutil.rmtree(self._workspace)
+
+    def set_context(self, key, value):
+        """Direct call set_context method of the context manager"""
+        self.context.set_context(key, value)
+
+    def get_context(self) -> WorkflowContextManager:
+        return self.context
+
+    def run(self, prompt: str) -> Path:
+        """
+        The workflow manager is supposed to generate a codebase based on the prompt
+
+        Parameters
+        ----------
+        prompt: str
+            the prompt user gives
+
+        Returns
+        -------
+        Path
+            The workflow manager is expected to produce output that includes a codebase containing generated code, results, and reports in a designated location.
+            The path is returned
+
+            The output path should follow a specific format:
+            - TODO: design
+              There is a summarized report where user can start from.
+        """
+
+        # NOTE: The following items are not designed to make the workflow very flexible.
+        # - The generated tasks can't be changed after geting new information from the execution retuls.
+        #   - But it is required in some cases, if we want to build a external dataset, it maybe have to plan like autogpt...
+
+        # NOTE: default user prompt might be changed in the future and exposed to the user
+        if prompt is None:
+            self.set_context("user_prompt", self.default_user_prompt)
+        else:
+            self.set_context("user_prompt", prompt)
+        self.logger.info(f"user_prompt: {self.get_context().get_context('user_prompt')}", title="Start")
+
+        # NOTE: list may not be enough for general task list
+        task_list = [HighLevelPlanTask(), SummarizeTask()]
+        task_finished = []
+        while len(task_list):
+            task_list_info = [str(task) for task in task_list]
+
+            # task list is not long, so sort it is not a big problem
+            # TODO: sort the task list based on the priority of the task
+            # task_list = sorted(task_list, key=lambda x: x.task_type)
+            t = task_list.pop(0)
+            self.logger.info(f"Task finished: {[str(task) for task in task_finished]}",
+                             f"Task in queue: {task_list_info}",
+                             f"Executing task: {str(t)}",
+                             title="Task")
+
+            t.assign_context_manager(self.context)
+            res = t.execute()
+            t.summarize()
+            task_finished.append(t)
+            self.context.set_context("task_finished", task_finished)
+            self.logger.plain_info(f"{str(t)} finished.\n\n\n")
+
+            task_list = res + task_list
+
+        return self._workspace
+
+
+class LearnManager:
+    __DEFAULT_TOPICS = ["IC", "MaxDropDown"]
+
+    def __init__(self):
+        self.epoch = 0
+        self.wm = WorkflowManager()
+
+        topics = [Topic(name=topic, describe=self.wm.prompt_template.get(f"Topic_{topic}")) for topic in
+                  self.__DEFAULT_TOPICS]
+        self.knowledge_base = KnowledgeBase(init_path=Path.cwd().joinpath('knowledge'), topics=topics)
+
+    def run(self, prompt):
+        # todo: add early stop condition
+        for i in range(10):
+            self.wm.run(prompt)
+            self.knowledge_base.update(self.wm._workspace)
+            self.knowledge_base.summarize_by_topic()
+            self.learn()
+            self.epoch += 1
+
+    def learn(self):
+        workspace = self.wm.context.get_context("workspace")
+
+        def _drop_duplicate_task(_task: List):
+            unique_task = {}
+            for obj in _task:
+                task_name = obj.__class__.__name__
+                if task_name not in unique_task:
+                    unique_task[task_name] = obj
+            return list(unique_task.values())
+
+        # one task maybe run several times in workflow
+        task_finished = _drop_duplicate_task(self.wm.context.get_context("task_finished"))
+
+        user_prompt = self.wm.context.get_context("user_prompt")
+        summary = self.wm.context.get_context("summary")
+
+        for task in task_finished:
+            prompt_workflow_selection = self.wm.prompt_template.get(f"{self.__class__.__name__}_user").render(
+                summary=summary, brief=self.knowledge_base.query_topics(),
+                task_finished=[str(t) for t in task_finished],
+                task=task.__class__.__name__, system=task.system.render(), user_prompt=user_prompt
+            )
+
+            response = APIBackend().build_messages_and_create_chat_completion(
+                user_prompt=prompt_workflow_selection,
+                system_prompt=self.wm.prompt_template.get(f"{self.__class__.__name__}_system").render()
+            )
+
+            # todo: response assertion
+            task.prompt_template.update(key=f"{task.__class__.__name__}_system", value=Template(response))
+
+        self.wm.prompt_template.save(Path.joinpath(workspace, f"prompts/checkpoint_{self.epoch}.yml"))
+        self.wm.context.clear(reserve=["workspace"])
--- a/qlib/rl/contrib/backtest.py
+++ b/qlib/rl/contrib/backtest.py
@@ -30,12 +30,13 @@ def _get_multi_level_executor_config(
    strategy_config: dict,
    cash_limit: float | None = None,
    generate_report: bool = False,
+    data_granularity: str = "1min",
 ) -> dict:
    executor_config = {
        "class": "SimulatorExecutor",
        "module_path": "qlib.backtest.executor",
        "kwargs": {
-            "time_per_step": "5min",  # FIXME: move this into config
+            "time_per_step": data_granularity,
            "verbose": False,
            "trade_type": SimulatorExecutor.TT_PARAL if cash_limit is not None else SimulatorExecutor.TT_SERIAL,
            "generate_report": generate_report,
@@ -154,12 +155,7 @@ def single_with_simulator(
    -------
        If generate_report is True, return execution records and the generated report. Otherwise, return only records.
    """
-    if split == "stock":
-        stock_id = orders.iloc[0].instrument
-        init_qlib(backtest_config["qlib"], part=stock_id)
-    else:
-        day = orders.iloc[0].datetime
-        init_qlib(backtest_config["qlib"], part=day)
+    init_qlib(backtest_config["qlib"])

    stocks = orders.instrument.unique().tolist()

@@ -181,13 +177,14 @@ def single_with_simulator(
            strategy_config=backtest_config["strategies"],
            cash_limit=cash_limit,
            generate_report=generate_report,
+            data_granularity=backtest_config["data_granularity"],
        )

        exchange_config = copy.deepcopy(backtest_config["exchange"])
        exchange_config.update(
            {
                "codes": stocks,
-                "freq": "5min",  # FIXME: move this into config
+                "freq": backtest_config["data_granularity"],
            }
        )

@@ -202,7 +199,7 @@ def single_with_simulator(
        reports.append(simulator.report_dict)
        decisions += simulator.decisions

-    indicator_1day_objs = [report["indicator"]["1day"][1] for report in reports]
+    indicator_1day_objs = [report["indicator_dict"]["1day"][1] for report in reports]
    indicator_info = {k: v for obj in indicator_1day_objs for k, v in obj.order_indicator_his.items()}
    records = _convert_indicator_to_dataframe(indicator_info)
    assert records is None or not np.isnan(records["ffr"]).any()
@@ -253,12 +250,7 @@ def single_with_collect_data_loop(
        If generate_report is True, return execution records and the generated report. Otherwise, return only records.
    """

-    if split == "stock":
-        stock_id = orders.iloc[0].instrument
-        init_qlib(backtest_config["qlib"], part=stock_id)
-    else:
-        day = orders.iloc[0].datetime
-        init_qlib(backtest_config["qlib"], part=day)
+    init_qlib(backtest_config["qlib"])

    trade_start_time = orders["datetime"].min()
    trade_end_time = orders["datetime"].max()
@@ -280,13 +272,14 @@ def single_with_collect_data_loop(
        strategy_config=backtest_config["strategies"],
        cash_limit=cash_limit,
        generate_report=generate_report,
+        data_granularity=backtest_config["data_granularity"],
    )

    exchange_config = copy.deepcopy(backtest_config["exchange"])
    exchange_config.update(
        {
            "codes": stocks,
-            "freq": "5min",  # FIXME: move this into config
+            "freq": backtest_config["data_granularity"],
        }
    )

--- a/qlib/rl/contrib/naive_config_parser.py
+++ b/qlib/rl/contrib/naive_config_parser.py
@@ -100,6 +100,7 @@ def get_backtest_config_fromfile(path: str) -> dict:
        "multiplier": 1.0,
        "output_dir": "outputs_backtest/",
        "generate_report": False,
+        "data_granularity": "1min",
    }
    backtest_config = merge_a_into_b(a=backtest_config, b=backtest_config_default)

--- a/qlib/rl/contrib/train_onpolicy.py
+++ b/qlib/rl/contrib/train_onpolicy.py
@@ -1,21 +1,23 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT License.
+from __future__ import annotations
+
 import argparse
 import os
 import random
+import sys
 import warnings
 from pathlib import Path
 from typing import cast, List, Optional

 import numpy as np
 import pandas as pd
-import qlib
 import torch
 import yaml
 from qlib.backtest import Order
 from qlib.backtest.decision import OrderDir
 from qlib.constant import ONE_MIN
-from qlib.rl.data.pickle_styled import load_simple_intraday_backtest_data
+from qlib.rl.data.native import load_handler_intraday_processed_data
 from qlib.rl.interpreter import ActionInterpreter, StateInterpreter
 from qlib.rl.order_execution import SingleAssetOrderExecutionSimple
 from qlib.rl.reward import Reward
@@ -49,19 +51,17 @@ def _read_orders(order_dir: Path) -> pd.DataFrame:
 class LazyLoadDataset(Dataset):
    def __init__(
        self,
+        data_dir: str,
        order_file_path: Path,
-        data_dir: Path,
        default_start_time_index: int,
        default_end_time_index: int,
    ) -> None:
        self._default_start_time_index = default_start_time_index
        self._default_end_time_index = default_end_time_index

-        self._order_file_path = order_file_path
        self._order_df = _read_orders(order_file_path).reset_index()
-
-        self._data_dir = data_dir
        self._ticks_index: Optional[pd.DatetimeIndex] = None
+        self._data_dir = Path(data_dir)

    def __len__(self) -> int:
        return len(self._order_df)
@@ -74,12 +74,17 @@ class LazyLoadDataset(Dataset):
            # TODO: We only load ticks index once based on the assumption that ticks index of different dates
            # TODO: in one experiment are all the same. If that assumption is not hold, we need to load ticks index
            # TODO: of all dates.
-            backtest_data = load_simple_intraday_backtest_data(
+
+            data = load_handler_intraday_processed_data(
                data_dir=self._data_dir,
                stock_id=row["instrument"],
                date=date,
+                feature_columns_today=[],
+                feature_columns_yesterday=[],
+                backtest=True,
+                index_only=True,
            )
-            self._ticks_index = [t - date for t in backtest_data.get_time_index()]
+            self._ticks_index = [t - date for t in data.today.index]

        order = Order(
            stock_id=row["instrument"],
@@ -104,8 +109,6 @@ def train_and_test(
    run_training: bool,
    run_backtest: bool,
 ) -> None:
-    qlib.init()
-
    order_root_path = Path(data_config["source"]["order_dir"])

    data_granularity = simulator_config.get("data_granularity", 1)
@@ -113,10 +116,11 @@ def train_and_test(
    def _simulator_factory_simple(order: Order) -> SingleAssetOrderExecutionSimple:
        return SingleAssetOrderExecutionSimple(
            order=order,
-            data_dir=Path(data_config["source"]["data_dir"]),
-            ticks_per_step=simulator_config["time_per_step"],
+            data_dir=data_config["source"]["feature_root_dir"],
+            feature_columns_today=data_config["source"]["feature_columns_today"],
+            feature_columns_yesterday=data_config["source"]["feature_columns_yesterday"],
            data_granularity=data_granularity,
-            deal_price_type=data_config["source"].get("deal_price_column", "close"),
+            ticks_per_step=simulator_config["time_per_step"],
            vol_threshold=simulator_config["vol_limit"],
        )

@@ -126,8 +130,8 @@ def train_and_test(
    if run_training:
        train_dataset, valid_dataset = [
            LazyLoadDataset(
+                data_dir=data_config["source"]["feature_root_dir"],
                order_file_path=order_root_path / tag,
-                data_dir=Path(data_config["source"]["data_dir"]),
                default_start_time_index=data_config["source"]["default_start_time_index"] // data_granularity,
                default_end_time_index=data_config["source"]["default_end_time_index"] // data_granularity,
            )
@@ -178,8 +182,8 @@ def train_and_test(

    if run_backtest:
        test_dataset = LazyLoadDataset(
+            data_dir=data_config["source"]["feature_root_dir"],
            order_file_path=order_root_path / "test",
-            data_dir=Path(data_config["source"]["data_dir"]),
            default_start_time_index=data_config["source"]["default_start_time_index"] // data_granularity,
            default_end_time_index=data_config["source"]["default_end_time_index"] // data_granularity,
        )
@@ -205,6 +209,9 @@ def main(config: dict, run_training: bool, run_backtest: bool) -> None:
    if "seed" in config["runtime"]:
        seed_everything(config["runtime"]["seed"])

+    for extra_module_path in config["env"].get("extra_module_paths", []):
+        sys.path.append(extra_module_path)
+
    state_interpreter: StateInterpreter = init_instance_by_config(config["state_interpreter"])
    action_interpreter: ActionInterpreter = init_instance_by_config(config["action_interpreter"])
    reward: Reward = init_instance_by_config(config["reward"])
--- a/qlib/rl/data/integration.py
+++ b/qlib/rl/data/integration.py
@@ -8,48 +8,14 @@ TODO: The implementation here is kind of adhoc. It is better to design a more un

 from __future__ import annotations

-import pickle
 from pathlib import Path
-from typing import List

-import cachetools
-import numpy as np
-import pandas as pd
 import qlib
 from qlib.constant import REG_CN
 from qlib.contrib.ops.high_freq import BFillNan, Cut, Date, DayCumsum, DayLast, FFillNan, IsInf, IsNull, Select
-from qlib.data.dataset import DatasetH
-
-dataset = None


-class DataWrapper:
-    def __init__(
-        self,
-        feature_dataset: DatasetH,
-        backtest_dataset: DatasetH,
-        columns_today: List[str],
-        columns_yesterday: List[str],
-        _internal: bool = False,
-    ):
-        assert _internal, "Init function of data wrapper is for internal use only."
-
-        self.feature_dataset = feature_dataset
-        self.backtest_dataset = backtest_dataset
-        self.columns_today = columns_today
-        self.columns_yesterday = columns_yesterday
-
-    @cachetools.cached(  # type: ignore
-        cache=cachetools.LRUCache(100),
-        key=lambda _, stock_id, date, backtest: (stock_id, date.replace(hour=0, minute=0, second=0), backtest),
-    )
-    def get(self, stock_id: str, date: pd.Timestamp, backtest: bool = False) -> pd.DataFrame:
-        start_time, end_time = date.replace(hour=0, minute=0, second=0), date.replace(hour=23, minute=59, second=59)
-        dataset = self.backtest_dataset if backtest else self.feature_dataset
-        return dataset.handler.fetch(pd.IndexSlice[stock_id, start_time:end_time], level=None)
-
-
-def init_qlib(qlib_config: dict, part: str | None = None) -> None:
+def init_qlib(qlib_config: dict) -> None:
    """Initialize necessary resource to launch the workflow, including data direction, feature columns, etc..

    Parameters
@@ -72,12 +38,8 @@ def init_qlib(qlib_config: dict, part: str | None = None) -> None:
                    "$bidV_1", "$bidV1_1", "$bidV3_1", "$bidV5_1", "$askV_1", "$askV1_1", "$askV3_1", "$askV5_1",
                ],
            }
-    part
-        Identifying which part (stock / date) to load.
    """

-    global dataset  # pylint: disable=W0603
-
    def _convert_to_path(path: str | Path) -> Path:
        return path if isinstance(path, Path) else Path(path)

@@ -118,47 +80,3 @@ def init_qlib(qlib_config: dict, part: str | None = None) -> None:
        redis_port=-1,
        clear_mem_cache=False,  # init_qlib will be called for multiple times. Keep the cache for improving performance
    )
-
-    if part == "skip":
-        return
-
-    # this won't work if it's put outside in case of multiprocessing
-    from qlib.data import D  # noqa pylint: disable=C0415,W0611
-
-    if part is None:
-        feature_path = Path(qlib_config["feature_root_dir"]) / "feature.pkl"
-        backtest_path = Path(qlib_config["feature_root_dir"]) / "backtest.pkl"
-    else:
-        feature_path = Path(qlib_config["feature_root_dir"]) / "feature" / (part + ".pkl")
-        backtest_path = Path(qlib_config["feature_root_dir"]) / "backtest" / (part + ".pkl")
-
-    with feature_path.open("rb") as f:
-        feature_dataset = pickle.load(f)
-    with backtest_path.open("rb") as f:
-        backtest_dataset = pickle.load(f)
-
-    dataset = DataWrapper(
-        feature_dataset,
-        backtest_dataset,
-        qlib_config["feature_columns_today"],
-        qlib_config["feature_columns_yesterday"],
-        _internal=True,
-    )
-
-
-def fetch_features(stock_id: str, date: pd.Timestamp, yesterday: bool = False, backtest: bool = False) -> pd.DataFrame:
-    assert dataset is not None, "You must call init_qlib() before doing this."
-
-    if backtest:
-        fields = ["$close", "$volume"]
-    else:
-        fields = dataset.columns_yesterday if yesterday else dataset.columns_today
-
-    data = dataset.get(stock_id, date, backtest)
-    if data is None or len(data) == 0:
-        # create a fake index, but RL doesn't care about index
-        data = pd.DataFrame(0.0, index=np.arange(240), columns=fields, dtype=np.float32)  # FIXME: hardcode here
-    else:
-        data = data.rename(columns={c: c.rstrip("0") for c in data.columns})
-        data = data[fields]
-    return data
--- a/qlib/rl/data/native.py
+++ b/qlib/rl/data/native.py
@@ -2,17 +2,29 @@
 # Licensed under the MIT License.
 from __future__ import annotations

-from typing import cast
+from pathlib import Path
+from typing import cast, List

 import cachetools
 import pandas as pd
+import pickle
+import os

 from qlib.backtest import Exchange, Order
 from qlib.backtest.decision import TradeRange, TradeRangeByTime
-from qlib.rl.order_execution.utils import get_ticks_slice
-
+from qlib.constant import EPS_T
 from .base import BaseIntradayBacktestData, BaseIntradayProcessedData, ProcessedDataProvider
-from .integration import fetch_features
+
+
+def get_ticks_slice(
+    ticks_index: pd.DatetimeIndex,
+    start: pd.Timestamp,
+    end: pd.Timestamp,
+    include_end: bool = False,
+) -> pd.DatetimeIndex:
+    if not include_end:
+        end = end - EPS_T
+    return ticks_index[ticks_index.slice_indexer(start, end)]


 class IntradayBacktestData(BaseIntradayBacktestData):
@@ -71,6 +83,31 @@ class IntradayBacktestData(BaseIntradayBacktestData):
        return pd.DatetimeIndex([e[1] for e in list(self._exchange.quote_df.index)])


+class DataframeIntradayBacktestData(BaseIntradayBacktestData):
+    """Backtest data from dataframe"""
+
+    def __init__(self, df: pd.DataFrame, price_column: str = "$close0", volume_column: str = "$volume0") -> None:
+        self.df = df
+        self.price_column = price_column
+        self.volume_column = volume_column
+
+    def __repr__(self) -> str:
+        with pd.option_context("memory_usage", False, "display.max_info_columns", 1, "display.large_repr", "info"):
+            return f"{self.__class__.__name__}({self.df})"
+
+    def __len__(self) -> int:
+        return len(self.df)
+
+    def get_deal_price(self) -> pd.Series:
+        return self.df[self.price_column]
+
+    def get_volume(self) -> pd.Series:
+        return self.df[self.volume_column]
+
+    def get_time_index(self) -> pd.DatetimeIndex:
+        return cast(pd.DatetimeIndex, self.df.index)
+
+
@cachetools.cached(  # type: ignore
    cache=cachetools.LRUCache(100),
    key=lambda order, _, __: order.key_by_day,
@@ -103,13 +140,18 @@ def load_backtest_data(
    return backtest_data


-class NTIntradayProcessedData(BaseIntradayProcessedData):
-    """Subclass of IntradayProcessedData. Used to handle NT style data."""
+class HandlerIntradayProcessedData(BaseIntradayProcessedData):
+    """Subclass of IntradayProcessedData. Used to handle handler (bin format) style data."""

    def __init__(
        self,
+        data_dir: Path,
        stock_id: str,
        date: pd.Timestamp,
+        feature_columns_today: List[str],
+        feature_columns_yesterday: List[str],
+        backtest: bool = False,
+        index_only: bool = False,
    ) -> None:
        def _drop_stock_id(df: pd.DataFrame) -> pd.DataFrame:
            df = df.reset_index()
@@ -117,8 +159,18 @@ class NTIntradayProcessedData(BaseIntradayProcessedData):
                df = df.drop(columns=["instrument"])
            return df.set_index(["datetime"])

-        self.today = _drop_stock_id(fetch_features(stock_id, date))
-        self.yesterday = _drop_stock_id(fetch_features(stock_id, date, yesterday=True))
+        path = os.path.join(data_dir, "backtest" if backtest else "feature", f"{stock_id}.pkl")
+        start_time, end_time = date.replace(hour=0, minute=0, second=0), date.replace(hour=23, minute=59, second=59)
+        with open(path, "rb") as fstream:
+            dataset = pickle.load(fstream)
+        data = dataset.handler.fetch(pd.IndexSlice[stock_id, start_time:end_time], level=None)
+
+        if index_only:
+            self.today = _drop_stock_id(data[[]])
+            self.yesterday = _drop_stock_id(data[[]])
+        else:
+            self.today = _drop_stock_id(data[feature_columns_today])
+            self.yesterday = _drop_stock_id(data[feature_columns_yesterday])

    def __repr__(self) -> str:
        with pd.option_context("memory_usage", False, "display.max_info_columns", 1, "display.large_repr", "info"):
@@ -127,12 +179,42 @@ class NTIntradayProcessedData(BaseIntradayProcessedData):

@cachetools.cached(  # type: ignore
    cache=cachetools.LRUCache(100),  # 100 * 50K = 5MB
+    key=lambda data_dir, stock_id, date, feature_columns_today, feature_columns_yesterday, backtest, index_only: (
+        stock_id,
+        date,
+        backtest,
+        index_only,
+    ),
 )
-def load_nt_intraday_processed_data(stock_id: str, date: pd.Timestamp) -> NTIntradayProcessedData:
-    return NTIntradayProcessedData(stock_id, date)
+def load_handler_intraday_processed_data(
+    data_dir: Path,
+    stock_id: str,
+    date: pd.Timestamp,
+    feature_columns_today: List[str],
+    feature_columns_yesterday: List[str],
+    backtest: bool = False,
+    index_only: bool = False,
+) -> HandlerIntradayProcessedData:
+    return HandlerIntradayProcessedData(
+        data_dir, stock_id, date, feature_columns_today, feature_columns_yesterday, backtest, index_only
+    )


-class NTProcessedDataProvider(ProcessedDataProvider):
+class HandlerProcessedDataProvider(ProcessedDataProvider):
+    def __init__(
+        self,
+        data_dir: str,
+        feature_columns_today: List[str],
+        feature_columns_yesterday: List[str],
+        backtest: bool = False,
+    ) -> None:
+        super().__init__()
+
+        self.data_dir = Path(data_dir)
+        self.feature_columns_today = feature_columns_today
+        self.feature_columns_yesterday = feature_columns_yesterday
+        self.backtest = backtest
+
    def get_data(
        self,
        stock_id: str,
@@ -140,4 +222,12 @@ class NTProcessedDataProvider(ProcessedDataProvider):
        feature_dim: int,
        time_index: pd.Index,
    ) -> BaseIntradayProcessedData:
-        return load_nt_intraday_processed_data(stock_id, date)
+        return load_handler_intraday_processed_data(
+            self.data_dir,
+            stock_id,
+            date,
+            self.feature_columns_today,
+            self.feature_columns_yesterday,
+            backtest=self.backtest,
+            index_only=False,
+        )
--- a/qlib/rl/data/pickle_styled.py
+++ b/qlib/rl/data/pickle_styled.py
@@ -158,8 +158,8 @@ class SimpleIntradayBacktestData(BaseIntradayBacktestData):
        return cast(pd.DatetimeIndex, self.data.index)


-class IntradayProcessedData(BaseIntradayProcessedData):
-    """Subclass of IntradayProcessedData. Used to handle Dataset Handler style data."""
+class PickleIntradayProcessedData(BaseIntradayProcessedData):
+    """Subclass of IntradayProcessedData. Used to handle pickle-styled data."""

    def __init__(
        self,
@@ -217,14 +217,14 @@ def load_simple_intraday_backtest_data(
    cache=cachetools.LRUCache(100),  # 100 * 50K = 5MB
    key=lambda data_dir, stock_id, date, feature_dim, time_index: hashkey(data_dir, stock_id, date),
 )
-def load_pickled_intraday_processed_data(
+def load_pickle_intraday_processed_data(
    data_dir: Path,
    stock_id: str,
    date: pd.Timestamp,
    feature_dim: int,
    time_index: pd.Index,
 ) -> BaseIntradayProcessedData:
-    return IntradayProcessedData(data_dir, stock_id, date, feature_dim, time_index)
+    return PickleIntradayProcessedData(data_dir, stock_id, date, feature_dim, time_index)


 class PickleProcessedDataProvider(ProcessedDataProvider):
@@ -240,7 +240,7 @@ class PickleProcessedDataProvider(ProcessedDataProvider):
        feature_dim: int,
        time_index: pd.Index,
    ) -> BaseIntradayProcessedData:
-        return load_pickled_intraday_processed_data(
+        return load_pickle_intraday_processed_data(
            data_dir=self._data_dir,
            stock_id=stock_id,
            date=date,
--- a/qlib/rl/order_execution/simulator_qlib.py
+++ b/qlib/rl/order_execution/simulator_qlib.py
@@ -67,7 +67,7 @@ class SingleAssetOrderExecution(Simulator[Order, SAOEState, float]):
        cash_limit: Optional[float] = None,
    ) -> None:
        if qlib_config is not None:
-            init_qlib(qlib_config, part="skip")
+            init_qlib(qlib_config)

        strategy, self._executor = get_strategy_executor(
            start_time=order.date,
--- a/qlib/rl/order_execution/simulator_simple.py
+++ b/qlib/rl/order_execution/simulator_simple.py
@@ -3,17 +3,19 @@

 from __future__ import annotations

-from pathlib import Path
-from typing import Any, cast, Optional
+from typing import Any, cast, List, Optional

 import numpy as np
 import pandas as pd
+
+from pathlib import Path
 from qlib.backtest.decision import Order, OrderDir
 from qlib.constant import EPS, EPS_T, float_or_ndarray
-from qlib.rl.data.pickle_styled import DealPriceType, load_simple_intraday_backtest_data
+from qlib.rl.data.base import BaseIntradayBacktestData
+from qlib.rl.data.native import DataframeIntradayBacktestData, load_handler_intraday_processed_data
+from qlib.rl.data.pickle_styled import load_simple_intraday_backtest_data
 from qlib.rl.simulator import Simulator
 from qlib.rl.utils import LogLevel
-
 from .state import SAOEMetrics, SAOEState

 __all__ = ["SingleAssetOrderExecutionSimple"]
@@ -36,12 +38,16 @@ class SingleAssetOrderExecutionSimple(Simulator[Order, SAOEState, float]):
    ----------
    order
        The seed to start an SAOE simulator is an order.
+    data_dir
+        Path to load backtest data.
+    feature_columns_today
+        Columns of today's feature.
+    feature_columns_yesterday
+        Columns of yesterday's feature.
    data_granularity
        Number of ticks between consecutive data entries.
    ticks_per_step
        How many ticks per step.
-    data_dir
-        Path to load backtest data
    vol_threshold
        Maximum execution volume (divided by market execution volume).
    """
@@ -73,9 +79,10 @@ class SingleAssetOrderExecutionSimple(Simulator[Order, SAOEState, float]):
        self,
        order: Order,
        data_dir: Path,
+        feature_columns_today: List[str] = [],
+        feature_columns_yesterday: List[str] = [],
        data_granularity: int = 1,
        ticks_per_step: int = 30,
-        deal_price_type: DealPriceType = "close",
        vol_threshold: Optional[float] = None,
    ) -> None:
        super().__init__(initial=order)
@@ -83,18 +90,13 @@ class SingleAssetOrderExecutionSimple(Simulator[Order, SAOEState, float]):
        assert ticks_per_step % data_granularity == 0

        self.order = order
-        self.ticks_per_step: int = ticks_per_step // data_granularity
-        self.deal_price_type = deal_price_type
-        self.vol_threshold = vol_threshold
        self.data_dir = data_dir
-        self.backtest_data = load_simple_intraday_backtest_data(
-            self.data_dir,
-            order.stock_id,
-            pd.Timestamp(order.start_time.date()),
-            self.deal_price_type,
-            order.direction,
-        )
+        self.feature_columns_today = feature_columns_today
+        self.feature_columns_yesterday = feature_columns_yesterday
+        self.ticks_per_step: int = ticks_per_step // data_granularity
+        self.vol_threshold = vol_threshold

+        self.backtest_data = self.get_backtest_data()
        self.ticks_index = self.backtest_data.get_time_index()

        # Get time index available for trading
@@ -118,6 +120,30 @@ class SingleAssetOrderExecutionSimple(Simulator[Order, SAOEState, float]):
        self.market_vol: Optional[np.ndarray] = None
        self.market_vol_limit: Optional[np.ndarray] = None

+    def get_backtest_data(self) -> BaseIntradayBacktestData:
+        try:
+            data = load_handler_intraday_processed_data(
+                data_dir=self.data_dir,
+                stock_id=self.order.stock_id,
+                date=pd.Timestamp(self.order.start_time.date()),
+                feature_columns_today=self.feature_columns_today,
+                feature_columns_yesterday=self.feature_columns_yesterday,
+                backtest=True,
+                index_only=False,
+            )
+            return DataframeIntradayBacktestData(data.today)
+        except (AttributeError, FileNotFoundError):
+            # TODO: For compatibility with older versions of test scripts (tests/rl/test_saoe_simple.py)
+            # TODO: In the future, we should modify the data format used by the test script,
+            # TODO: and then delete this branch.
+            return load_simple_intraday_backtest_data(
+                self.data_dir / "backtest",
+                self.order.stock_id,
+                pd.Timestamp(self.order.start_time.date()),
+                "close",
+                self.order.direction,
+            )
+
    def step(self, amount: float) -> None:
        """Execute one step or SAOE.

--- a/qlib/rl/order_execution/utils.py
+++ b/qlib/rl/order_execution/utils.py
@@ -10,18 +10,7 @@ import pandas as pd

 from qlib.backtest.decision import OrderDir
 from qlib.backtest.executor import BaseExecutor, NestedExecutor, SimulatorExecutor
-from qlib.constant import EPS_T, float_or_ndarray
-
-
-def get_ticks_slice(
-    ticks_index: pd.DatetimeIndex,
-    start: pd.Timestamp,
-    end: pd.Timestamp,
-    include_end: bool = False,
-) -> pd.DatetimeIndex:
-    if not include_end:
-        end = end - EPS_T
-    return ticks_index[ticks_index.slice_indexer(start, end)]
+from qlib.constant import float_or_ndarray


 def dataframe_append(df: pd.DataFrame, other: Any) -> pd.DataFrame:
--- a/qlib/utils/init.py
+++ b/qlib/utils/init.py
@@ -1,6 +1,7 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT License.

+# TODO: this utils covers too much utilities, please seperat it into sub modules

 from __future__ import division
 from __future__ import print_function
@@ -43,7 +44,7 @@ is_deprecated_lexsorted_pandas = version.parse(pd.__version__) > version.parse("
 #################### Server ####################
 def get_redis_connection():
    """get redis connection instance."""
-    return redis.StrictRedis(host=C.redis_host, port=C.redis_port, db=C.redis_task_db)
+    return redis.StrictRedis(host=C.redis_host, port=C.redis_port, db=C.redis_task_db, password=C.redis_password)


 #################### Data ####################
@@ -427,7 +428,7 @@ def init_instance_by_config(
            pr = urlparse(config)
            if pr.scheme == "file":
                pr_path = os.path.join(pr.netloc, pr.path) if bool(pr.path) else pr.netloc
-                with open(pr_path, "rb") as f:
+                with open(os.path.normpath(pr_path), "rb") as f:
                    return pickle.load(f)
        else:
            with config.open("rb") as f:
--- a/qlib/utils/data.py
+++ b/qlib/utils/data.py
@@ -1,6 +1,10 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT License.
-from typing import Union
+"""
+This module covers some utility functions that operate on data or basic object
+"""
+from copy import deepcopy
+from typing import List, Union
 import pandas as pd
 import numpy as np

@@ -54,3 +58,48 @@ def deepcopy_basic_type(obj: object) -> object:
        return {k: deepcopy_basic_type(v) for k, v in obj.items()}
    else:
        return obj
+
+
+S_DROP = "__DROP__"  # this is a symbol which indicates drop the value
+
+
+def update_config(base_config: dict, ext_config: Union[dict, List[dict]]):
+    """
+    supporting adding base config based on the ext_config
+
+    >>> bc = {"a": "xixi"}
+    >>> ec = {"b": "haha"}
+    >>> new_bc = update_config(bc, ec)
+    >>> print(new_bc)
+    {'a': 'xixi', 'b': 'haha'}
+    >>> print(bc)  # base config should not be changed
+    {'a': 'xixi'}
+    >>> print(update_config(bc, {"b": S_DROP}))
+    {'a': 'xixi'}
+    >>> print(update_config(new_bc, {"b": S_DROP}))
+    {'a': 'xixi'}
+    """
+
+    base_config = deepcopy(base_config)  # in case of modifying base config
+
+    for ec in ext_config if isinstance(ext_config, (list, tuple)) else [ext_config]:
+        for key in ec:
+            if key not in base_config:
+                # if it is not in the default key, then replace it.
+                # ADD if not drop
+                if ec[key] != S_DROP:
+                    base_config[key] = ec[key]
+
+            else:
+                if isinstance(base_config[key], dict) and isinstance(ec[key], dict):
+                    # Recursive
+                    # Both of them are dict, then update it nested
+                    base_config[key] = update_config(base_config[key], ec[key])
+                elif ec[key] == S_DROP:
+                    # DROP
+                    del base_config[key]
+                else:
+                    # REPLACE
+                    # one of then are not dict. Then replace
+                    base_config[key] = ec[key]
+    return base_config
--- a/qlib/workflow/cli.py
+++ b/qlib/workflow/cli.py
@@ -1,6 +1,6 @@
 #  Copyright (c) Microsoft Corporation.
 #  Licensed under the MIT License.
-
+import logging
 import sys
 import os
 from pathlib import Path
@@ -10,6 +10,12 @@ import fire
 import ruamel.yaml as yaml
 from qlib.config import C
 from qlib.model.trainer import task_train
+from qlib.utils.data import update_config
+from qlib.log import get_module_logger
+from qlib.utils import set_log_with_config
+
+set_log_with_config(C.logging_config)
+logger = get_module_logger("qrun", logging.INFO)


 def get_path_list(path):
@@ -47,10 +53,47 @@ def workflow(config_path, experiment_name="workflow", uri_folder="mlruns"):
    This is a Qlib CLI entrance.
    User can run the whole Quant research workflow defined by a configure file
    - the code is located here ``qlib/workflow/cli.py`
+
+    User can specify a base_config file in your workflow.yml file by adding "BASE_CONFIG_PATH".
+    Qlib will load the configuration in BASE_CONFIG_PATH first, and the user only needs to update the custom fields
+    in their own workflow.yml file.
+
+    For examples:
+
+        qlib_init:
+            provider_uri: "~/.qlib/qlib_data/cn_data"
+            region: cn
+        BASE_CONFIG_PATH: "workflow_config_lightgbm_Alpha158_csi500.yaml"
+        market: csi300
+
    """
    with open(config_path) as fp:
        config = yaml.safe_load(fp)

+    base_config_path = config.get("BASE_CONFIG_PATH", None)
+    if base_config_path:
+        logger.info(f"Use BASE_CONFIG_PATH: {base_config_path}")
+        base_config_path = Path(base_config_path)
+
+        # it will find config file in absolute path and relative path
+        if base_config_path.exists():
+            path = base_config_path
+        else:
+            logger.info(
+                f"Can't find BASE_CONFIG_PATH base on: {Path.cwd()}, "
+                f"try using relative path to config path: {Path(config_path).absolute()}"
+            )
+            relative_path = Path(config_path).absolute().parent.joinpath(base_config_path)
+            if relative_path.exists():
+                path = relative_path
+            else:
+                raise FileNotFoundError(f"Can't find the BASE_CONFIG file: {base_config_path}")
+
+        with open(path) as fp:
+            base_config = yaml.safe_load(fp)
+        logger.info(f"Load BASE_CONFIG_PATH succeed: {path.resolve()}")
+        config = update_config(base_config, config)
+
    # config the `sys` section
    sys_config(config, config_path)

--- a/qlib/workflow/record_temp.py
+++ b/qlib/workflow/record_temp.py
@@ -18,7 +18,7 @@ from ..utils import fill_placeholder, flatten_dict, class_casting, get_date_by_s
 from ..utils.time import Freq
 from ..utils.data import deepcopy_basic_type
 from ..contrib.eva.alpha import calc_ic, calc_long_short_return, calc_long_short_prec
-
+from qlib.contrib.analyzer import HFAnalyzer, SignalAnalyzer

 logger = get_module_logger("workflow", logging.INFO)

@@ -156,6 +156,9 @@ class RecordTemp:
                with class_casting(self, self.depend_cls):
                    self.check(include_self=True)

+    def analyse(self):
+        raise NotImplementedError(f"Please implement the `analysis` method.")
+

 class SignalRecord(RecordTemp):
    """
--- a/scripts/finco/README.md
+++ b/scripts/finco/README.md
@@ -0,0 +1,15 @@
+
+
+# Requirements
+
+
+Use following install command to complete the project.
+```
+pip install -e '.[finco]'
+```
+
+
+# TODOs
+
+- [ ] Select the appropriate LLM API
+  - Which API is more suitable for meeting our requirements - the original API or an alternative like LangChain?
--- a/scripts/finco/cmd.sh
+++ b/scripts/finco/cmd.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+set -x # show command
+set -e # Error on exception
+
+DIR="$(
+	cd "$(dirname "$(readlink -f "$0")")" || exit
+	pwd -P
+)"
+# --load the cridentials
+if [ -e $DIR/cridential.sh ]; then
+	source $DIR/cridential.sh
+fi
+
+# run the command
+python -m qlib.finco.cli "please help me build a low turnover strategy that focus more on longterm return"
--- a/scripts/finco/cridential.sh.example
+++ b/scripts/finco/cridential.sh.example
@@ -0,0 +1,3 @@
+export OPENAI_API_TYPE=azure  # This only necessary for Azure OpenAI
+export OPENAI_API_KEY=
+export OPENAI_API_BASE=
--- a/setup.py
+++ b/setup.py
@@ -170,9 +170,17 @@ setup(
            "gym>=0.24",  # If you do not put gym at the end, gym will degrade causing pytest results to fail.
        ],
        "rl": [
-            "tianshou",
+            "tianshou<=0.4.10",
            "torch",
        ],
+        "finco": [
+            # finco is not necessary for all Qlib users; So a single require section is used for it.
+            "openapi",
+            "pydantic",  # Please add it to basic requirements after the design of pydantic is state.
+            "python-dotenv",  # I don't think this is necessary if we use pydantic.
+            "fuzzywuzzy",
+            "python-Levenshtein"    # not necessary but accelerate fuzzywuzzy calculation
+        ],
    },
    include_package_data=True,
    classifiers=[
--- a/tests/data_mid_layer_tests/README.md
+++ b/tests/data_mid_layer_tests/README.md
@@ -0,0 +1,5 @@
+# Introduction
+The middle layers of data, which mainly includes
+- Handler
+    - processors
+- Datasets
--- a/tests/data_mid_layer_tests/test_dataset.py
+++ b/tests/data_mid_layer_tests/test_dataset.py
--- a/tests/data_mid_layer_tests/test_handler.py
+++ b/tests/data_mid_layer_tests/test_handler.py
@@ -0,0 +1,37 @@
+import os
+import pickle
+import shutil
+import unittest
+from qlib.tests import TestAutoData
+from qlib.data import D
+from qlib.data.dataset.handler import DataHandlerLP
+
+
+class HandlerTests(TestAutoData):
+    def to_str(self, obj):
+        return "".join(str(obj).split())
+
+    def test_handler_df(self):
+        df = D.features(["sh600519"], start_time="20190101", end_time="20190201", fields=["$close"])
+        dh = DataHandlerLP.from_df(df)
+        print(dh.fetch())
+        self.assertTrue(dh._data.equals(df))
+        self.assertTrue(dh._infer is dh._data)
+        self.assertTrue(dh._learn is dh._data)
+        self.assertTrue(dh.data_loader._data is dh._data)
+        fname = "_handler_test.pkl"
+        dh.to_pickle(fname, dump_all=True)
+
+        with open(fname, "rb") as f:
+            dh_d = pickle.load(f)
+
+        self.assertTrue(dh_d._data.equals(df))
+        self.assertTrue(dh_d._infer is dh_d._data)
+        self.assertTrue(dh_d._learn is dh_d._data)
+        # Data loader will no longer be useful
+        self.assertTrue("_data" not in dh_d.data_loader.__dict__.keys())
+        os.remove(fname)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/data_mid_layer_tests/test_handler_storage.py
+++ b/tests/data_mid_layer_tests/test_handler_storage.py
--- a/tests/data_mid_layer_tests/test_processor.py
+++ b/tests/data_mid_layer_tests/test_processor.py
--- a/tests/finco/test_cfg.py
+++ b/tests/finco/test_cfg.py
@@ -0,0 +1,71 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+import unittest
+import shutil
+import difflib
+from qlib.finco.tpl import get_tpl_path
+import ruamel.yaml as yaml
+
+from qlib.data.dataset.handler import DataHandlerLP
+from qlib.utils import init_instance_by_config
+from qlib.tests import TestAutoData
+
+from pathlib import Path
+from qlib.finco.tpl import get_tpl_path
+from qlib.finco.task import YamlEditTask
+
+DIRNAME = Path(__file__).absolute().resolve().parent
+
+
+class FincoTpl(TestAutoData):
+    def test_tpl_consistence(self):
+        """Motivation: make sure the configuable template is consistent with the default config"""
+        tpl_p = get_tpl_path()
+        with (tpl_p / "sl" / "workflow_config.yaml").open("rb") as fp:
+            config = yaml.safe_load(fp)
+        # init_data_handler
+        hd: DataHandlerLP = init_instance_by_config(config["task"]["dataset"]["kwargs"]["handler"])
+        # NOTE: The config in workflow_config.yaml is generated by the following code:
+        # dump in yaml format to file without auto linebreak
+        # print(yaml.dump(hd.data_loader.fields, width=10000, stream=open("_tmp", "w")))
+
+        with (tpl_p / "sl-cfg" / "workflow_config.yaml").open("rb") as fp:
+            config = yaml.safe_load(fp)
+        hd_ds: DataHandlerLP = init_instance_by_config(config["task"]["dataset"]["kwargs"]["handler"])
+        self.assertEqual(hd_ds.data_loader.fields, hd.data_loader.fields)
+
+        check = hd_ds.fetch().fillna(0.0) == hd.fetch().fillna(0.0)
+        self.assertTrue(check.all().all())
+
+    def test_update_yaml(self):
+        p = get_tpl_path() / "sl" / "workflow_config.yaml"
+        p_new = DIRNAME / "_test_config.yaml"
+        shutil.copy(p, p_new)
+        updated_content = """
+class: LGBModelTest
+module_path: qlib.contrib.model.gbdt
+kwargs:
+    loss: mse
+    colsample_bytree: 1.8879
+    learning_rate: 0.3
+    subsample: 0.8790
+    lambda_l1: 205.7000
+    lambda_l2: 580.9769
+    max_depth: 9
+    num_leaves: 211
+    num_threads: 21
+"""
+        t = YamlEditTask(p_new, "task.model", updated_content)
+        t.execute()
+        # NOTE: the formmat is changed by ruamel.yaml, so it can't be compared by text directly..
+        # print the diff between p and p_new with difflib
+        # with p.open("r") as fp:
+        #     content = fp.read()
+        # with p_new.open("r") as fp:
+        #     content_new = fp.read()
+        # for line in difflib.unified_diff(content, content_new, fromfile="original", tofile="new", lineterm=""):
+        #     print(line)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/finco/test_sumarize.py
+++ b/tests/finco/test_sumarize.py
@@ -0,0 +1,66 @@
+import unittest
+import os
+import shutil
+
+from dotenv import load_dotenv
+# pydantic support load_dotenv,   so load_dotenv will be deprecated in the future.
+
+from qlib.finco.task import SummarizeTask
+from qlib.finco.workflow import WorkflowContextManager
+from qlib.finco.llm import APIBackend
+from qlib.finco.workflow import WorkflowManager
+
+load_dotenv(verbose=True, override=True)
+
+
+class TestSummarize(unittest.TestCase):
+
+    def test_chat(self):
+        messages = [
+            {
+                "role": "system",
+                "content": "Your are a professional financial assistant.",
+            },
+            {
+                "role": "user",
+                "content": "How to write a perfect quant strategy.",
+            },
+        ]
+        response = APIBackend().try_create_chat_completion(messages=messages)
+        print(response)
+
+    def test_execution(self):
+        task = SummarizeTask()
+        context = WorkflowContextManager()
+        context.set_context("workspace", "../../examples/benchmarks/Linear")
+        context.set_context("user_prompt", "My main focus is on the performance of the strategy's return."
+                                           "Please summarize the information and give me some advice.")
+        task.assign_context_manager(context)
+        resp = task.execute()
+        print(resp)
+
+    def test_generate_batch_result(self):
+        wm = WorkflowManager()
+
+        prompt = wm.default_user_prompt
+        # prompt = ""
+
+        workdir = os.path.dirname(wm.get_context().get_context("workspace"))
+        summaries_path = os.path.join(workdir, "summaries")
+
+        if not os.path.exists(summaries_path):
+            os.makedirs(summaries_path)
+
+        for i in range(10):
+            wm.run(prompt)
+            if os.path.exists(f"{workdir}/finCoReport.md"):
+                shutil.move(f"{workdir}/finCoReport.md", f"{workdir}/summaries/finCoReport{i}.md")
+
+    def test_parse2txt(self):
+        task = SummarizeTask()
+        resp = task.get_info_from_file("")
+        print(resp)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/finco/test_utils.py
+++ b/tests/finco/test_utils.py
@@ -0,0 +1,23 @@
+import unittest
+from qlib.finco.utils import SingletonBaseClass
+
+
+class TimeUtils(unittest.TestCase):
+
+    def test_singleton(self):
+        # self.assertEqual(self.to_str(data.tail()), self.to_str(res))
+        closure_checker = []
+
+        class A(SingletonBaseClass):
+
+            def __init__(self) -> None:
+                closure_checker.append(0)
+
+        A()
+        self.assertEqual(len(closure_checker), 1)
+        A()
+        self.assertEqual(len(closure_checker), 1)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/misc/test_index_data.py
+++ b/tests/misc/test_index_data.py
@@ -76,7 +76,7 @@ class IndexDataTest(unittest.TestCase):
        self.assertTrue(np.isnan(sd.loc["bar", "g"]))

        # support slicing
-        print(sd.loc[~sd.loc[:, "g"].isna().data.astype(np.bool)])
+        print(sd.loc[~sd.loc[:, "g"].isna().data.astype(bool)])

        print(self.assertTrue(idd.SingleData().index == idd.SingleData().index))

--- a/tests/rl/test_saoe_simple.py
+++ b/tests/rl/test_saoe_simple.py
@@ -31,7 +31,6 @@ FEATURE_DATA_DIR = DATA_DIR / "processed"
 ORDER_DIR = DATA_DIR / "order" / "valid_bidir"

 CN_DATA_DIR = DATA_ROOT_DIR / "cn"
-CN_BACKTEST_DATA_DIR = CN_DATA_DIR / "backtest"
 CN_FEATURE_DATA_DIR = CN_DATA_DIR / "processed"
 CN_ORDER_DIR = CN_DATA_DIR / "order" / "test"
 CN_POLICY_WEIGHTS_DIR = CN_DATA_DIR / "weights"
@@ -49,7 +48,7 @@ def test_pickle_data_inspect():
 def test_simulator_first_step():
    order = Order("AAL", 30.0, 0, pd.Timestamp("2013-12-11 00:00:00"), pd.Timestamp("2013-12-11 23:59:59"))

-    simulator = SingleAssetOrderExecutionSimple(order, BACKTEST_DATA_DIR)
+    simulator = SingleAssetOrderExecutionSimple(order, DATA_DIR)
    state = simulator.get_state()
    assert state.cur_time == pd.Timestamp("2013-12-11 09:30:00")
    assert state.position == 30.0
@@ -83,7 +82,7 @@ def test_simulator_first_step():
 def test_simulator_stop_twap():
    order = Order("AAL", 13.0, 0, pd.Timestamp("2013-12-11 00:00:00"), pd.Timestamp("2013-12-11 23:59:59"))

-    simulator = SingleAssetOrderExecutionSimple(order, BACKTEST_DATA_DIR)
+    simulator = SingleAssetOrderExecutionSimple(order, DATA_DIR)
    for _ in range(13):
        simulator.step(1.0)

@@ -106,10 +105,10 @@ def test_simulator_stop_early():
    order = Order("AAL", 1.0, 1, pd.Timestamp("2013-12-11 00:00:00"), pd.Timestamp("2013-12-11 23:59:59"))

    with pytest.raises(ValueError):
-        simulator = SingleAssetOrderExecutionSimple(order, BACKTEST_DATA_DIR)
+        simulator = SingleAssetOrderExecutionSimple(order, DATA_DIR)
        simulator.step(2.0)

-    simulator = SingleAssetOrderExecutionSimple(order, BACKTEST_DATA_DIR)
+    simulator = SingleAssetOrderExecutionSimple(order, DATA_DIR)
    simulator.step(1.0)

    with pytest.raises(AssertionError):
@@ -119,7 +118,7 @@ def test_simulator_stop_early():
 def test_simulator_start_middle():
    order = Order("AAL", 15.0, 1, pd.Timestamp("2013-12-11 10:15:00"), pd.Timestamp("2013-12-11 15:44:59"))

-    simulator = SingleAssetOrderExecutionSimple(order, BACKTEST_DATA_DIR)
+    simulator = SingleAssetOrderExecutionSimple(order, DATA_DIR)
    assert len(simulator.ticks_for_order) == 330
    assert simulator.cur_time == pd.Timestamp("2013-12-11 10:15:00")
    simulator.step(2.0)
@@ -138,7 +137,7 @@ def test_simulator_start_middle():
 def test_interpreter():
    order = Order("AAL", 15.0, 1, pd.Timestamp("2013-12-11 10:15:00"), pd.Timestamp("2013-12-11 15:44:59"))

-    simulator = SingleAssetOrderExecutionSimple(order, BACKTEST_DATA_DIR)
+    simulator = SingleAssetOrderExecutionSimple(order, DATA_DIR)
    assert len(simulator.ticks_for_order) == 330
    assert simulator.cur_time == pd.Timestamp("2013-12-11 10:15:00")

@@ -219,7 +218,7 @@ def test_network_sanity():
    # we won't check the correctness of networks here
    order = Order("AAL", 15.0, 1, pd.Timestamp("2013-12-11 9:30:00"), pd.Timestamp("2013-12-11 15:59:59"))

-    simulator = SingleAssetOrderExecutionSimple(order, BACKTEST_DATA_DIR)
+    simulator = SingleAssetOrderExecutionSimple(order, DATA_DIR)
    assert len(simulator.ticks_for_order) == 390

    class EmulateEnvWrapper(NamedTuple):
@@ -259,7 +258,7 @@ def test_twap_strategy(finite_env_type):
    csv_writer = CsvWriter(Path(__file__).parent / ".output")

    backtest(
-        partial(SingleAssetOrderExecutionSimple, data_dir=BACKTEST_DATA_DIR, ticks_per_step=30),
+        partial(SingleAssetOrderExecutionSimple, data_dir=DATA_DIR, ticks_per_step=30),
        state_interp,
        action_interp,
        orders,
@@ -290,7 +289,7 @@ def test_cn_ppo_strategy():
    csv_writer = CsvWriter(Path(__file__).parent / ".output")

    backtest(
-        partial(SingleAssetOrderExecutionSimple, data_dir=CN_BACKTEST_DATA_DIR, ticks_per_step=30),
+        partial(SingleAssetOrderExecutionSimple, data_dir=CN_DATA_DIR, ticks_per_step=30),
        state_interp,
        action_interp,
        orders,
@@ -319,7 +318,7 @@ def test_ppo_train():
    policy = PPO(network, state_interp.observation_space, action_interp.action_space, 1e-4)

    train(
-        partial(SingleAssetOrderExecutionSimple, data_dir=CN_BACKTEST_DATA_DIR, ticks_per_step=30),
+        partial(SingleAssetOrderExecutionSimple, data_dir=CN_DATA_DIR, ticks_per_step=30),
        state_interp,
        action_interp,
        orders,
Author	SHA1	Message	Date
Xu Yang	2df211c320	merge all commit	2023-07-13 16:29:44 +08:00
Fivele-Li	effed382e9	Optimize prompt for entire learn loop (#1589 ) * Adjust prompt and fix cases * adjust summarizeTask & learn prompts; * fix typos & drop duplicate task method; * adjust learn prompts;	2023-07-11 18:13:52 +08:00
Fivele-Li	86ffd1799d	Add knowledge module and tune summarizeTask (#1582 ) * Add knowledge module * add KnowledgeExperiment add KnowledgeBase; * add knowledge associate prompts to template; * Add Topic class * add Topic to summarize knowledge; * add recorder's metric to summarizeTask; --------- Co-authored-by: Cadenza-Li <362237642@qq.com>	2023-07-06 11:39:36 +08:00
Young	aef11536e3	rename & test	2023-07-04 20:28:08 +08:00
Xu Yang	8b0fdf1623	Merge pull request #1581 from microsoft/xuyang1/fix_singleton_bug fix singleton bug	2023-07-04 16:51:51 +08:00
Xu Yang	9a36f8da20	fix singleton bug	2023-07-04 16:20:02 +08:00
Xu Yang	b7757d5008	Merge pull request #1580 from microsoft/xuyang1/refine_workflow_to_increase_success_rate refine workflow to increase success rate	2023-07-03 17:59:54 +08:00
Xu Yang	ee5e5cfdd8	remove useless code	2023-07-03 17:57:13 +08:00
Xu Yang	6cb87ecfd1	refine code to use qrun	2023-07-03 17:56:22 +08:00
Xu Yang	9119bcdd3c	Merge pull request #1576 from microsoft/xuyang1/add_config_and_code_dump_task refine workflow and prompts	2023-06-30 14:43:49 +08:00
Xu Yang	4fccf8112d	fix one workflow	2023-06-30 14:33:41 +08:00
Xu Yang	73bd79ca1a	merge into one commit	2023-06-30 14:23:40 +08:00
Fivele-Li	7e84f3aae2	Add backtest and backforward task (#1568 ) * * add TrainTask & BacktestTask; * add BackForwardTask; * adjust prompt_template.yaml which default config failed to backtest; * run workflow in loop * add update method to prompt_template.py * remove debug code * Adjust Learn Process * add LearnManager class & use LearnManager to update system prompt; * use qrun to replace recorder for training and backtesting; * Adjust analyser * analyser independent of recorder; * rename analyser's workspace attribution; * analyser load variable by recorder. --------- Co-authored-by: Cadenza-Li <362237642@qq.com>	2023-06-30 10:04:43 +08:00
Fivele-Li	1326ac614d	Add docs to context and retrieve (#1566 ) * add analyser docstring to context; * add retrieve method to context manager; * add notes to retrieve	2023-06-24 21:47:27 +08:00
Fivele-Li	f12184cc0f	Add analyser task and optimize interact (#1552 ) * * optimize interact * add AnalyserTask * optimize logger format and add render feature * format optimize	2023-06-16 11:42:45 +08:00
Xu Yang	a70386ad52	Merge pull request #1550 from microsoft/xuyang1/refine_task_prompts add datahandler and design action task according to component	2023-06-14 14:52:42 +08:00
Xu Yang	74619ed8d8	fix using defaut in record strategy and backtest	2023-06-14 14:52:16 +08:00
Fivele-Li	1a523df007	Optimize log and interact of FinCo (#1549 ) * use FinCoLog for a better interact experience * addition file changes * optimize format * optimize format	2023-06-14 14:48:17 +08:00
Xu Yang	f9cc8a5aaa	remove useless prompt	2023-06-14 10:46:38 +08:00
Xu Yang	7762c5a1fd	add datahandler and design action task according to component	2023-06-13 23:28:27 +08:00
Xu Yang	fa7ef29281	Merge pull request #1548 from microsoft/xuyang1/add_dump_to_file_task add simple readme & move prompt templates to outer yaml file to make the code clean	2023-06-13 15:29:13 +08:00
Xu Yang	429c9a7c66	format	2023-06-13 15:27:59 +08:00
Xu Yang	80fbc00792	move prompt templates to yaml file to make code clean	2023-06-13 15:21:19 +08:00
Xu Yang	01accec24c	update code	2023-06-12 16:25:16 +08:00
Fivele-Li	1d88830b0d	Add recorder task and visualize (#1542 ) * add recorder task * add batch generate summarize report unittest. * * add recorder to RecorderTask; * add matplot figure to analyzer.py * add image to markdown; * Add some log * update figure path. --------- Co-authored-by: Young <afe.young@gmail.com> Co-authored-by: Cadenza-Li <362237642@qq.com>	2023-06-12 15:48:00 +08:00
you-n-g	ad7498e287	Edit yaml task (#1538 ) * Edit yaml task * update comments	2023-06-02 00:44:41 +08:00
you-n-g	73d51f05b4	Init workspace and CMDTask (#1537 ) * Update setup.py and config * WIP * init_workspace and CMDTask * Delete test_sumarize.py	2023-06-01 23:32:35 +08:00
Fivele-Li	3b56b8e6c0	Optimize summarize task prompt and others (#1533 ) * 1.update prompt; 2.update fetch information method. * 1.update prompt; 2.save result to markdown; * 1.get context info from context_manager; 2.run the entire process successfully.	2023-06-01 21:22:24 +08:00
you-n-g	40e0c329ba	Add configurable dataset (#1535 )	2023-06-01 20:05:02 +08:00
Xu Yang	e376648860	Merge pull request #1536 from microsoft/xuyang1/add_debug_mode_to_save_cache add a debug mode to speed up debug process	2023-06-01 19:44:17 +08:00
Xu Yang	5f37f32184	update code	2023-06-01 19:38:26 +08:00
Xu Yang	d46b4c1ebf	Merge pull request #1534 from microsoft/xuyang1/add_code_implementation_task add code implementation task	2023-06-01 18:13:05 +08:00
Xu Yang	0515524b51	add code implementation code	2023-06-01 18:04:31 +08:00
Xu Yang	cda32d5703	Merge pull request #1532 from microsoft/xuyang1/add-plan-and-config-task-implementation add the initial version of plan and config task implementation	2023-06-01 11:20:04 +08:00
Xu Yang	e2332a004b	imporove some words in prompt	2023-06-01 01:09:14 +08:00
Xu Yang	08d9dbccc9	update v1 code containing SLplan and config action	2023-06-01 00:36:04 +08:00
Fivele-Li	e7cd93a36d	add base method for summarization; (#1530 )	2023-05-31 15:50:34 +08:00
Xu Yang	3919678028	split task into workflow and task to make the strcture more clear	2023-05-31 11:45:25 +08:00
Xu Yang	421b1403b2	Merge pull request #1528 from microsoft/xuyang1/refine_task_and_implement_workflow_task_as_example Xuyang1/refine task and implement workflow task as example	2023-05-31 11:36:36 +08:00
Xu Yang	94102fb742	remove tasktype variable	2023-05-31 11:35:54 +08:00
Cadenza-Li	74a5d7c8af	add parse method for summarization;	2023-05-31 00:08:21 +08:00
Xu Yang	ce39b4b6f8	add qlib auto init so logger can display info	2023-05-30 21:52:35 +08:00
Xu Yang	2af35d9c89	second commit	2023-05-30 20:20:16 +08:00
Xu Yang	f37643550b	first round	2023-05-30 20:19:58 +08:00
Xu Yang	55611aa43e	Merge pull request #1527 from microsoft/xuyang1/add_openai_api_support add openai interface support	2023-05-30 13:44:10 +08:00
Xu Yang	f24253efd2	add openai interface support	2023-05-30 13:42:01 +08:00
Young	7c4f3b8a7d	Initial interface for discussion	2023-05-24 12:18:31 +08:00
you-n-g	94268619c4	Update README.md	2023-05-23 09:50:00 +08:00
Huoran Li	8d60a6a02b	Resolve RL FIXMES (#1503 ) * Solve several small FIXMEs left in RL * Add TODO in example * Minor bugfix * black	2023-05-17 16:57:08 +08:00
Fivele-Li	7234308651	Add base config in yml (#1500 ) * path on Windows contains double '/' which may cause open file failed. * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * add baseConfig in yml,user can add new keys or update/drop keys in baseConfig; * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * pip release version 23.1 on Apr.15 2023, CI failed to run, Please refer to #1495 ofr detailed logs. The pip version has been temporarily fixed to 23.0.1. * 1.Search for baseConfig in multiple directories; 2.Add user instructions in qrun; * fix format with black * 1.modify baseConfig key to BASE_CONFIG_PATH; 2.only find config file in absolute path and relative path; * load BASE_CONFIG_PATH on absolute path & relative path; * fix Lint with black --------- Co-authored-by: lijinhui <362237642@qq.com>	2023-05-12 17:35:37 +08:00
Chaoying	acf5df27ce	Add support for redis password (#1508 )	2023-05-08 16:17:15 +08:00
Chaoying	37a59f28d3	Fix deprecated syntax in numpy (#1507 ) * Fix deprecated syntax in numpy * Replace np.bool with bool	2023-05-08 16:17:02 +08:00
YQ Tsui	b084c352f5	provide dtype to empty series to surpress warning; fix type (#1449 )	2023-05-05 17:47:44 +08:00
Maksim Zayakin	9e22e5168b	Remove unused `DNNModelPytorch` params (#1470 ) * Remove lr_decay and lr_decay_steps params More flexible way to pass a scheduler (via callable function) is already supported * remove lr_decay and lr_decay_steps from mlp workflow configs	2023-04-28 17:48:40 +08:00
Fivele-Li	dceff7b471	Specify the tianshou version to match the dev environment to avoid the error in issue #1477 . (#1502 )	2023-04-28 13:50:25 +08:00
Huoran Li	7f1e8c5206	Refine Qlib RL data format (#1480 ) * wip * wip * wip * Fix naming errors * Backtest test passed * Why training stuck? * Minor * Refine train configs * Use dummy in training * Remove pickle_dataframe * CI * CI * Add more strict condition to filter orders * Pass test * Add TODO in example --------- Co-authored-by: Young <afe.young@gmail.com>	2023-04-26 21:14:30 +08:00
Fivele-Li	46264dfec9	normpath for Windows (#1495 ) * path on Windows contains double '/' which may cause open file failed. * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * locate import numpy error * pip release version 23.1 on Apr.15 2023, CI failed to run, Please refer to #1495 ofr detailed logs. The pip version has been temporarily fixed to 23.0.1. --------- Co-authored-by: lijinhui <362237642@qq.com>	2023-04-26 16:26:12 +08:00
Fivele-Li	754799ab05	update ubuntu CI version; (#1488 ) * update ubuntu CI version; (End of standard support for 18.04 LTS - 31 May 2023) * update ubuntu CI version; --------- Co-authored-by: lijinhui <362237642@qq.com>	2023-04-10 17:06:48 +08:00
you-n-g	32c3070b73	Refine DDG-DA (#1472 ) * Run ddg-da successfully * Support include valid; More parameters * Support L2 reg & visualization * Blackformat * Enable fill_method * Support specify handler & optim dataset * Fix Pylint	2023-04-07 15:00:21 +08:00
you-n-g	40de67265a	Update Docs about some concepts in DataHandler (#1485 )	2023-04-07 10:02:16 +08:00