mirror of
https://github.com/microsoft/qlib.git
synced 2026-06-06 05:51:17 +08:00
* MVP for Indian Stocks in qlib using yahooquery * cleaned with black * cleaned with black * add YahooNormalizeIN and YahooNormalizeIN1d * cleaned the code * added 1min for IN and also updated readme * update comments * fix comments * recorder support upload both raw file and directory * fix comments * Update README.md * Fix docs of QlibRecorder * sort index after loader (#538) make sure the fetch method is based on a index-sorted pd.DataFrame * refactor online serving rolling api * refactor TRA * format by black * fix horizon * fix TRA when use single head * clean up * improve pretrain * update README * fix tra when logdir is None * fix tra when logdir is None * Update strategy.py * Update README.md * Update README.md * Conda Suggestion * code standard docs * Update ensemble.py (#560) * Fix CI Bug (#575) Co-authored-by: yuxwang <anduinnn@foxmail.com> * Update gen.py (#576) * Fix multi-process loop calls (#574) * check lexsort in the 'lazy_sort_index' function (#566) * check lexsort * check lexsort * lexsort comment * lexsort comment * Delete .DS_Store * Update README.md * bug fix & use oracle transport pretrain * mend * Add `backend_freq_config` parameter, support multi-freq uri * Add sample_config to QlibDataLoader, support multi-freq * add multi-freq example * get_cls_kwargs renamed get_callable_kwargs * support multi-freq uri * Add inst_processors to D.features * Fix typo * Fix the index type of the multi-freq example * Fix duplicate mlflow directories in tests * Add DataPathManager to QlibConfig && modify inst_processors to supports list only * Modify the default value in the multi_freq example * Modify client-server mode and dataset-cache to disable inst_processor * Add wheel package to github CI * fix comment * Update FAQ.rst * Update README.md Fix wrong link * Update the docs of TaskManager (#586) * Update manage.py * update yaml * update run_all_model * Modify the Feature to be case sensitive (#589) * update README * remove verbose * fix spell bug * fix typos (#592) * Update Release Note * fix portfolio bug * Add calendar support for resample * add freq kwargs * test.yml: Remove redundant code (#595) * Supporting shared processor (#596) * Supporting shared processor * fix readonly reverse bug * remove pytests dependency * with fit bug * fix parameter error * fix comments * Fix undefined names in Python code (#599) * Update pytorch_tabnet.py $ `flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics` ``` ./qlib/qlib/contrib/model/pytorch_tabnet.py:567:38: F821 undefined name 'inp' self.independ.append(GLU(inp, out_dim, vbs=vbs)) ^ ./qlib/examples/model_rolling/task_manager_rolling.py:75:18: F821 undefined name 'task_train' run_task(task_train, self.task_pool, experiment_name=self.experiment_name) ^ 2 F821 undefined name 'task_train' 2 ``` * Fix undefined names in Python code * from qlib.model.trainer import task_train * update seed * fix some docstring * add comments * Fix SimpleDatasetCache * Update setup.py updated classifiers * Update setup.py change to matplotlib==3.3 * Update python-publish.yml added python 3.9 * updategrade version number * Update model list * fix the type of filter_pipe * fix comment * fix record_temp * update cvxpy version * Update code_standard.rst (#587) * Update code_standard.rst * Update docs/developer/code_standard.rst Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> * Add file lock for MLflowExpManager (#619) * fix torch version * Share version number (#620) * Update initialization.rst (#622) * Update initialization.rst * Update docs/start/initialization.rst Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> * Update docs/start/initialization.rst Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> * fix bugs for running previous exmaple * fix deal amount bug * update change doc (#623) * Add files via upload * Update README.md * Update README.md * Update README.md * Delete change doc.gif * Add files via upload * Update README.md * Delete change doc.gif * Add files via upload * Delete change doc.gif * Add files via upload * Update README.md Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> * update doc * simplify run all model * fix run all model bug * Fix Models (#483) * fix gat dataset * fix tft model * Update tft.py * Fix tft.py Co-authored-by: Pengrong Zhu <zhu.pengrong@foxmail.com> * type and skip empty exp * fix model yaml config * fix tft import bug * skip empty result * fix model and yaml bug * fix wrong generate parameter * Modify multi-freq example (#626) * modify the example of multi-freq * add Copyright * add a comment to average_ops.py * modify the example of multi-freq * add comment to multi_freq_handler.py * add the Ref expression description to multi_freq_handler.py * add expression description to multi_freq_handler.py * update images * fix workflow and update framework Co-authored-by: Gaurav <2796gaurav@gmail.com> Co-authored-by: 2796gaurav <17353992+2796gaurav@users.noreply.github.com> Co-authored-by: bxdd <bxd98@126.com> Co-authored-by: Young <afe.young@gmail.com> Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> Co-authored-by: Dong Zhou <Zhou.Dong@microsoft.com> Co-authored-by: ZhangTP1996 <ztp18@mails.tsinghua.edu.cn> Co-authored-by: demon143 <59681577+demon143@users.noreply.github.com> Co-authored-by: Wangwuyi123 <51237097+Wangwuyi123@users.noreply.github.com> Co-authored-by: yuxwang <anduinnn@foxmail.com> Co-authored-by: Pengrong Zhu <zhu.pengrong@foxmail.com> Co-authored-by: Mark Zhao <50850474+markzhao98@users.noreply.github.com> Co-authored-by: cslwqxx <cslwqxx@users.noreply.github.com> Co-authored-by: Dong Zhou <evanzd@users.noreply.github.com> Co-authored-by: SaintMalik <37118134+saintmalik@users.noreply.github.com> Co-authored-by: Christian Clauss <cclauss@me.com> Co-authored-by: Anurag Kumar <mailanu98@gmail.com> Co-authored-by: demon143 <785696300@qq.com>
136 lines
6.8 KiB
Python
136 lines
6.8 KiB
Python
# Copyright (c) Microsoft Corporation.
|
|
# Licensed under the MIT License.
|
|
|
|
import pandas as pd
|
|
|
|
from qlib.data.dataset.loader import QlibDataLoader
|
|
from qlib.contrib.data.handler import DataHandlerLP, _DEFAULT_LEARN_PROCESSORS, check_transform_proc
|
|
|
|
|
|
class Avg15minLoader(QlibDataLoader):
|
|
def load(self, instruments=None, start_time=None, end_time=None) -> pd.DataFrame:
|
|
df = super(Avg15minLoader, self).load(instruments, start_time, end_time)
|
|
if self.is_group:
|
|
# feature_day(day freq) and feature_15min(1min freq, Average every 15 minutes) renamed feature
|
|
df.columns = df.columns.map(lambda x: ("feature", x[1]) if x[0].startswith("feature") else x)
|
|
return df
|
|
|
|
|
|
class Avg15minHandler(DataHandlerLP):
|
|
def __init__(
|
|
self,
|
|
instruments="csi500",
|
|
start_time=None,
|
|
end_time=None,
|
|
freq="day",
|
|
infer_processors=[],
|
|
learn_processors=_DEFAULT_LEARN_PROCESSORS,
|
|
fit_start_time=None,
|
|
fit_end_time=None,
|
|
process_type=DataHandlerLP.PTYPE_A,
|
|
filter_pipe=None,
|
|
inst_processor=None,
|
|
**kwargs,
|
|
):
|
|
infer_processors = check_transform_proc(infer_processors, fit_start_time, fit_end_time)
|
|
learn_processors = check_transform_proc(learn_processors, fit_start_time, fit_end_time)
|
|
data_loader = Avg15minLoader(
|
|
config=self.loader_config(), filter_pipe=filter_pipe, freq=freq, inst_processor=inst_processor
|
|
)
|
|
super().__init__(
|
|
instruments=instruments,
|
|
start_time=start_time,
|
|
end_time=end_time,
|
|
data_loader=data_loader,
|
|
infer_processors=infer_processors,
|
|
learn_processors=learn_processors,
|
|
process_type=process_type,
|
|
)
|
|
|
|
def loader_config(self):
|
|
|
|
# Results for dataset: df: pd.DataFrame
|
|
# len(df.columns) == 6 + 6 * 16, len(df.index.get_level_values(level="datetime").unique()) == T
|
|
# df.columns: close0, close1, ..., close16, open0, ..., open16, ..., vwap16
|
|
# freq == day:
|
|
# close0, open0, low0, high0, volume0, vwap0
|
|
# freq == 1min:
|
|
# close1, ..., close16, ..., vwap1, ..., vwap16
|
|
# df.index.name == ["datetime", "instrument"]: pd.MultiIndex
|
|
# Example:
|
|
# feature ... label
|
|
# close0 open0 low0 ... vwap1 vwap16 LABEL0
|
|
# datetime instrument ...
|
|
# 2020-10-09 SH600000 11.794546 11.819587 11.769505 ... NaN NaN -0.005214
|
|
# 2020-10-15 SH600000 12.044961 11.944795 11.932274 ... NaN NaN -0.007202
|
|
# ... ... ... ... ... ... ... ...
|
|
# 2021-05-28 SZ300676 6.369684 6.495406 6.306568 ... NaN NaN -0.001321
|
|
# 2021-05-31 SZ300676 6.601626 6.465643 6.465130 ... NaN NaN -0.023428
|
|
|
|
# features day: len(columns) == 6, freq = day
|
|
# $close is the closing price of the current trading day:
|
|
# if the user needs to get the `close` before the last T days, use Ref($close, T-1), for example:
|
|
# $close Ref($close, 1) Ref($close, 2) Ref($close, 3) Ref($close, 4)
|
|
# instrument datetime
|
|
# SH600519 2021-06-01 244.271530
|
|
# 2021-06-02 242.205917 244.271530
|
|
# 2021-06-03 242.229889 242.205917 244.271530
|
|
# 2021-06-04 245.421524 242.229889 242.205917 244.271530
|
|
# 2021-06-07 247.547089 245.421524 242.229889 242.205917 244.271530
|
|
|
|
# WARNING: Ref($close, N), if N == 0, Ref($close, N) ==> $close
|
|
|
|
fields = ["$close", "$open", "$low", "$high", "$volume", "$vwap"]
|
|
# names: close0, open0, ..., vwap0
|
|
names = list(map(lambda x: x.strip("$") + "0", fields))
|
|
|
|
config = {"feature_day": (fields, names)}
|
|
|
|
# features 15min: len(columns) == 6 * 16, freq = 1min
|
|
# $close is the closing price of the current trading day:
|
|
# if the user gets 'close' for the i-th 15min of the last T days, use `Ref(Mean($close, 15), (T-1) * 240 + i * 15)`, for example:
|
|
# Ref(Mean($close, 15), 225) Ref(Mean($close, 15), 465) Ref(Mean($close, 15), 705)
|
|
# instrument datetime
|
|
# SH600519 2021-05-31 241.769897 243.077942 244.712997
|
|
# 2021-06-01 244.271530 241.769897 243.077942
|
|
# 2021-06-02 242.205917 244.271530 241.769897
|
|
|
|
# WARNING: Ref(Mean($close, 15), N), if N == 0, Ref(Mean($close, 15), N) ==> Mean($close, 15)
|
|
|
|
# Results of the current script:
|
|
# time: 09:00 --> 09:14, ..., 14:45 --> 14:59
|
|
# fields: Ref(Mean($close, 15), 225), ..., Mean($close, 15)
|
|
# name: close1, ..., close16
|
|
#
|
|
|
|
# Expression description: take close as an example
|
|
# Mean($close, 15) ==> df["$close"].rolling(15, min_periods=1).mean()
|
|
# Ref(Mean($close, 15), 15) ==> df["$close"].rolling(15, min_periods=1).mean().shift(15)
|
|
|
|
# NOTE: The last data of each trading day, which is the average of the i-th 15 minutes
|
|
|
|
# Average:
|
|
# Average of the i-th 15-minute period of each trading day: 1 <= i <= 250 // 16
|
|
# Avg(15minutes): Ref(Mean($close, 15), 240 - i * 15)
|
|
#
|
|
# Average of the first 15 minutes of each trading day; i = 1
|
|
# Avg(09:00 --> 09:14), df.index.loc["09:14"]: Ref(Mean($close, 15), 240- 1 * 15) ==> Ref(Mean($close, 15), 225)
|
|
# Average of the last 15 minutes of each trading day; i = 16
|
|
# Avg(14:45 --> 14:59), df.index.loc["14:59"]: Ref(Mean($close, 15), 240 - 16 * 15) ==> Ref(Mean($close, 15), 0) ==> Mean($close, 15)
|
|
|
|
# 15min resample to day
|
|
# df.resample("1d").last()
|
|
tmp_fields = []
|
|
tmp_names = []
|
|
for i, _f in enumerate(fields):
|
|
_fields = [f"Ref(Mean({_f}, 15), {j * 15})" for j in range(1, 240 // 15)]
|
|
_names = [f"{names[i][:-1]}{int(names[i][-1])+j}" for j in range(240 // 15 - 1, 0, -1)]
|
|
_fields.append(f"Mean({_f}, 15)")
|
|
_names.append(f"{names[i][:-1]}{int(names[i][-1])+240 // 15}")
|
|
tmp_fields += _fields
|
|
tmp_names += _names
|
|
config["feature_15min"] = (tmp_fields, tmp_names)
|
|
# label
|
|
config["label"] = (["Ref($close, -2)/Ref($close, -1) - 1"], ["LABEL0"])
|
|
return config
|