1
0
mirror of https://github.com/microsoft/qlib.git synced 2026-06-06 05:51:17 +08:00
Files
qlib/scripts/data_collector
Fivele-Li 47bd13295b Fix Yahoo daily data format inconsistent (#1517)
* Fix FutureWarning: Passing unit-less datetime64 dtype to .astype is deprecated and will raise in a future version. Pass 'datetime64[ns]' instead

* align index format while end date contains current day data

* fix black

* fix black

* optimize code

* optimize code

* optimize code

* fix ci error

* check ci error

* fix ci error

* check ci error

* check ci error

* check ci error

* check ci error

* check ci error

* check ci error

* fix ci error

* fix ci error

* fix ci error

* fix ci error

* fix ci error

---------

Co-authored-by: Cadenza-Li <362237642@qq.com>
Co-authored-by: Linlang <Lv.Linlang@hotmail.com>
2024-06-21 11:22:23 +08:00
..
2023-11-21 20:31:47 +08:00
2022-07-07 00:06:47 +08:00
2024-06-20 18:12:07 +08:00
2023-11-21 20:31:47 +08:00
2022-12-11 14:29:16 +08:00
2023-11-21 20:31:47 +08:00
2022-04-27 18:43:26 +08:00

Data Collector

Introduction

Scripts for data collection

Custom Data Collection

Specific implementation reference: https://github.com/microsoft/qlib/tree/main/scripts/data_collector/yahoo

  1. Create a dataset code directory in the current directory
  2. Add collector.py
    • add collector class:
      CUR_DIR = Path(__file__).resolve().parent
      sys.path.append(str(CUR_DIR.parent.parent))
      from data_collector.base import BaseCollector, BaseNormalize, BaseRun
      class UserCollector(BaseCollector):
          ...
      
    • add normalize class:
      class UserNormalzie(BaseNormalize):
          ...
      
    • add CLI class:
      class Run(BaseRun):
          ...
      
  3. add README.md
  4. add requirements.txt

Description of dataset

Basic data
Features Price/Volume:
   - $close/$open/$low/$high/$volume/$change/$factor
Calendar <freq>.txt:
   - day.txt
   - 1min.txt
Instruments <market>.txt:
   - required: all.txt;
   - csi300.txt/csi500.txt/sp500.txt
  • Features: data, digital
    • if not adjusted, factor=1

Data-dependent component

To make the component running correctly, the dependent data are required

Component required data
Data retrieval Features, Calendar, Instrument
Backtest Features[Price/Volume], Calendar, Instruments