1
0
mirror of https://github.com/microsoft/qlib.git synced 2026-06-06 05:51:17 +08:00
Files
qlib/scripts/data_collector
Alaa Kaddour 7095e755fa fix: replace deprecated pandas fillna(method=) with ffill()/bfill() (#1987)
* fix: replace deprecated pandas fillna(method=) with ffill()/bfill()

  Replace deprecated fillna(method="ffill"/"bfill") calls with modern
  pandas ffill() and bfill() methods to fix FutureWarnings in pandas 2.x.

  Also includes black formatting fixes for compliance.

  This addresses the pandas deprecation warnings portion of issue #1981.
  Other issues (date parsing, type conversion, timezone handling) will be
  addressed in separate commits.

  Fixes:
  - Yahoo collector: 2 instances in calc_change() and adjusted_price()
  - BaoStock collector: 1 instance in calc_change()
  - Core utils: resam.py fillna operations
  - Backtest: profit_attribution.py stock data processing
  - High-freq ops: FFillNan and BFillNan operators
  - Position analysis: parse_position.py weight processing

  Partially addresses GitHub issue #1981

* lint with black

* lint with black

* limit minimum version of pandas

* limit minimum version of pandas

---------

Co-authored-by: Linlang <Lv.Linlang@hotmail.com>
2025-08-19 16:00:29 +08:00
..
2024-12-04 12:10:05 +08:00
2022-07-07 00:06:47 +08:00
2023-11-21 20:31:47 +08:00
2022-04-27 18:43:26 +08:00

Data Collector

Introduction

Scripts for data collection

Custom Data Collection

Specific implementation reference: https://github.com/microsoft/qlib/tree/main/scripts/data_collector/yahoo

  1. Create a dataset code directory in the current directory
  2. Add collector.py
    • add collector class:
      CUR_DIR = Path(__file__).resolve().parent
      sys.path.append(str(CUR_DIR.parent.parent))
      from data_collector.base import BaseCollector, BaseNormalize, BaseRun
      class UserCollector(BaseCollector):
          ...
      
    • add normalize class:
      class UserNormalzie(BaseNormalize):
          ...
      
    • add CLI class:
      class Run(BaseRun):
          ...
      
  3. add README.md
  4. add requirements.txt

Description of dataset

Basic data
Features Price/Volume:
   - $close/$open/$low/$high/$volume/$change/$factor
Calendar <freq>.txt:
   - day.txt
   - 1min.txt
Instruments <market>.txt:
   - required: all.txt;
   - csi300.txt/csi500.txt/sp500.txt
  • Features: data, digital
    • if not adjusted, factor=1

Data-dependent component

To make the component running correctly, the dependent data are required

Component required data
Data retrieval Features, Calendar, Instrument
Backtest Features[Price/Volume], Calendar, Instruments