* fix: replace deprecated pandas fillna(method=) with ffill()/bfill() Replace deprecated fillna(method="ffill"/"bfill") calls with modern pandas ffill() and bfill() methods to fix FutureWarnings in pandas 2.x. Also includes black formatting fixes for compliance. This addresses the pandas deprecation warnings portion of issue #1981. Other issues (date parsing, type conversion, timezone handling) will be addressed in separate commits. Fixes: - Yahoo collector: 2 instances in calc_change() and adjusted_price() - BaoStock collector: 1 instance in calc_change() - Core utils: resam.py fillna operations - Backtest: profit_attribution.py stock data processing - High-freq ops: FFillNan and BFillNan operators - Position analysis: parse_position.py weight processing Partially addresses GitHub issue #1981 * lint with black * lint with black * limit minimum version of pandas * limit minimum version of pandas --------- Co-authored-by: Linlang <Lv.Linlang@hotmail.com>
Collector Data
Get Qlib data(bin file)
- get data:
python scripts/get_data.py qlib_data - parameters:
target_dir: save dir, by default ~/.qlib/qlib_data/cn_data_5minversion: dataset version, value from [v2], by defaultv2v2end date is 2022-12
interval:5minregion:hs300delete_old: delete existing data fromtarget_dir(features, calendars, instruments, dataset_cache, features_cache), value from [True,False], by defaultTrueexists_skip: traget_dir data already exists, skipget_data, value from [True,False], by defaultFalse
- examples:
# hs300 5min python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/hs300_data_5min --region hs300 --interval 5min
Collector Baostock high frequency data to qlib
collector Baostock high frequency data and dump into
qlibformat. If the above ready-made data can't meet users' requirements, users can follow this section to crawl the latest data and convert it to qlib-data.
-
download data to csv:
python scripts/data_collector/baostock_5min/collector.py download_dataThis will download the raw data such as date, symbol, open, high, low, close, volume, amount, adjustflag from baostock to a local directory. One file per symbol.
- parameters:
source_dir: save the directoryinterval:5minregion:HS300start: start datetime, by default Noneend: end datetime, by default None
- examples:
# cn 5min data python collector.py download_data --source_dir ~/.qlib/stock_data/source/hs300_5min_original --start 2022-01-01 --end 2022-01-30 --interval 5min --region HS300
- parameters:
-
normalize data:
python scripts/data_collector/baostock_5min/collector.py normalize_dataThis will:
- Normalize high, low, close, open price using adjclose.
- Normalize the high, low, close, open price so that the first valid trading date's close price is 1.
- parameters:
source_dir: csv directorynormalize_dir: result directoryinterval:5minif
interval == 5min,qlib_data_1d_dircannot beNoneregion:HS300date_field_name: column name identifying time in csv files, by defaultdatesymbol_field_name: column name identifying symbol in csv files, by defaultsymbolend_date: if notNone, normalize the last date saved (including end_date); ifNone, it will ignore this parameter; by defaultNoneqlib_data_1d_dir: qlib directory(1d data) if interval==5min, qlib_data_1d_dir cannot be None, normalize 5min needs to use 1d data;# qlib_data_1d can be obtained like this: python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --interval 1d --region cn --version v3
- examples:
# normalize 5min cn python collector.py normalize_data --qlib_data_1d_dir ~/.qlib/qlib_data/cn_data --source_dir ~/.qlib/stock_data/source/hs300_5min_original --normalize_dir ~/.qlib/stock_data/source/hs300_5min_nor --region HS300 --interval 5min
-
dump data:
python scripts/dump_bin.py dump_allThis will convert the normalized csv in
featuredirectory as numpy array and store the normalized data one file per column and one symbol per directory.- parameters:
data_path: stock data path or directory, normalize result(normalize_dir)qlib_dir: qlib(dump) data directorfreq: transaction frequency, by defaultdayfreq_map = {1d:day, 5mih: 5min}max_workers: number of threads, by default 16include_fields: dump fields, by default""exclude_fields: fields not dumped, by default `"""dump_fields =
include_fields if include_fields else set(symbol_df.columns) - set(exclude_fields) exclude_fields else symbol_df.columnssymbol_field_name: column name identifying symbol in csv files, by defaultsymboldate_field_name: column name identifying time in csv files, by defaultdatefile_suffix: stock data file format, by default ".csv"
- examples:
# dump 5min cn python dump_bin.py dump_all --data_path ~/.qlib/stock_data/source/hs300_5min_nor --qlib_dir ~/.qlib/qlib_data/hs300_5min_bin --freq 5min --exclude_fields date,symbol
- parameters: