1
0
mirror of https://github.com/microsoft/qlib.git synced 2026-06-06 05:51:17 +08:00
Files
qlib/scripts/data_collector/pit
you-n-g 1b426503fc feat: data improve, support parquet (#1966)
* refactor: relocate CLI modules to qlib.cli and update references

* refactor: introduce read_as_df and rename csv_path to data_path

* lint

* refactor: rename csv_path to data_path and use QSettings.provider_uri

* fix pylint error

* fix get_data command

* add comments to CI yaml

* update docs

---------

Co-authored-by: Linlang <Lv.Linlang@hotmail.com>
2025-08-07 15:04:37 +08:00
..
2022-05-07 20:59:06 +08:00

Collect Point-in-Time Data

Please pay ATTENTION that the data is collected from baostock and the data might not be perfect. We recommend users to prepare their own data if they have high-quality dataset. For more information, users can refer to the related document

Requirements

pip install -r requirements.txt

Collector Data

Download Quarterly CN Data

cd qlib/scripts/data_collector/pit/
# download from baostock.com
python collector.py download_data --source_dir ~/.qlib/stock_data/source/pit --start 2000-01-01 --end 2020-01-01 --interval quarterly

Downloading all data from the stock is very time-consuming. If you just want to run a quick test on a few stocks, you can run the command below

python collector.py download_data --source_dir ~/.qlib/stock_data/source/pit --start 2000-01-01 --end 2020-01-01 --interval quarterly --symbol_regex "^(600519|000725).*"

Normalize Data

python collector.py normalize_data --interval quarterly --source_dir ~/.qlib/stock_data/source/pit --normalize_dir ~/.qlib/stock_data/source/pit_normalized

Dump Data into PIT Format

cd qlib/scripts
python dump_pit.py dump --data_path ~/.qlib/stock_data/source/pit_normalized --qlib_dir ~/.qlib/qlib_data/cn_data --interval quarterly