mirror of
https://github.com/microsoft/qlib.git
synced 2026-06-06 05:51:17 +08:00
* fix the bug that the HS_SYMBOLS_URL is 404 * fix bug * format with black * fix pylint error * change error code * fix ci error * fix ci error * optimize code * optimize code * add comments --------- Co-authored-by: Linlang <Lv.Linlang@hotmail.com>
Data Collector
Introduction
Scripts for data collection
- yahoo: get US/CN stock data from Yahoo Finance
- fund: get fund data from http://fund.eastmoney.com
- cn_index: get CN index from http://www.csindex.com.cn, CSI300/CSI100
- us_index: get US index from https://en.wikipedia.org/wiki, SP500/NASDAQ100/DJIA/SP400
- contrib: scripts for some auxiliary functions
Custom Data Collection
Specific implementation reference: https://github.com/microsoft/qlib/tree/main/scripts/data_collector/yahoo
- Create a dataset code directory in the current directory
- Add
collector.py- add collector class:
CUR_DIR = Path(__file__).resolve().parent sys.path.append(str(CUR_DIR.parent.parent)) from data_collector.base import BaseCollector, BaseNormalize, BaseRun class UserCollector(BaseCollector): ... - add normalize class:
class UserNormalzie(BaseNormalize): ... - add
CLIclass:class Run(BaseRun): ...
- add collector class:
- add
README.md - add
requirements.txt
Description of dataset
| Basic data | |
|---|---|
| Features | Price/Volume: - $close/$open/$low/$high/$volume/$change/$factor |
| Calendar | <freq>.txt: - day.txt - 1min.txt |
| Instruments | <market>.txt: - required: all.txt; - csi300.txt/csi500.txt/sp500.txt |
Features: data, digital- if not adjusted, factor=1
Data-dependent component
To make the component running correctly, the dependent data are required
| Component | required data |
|---|---|
| Data retrieval | Features, Calendar, Instrument |
| Backtest | Features[Price/Volume], Calendar, Instruments |