mirror of
https://github.com/microsoft/qlib.git
synced 2026-06-06 05:51:17 +08:00
add description of dataset document (#742)
This commit is contained in:
@@ -160,7 +160,7 @@ Load and prepare data by running the following code:
|
||||
|
||||
This dataset is created by public data collected by [crawler scripts](scripts/data_collector/), which have been released in
|
||||
the same repository.
|
||||
Users could create the same dataset with it.
|
||||
Users could create the same dataset with it. [Description of dataset](https://github.com/microsoft/qlib/tree/main/scripts/data_collector#description-of-dataset)
|
||||
|
||||
*Please pay **ATTENTION** that the data is collected from [Yahoo Finance](https://finance.yahoo.com/lookup), and the data might not be perfect.
|
||||
We recommend users to prepare their own data if they have a high-quality dataset. For more information, users can refer to the [related document](https://qlib.readthedocs.io/en/latest/component/data.html#converting-csv-format-into-qlib-format)*.
|
||||
|
||||
60
scripts/data_collector/README.md
Normal file
60
scripts/data_collector/README.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# Data Collector
|
||||
|
||||
## Introduction
|
||||
|
||||
Scripts for data collection
|
||||
|
||||
- yahoo: get *US/CN* stock data from *Yahoo Finance*
|
||||
- fund: get fund data from *http://fund.eastmoney.com*
|
||||
- cn_index: get *CN index* from *http://www.csindex.com.cn*, *CSI300*/*CSI100*
|
||||
- us_index: get *US index* from *https://en.wikipedia.org/wiki*, *SP500*/*NASDAQ100*/*DJIA*/*SP400*
|
||||
- contrib: scripts for some auxiliary functions
|
||||
|
||||
|
||||
## Custom Data Collection
|
||||
|
||||
> Specific implementation reference: https://github.com/microsoft/qlib/tree/main/scripts/data_collector/yahoo
|
||||
|
||||
1. Create a dataset code directory in the current directory
|
||||
2. Add `collector.py`
|
||||
- add collector class:
|
||||
```python
|
||||
CUR_DIR = Path(__file__).resolve().parent
|
||||
sys.path.append(str(CUR_DIR.parent.parent))
|
||||
from data_collector.base import BaseCollector, BaseNormalize, BaseRun
|
||||
class UserCollector(BaseCollector):
|
||||
...
|
||||
```
|
||||
- add normalize class:
|
||||
```python
|
||||
class UserNormalzie(BaseNormalize):
|
||||
...
|
||||
```
|
||||
- add `CLI` class:
|
||||
```python
|
||||
class Run(BaseRun):
|
||||
...
|
||||
```
|
||||
3. add `README.md`
|
||||
4. add `requirements.txt`
|
||||
|
||||
|
||||
## Description of dataset
|
||||
|
||||
| | Basic data |
|
||||
|------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
|
||||
| Features | **Price/Volume**: <br> - $close/$open/$low/$high/$volume/$change/$factor |
|
||||
| Calendar | **\<freq>.txt**: <br> - day.txt<br> - 1min.txt |
|
||||
| Instruments | **\<market>.txt**: <br> - required: **all.txt**; <br> - csi300.txt/csi500.txt/sp500.txt |
|
||||
|
||||
- `Features`: data, **digital**
|
||||
- if not **adjusted**, **factor=1**
|
||||
|
||||
### Data-dependent component
|
||||
|
||||
> To make the component running correctly, the dependent data are required
|
||||
|
||||
| Component | required data |
|
||||
|---------------------------------------------------|--------------------------------|
|
||||
| Data retrieval | Features, Calendar, Instrument |
|
||||
| Backtest | **Features[Price/Volume]**, Calendar, Instruments |
|
||||
Reference in New Issue
Block a user