# Data Collector
## Introduction
Scripts for data collection
- yahoo: get *US/CN* stock data from *Yahoo Finance*
- fund: get fund data from *http://fund.eastmoney.com*
- cn_index: get *CN index* from *http://www.csindex.com.cn*, *CSI300*/*CSI100*
- us_index: get *US index* from *https://en.wikipedia.org/wiki*, *SP500*/*NASDAQ100*/*DJIA*/*SP400*
- contrib: scripts for some auxiliary functions
## Custom Data Collection
> Specific implementation reference: https://github.com/microsoft/qlib/tree/main/scripts/data_collector/yahoo
1. Create a dataset code directory in the current directory
2. Add `collector.py`
- add collector class:
```python
CUR_DIR = Path(__file__).resolve().parent
sys.path.append(str(CUR_DIR.parent.parent))
from data_collector.base import BaseCollector, BaseNormalize, BaseRun
class UserCollector(BaseCollector):
...
```
- add normalize class:
```python
class UserNormalzie(BaseNormalize):
...
```
- add `CLI` class:
```python
class Run(BaseRun):
...
```
3. add `README.md`
4. add `requirements.txt`
## Description of dataset
| | Basic data |
|------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
| Features | **Price/Volume**:
- $close/$open/$low/$high/$volume/$change/$factor |
| Calendar | **\.txt**:
- day.txt
- 1min.txt |
| Instruments | **\.txt**:
- required: **all.txt**;
- csi300.txt/csi500.txt/sp500.txt |
- `Features`: data, **digital**
- if not **adjusted**, **factor=1**
### Data-dependent component
> To make the component running correctly, the dependent data are required
| Component | required data |
|---------------------------------------------------|--------------------------------|
| Data retrieval | Features, Calendar, Instrument |
| Backtest | **Features[Price/Volume]**, Calendar, Instruments |