.. _data: ================================ Data Layer: Data Framework & Usage ================================ Introduction ============================ ``Data Layer`` provides user-friendly APIs to manage and retrieve data. It provides high-performance data infrastructure. It is designed for quantitative investment. For example, users could build formulaic alphas with ``Data Layer`` easily. Please refer to `Building Formulaic Alphas <../advanced/alpha.html>`_ for more details. The introduction of ``Data Layer`` includes the following parts. - Data Preparation - Data API - Data Loader - Data Handler - Dataset - Cache - Data and Cache File Structure Here is a typical example of Qlib data workflow - Users download data and converting data into Qlib format(with filename suffix `.bin`). In this step, typically only some basic data are stored on disk(such as OHLCV). - Creating some basic features based on Qlib's expression Engine(e.g. "Ref($close, 60) / $close", the return of last 60 trading days). Supported operators in the expression engine can be found `here `_. This step is typically implemented in Qlib's `Data Loader `_ which is a component of `Data Handler `_ . - If users require more complicated data processing (e.g. data normalization), `Data Handler `_ support user-customized processors to process data(some predefined processors can be found `here `_). The processors are different from operators in expression engine. It is designed for some complicated data processing methods which is hard to supported in operators in expression engine. - At last, `Dataset `_ is responsible to prepare model-specific dataset from the processed data of Data Handler Data Preparation ============================ Qlib Format Data ------------------ We've specially designed a data structure to manage financial data, please refer to the `File storage design section in Qlib paper `_ for detailed information. Such data will be stored with filename suffix `.bin` (We'll call them `.bin` file, `.bin` format, or qlib format). `.bin` file is designed for scientific computing on finance data. ``Qlib`` provides two different off-the-shelf datasets, which can be accessed through this `link `_: ======================== ================= ================ Dataset US Market China Market ======================== ================= ================ Alpha360 √ √ Alpha158 √ √ ======================== ================= ================ Also, ``Qlib`` provides a high-frequency dataset. Users can run a high-frequency dataset example through this `link `_. Qlib Format Dataset -------------------- ``Qlib`` has provided an off-the-shelf dataset in `.bin` format, users could use the script ``scripts/get_data.py`` to download the China-Stock dataset as follows. .. code-block:: bash # download 1d python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn # download 1min python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/qlib_cn_1min --region cn --interval 1min In addition to China-Stock data, ``Qlib`` also includes a US-Stock dataset, which can be downloaded with the following command: .. code-block:: bash python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/us_data --region us After running the above command, users can find china-stock and us-stock data in ``Qlib`` format in the ``~/.qlib/qlib_data/cn_data`` directory and ``~/.qlib/qlib_data/us_data`` directory respectively. ``Qlib`` also provides the scripts in ``scripts/data_collector`` to help users crawl the latest data on the Internet and convert it to qlib format. When ``Qlib`` is initialized with this dataset, users could build and evaluate their own models with it. Please refer to `Initialization <../start/initialization.html>`_ for more details. Automatic update of daily frequency data ---------------------------------------- **It is recommended that users update the data manually once (\-\-trading_date 2021-05-25) and then set it to update automatically.** For more information refer to: `yahoo collector `_ - Automatic update of data to the "qlib" directory each trading day(Linux) - use *crontab*: `crontab -e` - set up timed tasks: .. code-block:: bash * * * * 1-5 python