1
0
mirror of https://github.com/microsoft/qlib.git synced 2026-06-06 05:51:17 +08:00
Files
igor17400 56cfa480dc Ibovespa index support (#990)
* feat: download ibovespa index historic composition

ibovespa(ibov) is the largest index in Brazil's stocks exchange.
The br_index folder has support for downloading new companies for the current index composition.
And has support, as well, for downloading companies from historic composition of ibov index.

Partially resolves issue #956

* fix: typo error instead of end_date, it was written end_ate

* feat: adds support for downloading stocks historic prices from Brazil's stocks exchange (B3)

Together with commit c2f933 it resolves issue #956

* fix: code formatted with black.

* wip: Creating code logic for brazils stock market data normalization

* docs: brazils stock market data normalization code documentation

* fix: code formatted the with black

* docs: fixed typo

* docs: more info about python version used to generate requirements.txt file

* docs: added BeautifulSoup requirements

* feat: removed debug prints

* feat: added ibov_index_composition variable as a class attribute of IBOVIndex

* feat: added increment to generate the four month period used by the ibov index

* refactor: Added get_instruments() method inside utils.py for better code usability.

Message in the PR request to understand the context of the change

In the course of reviewing this PR we found two issues.

    1. there are multiple places where the get_instruments() method is used,
	and we feel that scripts.index.py is the best place for the
	get_instruments() method to go.
    2. data_collector.utils has some very generic stuff put inside it.

* refactor: improve brazils stocks download speed

The reason to use retry=2 is due to the fact that
Yahoo Finance unfortunately does not keep track of the majority
of Brazilian stocks.

Therefore, the decorator deco_retry with retry argument
set to 5 will keep trying to get the stock data 5 times,
which makes the code to download Brazilians stocks very slow.

In future, this may change, but for now
I suggest to leave retry argument to 1 or 2 in
order to improve download speed.

In order to achieve this code logic an argument called retry_config
was added into YahooCollectorBR1d and YahooCollectorBR1min

* fix: added __main__ at the bottom of the script

* refactor: changed interface inside each index

Using partial as `fire.Fire(partial(get_instruments, market_index="br_index" ))`
will make the interface easier for the user to execute the script.
Then all the collector.py CLI in each folder can remove a redundant arguments.

* refactor: implemented  class interface retry into YahooCollectorBR

* docs: added BR as a possible region into the documentation

* refactor: make retry attribute part of the interface

This way we don't have to use hasattr to access the retry attribute as previously done
2022-04-06 09:01:29 +08:00

3.0 KiB

iBOVESPA History Companies Collection

Requirements

  • Install the libs from the file requirements.txt

    pip install -r requirements.txt
    
  • requirements.txt file was generated using python3.8

For the ibovespa (IBOV) index, we have:


Method get_new_companies

Index start date

  • The ibovespa index started on 2 January 1968 (wiki). In order to use this start date in our bench_start_date(self) method, two conditions must be satisfied:

    1. APIs used to download brazilian stocks (B3) historical prices must keep track of such historic data since 2 January 1968

    2. Some website or API must provide, from that date, the historic index composition. In other words, the companies used to build the index .

    As a consequence, the method bench_start_date(self) inside collector.py was implemented using pd.Timestamp("2003-01-03") due to two reasons

    1. The earliest ibov composition that have been found was from the first quarter of 2003. More informations about such composition can be seen on the sections below.

    2. Yahoo finance, one of the libraries used to download symbols historic prices, keeps track from this date forward.

  • Within the get_new_companies method, a logic was implemented to get, for each ibovespa component stock, the start date that yahoo finance keeps track of.

Code Logic

The code does a web scrapping into the B3's website, which keeps track of the ibovespa stocks composition on the current day.

Other approaches, such as request and Beautiful Soup could have been used. However, the website shows the table with the stocks with some delay, since it uses a script inside of it to obtain such compositions. Alternatively, selenium was used to download this stocks' composition in order to overcome this problem.

Futhermore, the data downloaded from the selenium script was preprocessed so it could be saved into the csv format stablished by scripts/data_collector/index.py.


Method get_changes

No suitable data source that keeps track of ibovespa's history stocks composition has been found. Except from this repository which provide such information have been used, however it only provides the data from the 1st quarter of 2003 to 3rd quarter of 2021.

With that reference, the index's composition can be compared quarter by quarter and year by year and then generate a file that keeps track of which stocks have been removed and which have been added each quarter and year.


Collector Data

# parse instruments, using in qlib/instruments.
python collector.py --index_name IBOV --qlib_dir ~/.qlib/qlib_data/br_data --method parse_instruments

# parse new companies
python collector.py --index_name IBOV --qlib_dir ~/.qlib/qlib_data/br_data --method save_new_companies