* fix: replace deprecated pandas fillna(method=) with ffill()/bfill()
Replace deprecated fillna(method="ffill"/"bfill") calls with modern
pandas ffill() and bfill() methods to fix FutureWarnings in pandas 2.x.
Also includes black formatting fixes for compliance.
This addresses the pandas deprecation warnings portion of issue #1981.
Other issues (date parsing, type conversion, timezone handling) will be
addressed in separate commits.
Fixes:
- Yahoo collector: 2 instances in calc_change() and adjusted_price()
- BaoStock collector: 1 instance in calc_change()
- Core utils: resam.py fillna operations
- Backtest: profit_attribution.py stock data processing
- High-freq ops: FFillNan and BFillNan operators
- Position analysis: parse_position.py weight processing
Partially addresses GitHub issue #1981
* lint with black
* lint with black
* limit minimum version of pandas
* limit minimum version of pandas
---------
Co-authored-by: Linlang <Lv.Linlang@hotmail.com>
* fixed a problem with multi index caused by the default value of groupkey
* modify group_key default value
* limit pandas verion
* format with black
* fix docs error
* fix docs error
* fixed bugs caused by pandas upgrade
* remove needless code
* reformat with black
* limit version & add docs
* fix: resolve#1892 by retriving the data page by page
* fix: resolve#1892 by retriving the data page by page
* reformat with black
---------
Co-authored-by: shengyuhong <shengyuhong@bytedance.com>
Co-authored-by: fibers <yu8582@126.com>
* #854 implement first data health checker draft
* #854 added support for qlib's data format, implemented factor check, reformatted summary
* adaptation current dataset
* format with black
* add data health check to docs
* fix sphinx error
* fix pylint error
* update code
* format with black
* format with pylint
---------
Co-authored-by: Linlang <Lv.Linlang@hotmail.com>
* Fix FutureWarning: Passing unit-less datetime64 dtype to .astype is deprecated and will raise in a future version. Pass 'datetime64[ns]' instead
* align index format while end date contains current day data
* fix black
* fix black
* optimize code
* optimize code
* optimize code
* fix ci error
* check ci error
* fix ci error
* check ci error
* check ci error
* check ci error
* check ci error
* check ci error
* check ci error
* fix ci error
* fix ci error
* fix ci error
* fix ci error
* fix ci error
---------
Co-authored-by: Cadenza-Li <362237642@qq.com>
Co-authored-by: Linlang <Lv.Linlang@hotmail.com>
* fix the bug that the HS_SYMBOLS_URL is 404
* fix bug
* format with black
* fix pylint error
* change error code
* fix ci error
* fix ci error
* optimize code
* optimize code
* add comments
---------
Co-authored-by: Linlang <Lv.Linlang@hotmail.com>
* download orderbook data
* fix CI error
* fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* test fix CI error
* optimize get_data code
* optimize get_data code
* optimize get_data code
* optimize README
---------
Co-authored-by: Linlang <v-linlanglv@microsoft.com>
* add_baostock_collector
* modify_comments
* fix_pylint_error
* solve_duplication_methods
* modified the logic of update_data_to_bin
* modified the logic of update_data_to_bin
* optimize code
* optimize pylint issue
* fix pylint error
* changes suggested by the review
* fix CI faild
* fix CI faild
* fix issue 1121
* format with black
* optimize code logic
* optimize code logic
* fix error code
* drop warning during code runs
* optimize code
* format with black
* fix bug
* format with black
* optimize code
* optimize code
* add comments
* Intermediate version
* Fix yaml template & Successfully run rolling
* Be compatible with benchmark
* Get same results with previous linear model
* Black formatting
* Update black
* Update the placeholder mechanism
* Update CI
* Update CI
* Upgrade Black
* Fix CI and simplify code
* Fix CI
* Move the data processing caching mechanism into utils.
* Adjusting DDG-DA
* Organize import
* Update YahooNormalizeUS1dExtend(#1196)
* Prevent pandas read_csv errors while running update_data_to_bin for US region
* Fix parse_index error while running update_data_to_bin for US region
* prevent pandas.read_csv error on specific symbol names
* Reordering parameters for better rendering
* removes prefix during feature_dir existence checking
* add explanation comments
* fix gramma error in doc strings
* fix typos in exchange.py
* fix typos and gramma errors
* fix typo and rename function param to avoid shading python keyword
* remove redundant parathesis; pass kwargs to parent class
* fix pyblack
* further correction
* assign -> be assigned to
* Explain data crawler structure
* Add documentation for data and feature
* Update scripts/data_collector/yahoo/README.md
Co-authored-by: you-n-g <you-n-g@users.noreply.github.com>
* Remove some confusing wording
* Add third party data source
* Fix command typo
* Update commands
Co-authored-by: you-n-g <you-n-g@users.noreply.github.com>
* Fixed pandas FutureWarning
`FutureWarning: Passing a set as an indexer is deprecated and will raise in a future version. Use a list instead.`
* fixed another pandas FutureWarning
```
scripts/data_collector/index.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
new_df = new_df.append(_tmp_df, sort=False)
```
* fixed more pandas futurewarnings
* feat: download ibovespa index historic composition
ibovespa(ibov) is the largest index in Brazil's stocks exchange.
The br_index folder has support for downloading new companies for the current index composition.
And has support, as well, for downloading companies from historic composition of ibov index.
Partially resolves issue #956
* fix: typo error instead of end_date, it was written end_ate
* feat: adds support for downloading stocks historic prices from Brazil's stocks exchange (B3)
Together with commit c2f933 it resolves issue #956
* fix: code formatted with black.
* wip: Creating code logic for brazils stock market data normalization
* docs: brazils stock market data normalization code documentation
* fix: code formatted the with black
* docs: fixed typo
* docs: more info about python version used to generate requirements.txt file
* docs: added BeautifulSoup requirements
* feat: removed debug prints
* feat: added ibov_index_composition variable as a class attribute of IBOVIndex
* feat: added increment to generate the four month period used by the ibov index
* refactor: Added get_instruments() method inside utils.py for better code usability.
Message in the PR request to understand the context of the change
In the course of reviewing this PR we found two issues.
1. there are multiple places where the get_instruments() method is used,
and we feel that scripts.index.py is the best place for the
get_instruments() method to go.
2. data_collector.utils has some very generic stuff put inside it.
* refactor: improve brazils stocks download speed
The reason to use retry=2 is due to the fact that
Yahoo Finance unfortunately does not keep track of the majority
of Brazilian stocks.
Therefore, the decorator deco_retry with retry argument
set to 5 will keep trying to get the stock data 5 times,
which makes the code to download Brazilians stocks very slow.
In future, this may change, but for now
I suggest to leave retry argument to 1 or 2 in
order to improve download speed.
In order to achieve this code logic an argument called retry_config
was added into YahooCollectorBR1d and YahooCollectorBR1min
* fix: added __main__ at the bottom of the script
* refactor: changed interface inside each index
Using partial as `fire.Fire(partial(get_instruments, market_index="br_index" ))`
will make the interface easier for the user to execute the script.
Then all the collector.py CLI in each folder can remove a redundant arguments.
* refactor: implemented class interface retry into YahooCollectorBR
* docs: added BR as a possible region into the documentation
* refactor: make retry attribute part of the interface
This way we don't have to use hasattr to access the retry attribute as previously done