1
0
mirror of https://github.com/microsoft/qlib.git synced 2026-06-06 05:51:17 +08:00
Signed-off-by: unknown <lv.linlang@qq.com>
This commit is contained in:
SunsetWolf
2021-12-31 22:14:47 +08:00
committed by GitHub
parent f59cfe51e0
commit dfc0ed3c01
56 changed files with 92 additions and 92 deletions

View File

@@ -30,7 +30,7 @@ Version 0.2.1
--------------------
- Support registering user-defined ``Provider``.
- Support use operators in string format, e.g. ``['Ref($close, 1)']`` is valid field format.
- Support dynamic fields in ``$some_field`` format. And exising fields like ``Close()`` may be deprecated in the future.
- Support dynamic fields in ``$some_field`` format. And existing fields like ``Close()`` may be deprecated in the future.
Version 0.2.2
--------------------
@@ -78,7 +78,7 @@ Version 0.3.5
- Support multi-label training, you can provide multiple label in ``handler``. (But LightGBM doesn't support due to the algorithm itself)
- Refactor ``handler`` code, dataset.py is no longer used, and you can deploy your own labels and features in ``feature_label_config``
- Handler only offer DataFrame. Also, ``trainer`` and model.py only receive DataFrame
- Change ``split_rolling_data``, we roll the data on market calender now, not on normal date
- Change ``split_rolling_data``, we roll the data on market calendar now, not on normal date
- Move some date config from ``handler`` to ``trainer``
Version 0.4.0
@@ -167,11 +167,11 @@ Version 0.8.0
- There are lots of changes for daily trading, it is hard to list all of them. But a few important changes could be noticed
- The trading limitation is more accurate;
- In `previous version <https://github.com/microsoft/qlib/blob/v0.7.2/qlib/contrib/backtest/exchange.py#L160>`_, longing and shorting actions share the same action.
- In `current verison <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/backtest/exchange.py#L304>`_, the trading limitation is different between loging and shorting action.
- In `current version <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/backtest/exchange.py#L304>`_, the trading limitation is different between logging and shorting action.
- The constant is different when calculating annualized metrics.
- `Current version <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/contrib/evaluate.py#L42>`_ uses more accurate constant than `previous version <https://github.com/microsoft/qlib/blob/v0.7.2/qlib/contrib/evaluate.py#L22>`_
- `A new version <https://github.com/microsoft/qlib/blob/7c31012b507a3823117bddcc693fc64899460b2a/qlib/tests/data.py#L17>`_ of data is released. Due to the unstability of Yahoo data source, the data may be different after downloading data again.
- Users could chec kout the backtesting results between `Current version <https://github.com/microsoft/qlib/tree/7c31012b507a3823117bddcc693fc64899460b2a/examples/benchmarks>`_ and `previous version <https://github.com/microsoft/qlib/tree/v0.7.2/examples/benchmarks>`_
- Users could check out the backtesting results between `Current version <https://github.com/microsoft/qlib/tree/7c31012b507a3823117bddcc693fc64899460b2a/examples/benchmarks>`_ and `previous version <https://github.com/microsoft/qlib/tree/v0.7.2/examples/benchmarks>`_
Other Versions

View File

@@ -14,7 +14,7 @@ To get the join trading performance of daily and intraday trading, they must int
In order to support the joint backtest strategies in multiple levels, a corresponding framework is required. None of the publicly available high-frequency trading frameworks considers multi-level joint trading, which make the backtesting aforementioned inaccurate.
Besides backtesting, the optimization of strategies from different levels is not standalone and can be affected by each other.
For example, the best portfolio management strategy may change with the performance of order executions(e.g. a portfolio with higher turnover may becomes a better choice when we imporve the order execution strategies).
For example, the best portfolio management strategy may change with the performance of order executions(e.g. a portfolio with higher turnover may becomes a better choice when we improve the order execution strategies).
To achieve the overall good performance , it is necessary to consider the interaction of strategies in different level.
Therefore, building a new framework for trading in multiple levels becomes necessary to solve the various problems mentioned above, for which we designed a nested decision execution framework that consider the interaction of strategies.

View File

@@ -37,7 +37,7 @@ Here is a general view of the structure of the system:
This experiment management system defines a set of interface and provided a concrete implementation ``MLflowExpManager``, which is based on the machine learning platform: ``MLFlow`` (`link <https://mlflow.org/>`_).
If users set the implementation of ``ExpManager`` to be ``MLflowExpManager``, they can use the command `mlflow ui` to visualize and check the experiment results. For more information, pleaes refer to the related documents `here <https://www.mlflow.org/docs/latest/cli.html#mlflow-ui>`_.
If users set the implementation of ``ExpManager`` to be ``MLflowExpManager``, they can use the command `mlflow ui` to visualize and check the experiment results. For more information, please refer to the related documents `here <https://www.mlflow.org/docs/latest/cli.html#mlflow-ui>`_.
Qlib Recorder
===================

View File

@@ -31,7 +31,7 @@ Let's see an example,
First make sure you have the latest version of `qlib` installed.
Then, you need to privide a configuration to setup the experiment.
Then, you need to provide a configuration to setup the experiment.
We write a simple configuration example as following,
.. code-block:: YAML
@@ -217,13 +217,13 @@ The tuner pipeline contains different tuners, and the `tuner` program will proce
Each part represents a tuner, and its modules which are to be tuned. Space in each part is the hyper-parameters' space of a certain module, you need to create your searching space and modify it in `/qlib/contrib/tuner/space.py`. We use `hyperopt` package to help us to construct the space, you can see the detail of how to use it in https://github.com/hyperopt/hyperopt/wiki/FMin .
- model
You need to provide the `class` and the `space` of the model. If the model is user's own implementation, you need to privide the `module_path`.
You need to provide the `class` and the `space` of the model. If the model is user's own implementation, you need to provide the `module_path`.
- trainer
You need to proveide the `class` of the trainer. If the trainer is user's own implementation, you need to privide the `module_path`.
You need to provide the `class` of the trainer. If the trainer is user's own implementation, you need to provide the `module_path`.
- strategy
You need to provide the `class` and the `space` of the strategy. If the strategy is user's own implementation, you need to privide the `module_path`.
You need to provide the `class` and the `space` of the strategy. If the strategy is user's own implementation, you need to provide the `module_path`.
- data_label
The label of the data, you can search which kinds of labels will lead to a better result. This part is optional, and you only need to provide `space`.
@@ -273,7 +273,7 @@ You need to use the same dataset to evaluate your different `estimator` experime
About the data and backtest
~~~~~~~~~~~~~~~~~~~~~~~~~~~
`data` and `backtest` are all same in the whole `tuner` experiment. Different `estimator` experiments must use the same data and backtest method. So, these two parts of config are same with that in `estimator` configuration. You can see the precise defination of these parts in `estimator` introduction. We only provide an example here.
`data` and `backtest` are all same in the whole `tuner` experiment. Different `estimator` experiments must use the same data and backtest method. So, these two parts of config are same with that in `estimator` configuration. You can see the precise definition of these parts in `estimator` introduction. We only provide an example here.
.. code-block:: YAML

View File

@@ -31,7 +31,7 @@ Users can easily intsall ``Qlib`` according to the following steps:
git clone https://github.com/microsoft/qlib.git && cd qlib
python setup.py install
To kown more about `installation`, please refer to `Qlib Installation <../start/installation.html>`_.
To known more about `installation`, please refer to `Qlib Installation <../start/installation.html>`_.
Prepare Data
==============
@@ -44,7 +44,7 @@ Load and prepare data by running the following code:
This dataset is created by public data collected by crawler scripts in ``scripts/data_collector/``, which have been released in the same repository. Users could create the same dataset with it.
To kown more about `prepare data`, please refer to `Data Preparation <../component/data.html#data-preparation>`_.
To known more about `prepare data`, please refer to `Data Preparation <../component/data.html#data-preparation>`_.
Auto Quant Research Workflow
====================================

View File

@@ -32,7 +32,7 @@ import abc
import enum
# Type defintions
# Type definitions
class DataTypes(enum.IntEnum):
"""Defines numerical types of each column."""

View File

@@ -254,9 +254,9 @@ class DistributedHyperparamOptManager(HyperparamOptManager):
param_ranges: Discrete hyperparameter range for random search.
fixed_params: Fixed model parameters per experiment.
root_model_folder: Folder to store optimisation artifacts.
worker_number: Worker index definining which set of hyperparameters to
worker_number: Worker index defining which set of hyperparameters to
test.
search_iterations: Maximum numer of random search iterations.
search_iterations: Maximum number of random search iterations.
num_iterations_per_worker: How many iterations are handled per worker.
clear_serialised_params: Whether to regenerate hyperparameter
combinations.
@@ -330,7 +330,7 @@ class DistributedHyperparamOptManager(HyperparamOptManager):
if os.path.exists(self.serialised_ranges_folder):
df = pd.read_csv(self.serialised_ranges_path, index_col=0)
else:
print("Unable to load - regenerating serach ranges instead")
print("Unable to load - regenerating search ranges instead")
df = self.update_serialised_hyperparam_df()
return df

View File

@@ -342,7 +342,7 @@ class TFTDataCache:
@classmethod
def contains(cls, key):
"""Retuns boolean indicating whether key is present in cache."""
"""Returns boolean indicating whether key is present in cache."""
return key in cls._data_cache
@@ -1120,10 +1120,10 @@ class TemporalFusionTransformer:
Args:
df: Input dataframe
return_targets: Whether to also return outputs aligned with predictions to
faciliate evaluation
facilitate evaluation
Returns:
Input dataframe or tuple of (input dataframe, algined output dataframe).
Input dataframe or tuple of (input dataframe, aligned output dataframe).
"""
data = self._batch_data(df)

View File

@@ -295,7 +295,7 @@ class TFTModel(ModelFT):
def to_pickle(self, path: Union[Path, str]):
"""
Tensorflow model can't be dumped directly.
So the data should be save seperatedly
So the data should be save separately
**TODO**: Please implement the function to load the files

View File

@@ -57,7 +57,7 @@ And here are two ways to run the model:
python example.py --config_file configs/config_alstm.yaml
```
Here we trained TRA on a pretrained backbone model. Therefore we run `*_init.yaml` before TRA's scipts.
Here we trained TRA on a pretrained backbone model. Therefore we run `*_init.yaml` before TRA's scripts.
### Results

View File

@@ -124,7 +124,7 @@ class TRAModel(Model):
loss = (pred - label).pow(2).mean()
L = (all_preds.detach() - label[:, None]).pow(2)
L -= L.min(dim=-1, keepdim=True).values # normalize & ensure postive input
L -= L.min(dim=-1, keepdim=True).values # normalize & ensure positive input
data_set.assign_data(index, L) # save loss to memory
@@ -165,7 +165,7 @@ class TRAModel(Model):
L = (all_preds - label[:, None]).pow(2)
L -= L.min(dim=-1, keepdim=True).values # normalize & ensure postive input
L -= L.min(dim=-1, keepdim=True).values # normalize & ensure positive input
data_set.assign_data(index, L) # save loss to memory
@@ -484,7 +484,7 @@ class TRA(nn.Module):
"""Temporal Routing Adaptor (TRA)
TRA takes historical prediction erros & latent representation as inputs,
TRA takes historical prediction errors & latent representation as inputs,
then routes the input sample to a specific predictor for training & inference.
Args:

View File

@@ -150,7 +150,7 @@ class Cut(ElemOperator):
self.l = l
self.r = r
if (self.l is not None and self.l <= 0) or (self.r is not None and self.r >= 0):
raise ValueError("Cut operator l shoud > 0 and r should < 0")
raise ValueError("Cut operator l should > 0 and r should < 0")
super(Cut, self).__init__(feature)

View File

@@ -298,7 +298,7 @@ class NestedDecisionExecutionWorkflow:
# - Aligning the profit calculation between multiple levels and single levels.
# 2) comparing different backtest
# - Basic test idea:
# - the daily backtest will be similar as multi-level(the data quality makes this gap samller)
# - the daily backtest will be similar as multi-level(the data quality makes this gap smaller)
def check_diff_freq(self):
self._init_qlib()

View File

@@ -241,7 +241,7 @@ def auto_init(**kwargs):
default_exp_name: "Experiment"
Example 2)
If you wan to create simple a stand alone config, you can use following config(a.k.a `conf_type: origin`)
If you want to create simple a stand alone config, you can use following config(a.k.a `conf_type: origin`)
.. code-block:: python

View File

@@ -31,7 +31,7 @@ rtn & earning in the Account
class AccumulatedInfo:
"""
accumulated trading info, including accumulated return/cost/turnover
AccumulatedInfo should be shared accross different levels
AccumulatedInfo should be shared across different levels
"""
def __init__(self):
@@ -199,7 +199,7 @@ class Account:
# if stock is sold out, no stock price information in Position, then we should update account first, then update current position
# if stock is bought, there is no stock in current position, update current, then update account
# The cost will be substracted from the cash at last. So the trading logic can ignore the cost calculation
# The cost will be subtracted from the cash at last. So the trading logic can ignore the cost calculation
if order.direction == Order.SELL:
# sell stock
self._update_state_from_order(order, trade_val, cost, trade_price)
@@ -378,7 +378,7 @@ class Account:
)
def get_portfolio_metrics(self):
"""get the history portfolio_metrics and postions instance"""
"""get the history portfolio_metrics and positions instance"""
if self.is_port_metr_enabled():
_portfolio_metrics = self.portfolio_metrics.generate_portfolio_metrics_dataframe()
_positions = self.get_hist_positions()

View File

@@ -13,7 +13,7 @@ from tqdm.auto import tqdm
def backtest_loop(start_time, end_time, trade_strategy: BaseStrategy, trade_executor: BaseExecutor):
"""backtest funciton for the interaction of the outermost strategy and executor in the nested decision execution
"""backtest function for the interaction of the outermost strategy and executor in the nested decision execution
please refer to the docs of `collect_data_loop`

View File

@@ -505,8 +505,8 @@ class BaseTradeDecision:
`inner_trade_decision` will be changed **inplaced**.
Motivation of the `mod_inner_decision`
- Leave a hook for outer decision to affact the decision generated by the inner strategy
- e.g. the outmost strategy generate a time range for trading. But the upper layer can only affact the
- Leave a hook for outer decision to affect the decision generated by the inner strategy
- e.g. the outmost strategy generate a time range for trading. But the upper layer can only affect the
nearest layer in the original design. With `mod_inner_decision`, the decision can passed through multiple
layers

View File

@@ -103,7 +103,7 @@ class Exchange:
Necessary fields:
$close is for calculating the total value at end of each day.
Optional fields:
$volume is only necessary when we limit the trade amount or caculate PA(vwap) indicator
$volume is only necessary when we limit the trade amount or calculate PA(vwap) indicator
$vwap is only necessary when we use the $vwap price as the deal price
$factor is for rounding to the trading unit
limit_sell will be set to False by default(False indicates we can sell this
@@ -505,7 +505,7 @@ class Exchange:
Note: some future information is used in this function
Parameter:
target_position : dict { stock_id : amount }
current_postion : dict { stock_id : amount}
current_position : dict { stock_id : amount}
trade_unit : trade_unit
down sample : for amount 321 and trade_unit 100, deal_amount is 300
deal order on trade_date

View File

@@ -41,7 +41,7 @@ class BaseExecutor:
Parameters
----------
time_per_step : str
trade time per trading step, used for genreate the trade calendar
trade time per trading step, used for generate the trade calendar
show_indicator: bool, optional
whether to show indicators, :
- 'pa', the price advantage
@@ -369,12 +369,12 @@ class NestedExecutor(BaseExecutor):
self.inner_strategy.reset(level_infra=sub_level_infra, outer_trade_decision=trade_decision)
def _update_trade_decision(self, trade_decision: BaseTradeDecision) -> BaseTradeDecision:
# outter strategy have chance to update decision each iterator
# outer strategy have chance to update decision each iterator
updated_trade_decision = trade_decision.update(self.inner_executor.trade_calendar)
if updated_trade_decision is not None:
trade_decision = updated_trade_decision
# NEW UPDATE
# create a hook for inner strategy to update outter decision
# create a hook for inner strategy to update outer decision
self.inner_strategy.alter_outer_trade_decision(trade_decision)
return trade_decision

View File

@@ -400,7 +400,7 @@ class BaseOrderIndicator:
indicators : List[BaseOrderIndicator]
the list of all inner indicators.
metrics : Union[str, List[str]]
all metrics needs ot be sumed.
all metrics needs to be sumed.
fill_value : float, optional
fill np.NaN with value. By default None.
"""

View File

@@ -152,7 +152,7 @@ class BasePosition:
"""
generate stock weight dict {stock_id : value weight of stock in the position}
it is meaningful in the beginning or the end of each trade step
- During execution of each trading step, the weight may be not consistant with the portfolio value
- During execution of each trading step, the weight may be not consistent with the portfolio value
Parameters
----------

View File

@@ -39,7 +39,7 @@ def get_benchmark_weight(
if not path:
path = Path(C.dpm.get_data_uri(freq)).expanduser() / "raw" / "AIndexMembers" / "weights.csv"
# TODO: the storage of weights should be implemented in a more elegent way
# TODO: The benchmark is not consistant with the filename in instruments.
# TODO: The benchmark is not consistent with the filename in instruments.
bench_weight_df = pd.read_csv(path, usecols=["code", "date", "index", "weight"])
bench_weight_df = bench_weight_df[bench_weight_df["index"] == bench]
bench_weight_df["date"] = pd.to_datetime(bench_weight_df["date"])

View File

@@ -73,7 +73,7 @@ class PortfolioMetrics:
self.init_bench(freq=freq, benchmark_config=benchmark_config)
def init_vars(self):
self.accounts = OrderedDict() # account postion value for each trade time
self.accounts = OrderedDict() # account position value for each trade time
self.returns = OrderedDict() # daily return rate for each trade time
self.total_turnovers = OrderedDict() # total turnover for each trade time
self.turnovers = OrderedDict() # turnover for each trade time
@@ -236,7 +236,7 @@ class Indicator:
"""
`Indicator` is implemented in a aggregate way.
All the metrics are calculated aggregately.
All the metrics are calculated for a seperated stock and in a specific step on a specific level.
All the metrics are calculated for a separated stock and in a specific step on a specific level.
| indicator | desc. |
|--------------+--------------------------------------------------------------|

View File

@@ -93,7 +93,7 @@ class TradeCalendarManager:
About the endpoints:
- Qlib uses the closed interval in time-series data selection, which has the same performance as pandas.Series.loc
# - The returned right endpoints should minus 1 seconds becasue of the closed interval representation in Qlib.
# - The returned right endpoints should minus 1 seconds because of the closed interval representation in Qlib.
# Note: Qlib supports up to minutely decision execution, so 1 seconds is less than any trading time interval.
Parameters

View File

@@ -18,8 +18,8 @@ class SepDataFrame:
"""
(Sep)erate DataFrame
We usually concat multiple dataframe to be processed together(Such as feature, label, weight, filter).
However, they are usally be used seperately at last.
This will result in extra cost for concating and spliting data(reshaping and copying data in the memory is very expensive)
However, they are usually be used separately at last.
This will result in extra cost for concatenating and splitting data(reshaping and copying data in the memory is very expensive)
SepDataFrame tries to act like a DataFrame whose column with multiindex
"""

View File

@@ -38,11 +38,11 @@ def _get_position_value_from_df(evaluate_date, position, close_data_df):
def get_position_value(evaluate_date, position):
"""sum of close*amount
get value of postion
get value of position
use close price
postions:
positions:
{
Timestamp('2016-01-05 00:00:00'):
{

View File

@@ -56,7 +56,7 @@ class HFLGBModel(ModelFT, LightGBMFInt):
def hf_signal_test(self, dataset: DatasetH, threhold=0.2):
"""
Test the sigal in high frequency test set
Test the signal in high frequency test set
"""
if self.model == None:
raise ValueError("Model hasn't been trained yet")

View File

@@ -446,7 +446,7 @@ class TabNet(nn.Module):
Args:
n_d: dimension of the features used to calculate the final results
n_a: dimension of the features input to the attention transformer of the next step
n_shared: numbr of shared steps in feature transfomer(optional)
n_shared: numbr of shared steps in feature transformer(optional)
n_ind: number of independent steps in feature transformer
n_steps: number of steps of pass through tabbet
relax coefficient:
@@ -479,7 +479,7 @@ class TabNet(nn.Module):
out = torch.zeros(x.size(0), self.n_d).to(x.device)
for step in self.steps:
x_te, l = step(x, x_a, priors)
out += F.relu(x_te[:, : self.n_d]) # split the feautre from feat_transformer
out += F.relu(x_te[:, : self.n_d]) # split the feature from feat_transformer
x_a = x_te[:, self.n_d :]
sparse_loss.append(l)
return self.fc(out), sum(sparse_loss)

View File

@@ -232,7 +232,7 @@ class TRAModel(Model):
choice_all.append(pd.DataFrame(choice.detach().cpu().numpy(), index=index))
decay = self.rho ** (self.global_step // 100) # decay every 100 steps
lamb = 0 if is_pretrain else self.lamb * decay
reg = prob.log().mul(P).sum(dim=1).mean() # train router to predict OT assignment
reg = prob.log().mul(P).sum(dim=1).mean() # train router to predict TO assignment
if self._writer is not None and not is_pretrain:
self._writer.add_scalar("training/router_loss", -reg.item(), self.global_step)
self._writer.add_scalar("training/reg_loss", loss.item(), self.global_step)
@@ -663,7 +663,7 @@ class TRA(nn.Module):
"""Temporal Routing Adaptor (TRA)
TRA takes historical prediction erros & latent representation as inputs,
TRA takes historical prediction errors & latent representation as inputs,
then routes the input sample to a specific predictor for training & inference.
Args:

View File

@@ -33,5 +33,5 @@ def count_parameters(models_or_parameters, unit="m"):
elif unit == "gb" or unit == "g":
counts /= 2 ** 30
elif unit is not None:
raise ValueError("Unknow unit: {:}".format(unit))
raise ValueError("Unknown unit: {:}".format(unit))
return counts

View File

@@ -36,7 +36,7 @@ def save_instance(instance, file_path):
save(dump) an instance to a pickle file
Parameter
instance :
data to te dumped
data to be dumped
file_path : string / pathlib.Path()
path of file to be dumped
"""

View File

@@ -47,7 +47,7 @@ class SoftTopkStrategy(WeightStrategyBase):
Return the proportion of your total value you will used in investment.
Dynamically risk_degree will result in Market timing
"""
# It will use 95% amoutn of your total value by default
# It will use 95% amount of your total value by default
return self.risk_degree
def generate_target_weight_position(self, score, current, trade_start_time, trade_end_time):

View File

@@ -24,7 +24,7 @@ class TWAPStrategy(BaseStrategy):
NOTE:
- This TWAP strategy will celling round when trading. This will make the TWAP trading strategy produce the order
ealier when the total trade unit of amount is less than the trading step
earlier when the total trade unit of amount is less than the trading step
"""
def reset(self, outer_trade_decision: BaseTradeDecision = None, **kwargs):
@@ -43,8 +43,8 @@ class TWAPStrategy(BaseStrategy):
def generate_trade_decision(self, execute_result=None):
# NOTE: corner cases!!!
# - If using upperbound round, please don't sell the amount which should in next step
# - the coordinate of the amount between steps is hard to be dealed between steps in the same level. It
# is easier to be dealed in upper steps
# - the coordinate of the amount between steps is hard to be dealt between steps in the same level. It
# is easier to be dealt in upper steps
# strategy is not available. Give an empty decision
if len(self.outer_trade_decision.get_decision()) == 0:

View File

@@ -69,7 +69,7 @@ class BaseSignalStrategy(BaseStrategy):
Return the proportion of your total value you will used in investment.
Dynamically risk_degree will result in Market timing.
"""
# It will use 95% amoutn of your total value by default
# It will use 95% amount of your total value by default
return self.risk_degree

View File

@@ -90,7 +90,7 @@ class QLibTuner(Tuner):
def objective(self, params):
# 1. Setup an config for a spcific estimator process
# 1. Setup an config for a specific estimator process
estimator_path = self.setup_estimator_config(params)
self.logger.info("Searching params: {} ".format(params))

View File

@@ -359,7 +359,7 @@ class ExpressionCache(BaseProviderCache):
def update(self, cache_uri: Union[str, Path], freq: str = "day"):
"""Update expression cache to latest calendar.
Overide this method to define how to update expression cache corresponding to users' own cache mechanism.
Override this method to define how to update expression cache corresponding to users' own cache mechanism.
Parameters
----------
@@ -445,7 +445,7 @@ class DatasetCache(BaseProviderCache):
def update(self, cache_uri: Union[str, Path], freq: str = "day"):
"""Update dataset cache to latest calendar.
Overide this method to define how to update dataset cache corresponding to users' own cache mechanism.
Override this method to define how to update dataset cache corresponding to users' own cache mechanism.
Parameters
----------
@@ -543,7 +543,7 @@ class DiskExpressionCache(ExpressionCache):
# instance
series = self.provider.expression(instrument, field, _calendar[0], _calendar[-1], freq)
if not series.empty:
# This expresion is empty, we don't generate any cache for it.
# This expression is empty, we don't generate any cache for it.
with CacheUtils.writer_lock(self.r, f"{str(C.dpm.get_data_uri(freq))}:expression-{_cache_uri}"):
self.gen_expression_cache(
expression_data=series,
@@ -858,7 +858,7 @@ class DiskDatasetCache(DatasetCache):
"""gen_dataset_cache
.. note:: This function does not consider the cache read write lock. Please
Aquire the lock outside this function
Acquire the lock outside this function
The format the cache contains 3 parts(followed by typical filename).
@@ -1035,7 +1035,7 @@ class DiskDatasetCache(DatasetCache):
# FIXME:
# Because the feature cache are stored as .bin file.
# So the series read from features are all float32.
# However, the first dataset cache is calulated based on the
# However, the first dataset cache is calculated based on the
# raw data. So the data type may be float64.
# Different data type will result in failure of appending data
if "/{}".format(DatasetCache.HDF_KEY) in store.keys():

View File

@@ -58,7 +58,7 @@ class Client:
msg_proc_func : func
the function to process the message when receiving response, should have arg `*args`.
msg_queue: Queue
The queue to pass the messsage after callback.
The queue to pass the message after callback.
"""
head_info = {"version": qlib.__version__}

View File

@@ -16,7 +16,7 @@ from multiprocessing import Pool
from typing import Iterable, Union
from typing import List, Union
# For supporting multiprocessing in outter code, joblib is used
# For supporting multiprocessing in outer code, joblib is used
from joblib import delayed
from .cache import H

View File

@@ -392,7 +392,7 @@ class TSDataSampler:
2021-01-14 12441 12442 12443 12444 12445 12446 ...
2) the second element: {<original index>: <row, col>}
"""
# object incase of pandas converting int to flaot
# object incase of pandas converting int to float
idx_df = pd.Series(range(data.shape[0]), index=data.index, dtype=object)
idx_df = lazy_sort_index(idx_df.unstack())
# NOTE: the correctness of `__getitem__` depends on columns sorted here

View File

@@ -70,7 +70,7 @@ class DataHandler(Serializable):
Parameters
----------
instruments :
The stock list to retrive.
The stock list to retrieve.
start_time :
start_time of the original data.
end_time :

View File

@@ -75,7 +75,7 @@ class Processor(Serializable):
def readonly(self) -> bool:
"""
Does the processor treat the input data readonly (i.e. does not write the input data) when processsing
Does the processor treat the input data readonly (i.e. does not write the input data) when processing
Knowning the readonly information is helpful to the Handler to avoid uncessary copy
"""

View File

@@ -63,7 +63,7 @@ class HasingStockStorage(BaseHandlerStorage):
"""Hasing data storage for datahanlder
- The default data storage pandas.DataFrame is too slow when randomly accessing one stock's data
- HasingStockStorage hashes the multiple stocks' data(pandas.DataFrame) by the key `stock_id`.
- HasingStockStorage hases the pandas.DataFrame into a dict, whose key is the stock_id(str) and value this stock data(panda.DataFrame), it has the following format:
- HasingStockStorage hashes the pandas.DataFrame into a dict, whose key is the stock_id(str) and value this stock data(panda.DataFrame), it has the following format:
{
stock1_id: stock1_data,
stock2_id: stock2_data,

View File

@@ -64,10 +64,10 @@ class QlibIntRLEnv(QlibRLEnv):
Parameters
----------
state_interpreter : Union[dict, StateInterpreter]
interpretor that interprets the qlib execute result into rl env state.
interpreter that interprets the qlib execute result into rl env state.
action_interpreter : Union[dict, ActionInterpreter]
interpretor that interprets the rl agent action into qlib order list
interpreter that interprets the rl agent action into qlib order list
"""
super(QlibIntRLEnv, self).__init__(executor=executor)
self.state_interpreter = init_instance_by_config(state_interpreter, accept_types=StateInterpreter)

View File

@@ -34,7 +34,7 @@ class BaseStrategy:
Parameters
----------
outer_trade_decision : BaseTradeDecision, optional
the trade decision of outer strategy which this startegy relies, and it will be traded in [start_time, end_time], by default None
the trade decision of outer strategy which this strategy relies, and it will be traded in [start_time, end_time], by default None
- If the strategy is used to split trade decision, it will be used
- If the strategy is used for portfolio management, it can be ignored
level_infra : LevelInfrastructure, optional
@@ -232,9 +232,9 @@ class RLIntStrategy(RLStrategy):
Parameters
----------
state_interpreter : Union[dict, StateInterpreter]
interpretor that interprets the qlib execute result into rl env state
interpreter that interprets the qlib execute result into rl env state
action_interpreter : Union[dict, ActionInterpreter]
interpretor that interprets the rl agent action into qlib order list
interpreter that interprets the rl agent action into qlib order list
start_time : Union[str, pd.Timestamp], optional
start time of trading, by default None
end_time : Union[str, pd.Timestamp], optional

View File

@@ -579,7 +579,7 @@ def get_date_range(trading_date, left_shift=0, right_shift=0, future=False):
def get_date_by_shift(trading_date, shift, future=False, clip_shift=True, freq="day", align: Optional[str] = None):
"""get trading date with shift bias wil cur_date
"""get trading date with shift bias will cur_date
e.g. : shift == 1, return next trading date
shift == -1, return previous trading date
----------

View File

@@ -6,7 +6,7 @@ Motivation of index_data
Some users just want a simple numpy dataframe with indices and don't want such a complicated tools.
Such users are the target of `index_data`
`index_data` try to behave like pandas (some API will be different because we try to be simpler and more intuitive) but don't compromize the performance. It provides the basic numpy data and simple indexing feature. If users call APIs which may compromize the performance, index_data will raise Errors.
`index_data` try to behave like pandas (some API will be different because we try to be simpler and more intuitive) but don't compromise the performance. It provides the basic numpy data and simple indexing feature. If users call APIs which may compromise the performance, index_data will raise Errors.
"""
from typing import Dict, Tuple, Union, Callable, List

View File

@@ -203,10 +203,10 @@ def get_valid_value(series, last=True):
"""get the first/last not nan value of pd.Series with single level index
Parameters
----------
series : pd.Seires
series : pd.Series
series should not be empty
last : bool, optional
wether to get the last valid value, by default True
whether to get the last valid value, by default True
- if last is True, get the last valid value
- else, get the first valid value

View File

@@ -88,7 +88,7 @@ class Experiment:
def search_records(self, **kwargs):
"""
Get a pandas DataFrame of records that fit the search criteria of the experiment.
Inputs are the search critera user want to apply.
Inputs are the search criteria user want to apply.
Returns
-------

View File

@@ -105,7 +105,7 @@ class ExpManager:
def search_records(self, experiment_ids=None, **kwargs):
"""
Get a pandas DataFrame of records that fit the search criteria of the experiment.
Inputs are the search critera user want to apply.
Inputs are the search criteria user want to apply.
Returns
-------

View File

@@ -75,7 +75,7 @@ class RecordUpdater(metaclass=ABCMeta):
class DSBasedUpdater(RecordUpdater, metaclass=ABCMeta):
"""
Dataset-Based Updater
- Provding updating feature for Updating data based on Qlib Dataset
- Providing updating feature for Updating data based on Qlib Dataset
Assumption
- Based on Qlib dataset

View File

@@ -116,7 +116,7 @@ class RecordTemp:
"""
Check if the records is properly generated and saved.
It is useful in following examples
- checking if the depended files complete before genrating new things.
- checking if the depended files complete before generating new things.
- checking if the final files is completed
Parameters

View File

@@ -9,5 +9,5 @@ A typical task workflow
|-----------------------+------------------------------------------------|
| TaskGen | Generating tasks. |
| TaskManager(optional) | Manage generated tasks |
| run task | retrive tasks from TaskManager and run tasks. |
| run task | retrieve tasks from TaskManager and run tasks. |
"""

View File

@@ -272,7 +272,7 @@ class RollingGen(TaskGen):
class MultiHorizonGenBase(TaskGen):
def __init__(self, horizon: List[int] = [5], label_leak_n=2):
"""
This task generator tries to genrate tasks for different horizons based on an existing task
This task generator tries to generate tasks for different horizons based on an existing task
Parameters
----------

View File

@@ -48,7 +48,7 @@ class TaskManager:
The tasks manager assumes that you will only update the tasks you fetched.
The mongo fetch one and update will make it date updating secure.
This class can be used as a tool from commandline. Here are serveral examples.
This class can be used as a tool from commandline. Here are several examples.
You can view the help of manage module with the following commands:
python -m qlib.workflow.task.manage -h # show manual of manage module CLI
python -m qlib.workflow.task.manage wait -h # show manual of the wait command of manage
@@ -368,7 +368,7 @@ class TaskManager:
def return_task(self, task, status=STATUS_WAITING):
"""
Return a task to status. Alway using in error handling.
Return a task to status. Always using in error handling.
Args:
task ([type]): [description]

View File

@@ -103,7 +103,7 @@ class FundCollector(BaseCollector):
error_msg = f"{symbol}-{interval}-{start}-{end}"
try:
# TODO: numberOfHistoricalDaysToCrawl should be bigger enouhg
# TODO: numberOfHistoricalDaysToCrawl should be bigger enough
url = INDEX_BENCH_URL.format(
index_code=symbol, numberOfHistoricalDaysToCrawl=10000, startDate=start, endDate=end
)

View File

@@ -360,7 +360,7 @@ def get_en_fund_symbols(qlib_data_path: [str, Path] = None) -> list:
_symbols = []
for sub_data in re.findall(r"[\[](.*?)[\]]", resp.content.decode().split("= [")[-1].replace("];", "")):
data = sub_data.replace('"', "").replace("'", "")
# TODO: do we need other informations, like fund_name from ['000001', 'HXCZHH', '华夏成长混合', '混合型', 'HUAXIACHENGZHANGHUNHE']
# TODO: do we need other information, like fund_name from ['000001', 'HXCZHH', '华夏成长混合', '混合型', 'HUAXIACHENGZHANGHUNHE']
_symbols.append(data.split(",")[0])
except Exception as e:
logger.warning(f"request error: {e}")