1
0
mirror of https://github.com/microsoft/qlib.git synced 2026-07-02 02:21:18 +08:00

move prompt templates to yaml file to make code clean

This commit is contained in:
Xu Yang
2023-06-13 15:21:19 +08:00
parent 01accec24c
commit 80fbc00792
3 changed files with 303 additions and 295 deletions

View File

@@ -0,0 +1,10 @@
from jinja2 import Template
from qlib.finco.utils import Singleton
import yaml
class PormptTemplate(Singleton):
def __init__(self) -> None:
super().__init__()
_template = yaml.load(open("./prompt_template.yaml", "r"), Loader=yaml.FullLoader)
for k, v in _template.items():
self.__setattr__(k, Template(v))

View File

@@ -0,0 +1,263 @@
WorkflowTask_system : |-
Your goal is to determine the appropriate workflow (supervised learning or reinforcement learning) for a given user requirement in Qlib. The user will provide a statement of their requirements, and you will provide a clear and concise response indicating the optimal workflow.
Please provide the output in the following format: "workflow: [supervised learning/reinforcement learning]". You should not provide additional explanations or engage in conversation with the user.
Please note that your response should be based solely on the user's requirements and should consider factors such as the complexity of the task, the type and amount of data available, and the desired outcome.
Example input 1:
Help me build a low turnover quant investment strategy that focus more on long turn return in China a stock market.
Example output 1:
workflow: supervised learning
Example input 2:
Help me build a pipeline to determine the best selling point of a stock in a day or half a day in USA stock market.
Example output 2:
workflow: reinforcement learning
WorkflowTask_user : |-
User input: '{{user_prompt}}'
Please provide the workflow in Qlib (supervised learning or reinforcement learning) ensureing the workflow can meet the user's requirements.
Response only with the output in the exact format specified in the system prompt, with no explanation or conversation.
SLPlanTask_system : |-
Your task is to design the 5 crucial components in Qlib (Dataset, Model, Record, Strategy, Backtest) ensuring the workflow can meet the user's requirements.
For each component, you first point out whether to use default module in Qlib or implement the new module (Default or Personized). Default module means the class has already be implemented by Qlib which can be found in document and source code. Default class can be directed called from config file without additional implementation. Personized module means new python class is implemented and called from config file. You should always provide the reason of your choice.
The user will provide the requirements, you will provide only the output the choice in exact format specified below with no explanation or conversation. You only response 5 components in the order of dataset, model, record, strategy, backtest with no other addition.
Example input:
Help me build a low turnover quant investment strategy that focus more on long turn return in China a stock market. I have some data in csv format and I want to merge them with the data in Qlib.
Example output:
components:
- Dataset: (Personized) I will implement a CustomDataset inherited from qlib.data.dataset and exposed a api to load user's csv file. I will check the format of user's data and align them with Qlib data. Because it is a suitable dataset to get a long turn return in China A stock market.
- Model: (Default) I will use LGBModel in qlib.contrib.model.gbdt and choose more robust hyperparameters to focus on long-term return. Because tree model is more stable than NN models and is more unlikely to be over converged.
- Record: (Default) I will use SignalRecord in qlib.workflow.record_temp and SigAnaRecord in qlib.workflow.record_temp to save all the signals and the analysis results. Because user needs to check the metrics to determine whether the system meets the requirements.
- Strategy: (Default) I will use TopkDropoutStrategy in qlib.contrib.strategy. Because it is a more robust strategy which saves turnover fee.
- Backtest: (Default) I will use the default backtest module in Qlib. Because it can tell the user a more real performance result of the model we build.
SLPlanTask_user : |-
User input: '{{user_prompt}}'
Please provide the 5 crucial components in Qlib (dataset, model, record, strategy, backtest) ensureing the workflow can meet the user's requirements.
Response only with the output in the exact format specified in the system prompt, with no explanation or conversation.
RecorderTask_system : |-
You are an expert system administrator.
Your task is to select the best analysis class based on user intent from this list:
{{ANALYZERS_list}}
Their description are:
{{ANALYZERS_DOCS}}
Response only with the Analyser name provided above with no explanation or conversation. if there are more than
one analyser, separate them by ","
RecorderTask_user : |-
{{user_prompt}},
The analyzers you select should separate by ",", such as: "HFAnalyzer", "SignalAnalyzer"
CMDTask_system : |-
You are an expert system administrator.
Your task is to convert the user's intention into a specific runnable command for a particular system.
Example input:
- User intention: Copy the folder from a/b/c to d/e/f
- User OS: Linux
Example output:
cp -r a/b/c d/e/f
CMDTask_user : |-
Example input:
- User intention: "{{cmd_intention}}"
- User OS: "{{user_os}}"
Example output:
ConfigActionTask_system : |-
Your task is to write the config of target component in Qlib(Dataset, Model, Record, Strategy, Backtest).
Config means the yaml file in Qlib. You can find the default config in qlib/contrib/config_template. You can also find the config in Qlib document. You should provide the content in exact yaml format with no other addition.
The user has provided the requirements and made plan and reason to each component. You should strictly follow user's plan and you should provide the reason of your hyperparameter choices if exist and some suggestion if user wants to finetune the hyperparameters after the config. Default means you should only use classes in Qlib without any other new code while Personized has no such restriction. class in Qlib means Qlib has implemented the class and you can find it in Qlib document or source code.
"Config", "Reason" and "Improve suggestion" should always be provided with exactly the same.
You only need to write the config of the target component in the exact format specified below with no explanation or conversation.
Example input:
user requirement: Help me build a low turnover quant investment strategy that focus more on long turn return in China a stock market. I have some data in csv format and I want to merge them with the data in Qlib.
user plan:
- Dataset: (Personized) I will implement a CustomDataset imherited from qlib.data.dataset and exposed a api to load user's csv file. I will check the format of user's data and align them with Qlib data. Because it is a suitable dataset to get a long turn return in China A stock market.
- Model: (Default) I will use LGBModel in qlib.contrib.model.gbdt and choose more robust hyperparameters to focus on long-term return. Because tree model is more stable than NN models and is more unlikely to be over converged.
- Record: (Default) I will use SignalRecord in qlib.workflow.record_temp and SigAnaRecord in qlib.workflow.record_temp to save all the signals and the analysis results. Because user needs to check the metrics to determine whether the system meets the requirements.
- Strategy: (Default) I will use TopkDropoutStrategy in qlib.contrib.strategy. Because it is a more robust strategy which saves turnover fee.
- Backtest: (Default) I will use the default backtest module in Qlib. Because it can tell the user a more real performance result of the model we build.
target component: Model
Example output:
Config:
```yaml
model:
class: LGBModel
module_path: qlib.contrib.model.gbdt
kwargs:
loss: mse
colsample_bytree: 0.8879
learning_rate: 0.2
subsample: 0.8789
lambda_l1: 205.6999
lambda_l2: 580.9768
max_depth: 8
num_leaves: 210
num_threads: 20
```
Reason: I choose the hyperparameters above because they are the default hyperparameters in Qlib and they are more robust than other hyperparameters.
Improve suggestion: You can try to tune the num_leaves in range [100, 300], max_depth in [5, 10], learning_rate in [0.01, 1] and other hyperparameters in the config. Since you're trying to get a long tern return, if you have enough computation resource, you can try to use a larger num_leaves and max_depth and a smaller learning_rate.
ConfigActionTask_user : |-
user requirement: {{user_requirement}}
user plan:
- Dataset: ({{dataset_decision}}) {{dataset_plan}}
- Model: ({{model_decision}}) {{model_plan}}
- Record: ({{record_decision}}) {{record_plan}}
- Strategy: ({{strategy_decision}}) {{strategy_plan}}
- Backtest: ({{backtest_decision}}) {{backtest_plan}}
target component: {{target_component}}
ImplementActionTask_system : |-
Your task is to write python code and give some reasonable explanation. The code is the implementation of a key component of Qlib(Dataset, Model, Record, Strategy, Backtest).
The user has provided the requirements and made plan and reason to each component. You should strictly follow user's plan. The user also provides the config which includes the class name and module name, your class name should be same as user's config.
Its strongly recommended that you implement a class which inherit from a class in Qlib and only modify some functions of it to meet user's requirement. After the code, you should write the explanation of your code. It contains the core idea of your code. Finally, you should provide a updated version of user's config to meet your implementation. The modification mainly focuses on kwargs to the new implemented classes. You can output same config as user input is nothing needs to change. You should provide the content in exact yaml format with no other addition.
You response should always contain "Code", "Explanation", "Modified config" with exactly the same characters.
You only need to write the code of the target component in the exact format specified below with no conversation.
Example input:
user requirement: Help me build a low turnover quant investment strategy that focus more on long turn return in China a stock market. I have some data in csv format and I want to merge them with the data in Qlib.
user plan:
- Dataset: (Personized) I will implement a CustomDataset imherited from qlib.data.dataset and exposed a api to load user's csv file. I will check the format of user's data and align them with Qlib data. Because it is a suitable dataset to get a long turn return in China A stock market.
- Model: (Default) I will use LGBModel in qlib.contrib.model.gbdt and choose more robust hyperparameters to focus on long-term return. Because tree model is more stable than NN models and is more unlikely to be over converged.
- Record: (Default) I will use SignalRecord in qlib.workflow.record_temp and SigAnaRecord in qlib.workflow.record_temp to save all the signals and the analysis results. Because user needs to check the metrics to determine whether the system meets the requirements.
- Strategy: (Default) I will use TopkDropoutStrategy in qlib.contrib.strategy. Because it is a more robust strategy which saves turnover fee.
- Backtest: (Default) I will use the default backtest module in Qlib. Because it can tell the user a more real performance result of the model we build.
User config:
```yaml
dataset:
class: CustomDataset
module_path: path.to.your.custom_dataset_module
kwargs:
handler:
class: CSVMergeHandler
module_path: path.to.your.csv_merge_handler_module
kwargs:
csv_path: path/to/your/csv/data
```
target component: Dataset
Example output:
Code:
```python
import pandas as pd
from qlib.data.dataset import DatasetH
from qlib.data.dataset.handler import DataHandlerLP
class CSVMergeHandler(DataHandlerLP):
def __init__(self, csv_path, **kwargs):
super().__init__(**kwargs)
self.csv_data = pd.read_csv(csv_path)
def load_all(self):
qlib_data = super().load_all()
merged_data = qlib_data.merge(self.csv_data, on=["date", "instrument"], how="left")
return merged_data
class CustomDataset(DatasetH):
def __init__(self, handler):
super().__init__(handler)
```
Explanation:
In this implementation, the CSVMergeHandler class inherits from DataHandlerLP and overrides the load_all method to merge the csv data with Qlib data. The CustomDataset class inherits from DatasetH and takes the handler as an argument.
Modified config:
```yaml
dataset:
class: CustomDataset
module_path: custom_dataset
kwargs:
handler:
class: CSVMergeHandler
module_path: custom_dataset
kwargs:
csv_path: path/to/your/csv/data
```
ImplementActionTask_user : |-
user requirement: {{user_requirement}}
user plan:
- Dataset: ({{dataset_decision}}) {{dataset_plan}}
- Model: ({{model_decision}}) {{model_plan}}
- Record: ({{record_decision}}) {{record_plan}}
- Strategy: ({{strategy_decision}}) {{strategy_plan}}
- Backtest: ({{backtest_decision}}) {{backtest_plan}}
User config:
```yaml
{{user_config}}
```
target component: {{target_component}}
SummarizeTask_system : |-
You are an expert in quant domain.
Your task is to help user to analysis the output of qlib, your main focus is on the backtesting metrics of
user strategies. Warnings reported during runtime can be ignored if deemed appropriate.
your information including the strategy's backtest log and runtime log.
You may receive some scripts of the codes as well, you can use them to analysis the output.
At the same time, you can also use your knowledge of the Microsoft/Qlib project and finance to complete your tasks.
If there are any abnormal areas in the log or scripts, please also point them out.
Example output 1:
The matrix in log shows that your strategy's max draw down is a bit large, based on your annualized return,
your strategy has a relatively low Sharpe ratio. Here are a few suggestions:
You can try diversifying your positions across different assets.
Images:
![HFAnalyzer](file:///D:/Codes/NLP/qlib/finco/finco_workspace/HFAnalyzer.jpeg)
Example output 2:
The output log shows the result of running `qlib` with `LinearModel` strategy on the Chinese stock market CSI 300
from 2008-01-01 to 2020-08-01, based on the Alpha158 data handler from 2015-01-01. The strategy involves using the
top 50 instruments with the highest signal scores and randomly dropping some of them (5 by default) to enhance
robustness. The backtesting result is shown in the table below:
| Metrics | Value |
| ------- | ----- |
| IC | 0.040 |
| ICIR | 0.312 |
| Long-Avg Ann Return | 0.093 |
| Long-Avg Ann Sharpe | 0.462 |
| Long-Short Ann Return | 0.245 |
| Long-Short Ann Sharpe | 4.098 |
| Rank IC | 0.048 |
| Rank ICIR | 0.370 |
It should be emphasized that:
You should output a report, the format of your report is Markdown format.
Please list as much data as possible in the report,
and you should present more data in tables of markdown format as much as possible.
The numbers in the report do not need to have too many significant figures.
You can add subheadings and paragraphs in Markdown for readability.
You can bold or use other formatting options to highlight keywords in the main text.
You should display images I offered in markdown using the appropriate image format.
SummarizeTask_user : |-
Here is my information: '{{information}}'
My intention is: {{user_prompt}}. Please provide me with a summary and recommendation based on my intention and the information I have provided. There are some figures which absolute path are: {{figure_path}}, You must display these images in markdown using the appropriate image format.

View File

@@ -14,6 +14,7 @@ import platform
from qlib.log import get_module_logger
from qlib.finco.llm import APIBackend
from qlib.finco.tpl import get_tpl_path
from qlib.finco.prompt_template import PormptTemplate
from qlib.workflow.record_temp import HFSignalRecord, SignalRecord
from qlib.contrib.analyzer import HFAnalyzer, SignalAnalyzer
from qlib.utils import init_instance_by_config
@@ -39,6 +40,7 @@ class Task:
def __init__(self) -> None:
self._context_manager = None
self.prompt_template = PormptTemplate()
self.executed = False
self.logger: logging.Logger = get_module_logger(f"finco.{self.__class__.__name__}")
@@ -74,6 +76,14 @@ class Task:
"""In continous mode, this method will not be called and the next task will be determined by the execution method only"""
raise NotImplementedError("The interact method is not implemented, but workflow not in continous mode")
@property
def system(self):
return self.prompt_template.__getattribute__(self.__class__.__name__ + "_system")
@property
def user(self):
return self.prompt_template.__getattribute__(self.__class__.__name__ + "_user")
class WorkflowTask(Task):
"""This task is supposed to be the first task of the workflow"""
@@ -82,43 +92,18 @@ class WorkflowTask(Task):
self,
) -> None:
super().__init__()
self.__DEFAULT_WORKFLOW_SYSTEM_PROMPT = """
Your goal is to determine the appropriate workflow (supervised learning or reinforcement learning) for a given user requirement in Qlib. The user will provide a statement of their requirements, and you will provide a clear and concise response indicating the optimal workflow.
Please provide the output in the following format: "workflow: [supervised learning/reinforcement learning]". You should not provide additional explanations or engage in conversation with the user.
Please note that your response should be based solely on the user's requirements and should consider factors such as the complexity of the task, the type and amount of data available, and the desired outcome.
Example input 1:
Help me build a low turnover quant investment strategy that focus more on long turn return in China a stock market.
Example output 1:
workflow: supervised learning
Example input 2:
Help me build a pipeline to determine the best selling point of a stock in a day or half a day in USA stock market.
Example output 2:
workflow: reinforcement learning
"""
self.__DEFAULT_WORKFLOW_USER_PROMPT = (
"User input: '{{user_prompt}}'\n"
"Please provide the workflow in Qlib (supervised learning or reinforcement learning) ensureing the workflow can meet the user's requirements.\n"
"Response only with the output in the exact format specified in the system prompt, with no explanation or conversation.\n"
)
def execute(
self,
) -> List[Task]:
"""make the choice which main workflow (RL, SL) will be used"""
user_prompt = self._context_manager.get_context("user_prompt")
prompt_workflow_selection = Template(self.__DEFAULT_WORKFLOW_USER_PROMPT).render(user_prompt=user_prompt)
prompt_workflow_selection = self.user.render(user_prompt=user_prompt)
response = APIBackend().build_messages_and_create_chat_completion(
prompt_workflow_selection, self.__DEFAULT_WORKFLOW_SYSTEM_PROMPT
prompt_workflow_selection, self.system.render()
)
self.save_chat_history_to_context_manager(
prompt_workflow_selection, response, self.__DEFAULT_WORKFLOW_SYSTEM_PROMPT
prompt_workflow_selection, response, self.system.render()
)
workflow = response.split(":")[1].strip().lower()
self.executed = True
@@ -161,33 +146,6 @@ class SLPlanTask(PlanTask):
self,
) -> None:
super().__init__()
self.__DEFAULT_WORKFLOW_SYSTEM_PROMPT = """
Your task is to design the 5 crucial components in Qlib (Dataset, Model, Record, Strategy, Backtest) ensuring the workflow can meet the user's requirements.
For each component, you first point out whether to use default module in Qlib or implement the new module (Default or Personized). Default module means the class has already be implemented by Qlib which can be found in document and source code. Default class can be directed called from config file without additional implementation. Personized module means new python class is implemented and called from config file. You should always provide the reason of your choice.
The user will provide the requirements, you will provide only the output the choice in exact format specified below with no explanation or conversation. You only response 5 components in the order of dataset, model, record, strategy, backtest with no other addition.
Example input:
Help me build a low turnover quant investment strategy that focus more on long turn return in China a stock market. I have some data in csv format and I want to merge them with the data in Qlib.
Example output:
components:
- Dataset: (Personized) I will implement a CustomDataset inherited from qlib.data.dataset and exposed a api to load user's csv file. I will check the format of user's data and align them with Qlib data. Because it is a suitable dataset to get a long turn return in China A stock market.
- Model: (Default) I will use LGBModel in qlib.contrib.model.gbdt and choose more robust hyperparameters to focus on long-term return. Because tree model is more stable than NN models and is more unlikely to be over converged.
- Record: (Default) I will use SignalRecord in qlib.workflow.record_temp and SigAnaRecord in qlib.workflow.record_temp to save all the signals and the analysis results. Because user needs to check the metrics to determine whether the system meets the requirements.
- Strategy: (Default) I will use TopkDropoutStrategy in qlib.contrib.strategy. Because it is a more robust strategy which saves turnover fee.
- Backtest: (Default) I will use the default backtest module in Qlib. Because it can tell the user a more real performance result of the model we build.
"""
self.__DEFAULT_WORKFLOW_USER_PROMPT = (
"User input: '{{user_prompt}}'\n"
"Please provide the 5 crucial components in Qlib (dataset, model, record, strategy, backtest) ensureing the workflow can meet the user's requirements.\n"
"Response only with the output in the exact format specified in the system prompt, with no explanation or conversation.\n"
)
def execute(self):
workflow = self._context_manager.get_context("workflow")
@@ -195,11 +153,11 @@ components:
user_prompt = self._context_manager.get_context("user_prompt")
assert user_prompt is not None, "The user prompt is not provided"
prompt_plan_all = Template(self.__DEFAULT_WORKFLOW_USER_PROMPT).render(user_prompt=user_prompt)
prompt_plan_all = self.user.render(user_prompt=user_prompt)
response = APIBackend().build_messages_and_create_chat_completion(
prompt_plan_all, self.__DEFAULT_WORKFLOW_SYSTEM_PROMPT
prompt_plan_all, self.system.render()
)
self.save_chat_history_to_context_manager(prompt_plan_all, response, self.__DEFAULT_WORKFLOW_SYSTEM_PROMPT)
self.save_chat_history_to_context_manager(prompt_plan_all, response, self.system.render())
if "components" not in response:
self.logger.warning(
"The response is not in the correct format, which probably means the answer is not correct"
@@ -263,33 +221,17 @@ class RecorderTask(Task):
# __ANALYZERS_PROJECT = {SignalAnalyzer.__name__: SignalRecord}
# __ANALYZERS_DOCS = {SignalAnalyzer.__name__: SignalAnalyzer.__doc__}
__DEFAULT_WORKFLOW_SYSTEM_PROMPT = f"""
You are an expert system administrator.
Your task is to select the best analysis class based on user intent from this list:
{list(__ANALYZERS_DOCS.keys())}
Their description are:
{__ANALYZERS_DOCS}
Response only with the Analyser name provided above with no explanation or conversation. if there are more than
one analyser, separate them by ","
"""
__DEFAULT_WORKFLOW_USER_PROMPT = """{{user_prompt}},
The analyzers you select should separate by ",", such as: "HFAnalyzer", "SignalAnalyzer"
"""
def __init__(self):
super().__init__()
self._output = None
def execute(self):
prompt = Template(self.__DEFAULT_WORKFLOW_USER_PROMPT).render(
prompt = self.user.render(
user_prompt=self._context_manager.get_context("user_prompt")
)
be = APIBackend()
be.debug_mode = False
response = be.build_messages_and_create_chat_completion(prompt, self.__DEFAULT_WORKFLOW_SYSTEM_PROMPT)
response = be.build_messages_and_create_chat_completion(prompt, self.system.render(ANALYZERS_list=list(self.__ANALYZERS_DOCS.keys()), ANALYZERS_DOCS=self.__ANALYZERS_DOCS))
# it's better to move to another Task
workflow_config = (
@@ -340,33 +282,17 @@ class CMDTask(ActionTask):
"""
This CMD task is responsible for ensuring compatibility across different operating systems.
"""
__DEFAULT_WORKFLOW_SYSTEM_PROMPT = """
You are an expert system administrator.
Your task is to convert the user's intention into a specific runnable command for a particular system.
Example input:
- User intention: Copy the folder from a/b/c to d/e/f
- User OS: Linux
Example output:
cp -r a/b/c d/e/f
"""
__DEFAULT_WORKFLOW_USER_PROMPT = """
Example input:
- User intention: "{{cmd_intention}}"
- User OS: "{{user_os}}"
Example output:
"""
def __init__(self, cmd_intention: str, cwd=None):
self.cwd = cwd
self.cmd_intention = cmd_intention
self._output = None
super().__init__()
def execute(self):
prompt = Template(self.__DEFAULT_WORKFLOW_USER_PROMPT).render(
prompt = self.user.render(
cmd_intention=self.cmd_intention, user_os=platform.system()
)
response = APIBackend().build_messages_and_create_chat_completion(prompt, self.__DEFAULT_WORKFLOW_SYSTEM_PROMPT)
response = APIBackend().build_messages_and_create_chat_completion(prompt, self.system.render())
self._output = subprocess.check_output(response, shell=True, cwd=self.cwd)
return []
@@ -381,59 +307,6 @@ class ConfigActionTask(ActionTask):
def __init__(self, component) -> None:
super().__init__()
self.target_componet = component
# TODO check if it's necessary that we input the plan of all components
self.__DEFAULT_CONFIG_ACTION_SYSTEM_PROMPT = """
Your task is to write the config of target component in Qlib(Dataset, Model, Record, Strategy, Backtest).
Config means the yaml file in Qlib. You can find the default config in qlib/contrib/config_template. You can also find the config in Qlib document. You should provide the content in exact yaml format with no other addition.
The user has provided the requirements and made plan and reason to each component. You should strictly follow user's plan and you should provide the reason of your hyperparameter choices if exist and some suggestion if user wants to finetune the hyperparameters after the config. Default means you should only use classes in Qlib without any other new code while Personized has no such restriction. class in Qlib means Qlib has implemented the class and you can find it in Qlib document or source code.
"Config", "Reason" and "Improve suggestion" should always be provided with exactly the same.
You only need to write the config of the target component in the exact format specified below with no explanation or conversation.
Example input:
user requirement: Help me build a low turnover quant investment strategy that focus more on long turn return in China a stock market. I have some data in csv format and I want to merge them with the data in Qlib.
user plan:
- Dataset: (Personized) I will implement a CustomDataset imherited from qlib.data.dataset and exposed a api to load user's csv file. I will check the format of user's data and align them with Qlib data. Because it is a suitable dataset to get a long turn return in China A stock market.
- Model: (Default) I will use LGBModel in qlib.contrib.model.gbdt and choose more robust hyperparameters to focus on long-term return. Because tree model is more stable than NN models and is more unlikely to be over converged.
- Record: (Default) I will use SignalRecord in qlib.workflow.record_temp and SigAnaRecord in qlib.workflow.record_temp to save all the signals and the analysis results. Because user needs to check the metrics to determine whether the system meets the requirements.
- Strategy: (Default) I will use TopkDropoutStrategy in qlib.contrib.strategy. Because it is a more robust strategy which saves turnover fee.
- Backtest: (Default) I will use the default backtest module in Qlib. Because it can tell the user a more real performance result of the model we build.
target component: Model
Example output:
Config:
```yaml
model:
class: LGBModel
module_path: qlib.contrib.model.gbdt
kwargs:
loss: mse
colsample_bytree: 0.8879
learning_rate: 0.2
subsample: 0.8789
lambda_l1: 205.6999
lambda_l2: 580.9768
max_depth: 8
num_leaves: 210
num_threads: 20
```
Reason: I choose the hyperparameters above because they are the default hyperparameters in Qlib and they are more robust than other hyperparameters.
Improve suggestion: You can try to tune the num_leaves in range [100, 300], max_depth in [5, 10], learning_rate in [0.01, 1] and other hyperparameters in the config. Since you're trying to get a long tern return, if you have enough computation resource, you can try to use a larger num_leaves and max_depth and a smaller learning_rate.
"""
self.__CONFIG_ACTION_SYSTEM_PROMPT_TEMPLATE = """
user requirement: {{user_requirement}}
user plan:
- Dataset: ({{dataset_decision}}) {{dataset_plan}}
- Model: ({{model_decision}}) {{model_plan}}
- Record: ({{record_decision}}) {{record_plan}}
- Strategy: ({{strategy_decision}}) {{strategy_plan}}
- Backtest: ({{backtest_decision}}) {{backtest_plan}}
target component: {{target_component}}
"""
def execute(self):
user_prompt = self._context_manager.get_context("user_prompt")
@@ -445,7 +318,7 @@ target component: {{target_component}}
assert None not in prompt_element_dict.values(), "Some decision or plan is not set by plan maker"
config_prompt = Template(self.__CONFIG_ACTION_SYSTEM_PROMPT_TEMPLATE).render(
config_prompt = self.user.render(
user_requirement=user_prompt,
dataset_decision=prompt_element_dict["Dataset_decision"],
dataset_plan=prompt_element_dict["Dataset_plan"],
@@ -460,9 +333,9 @@ target component: {{target_component}}
target_component=self.target_componet,
)
response = APIBackend().build_messages_and_create_chat_completion(
config_prompt, self.__DEFAULT_CONFIG_ACTION_SYSTEM_PROMPT
config_prompt, self.system.render()
)
self.save_chat_history_to_context_manager(config_prompt, response, self.__DEFAULT_CONFIG_ACTION_SYSTEM_PROMPT)
self.save_chat_history_to_context_manager(config_prompt, response, self.system.render())
res = re.search(r"Config:(.*)Reason:(.*)Improve suggestion:(.*)", response, re.S)
assert (
res is not None and len(res.groups()) == 3
@@ -484,90 +357,6 @@ class ImplementActionTask(ActionTask):
def __init__(self, target_component) -> None:
super().__init__()
self.target_component = target_component
self.__DEFAULT_IMPLEMENT_ACTION_SYSTEM_PROMPT = """
Your task is to write python code and give some reasonable explanation. The code is the implementation of a key component of Qlib(Dataset, Model, Record, Strategy, Backtest).
The user has provided the requirements and made plan and reason to each component. You should strictly follow user's plan. The user also provides the config which includes the class name and module name, your class name should be same as user's config.
Its strongly recommended that you implement a class which inherit from a class in Qlib and only modify some functions of it to meet user's requirement. After the code, you should write the explanation of your code. It contains the core idea of your code. Finally, you should provide a updated version of user's config to meet your implementation. The modification mainly focuses on kwargs to the new implemented classes. You can output same config as user input is nothing needs to change. You should provide the content in exact yaml format with no other addition.
You response should always contain "Code", "Explanation", "Modified config" with exactly the same characters.
You only need to write the code of the target component in the exact format specified below with no conversation.
Example input:
user requirement: Help me build a low turnover quant investment strategy that focus more on long turn return in China a stock market. I have some data in csv format and I want to merge them with the data in Qlib.
user plan:
- Dataset: (Personized) I will implement a CustomDataset imherited from qlib.data.dataset and exposed a api to load user's csv file. I will check the format of user's data and align them with Qlib data. Because it is a suitable dataset to get a long turn return in China A stock market.
- Model: (Default) I will use LGBModel in qlib.contrib.model.gbdt and choose more robust hyperparameters to focus on long-term return. Because tree model is more stable than NN models and is more unlikely to be over converged.
- Record: (Default) I will use SignalRecord in qlib.workflow.record_temp and SigAnaRecord in qlib.workflow.record_temp to save all the signals and the analysis results. Because user needs to check the metrics to determine whether the system meets the requirements.
- Strategy: (Default) I will use TopkDropoutStrategy in qlib.contrib.strategy. Because it is a more robust strategy which saves turnover fee.
- Backtest: (Default) I will use the default backtest module in Qlib. Because it can tell the user a more real performance result of the model we build.
User config:
```yaml
dataset:
class: CustomDataset
module_path: path.to.your.custom_dataset_module
kwargs:
handler:
class: CSVMergeHandler
module_path: path.to.your.csv_merge_handler_module
kwargs:
csv_path: path/to/your/csv/data
```
target component: Dataset
Example output:
Code:
```python
import pandas as pd
from qlib.data.dataset import DatasetH
from qlib.data.dataset.handler import DataHandlerLP
class CSVMergeHandler(DataHandlerLP):
def __init__(self, csv_path, **kwargs):
super().__init__(**kwargs)
self.csv_data = pd.read_csv(csv_path)
def load_all(self):
qlib_data = super().load_all()
merged_data = qlib_data.merge(self.csv_data, on=["date", "instrument"], how="left")
return merged_data
class CustomDataset(DatasetH):
def __init__(self, handler):
super().__init__(handler)
```
Explanation:
In this implementation, the CSVMergeHandler class inherits from DataHandlerLP and overrides the load_all method to merge the csv data with Qlib data. The CustomDataset class inherits from DatasetH and takes the handler as an argument.
Modified config:
```yaml
dataset:
class: CustomDataset
module_path: custom_dataset
kwargs:
handler:
class: CSVMergeHandler
module_path: custom_dataset
kwargs:
csv_path: path/to/your/csv/data
```
"""
self.__DEFAULT_IMPLEMENT_ACTION_USER_PROMPT = """
user requirement: {{user_requirement}}
user plan:
- Dataset: ({{dataset_decision}}) {{dataset_plan}}
- Model: ({{model_decision}}) {{model_plan}}
- Record: ({{record_decision}}) {{record_plan}}
- Strategy: ({{strategy_decision}}) {{strategy_plan}}
- Backtest: ({{backtest_decision}}) {{backtest_plan}}
User config:
```yaml
{{user_config}}
```
target component: {{target_component}}
"""
def execute(self):
"""
@@ -585,7 +374,7 @@ target component: {{target_component}}
assert None not in prompt_element_dict.values(), "Some decision or plan is not set by plan maker"
config = self._context_manager.get_context(f"{self.target_component}_config")
implement_prompt = Template(self.__DEFAULT_IMPLEMENT_ACTION_USER_PROMPT).render(
implement_prompt = self.user.render(
user_requirement=user_prompt,
dataset_decision=prompt_element_dict["Dataset_decision"],
dataset_plan=prompt_element_dict["Dataset_plan"],
@@ -601,10 +390,10 @@ target component: {{target_component}}
user_config=config,
)
response = APIBackend().build_messages_and_create_chat_completion(
implement_prompt, self.__DEFAULT_IMPLEMENT_ACTION_SYSTEM_PROMPT
implement_prompt, self.system.render()
)
self.save_chat_history_to_context_manager(
implement_prompt, response, self.__DEFAULT_IMPLEMENT_ACTION_SYSTEM_PROMPT
implement_prompt, response, self.system.render()
)
res = re.search(r"Code:(.*)Explanation:(.*)Modified config:(.*)", response, re.S)
@@ -669,59 +458,6 @@ class YamlEditTask(ActionTask):
class SummarizeTask(Task):
__DEFAULT_WORKSPACE = "./"
__DEFAULT_WORKFLOW_SYSTEM_PROMPT = """
You are an expert in quant domain.
Your task is to help user to analysis the output of qlib, your main focus is on the backtesting metrics of
user strategies. Warnings reported during runtime can be ignored if deemed appropriate.
your information including the strategy's backtest log and runtime log.
You may receive some scripts of the codes as well, you can use them to analysis the output.
At the same time, you can also use your knowledge of the Microsoft/Qlib project and finance to complete your tasks.
If there are any abnormal areas in the log or scripts, please also point them out.
Example output 1:
The matrix in log shows that your strategy's max draw down is a bit large, based on your annualized return,
your strategy has a relatively low Sharpe ratio. Here are a few suggestions:
You can try diversifying your positions across different assets.
Images:
![HFAnalyzer](file:///D:/Codes/NLP/qlib/finco/finco_workspace/HFAnalyzer.jpeg)
Example output 2:
The output log shows the result of running `qlib` with `LinearModel` strategy on the Chinese stock market CSI 300
from 2008-01-01 to 2020-08-01, based on the Alpha158 data handler from 2015-01-01. The strategy involves using the
top 50 instruments with the highest signal scores and randomly dropping some of them (5 by default) to enhance
robustness. The backtesting result is shown in the table below:
| Metrics | Value |
| ------- | ----- |
| IC | 0.040 |
| ICIR | 0.312 |
| Long-Avg Ann Return | 0.093 |
| Long-Avg Ann Sharpe | 0.462 |
| Long-Short Ann Return | 0.245 |
| Long-Short Ann Sharpe | 4.098 |
| Rank IC | 0.048 |
| Rank ICIR | 0.370 |
It should be emphasized that:
You should output a report, the format of your report is Markdown format.
Please list as much data as possible in the report,
and you should present more data in tables of markdown format as much as possible.
The numbers in the report do not need to have too many significant figures.
You can add subheadings and paragraphs in Markdown for readability.
You can bold or use other formatting options to highlight keywords in the main text.
You should display images I offered in markdown using the appropriate image format.
"""
__DEFAULT_WORKFLOW_USER_PROMPT = (
"Here is my information: '{{information}}'\n"
"My intention is: {{user_prompt}}. Please provide me with a summary and "
"recommendation based on my intention and the information I have provided."
"There are some figures which absolute path are: {{figure_path}}, "
"You must display these images in markdown using the appropriate image format."
)
__DEFAULT_USER_PROMPT = "Summarize the information I offered and give me some advice."
# TODO: 2048 is close to exceed GPT token limit
@@ -739,21 +475,20 @@ class SummarizeTask(Task):
user_prompt = self._context_manager.get_context("user_prompt")
user_prompt = user_prompt if user_prompt is not None else self.__DEFAULT_USER_PROMPT
system_prompt = self.__DEFAULT_WORKFLOW_SYSTEM_PROMPT
file_info = self.get_info_from_file(workspace)
context_info = [] # too long context make response unstable.
figure_path = self.get_figure_path()
information = context_info + file_info
prompt_workflow_selection = Template(self.__DEFAULT_WORKFLOW_USER_PROMPT).render(
prompt_workflow_selection = self.user.render(
information=information, figure_path=figure_path, user_prompt=user_prompt
)
be = APIBackend()
be.debug_mode = False
response = be.build_messages_and_create_chat_completion(
user_prompt=prompt_workflow_selection, system_prompt=system_prompt
user_prompt=prompt_workflow_selection, system_prompt=self.system.render()
)
self.save_markdown(content=response)
return []
@@ -813,7 +548,7 @@ class SummarizeTask(Task):
for filename in files:
postfix = filename.split(".")[-1]
if postfix in ["jpeg"]:
file_path = os.path.join("./", filename)
file_path = os.path.join(root, filename)
file_list.append(str(Path(file_path).relative_to(self.workspace)))
return file_list