Promptmeteo Usage - Save and Load Model#
LLMs are truly revolutionizing the world, enabling humans to do things we couldn’t do before, or making them so much easier and faster to do.
Promptmeteo leverages the power of LLMs to democratize the data science process. This means you can easily train a model and make predictions with it in a few, comfortable steps.
In this tutorial, you are going to create a model to perform document classification (a classic NLP task), train it with a manageable amount of data, save it, load it and use it to classify new documents.
1. Create a model#
1.1 Instance a DocumentClassifier#
Luckily, Promptmeteo has a specific task to do document classification. You only have to instance the class DocumentClassifier, and it’s very easy to do so. You have to set: - language
: the language in which the task is going to be performed (currently, only English or Spanish are supported) - model_name
and model_provider_name
: the model and provider you want to use (currently, these are
supported) - prompt_labels
: the categories in which you need to classify the documents (free choice)
Let’s use google/flan-t5-small
, which is free, in English, and go for the classic sentiment classification, that is, using the positive
and negative
labels to classify opinions.
[21]:
import sys; sys.path.append('..')
from promptmeteo import DocumentClassifier
clf = DocumentClassifier(
language = 'en',
model_provider_name = 'hf_pipeline',
model_name = 'google/flan-t5-small',
prompt_labels = ['positive', 'negative']
)
1.2 Prompts#
The first thing you need to know about Promptmeteo is that we think prompts should be treated with the same respect as code. Therefore, they shouldn’t be hard-coded or taken lightly; instead, they should be carefully designed and properly versioned. In the end, they are the way we communicate with the LLMs. That means a mistake in a prompt can result in major unwanted outputs, while a really good prompt can unleash all the power from the LLM.
That said, prompt templates in Promptmeteo are written in YAML and saved in files. So first, we need to write a helper function to print YAML files in a proper way.
[22]:
import yaml
def prompt_print(prompt: str):
"""Prints YAML prompts in a nice way.
"""
yaml_prompt = yaml.safe_load(prompt)
for key, value in yaml_prompt.items():
print(key, "\n", value)
1.2.1 Prompt templates#
Promptmeteo has predefined prompt templates for each of the available languages, models and tasks. This allows the user not to wonder about defining the perfect prompt, but rather to parametrize the prompt template with the parameters of the use case, without neglecting the prompts.
Let’s have a look into the prompt template for the document classification task in English with FlanTL:
[23]:
prompt_print(clf.task.prompt.PROMPT_EXAMPLE)
TEMPLATE
I need you to help me with a text classification task. {__PROMPT_DOMAIN__} {__PROMPT_LABELS__}
{__CHAIN_THOUGHT__} {__ANSWER_FORMAT__} {__SHOT_EXAMPLES__} {__PROMPT_SAMPLE__}
PROMPT_DOMAIN
The texts you will be processing are from the {__DOMAIN__} domain.
PROMPT_LABELS
I want you to classify the texts into one of the following categories: {__LABELS__}.
PROMPT_DETAIL
SHOT_EXAMPLES
Examples:
{__EXAMPLES__}
PROMPT_SAMPLE
{__SAMPLE__}
CHAIN_THOUGHT
ANSWER_FORMAT
In your response, include only the name of the class predicted.
1.2.2 Prompt texts#
The prompt template is used to build a prompt text, which is the final text to pass to the LLM. We haven’t provided any example yet, only the labels, so it looks like this:
[50]:
print(clf.task.prompt.template)
I need you to help me with a text classification task. The texts you will be processing are from the domain. I want you to classify the texts into one of the following categories: positive, negative.
In your response, include only the name of the class predicted. Examples:
{__EXAMPLES__}
{__SAMPLE__}
We could use the above text as the prompt as is, it’s a valid approach called zero-shot prompting. But usually we will need to show the LLM some examples of what we want it to do; and that means we need to train it.
2. Train a model#
2.1 train()
function#
As in any data science pipeline, you can train models in Promptmeteo. You simply have to give examples to it using the train()
function, providing the texts (examples
) and their expected classification (annotations
):
[25]:
clf = clf.train(
examples = ['i am happy', 'I like it', 'I hate it'],
annotations = ['positive', 'positive', 'negative'],
)
2.2 Examples injection#
Each example is added to the prompt to help the model improve the answers. When the number of examples is low, this technique is called few-shot prompting.
These examples should be chosen from those that are more related to the new sample passed for making inference. We can see that now the prompt with the examples has the following aspect:
[52]:
example_for_inference = 'I love it'
print(clf.task._get_prompt(example_for_inference))
I need you to help me with a text classification task. The texts you will be processing are from the domain. I want you to classify the texts into one of the following categories: positive, negative.
In your response, include only the name of the class predicted. Examples:
I like it
positive
I hate it
negative
i am happy
positive
I love it
2.3 Save model#
The examples passed to the train()
function are saved in a vectorstore in local (with FAISS) and this vectorstore can be serialized to disk. Saving these examples in disk allows us to easily reuse them for new use cases, without having to retrieve the original data again.
[27]:
clf.save_model('my_classifier.meteo')
[27]:
<promptmeteo.document_classifier.DocumentClassifier at 0x7ff4221a8ee0>
3. Load a model#
3.1 Model creation#
Now that we have saved a model, we can load it. To load a model we have to intantiate a DocumentClassifier
as we did before and use the function load_model()
:
[28]:
from promptmeteo import DocumentClassifier
new_clf = DocumentClassifier(
language = 'en',
model_provider_name = 'hf_pipeline',
model_name = 'google/flan-t5-small',
prompt_labels = ['positive', 'negative']
).load_model('my_classifier.meteo')
3.2 Predict new data#
And now we are ready to predict labels for new data! By calling the function predict()
we can use the prompt created with the examples to predict the classification over new data:
[53]:
new_clf.predict(['so cool!!'])
[53]:
[['positive']]
4 Conclusions#
In this example we have shown how Promptmeteo can be used as a machine learning framework such as Scikit-Learn or Pytorch. It has a similar interface which allows to save the results from the training in a binary file and reuse it. This eases the integration of the LLM solution in ML pipeline tools such as Sagemaker or Vertex.
Promptmeteo does not only include code data to simplify the integration of LLM model and services. It also includes predefined prompt engineering logic for different models and tasks. It allows to focus on developing a solution rather than writting prompts, and ensures that the prompt has been tested correctly by Promptmeteo, which makes this kind of solution less error-prone.