Task 1b: Perform Text Generation using a prompt that includes Context
In this notebook, you will learn how to generate an email response to a customer who was not happy with the quality of customer service they received from the customer support engineer. You will provide additional context to the model by including the contents of the actual email received from the unhappy customer.
You will add more complexity with the help of PromptTemplates to leverage the LangChain framework for a similar use case. PromptTemplates allow you to create generic shells which can be populated with information later and obtain model outputs based on different scenarios.
LangChain is a framework for developing applications powered by language models. The key aspects of this framework allow us to augment the Large Language Models by chaining together various components to create advanced use cases.
Due to the additional context in the prompt, the content produced in this notebook is of much better quality and relevance than the content produced earlier through zero-shot prompts. The prompt used in this notebook creates a custom LangChain prompt template for adding context to the text generation request.
Scenario
You are Bob, a Customer Service Manager at AnyCompany, and some of your customers are not happy with the customer service and are providing negative feedback on the service provided by customer support engineers. Now, you would like to respond to those customers humbly, apologizing for the poor service and regain their trust. You need the help of a Large Language Model (LLM) to generate a bulk of emails for you, which are human-friendly and personalized to the customer's sentiment from previous email correspondence.
In this scenario, you can leverage the power of LangChain's PromptTemplates to create a generic shell for generating personalized email responses based on the customer's previous email. The PromptTemplate will incorporate the customer's original email content, allowing the LLM to understand the context and sentiment, and then generate a relevant and customized response.
Task 1b.1: Environment setup
In this task, you set up your environment.
#Create a service client by name using the default session.
import json
import os
import sys
import warnings
import boto3
warnings.filterwarnings('ignore')
module_path = ".."
sys.path.append(os.path.abspath(module_path))
bedrock_client = boto3.client('bedrock-runtime',region_name=os.environ.get("AWS_DEFAULT_REGION", None))
Task 1b.2: Invoke the Bedrock LLM Model
In this task, you create an instance of the Bedrock class from llms. This expects a model_id which is the Amazon Resource Name (ARN) of the model available in Amazon Bedrock.
Optionally, you can pass a previously created boto3 client as well as some model_kwargs which can hold parameters such as temperature, top_p, max_token_count, or stop_sequences (more information on parameters can be explored in the Amazon Bedrock console).
Refer to documentation for Available text generation model Ids under Amazon Bedrock.
Note: The different models support different model_kwargs.
# Model configuration
from langchain_aws import ChatBedrock
from langchain_core.output_parsers import StrOutputParser
model_id = "meta.llama3-8b-instruct-v1:0"
model_kwargs = {
"max_gen_len": 512,
"temperature": 0,
"top_p": 1,
}
# LangChain class for chat
chat_model = ChatBedrock(
client=bedrock_client,
model_id=model_id,
model_kwargs=model_kwargs,
)
Task 1b.3: Create a LangChain custom prompt template
In this task, you will create a template for the prompt that you can pass different input variables on every run. This is useful when you have to generate content with different input variables that you may be fetching from a database.
In the previous task, we hardcoded the prompt. It might be the case that you have multiple customers sending similar negative feedback, and you now want to use each of those customers' emails and respond to them with an apology, but you also want to keep the response a bit personalized. In the following cell, you will explore how you can create a PromptTemplate to achieve this pattern.
# Create a prompt template that has multiple input variables
from langchain.prompts import PromptTemplate
multi_var_prompt = PromptTemplate(
input_variables=["customerServiceManager", "customerName", "feedbackFromCustomer"],
template="""
Human: Create an apology email from the Service Manager {customerServiceManager} at AnyCompany to {customerName} in response to the following feedback that was received from the customer:
<customer_feedback>
{feedbackFromCustomer}
</customer_feedback>
Assistant:"""
)
# Pass in values to the input variables
prompt = multi_var_prompt.format(customerServiceManager="Bob Smith",
customerName="John Doe",
feedbackFromCustomer="""Hello Bob,
I am very disappointed with the recent experience I had when I called your customer support.
I was expecting an immediate call back but it took three days for us to get a call back.
The first suggestion to fix the problem was incorrect. Ultimately the problem was fixed after three days.
We are very unhappy with the response provided and may consider taking our business elsewhere.
"""
)
# get number of tokens
num_tokens = chat_model.get_num_tokens(prompt)
print(f"Our prompt has {num_tokens} tokens")
Note: You can safely ignore the warnings and proceed to next cell.
#invoke
response = chat_model.invoke(prompt)
# Configure a Chain to parse output
chain = StrOutputParser()
formatted_response=chain.invoke(response)
print(formatted_response)
You have successfully learned that invoking the LLM without any context might not yield the desired results. By adding context and further using the prompt template to constrain the output from the LLM, you were able to successfully obtain your desired output.
Task 2a: Text summarization with small files with Titan Text Premier
In this notebook, you ingest a small string of text directly into the Amazon Bedrock API (using the Titan Text model) and instruct it to summarize the input text. You can apply this approach to summarize call transcripts, meeting transcripts, books, articles, blog posts, and other relevant content when the input text length is within the context size limits of the model.
Task 2a.1: Environment setup
In this task, you set up your environment.
#Create a service client by name using the default session.
import json
import os
import sys
import warnings
import boto3
import botocore
warnings.filterwarnings('ignore')
module_path = ".."
sys.path.append(os.path.abspath(module_path))
bedrock_client = boto3.client('bedrock-runtime',region_name=os.environ.get("AWS_DEFAULT_REGION", None))
Task 2a.2: Writing prompt with text to be summarized
In this task, you use a short passage of text with fewer tokens than the maximum length supported by the foundation model. As a sample input text for this lab, you use a paragraph from an AWS blog post announcing Amazon Bedrock.
The prompt starts with an instruction Please provide a summary of the following text..
prompt_data = """
Please provide a summary of the following text:
AWS took all of that feedback from customers, and today we are excited to announce Amazon Bedrock, \
a new service that makes FMs from AI21 Labs, Anthropic, Stability AI, and Amazon accessible via an API. \
Bedrock is the easiest way for customers to build and scale generative AI-based applications using FMs, \
democratizing access for all builders. Bedrock will offer the ability to access a range of powerful FMs \
for text and images—including Amazons Titan FMs, which consist of two new LLMs we’re also announcing \
today—through a scalable, reliable, and secure AWS managed service. With Bedrock’s serverless experience, \
customers can easily find the right model for what they’re trying to get done, get started quickly, privately \
customize FMs with their own data, and easily integrate and deploy them into their applications using the AWS \
tools and capabilities they are familiar with, without having to manage any infrastructure (including integrations \
with Amazon SageMaker ML features like Experiments to test different models and Pipelines to manage their FMs at scale).
"""
Task 2a.3: Creating request body with prompt and inference parameters
In this task, you create the request body with the above prompt and inference parameters.
# request body
body = json.dumps({
"inputText": prompt_data,
"textGenerationConfig":{
"maxTokenCount":2048,
"stopSequences":[],
"temperature":0,
"topP":0.9
}
})
Task 2a.4: Invoke foundation model via Boto3
In this task, you send an API request to Amazon Bedrock specifying the request parameters: modelId, accept, and contentType. Following the provided prompt, the foundation model in Amazon Bedrock then summarizes the input text.
Complete Output Generation
By default, the Amazon Bedrock service generates the entire summary for a given prompt in a single output. This can be slow if the model output contains many tokens.
#model configuration and invoke the model
modelId = 'amazon.titan-text-premier-v1:0' # change this to use a different version from the model provider
accept = 'application/json'
contentType = 'application/json'
outputText = "\n"
try:
response = bedrock_client.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())
outputText = response_body.get('results')[0].get('outputText')
except botocore.exceptions.ClientError as error:
if error.response['Error']['Code'] == 'AccessDeniedException':
print(f"\x1b[41m{error.response['Error']['Message']}\
\nTo troubeshoot this issue please refer to the following resources.\
\nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
\nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")
else:
raise error
print(outputText)
Streaming Output Generation
Next, you explore how to use Amazon Bedrock's invoke_model_with_response_stream API to stream model outputs so users can consume outputs as they are generated. Rather than generating the full output at once, this API returns a ResponseStream that sends smaller output chunks from the model as they are produced. You can display these streaming outputs in a continuous, consumable view.
#invoke model with response stream
modelId = 'amazon.titan-text-premier-v1:0'
response = bedrock_client.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)
stream = response.get('body')
output = list(stream)
output
from IPython.display import display_markdown,Markdown,clear_output
modelId = 'amazon.titan-text-premier-v1:0'
response = bedrock_client.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)
stream = response.get('body')
output = []
i = 1
if stream:
for event in stream:
chunk = event.get('chunk')
if chunk:
chunk_obj = json.loads(chunk.get('bytes').decode())
text = chunk_obj['outputText']
clear_output(wait=True)
output.append(text)
display_markdown(Markdown(''.join(output)))
i+=1
You have now experimented with using the boto3 SDK to access the Amazon Bedrock API. This SDK provides basic programmatic access to Bedrock capabilities. By leveraging this API, you were able to implement two use cases: 1) Generating an entire text summary of AWS news content at once, and 2) Streaming the summary output in chunks for incremental processing.
Task 2b: Abstractive Text Summarization
In this notebook, you manage challenges arising in large document summarization - input text can exceed model context lengths, generate hallucinated outputs, or trigger out-of-memory errors.
To mitigate these issues, this notebook demonstrates an architecture using prompt chunking and chaining with the LangChain framework, a toolkit enabling applications leveraging language models.
You explore an approach addressing scenarios when user documents surpass token limits. Chunking splits documents into segments under context length thresholds before sequentially feeding them to models. This chains prompts across chunks, retaining prior context. You apply this approach to summarize call transcripts, meetings transcripts, books, articles, blog posts, and other relevant content.
Task 2b.1: Environment setup
In this task, you set up your environment.
#Create a service client by name using the default session.
import json
import os
import sys
import boto3
module_path = ".."
sys.path.append(os.path.abspath(module_path))
bedrock_client = boto3.client('bedrock-runtime',region_name=os.environ.get("AWS_DEFAULT_REGION", None))
Task 2b.2: Summarize long text
Configuring LangChain with Boto3
In this task, you need to specify the LLM for the LangChain Bedrock class, and can pass arguments for inference.
# model configuration
from langchain_aws import BedrockLLM
modelId = "meta.llama3-8b-instruct-v1:0"
llm = BedrockLLM(
model_id=modelId,
model_kwargs={
"max_gen_len": 2048,
"temperature": 0,
"top_p": 1
},
client=bedrock_client
)
Task 2b.3: Loading a text file with many tokens
In this task, you can find a text file of Amazon's CEO letter to shareholders in 2022 in the letters directory. The following cell loads the text file and counts the number of tokens. You will see a warning indicating the number of tokens in the text file exceeds the maximum number of tokens for this model.
#get tokens
shareholder_letter = "../letters/2022-letter.txt"
with open(shareholder_letter, "r") as file:
letter = file.read()
llm.get_num_tokens(letter)
Note: You can safely ignore the warnings and proceed to next cell.
Task 2b.4: Splitting the long text into chunks
In this task, you split the text into smaller chunks because it is too long to fit in the prompt. RecursiveCharacterTextSplitter in LangChain supports splitting long text into chunks recursively until the size of each chunk becomes smaller than chunk_size. A text is separated with separators=["\n\n", "\n"] into chunks, which avoids splitting each paragraph into multiple chunks.
Using 6,000 characters per chunk, you can get summaries for each portion separately. The number of tokens, or word pieces, in a chunk depends on the text.
#chunking
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
separators=["\n\n", "\n"], chunk_size=4000, chunk_overlap=100
)
docs = text_splitter.create_documents([letter])
num_docs = len(docs)
num_tokens_first_doc = llm.get_num_tokens(docs[0].page_content)
print(
f"Now we have {num_docs} documents and the first one has {num_tokens_first_doc} tokens"
)
Task 2b.5: Summarizing chunks and combining them
In this task, assuming that the number of tokens is consistent in the other documents, you should be good to go. You can use LangChain's load_summarize_chain to summarize the text. load_summarize_chain provides three ways of summarization: stuff, map_reduce, and refine.
stuff: puts all the chunks into one prompt. Thus, this would hit the maximum limit of tokens.
map_reduce: summarizes each chunk, combines the summaries, and summarizes the combined summary. If the combined summary is too large, it would raise an error.
refine: summarizes the first chunk, and then summarizes the second chunk with the first summary. The same process repeats until all chunks are summarized.
Both map_reduce and refine invoke the LLM multiple times and take time for obtaining the final summary. You can try map_reduce here.
# Set verbose=True if you want to see the prompts being used
from langchain.chains.summarize import load_summarize_chain
summary_chain = load_summarize_chain(llm=llm, chain_type="map_reduce", verbose=False)
Note: Depending on your number of documents, Bedrock request rate quota, and configured retry settings - the chain below may take some time to run.
#invoke chain
output = ""
try:
output = summary_chain.invoke(docs)
except ValueError as error:
if "AccessDeniedException" in str(error):
print(f"\x1b[41m{error}\
\nTo troubeshoot this issue please refer to the following resources.\
\nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
\nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")
class StopExecution(ValueError):
def _render_traceback_(self):
pass
raise StopExecution
else:
raise error
# print output
print(output['output_text'])
You have now experimented with using prompt chunking and chaining with the LangChain framework to summarize large documents while mitigating issues arising from long input text.