Perform Text Generation

In this task, you run two notebook files: Task1a.ipynb, which invokes an Amazon Bedrock model for text generation using a zero-shot prompt, and Task1b.ipynb, which uses the LangChain framework to communicate with the Amazon Bedrock API and creates a custom LangChain prompt template to add context to the text generation request.

In this notebook, you learn how to use Large Language Model (LLM) to generate an email response to a customer who provided negative feedback on the quality of customer service they received from the support engineer. In this notebook, you generate an email with a thank you note based on the customer's previous email. You use the Amazon Titan model using the Amazon Bedrock API with Boto3 client.

The prompt used in this task is called a zero-shot prompt. In a zero-shot prompt, you describe the task or desired output to the language model in plain language. The model then uses its pre-trained knowledge and capabilities to generate a response or complete the task based solely on the provided prompt.

Scenario

You are Bob a Customer Service Manager at AnyCompany and some of your customers are not happy with the customer service and are providing negative feedbacks on the service provided by customer support engineers. Now, you would like to respond to those customers apologizing for the poor service and to regain trust. You need the help of an LLM to generate a bulk of emails for you which are human friendly and personalized to the customer's sentiment from previous email correspondence.

Task 1a.1: Environment setup

In this task, you set up your environment.

#Create a service client by name using the default session.

import json

import os

import sys


import boto3

import botocore


module_path = ".."

sys.path.append(os.path.abspath(module_path))


bedrock_client = boto3.client('bedrock-runtime',region_name=os.environ.get("AWS_DEFAULT_REGION", None))

Task 1a.2: Generate text

In this task, you prepare an input for the Amazon Bedrock service to generate an email.

# create the prompt

prompt_data = """

Command: Write an email from Bob, Customer Service Manager, AnyCompany to the customer "John Doe"

who provided negative feedback on the service provided by our customer support

engineer"""

body = json.dumps({

"inputText": prompt_data,

"textGenerationConfig":{

"maxTokenCount":8192,

"stopSequences":[],

"temperature":0,

"topP":0.9

}

})

Next, you use the Amazon Titan model.

Note: Amazon Titan supports a context window of ~4k tokens and accepts the following parameters:

The Amazon Bedrock API provides you with an API invoke_model which accepts the following:

Refer to documentation for Available text generation model Ids.

Task 1a.3: Invoke the Amazon Titan Large language model

In this task, you explore how the model generates an output based on the prompt created earlier.

Complete Output Generation

This email is generated using the Amazon Titan model by understanding the input request and utilizing its inherent understanding of different modalities. The request to the API is synchronous and waits for the entire output to be generated by the model.

#invoke model

modelId = 'amazon.titan-text-express-v1' # change this to use a different version from the model provider

accept = 'application/json'

contentType = 'application/json'

outputText = "\n"

try:


response = bedrock_client.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)

response_body = json.loads(response.get('body').read())


outputText = response_body.get('results')[0].get('outputText')


except botocore.exceptions.ClientError as error:

if error.response['Error']['Code'] == 'AccessDeniedException':

print(f"\x1b[41m{error.response['Error']['Message']}\

\nTo troubeshoot this issue please refer to the following resources.\

\nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\

\nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")

else:

raise error


# The relevant portion of the response begins after the first newline character

# Below we print the response beginning after the first occurence of '\n'.


email = outputText[outputText.index('\n')+1:]

print(email)


Streaming Output Generation

Bedrock also supports that the output can be streamed as it is generated by the model in form of chunks. This email is generated by invoking the model with streaming option. invoke_model_with_response_stream returns a ResponseStream which you can read from.

# invoke model with response stream

output = []

try:

response = bedrock_client.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)

stream = response.get('body')

i = 1

if stream:

for event in stream:

chunk = event.get('chunk')

if chunk:

chunk_obj = json.loads(chunk.get('bytes').decode())

text = chunk_obj['outputText']

output.append(text)

print(f'\t\t\x1b[31m**Chunk {i}**\x1b[0m\n{text}\n')

i+=1

except botocore.exceptions.ClientError as error:

if error.response['Error']['Code'] == 'AccessDeniedException':

print(f"\x1b[41m{error.response['Error']['Message']}\

\nTo troubeshoot this issue please refer to the following resources.\

\nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\

\nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")

else:

raise error

The stream with response approach helps to quickly obtain the output of the model and allows the service to complete it as you read. This assists in use cases where you request the model to generate longer pieces of text. You can later combine all the chunks generated to form the complete output and use it for your use case.

#combine output chunks

print('\t\t\x1b[31m**COMPLETE OUTPUT**\x1b[0m\n')

complete_output = ''.join(output)

print(complete_output)

You have now experimented with using the boto3 SDK, which provides basic exposure to the Amazon Bedrock API. Using this API, you have seen the use case of generating an email to respond to a customer's negative feedback.


Create Text Summarization

In this task, you run two notebook files: Task2a.ipynb, which summarizes the text with small files using Titan Text Premier, and Task2b.ipynb, which uses chunking to summarize long texts with Amazon Titan.

Task 1b: Perform Text Generation using a prompt that includes Context

In this notebook, you will learn how to generate an email response to a customer who was not happy with the quality of customer service they received from the customer support engineer. You will provide additional context to the model by including the contents of the actual email received from the unhappy customer.

You will add more complexity with the help of PromptTemplates to leverage the LangChain framework for a similar use case. PromptTemplates allow you to create generic shells which can be populated with information later and obtain model outputs based on different scenarios.

LangChain is a framework for developing applications powered by language models. The key aspects of this framework allow us to augment the Large Language Models by chaining together various components to create advanced use cases.

Due to the additional context in the prompt, the content produced in this notebook is of much better quality and relevance than the content produced earlier through zero-shot prompts. The prompt used in this notebook creates a custom LangChain prompt template for adding context to the text generation request.

Scenario

You are Bob, a Customer Service Manager at AnyCompany, and some of your customers are not happy with the customer service and are providing negative feedback on the service provided by customer support engineers. Now, you would like to respond to those customers humbly, apologizing for the poor service and regain their trust. You need the help of a Large Language Model (LLM) to generate a bulk of emails for you, which are human-friendly and personalized to the customer's sentiment from previous email correspondence.

In this scenario, you can leverage the power of LangChain's PromptTemplates to create a generic shell for generating personalized email responses based on the customer's previous email. The PromptTemplate will incorporate the customer's original email content, allowing the LLM to understand the context and sentiment, and then generate a relevant and customized response.

Task 1b.1: Environment setup

In this task, you set up your environment.

#Create a service client by name using the default session.

import json

import os

import sys

import warnings


import boto3


warnings.filterwarnings('ignore')

module_path = ".."

sys.path.append(os.path.abspath(module_path))



bedrock_client = boto3.client('bedrock-runtime',region_name=os.environ.get("AWS_DEFAULT_REGION", None))

Task 1b.2: Invoke the Bedrock LLM Model

In this task, you create an instance of the Bedrock class from llms. This expects a model_id which is the Amazon Resource Name (ARN) of the model available in Amazon Bedrock.

Optionally, you can pass a previously created boto3 client as well as some model_kwargs which can hold parameters such as temperature, top_p, max_token_count, or stop_sequences (more information on parameters can be explored in the Amazon Bedrock console).

Refer to documentation for Available text generation model Ids under Amazon Bedrock.

Note: The different models support different model_kwargs.

# Model configuration

from langchain_aws import ChatBedrock

from langchain_core.output_parsers import StrOutputParser


model_id = "meta.llama3-8b-instruct-v1:0"

model_kwargs = {

"max_gen_len": 512,

"temperature": 0,

"top_p": 1,

}


# LangChain class for chat

chat_model = ChatBedrock(

client=bedrock_client,

model_id=model_id,

model_kwargs=model_kwargs,

)

Task 1b.3: Create a LangChain custom prompt template

In this task, you will create a template for the prompt that you can pass different input variables on every run. This is useful when you have to generate content with different input variables that you may be fetching from a database.

In the previous task, we hardcoded the prompt. It might be the case that you have multiple customers sending similar negative feedback, and you now want to use each of those customers' emails and respond to them with an apology, but you also want to keep the response a bit personalized. In the following cell, you will explore how you can create a PromptTemplate to achieve this pattern.

# Create a prompt template that has multiple input variables

from langchain.prompts import PromptTemplate


multi_var_prompt = PromptTemplate(

input_variables=["customerServiceManager", "customerName", "feedbackFromCustomer"],

template="""


Human: Create an apology email from the Service Manager {customerServiceManager} at AnyCompany to {customerName} in response to the following feedback that was received from the customer:

<customer_feedback>

{feedbackFromCustomer}

</customer_feedback>


Assistant:"""

)


# Pass in values to the input variables

prompt = multi_var_prompt.format(customerServiceManager="Bob Smith",

customerName="John Doe",

feedbackFromCustomer="""Hello Bob,

I am very disappointed with the recent experience I had when I called your customer support.

I was expecting an immediate call back but it took three days for us to get a call back.

The first suggestion to fix the problem was incorrect. Ultimately the problem was fixed after three days.

We are very unhappy with the response provided and may consider taking our business elsewhere.

"""

)


# get number of tokens

num_tokens = chat_model.get_num_tokens(prompt)

print(f"Our prompt has {num_tokens} tokens")

Note: You can safely ignore the warnings and proceed to next cell.

#invoke

response = chat_model.invoke(prompt)

# Configure a Chain to parse output

chain = StrOutputParser()

formatted_response=chain.invoke(response)

print(formatted_response)

You have successfully learned that invoking the LLM without any context might not yield the desired results. By adding context and further using the prompt template to constrain the output from the LLM, you were able to successfully obtain your desired output.


Task 2a: Text summarization with small files with Titan Text Premier

In this notebook, you ingest a small string of text directly into the Amazon Bedrock API (using the Titan Text model) and instruct it to summarize the input text. You can apply this approach to summarize call transcripts, meeting transcripts, books, articles, blog posts, and other relevant content when the input text length is within the context size limits of the model.

Task 2a.1: Environment setup

In this task, you set up your environment.

#Create a service client by name using the default session.

import json

import os

import sys

import warnings


import boto3

import botocore


warnings.filterwarnings('ignore')

module_path = ".."

sys.path.append(os.path.abspath(module_path))


bedrock_client = boto3.client('bedrock-runtime',region_name=os.environ.get("AWS_DEFAULT_REGION", None))


Task 2a.2: Writing prompt with text to be summarized

In this task, you use a short passage of text with fewer tokens than the maximum length supported by the foundation model. As a sample input text for this lab, you use a paragraph from an AWS blog post announcing Amazon Bedrock.

The prompt starts with an instruction Please provide a summary of the following text..

prompt_data = """


Please provide a summary of the following text:


AWS took all of that feedback from customers, and today we are excited to announce Amazon Bedrock, \

a new service that makes FMs from AI21 Labs, Anthropic, Stability AI, and Amazon accessible via an API. \

Bedrock is the easiest way for customers to build and scale generative AI-based applications using FMs, \

democratizing access for all builders. Bedrock will offer the ability to access a range of powerful FMs \

for text and images—including Amazons Titan FMs, which consist of two new LLMs we’re also announcing \

today—through a scalable, reliable, and secure AWS managed service. With Bedrock’s serverless experience, \

customers can easily find the right model for what they’re trying to get done, get started quickly, privately \

customize FMs with their own data, and easily integrate and deploy them into their applications using the AWS \

tools and capabilities they are familiar with, without having to manage any infrastructure (including integrations \

with Amazon SageMaker ML features like Experiments to test different models and Pipelines to manage their FMs at scale).

"""

Task 2a.3: Creating request body with prompt and inference parameters

In this task, you create the request body with the above prompt and inference parameters.

# request body

body = json.dumps({

"inputText": prompt_data,

"textGenerationConfig":{

"maxTokenCount":2048,

"stopSequences":[],

"temperature":0,

"topP":0.9

}

})

Task 2a.4: Invoke foundation model via Boto3

In this task, you send an API request to Amazon Bedrock specifying the request parameters: modelId, accept, and contentType. Following the provided prompt, the foundation model in Amazon Bedrock then summarizes the input text.

Complete Output Generation

By default, the Amazon Bedrock service generates the entire summary for a given prompt in a single output. This can be slow if the model output contains many tokens.

#model configuration and invoke the model

modelId = 'amazon.titan-text-premier-v1:0' # change this to use a different version from the model provider

accept = 'application/json'

contentType = 'application/json'

outputText = "\n"


try:


response = bedrock_client.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)

response_body = json.loads(response.get('body').read())


outputText = response_body.get('results')[0].get('outputText')


except botocore.exceptions.ClientError as error:

if error.response['Error']['Code'] == 'AccessDeniedException':

print(f"\x1b[41m{error.response['Error']['Message']}\

\nTo troubeshoot this issue please refer to the following resources.\

\nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\

\nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")

else:

raise error


print(outputText)

Streaming Output Generation

Next, you explore how to use Amazon Bedrock's invoke_model_with_response_stream API to stream model outputs so users can consume outputs as they are generated. Rather than generating the full output at once, this API returns a ResponseStream that sends smaller output chunks from the model as they are produced. You can display these streaming outputs in a continuous, consumable view.

#invoke model with response stream

modelId = 'amazon.titan-text-premier-v1:0'

response = bedrock_client.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)

stream = response.get('body')

output = list(stream)

output

from IPython.display import display_markdown,Markdown,clear_output

modelId = 'amazon.titan-text-premier-v1:0'

response = bedrock_client.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)

stream = response.get('body')

output = []

i = 1

if stream:

for event in stream:

chunk = event.get('chunk')

if chunk:

chunk_obj = json.loads(chunk.get('bytes').decode())

text = chunk_obj['outputText']

clear_output(wait=True)

output.append(text)

display_markdown(Markdown(''.join(output)))

i+=1

You have now experimented with using the boto3 SDK to access the Amazon Bedrock API. This SDK provides basic programmatic access to Bedrock capabilities. By leveraging this API, you were able to implement two use cases: 1) Generating an entire text summary of AWS news content at once, and 2) Streaming the summary output in chunks for incremental processing.


Task 2b: Abstractive Text Summarization

In this notebook, you manage challenges arising in large document summarization - input text can exceed model context lengths, generate hallucinated outputs, or trigger out-of-memory errors.

To mitigate these issues, this notebook demonstrates an architecture using prompt chunking and chaining with the LangChain framework, a toolkit enabling applications leveraging language models.

You explore an approach addressing scenarios when user documents surpass token limits. Chunking splits documents into segments under context length thresholds before sequentially feeding them to models. This chains prompts across chunks, retaining prior context. You apply this approach to summarize call transcripts, meetings transcripts, books, articles, blog posts, and other relevant content.

Task 2b.1: Environment setup

In this task, you set up your environment.

#Create a service client by name using the default session.

import json

import os

import sys


import boto3


module_path = ".."

sys.path.append(os.path.abspath(module_path))

bedrock_client = boto3.client('bedrock-runtime',region_name=os.environ.get("AWS_DEFAULT_REGION", None))

Task 2b.2: Summarize long text

Configuring LangChain with Boto3

In this task, you need to specify the LLM for the LangChain Bedrock class, and can pass arguments for inference.

# model configuration

from langchain_aws import BedrockLLM

modelId = "meta.llama3-8b-instruct-v1:0"

llm = BedrockLLM(

model_id=modelId,

model_kwargs={

"max_gen_len": 2048,

"temperature": 0,

"top_p": 1

},

client=bedrock_client

)

Task 2b.3: Loading a text file with many tokens

In this task, you can find a text file of Amazon's CEO letter to shareholders in 2022 in the letters directory. The following cell loads the text file and counts the number of tokens. You will see a warning indicating the number of tokens in the text file exceeds the maximum number of tokens for this model.

#get tokens

shareholder_letter = "../letters/2022-letter.txt"


with open(shareholder_letter, "r") as file:

letter = file.read()

llm.get_num_tokens(letter)

Note: You can safely ignore the warnings and proceed to next cell.

Task 2b.4: Splitting the long text into chunks

In this task, you split the text into smaller chunks because it is too long to fit in the prompt. RecursiveCharacterTextSplitter in LangChain supports splitting long text into chunks recursively until the size of each chunk becomes smaller than chunk_size. A text is separated with separators=["\n\n", "\n"] into chunks, which avoids splitting each paragraph into multiple chunks.

Using 6,000 characters per chunk, you can get summaries for each portion separately. The number of tokens, or word pieces, in a chunk depends on the text.

#chunking

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(

separators=["\n\n", "\n"], chunk_size=4000, chunk_overlap=100

)


docs = text_splitter.create_documents([letter])

num_docs = len(docs)


num_tokens_first_doc = llm.get_num_tokens(docs[0].page_content)


print(

f"Now we have {num_docs} documents and the first one has {num_tokens_first_doc} tokens"

)

Task 2b.5: Summarizing chunks and combining them

In this task, assuming that the number of tokens is consistent in the other documents, you should be good to go. You can use LangChain's load_summarize_chain to summarize the text. load_summarize_chain provides three ways of summarization: stuff, map_reduce, and refine.

Both map_reduce and refine invoke the LLM multiple times and take time for obtaining the final summary. You can try map_reduce here.

# Set verbose=True if you want to see the prompts being used

from langchain.chains.summarize import load_summarize_chain

summary_chain = load_summarize_chain(llm=llm, chain_type="map_reduce", verbose=False)

Note: Depending on your number of documents, Bedrock request rate quota, and configured retry settings - the chain below may take some time to run.

#invoke chain

output = ""

try:

output = summary_chain.invoke(docs)


except ValueError as error:

if "AccessDeniedException" in str(error):

print(f"\x1b[41m{error}\

\nTo troubeshoot this issue please refer to the following resources.\

\nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\

\nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")

class StopExecution(ValueError):

def _render_traceback_(self):

pass

raise StopExecution

else:

raise error

# print output

print(output['output_text'])

You have now experimented with using prompt chunking and chaining with the LangChain framework to summarize large documents while mitigating issues arising from long input text.


Use Amazon Bedrock for Question Answering

In this task, you utilize Bedrock Titan model to provide factual responses to queries by sending context-included requests and receiving relevant responses.

Build a Chatbot

In this task, you build a chatbot using the Foundation Models (FMs) in Amazon Bedrock and use llama3-8b-instruct and titan-text-premier as your FMs for building the chatbots. 

Use Amazon Badrock Models for Code Generation

In this task, you use a large language model (LLM) to generate code based on a text prompt. 

Integrate Amazon Bedrock Model with LangChain Agents

In this task, you learn how to use the flexible Converse API to integrate external capabilities into conversational applications.