Skip to content

Picovoice/serverless-picollm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Serverless picoLLM: LLMs Running in AWS Lambda!

Code for the Serverless LLM article on picovoice.ai which you can find here: picoLLM on Lambda.

The Demo in Action

Disclaimer

THIS DEMO EXCEEDS AWS FREE TIER USAGE. YOU WILL BE CHARGED BY AWS IF YOU DEPLOY THIS DEMO.

Prerequisites

You will need to following in order to deploy and run this demo:

  1. A Picovoice Console account with a valid AccessKey.

  2. An AWS account.

  3. AWS SAM CLI installed and setup. Follow the offical guide completely.

  4. A valid Docker installation.

Setup

  1. Clone the serverless-picollm repo:
git clone https://github.com/Picovoice/serverless-picollm.git
  1. Download a Phi2 based .pllm model from the picoLLM section of the Picovoice Console.

Tip

Other models will work as long as they are chat-enabled and fit within the AWS Lambda code size and memory limits. You will also need to update the Dialog object in client.py to the appropriate class.

For example, if using Llama3 with the llama-3-8b-instruct-326 model, the line in client.py should be updated to:

dialog = picollm.Llama3ChatDialog(history=3)
  1. Place the downloaded .pllm model in the models/ directory.

  2. Replace "${YOUR_ACCESS_KEY_HERE}" inside the src/app.py file with your AccessKey obtained from Picovoice Console.

Deploy

  1. Use AWS SAM CLI to build the app:
sam build
  1. Use AWS SAM CLI to deploy the app, following the guided prompts:
sam deploy --guided
  1. At the end of the deployment AWS SAM CLI will print an outputs section. Make note of the WebSocketURI. It should look something like this:
CloudFormation outputs from deployed stack
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Outputs
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Key                 HandlerFunctionFunctionArn
Description         HandlerFunction function ARN
Value               arn:aws:lambda:us-west-2:000000000000:function:picollm-lambda-HandlerFunction-ABC123DEF098

Key                 WebSocketURI
Description         The WSS Protocol URI to connect to
Value               wss://ABC123DEF098.execute-api.us-west-2.amazonaws.com/Prod
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
wss://ABC123DEF098.execute-api.us-west-2.amazonaws.com/Prod

Note

If you make any changes to the model, Dockerfile or app.py files, you will need to repeat all these deployment steps.

Chat!

  1. Run client.py, passing in the URL copied from the deployment step:
python client.py -u <WebSocket URL>
  1. Once connected the client will give you a prompt. Type in your chat message and picoLLM will stream back a response from the lambda!
> What is the capital of France?
< The capital of France is Paris.

< [Completion finished @ `6.35` tps]

Important

When you first send a message you may get the following response: < [Lambda is loading & caching picoLLM. Please wait...]. This means the picoLLM is loading the model as lambda streams it from the Elastic Container Registry. Because of the nature and limitations of AWS Lambda this process may take upwards of a few minutes. Subsequent messages and connections will not take as long to load as lambda will cache the layers.