こんにちは、広野です。
まとまった数の PowerPoint 資料 (PPTX ファイル) を PDF に変換したくて、AWS Lambda 関数をつくってみました。
記事を環境構築編 (本記事) と Lambda 関数編に分けて説明します。
やりたかったこと
多くの PowerPoint 資料 (PPTX) があり、それを RAG (Amazon Bedrock Knowledge Bases) に食わせたい。のですが、RAG が PPTX をソースデータファイルとしてサポートしておらず、一度 PDF に変換しないといけない事情がありました。簡単に変換できるよう、Amazon S3 バケットに置いたら自動変換してくれる処理をつくりました。PPTX – PDF 変換には LibreOffice を使用します。
- Amazon S3 バケットにファイルを置くと、イベント通知が発行されます。ファイルは input フォルダに置きます。
- Amazon EventBridge ルールで、input フォルダ内の .pptx ファイルであれば AWS Lambda 関数を呼び出します。
- Lambda 関数は、EventBridge から当該 PPTX ファイルのメタデータを受け取っているので、それをもとに PPTX ファイルを取得します。
- Lambda 関数内で、LibreOffice をヘッドレスで (No GUI で) 実行し、PPTX を PDF に変換します。
- 作成された PDF ファイルを Amazon S3 バケットの output フォルダに保存します。名前は元ファイル名の拡張子が .pdf に変わっただけのものです。
LibreOffice について
LibreOffice はオープンソースの Office ソフトウェアです。Word, Excel, PowerPoint などの Microsoft 製品と互換性があります。そのため、PowerPoint のファイルを扱うことができます。
この LibreOffice はヘッドレス、つまりコマンドで操作することができ、PowerPoint を PDF 変換する機能を利用します。
環境について
この実行環境は、大きく以下の 3つに分かれています。
- Amazon S3 バケットに PPTX を保存し、PDF を受け取るインターフェースとしての Amazon S3
- PPTX を PDF に変換する AWS Lambda 関数 -> Lambda 関数編の記事で詳細を説明します。
- AWS Lambda 関数をビルド、デプロイするための CI/CD 環境
全体像は以下の図のようになります。
図の右上の方に、インターフェースとしての Amazon S3 バケットがあります。
図の右下の方に、PPTX を PDF に変換する AWS Lambda 関数があります。Lambda 関数を呼び出すための Amazon EventBridge ルールとセットで、CI/CD パイプラインからデプロイされます。
ここで、なぜ CI/CD パイプラインを構築しているかというと。
LibreOffice の処理は通常の Lambda 関数にとっては重い処理になるので、コンテナ Lambda を使用することにしました。Docker コンテナイメージを作成する必要があり、イメージ置き場としての Amazon ECR、イメージをビルドする CI/CD パイプラインを構築しています。
ソースコードは大きく 2種類に分かれます。
- ビルドフェーズで使用するコンテナイメージ構築用ファイル
- デプロイフェーズで使用する AWS CloudFormation テンプレート
これらを開発者が AWS CodeCommit で管理しており、ソースコードが更新されると CI/CD パイプラインが動き出します。
ビルドフェーズでビルドされたコンテナイメージ (Lambda 関数の実体) は Amazon ECR に保存されます。そのままでは Lambda 関数として機能しないので、デプロイフェーズで AWS CloudFormation により Lambda 関数としてデプロイされます。
環境構築 (AWS CloudFormation)
上述の環境を AWS CloudFormation でデプロイしています。
- Amazon S3 バケットに PPTX を保存し、PDF を受け取るインターフェースとしての Amazon S3
- AWS Lambda 関数をビルド、デプロイするための CI/CD 環境
これができあがると、AWS CodeCommit でコンテナイメージを含む Lambda 関数コードを自由に開発、デプロイできます。
AWSTemplateFormatVersion: 2010-09-09
Description: The CloudFormation template that creates a CI/CD environment for a container Lambda function. It provides converting pptx to PDF.
# ------------------------------------------------------------#
# Input Parameters
# ------------------------------------------------------------#
Parameters:
SystemName:
Type: String
Description: System name. use lower case only. (e.g. example)
Default: example
MaxLength: 10
MinLength: 1
SubName:
Type: String
Description: System sub name. use lower case only. (e.g. prod or dev)
Default: dev
MaxLength: 10
MinLength: 1
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
- Label:
default: "General Configuration"
Parameters:
- SystemName
- SubName
Resources:
# ------------------------------------------------------------#
# S3
# ------------------------------------------------------------#
S3BucketDocs:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub ${SystemName}-${SubName}-pptx-pdf-conv-docs
LifecycleConfiguration:
Rules:
- Id: AutoDelete
Status: Enabled
ExpirationInDays: 14
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
NotificationConfiguration:
EventBridgeConfiguration:
EventBridgeEnabled: true
Tags:
- Key: Cost
Value: !Sub ${SystemName}-${SubName}
S3BucketArtifact:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub ${SystemName}-${SubName}-pptx-pdf-conv-artifact
LifecycleConfiguration:
Rules:
- Id: AutoDelete
Status: Enabled
ExpirationInDays: 14
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
Tags:
- Key: Cost
Value: !Sub ${SystemName}-${SubName}
S3BucketLogs:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub ${SystemName}-${SubName}-pptx-pdf-conv-logs
LifecycleConfiguration:
Rules:
- Id: AutoDelete
Status: Enabled
ExpirationInDays: 365
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
Tags:
- Key: Cost
Value: !Sub ${SystemName}-${SubName}
# ------------------------------------------------------------#
# ECR
# ------------------------------------------------------------#
EcrRepositoryContainerLambda:
Type: AWS::ECR::Repository
Properties:
RepositoryName: !Sub ${SystemName}-${SubName}-pptx-pdf-conv
EncryptionConfiguration:
EncryptionType: AES256
ImageScanningConfiguration:
ScanOnPush: true
ImageTagMutability: IMMUTABLE
LifecyclePolicy:
LifecyclePolicyText: |
{
"rules": [
{
"rulePriority": 1,
"description": "Keep only 5 images, expire all others",
"selection": {
"tagStatus": "any",
"countType": "imageCountMoreThan",
"countNumber": 5
},
"action": {
"type": "expire"
}
}
]
}
EmptyOnDelete: true
Tags:
- Key: Cost
Value: !Sub ${SystemName}-${SubName}
# ------------------------------------------------------------#
# CodeCommit Repository
# ------------------------------------------------------------#
CodeCommitRepoContainerLambda:
Type: AWS::CodeCommit::Repository
Properties:
RepositoryName: !Sub ${SystemName}-${SubName}-pptx-pdf-conv
RepositoryDescription: !Sub pptx pdf converter for ${SystemName}-${SubName}
Tags:
- Key: Cost
Value: !Sub ${SystemName}-${SubName}
# ------------------------------------------------------------#
# CodePipeline
# ------------------------------------------------------------#
CodePipelineContainerLambda:
Type: AWS::CodePipeline::Pipeline
Properties:
Name: !Sub ${SystemName}-${SubName}-pptx-pdf-conv
PipelineType: V2
ArtifactStore:
Location: !Ref S3BucketArtifact
Type: S3
RestartExecutionOnUpdate: false
RoleArn: !GetAtt CodePipelineServiceRoleContainerLambda.Arn
Stages:
- Name: Source
Actions:
- Name: Source
RunOrder: 1
ActionTypeId:
Category: Source
Owner: AWS
Version: 1
Provider: CodeCommit
Configuration:
RepositoryName: !GetAtt CodeCommitRepoContainerLambda.Name
BranchName: main
PollForSourceChanges: false
OutputArtifactFormat: CODEBUILD_CLONE_REF
Namespace: SourceVariables
OutputArtifacts:
- Name: Source
- Name: Build
Actions:
- Name: Build
RunOrder: 1
Region: !Sub ${AWS::Region}
ActionTypeId:
Category: Build
Owner: AWS
Version: 1
Provider: CodeBuild
Configuration:
ProjectName: !Ref CodeBuildProjectContainerLambda
BatchEnabled: false
EnvironmentVariables: |
[
{
"name": "IMAGE_TAG",
"type": "PLAINTEXT",
"value": "#{codepipeline.PipelineExecutionId}"
}
]
Namespace: BuildVariables
InputArtifacts:
- Name: Source
OutputArtifacts:
- Name: Build
- Name: Deploy
Actions:
- ActionTypeId:
Category: Deploy
Owner: AWS
Provider: CloudFormation
Version: 1
Configuration:
StackName: !Sub ${SystemName}-${SubName}-pptx-pdf-conv-lambda
Capabilities: CAPABILITY_NAMED_IAM
RoleArn: !GetAtt CodePipelineDeployCreateUpdateRoleContainerLambda.Arn
ActionMode: CREATE_UPDATE
TemplatePath: Build::cfn_container_lambda.yml
ParameterOverrides: !Sub '{"SystemName":"${SystemName}","SubName":"${SubName}","ImageTag":"#{codepipeline.PipelineExecutionId}","S3BucketDocs":"${S3BucketDocs}","ImgRepoName":"${EcrRepositoryContainerLambda}"}'
InputArtifacts:
- Name: Build
Name: CreateOrUpdate
RoleArn: !GetAtt CodePipelineDeployCreateUpdateActionRoleContainerLambda.Arn
RunOrder: 1
Tags:
- Key: Cost
Value: !Sub ${SystemName}-${SubName}
DependsOn:
- CodePipelineServiceRoleContainerLambda
- CodeBuildProjectContainerLambda
- CodePipelineDeployCreateUpdateActionRoleContainerLambda
- EcrRepositoryContainerLambda
# ------------------------------------------------------------#
# CodePipeline Service Role (IAM)
# ------------------------------------------------------------#
CodePipelineServiceRoleContainerLambda:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub CpServiceRoleContainerLambda-${SystemName}-${SubName}
Description: This role allows CodePipeline to call each stages.
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- codepipeline.amazonaws.com
Action:
- sts:AssumeRole
Path: /
Policies:
- PolicyName: !Sub CpServicePolicyContainerLambda-${SystemName}-${SubName}
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- "codecommit:CancelUploadArchive"
- "codecommit:GetBranch"
- "codecommit:GetCommit"
- "codecommit:GetRepository"
- "codecommit:GetUploadArchiveStatus"
- "codecommit:UploadArchive"
Resource: !GetAtt CodeCommitRepoContainerLambda.Arn
- Effect: Allow
Action:
- "codebuild:BatchGetBuilds"
- "codebuild:StartBuild"
- "codebuild:BatchGetBuildBatches"
- "codebuild:StartBuildBatch"
Resource: "*"
- Effect: Allow
Action:
- "cloudwatch:*"
- "s3:*"
Resource: "*"
- Effect: Allow
Action:
- "lambda:InvokeFunction"
- "lambda:ListFunctions"
Resource: "*"
- Effect: Allow
Action: "sts:AssumeRole"
Resource:
- !GetAtt CodePipelineDeployCreateUpdateActionRoleContainerLambda.Arn
DependsOn:
- CodeCommitRepoContainerLambda
- CodePipelineDeployCreateUpdateActionRoleContainerLambda
# ------------------------------------------------------------#
# CodePipeline Deploy Create Update Role (IAM)
# ------------------------------------------------------------#
CodePipelineDeployCreateUpdateRoleContainerLambda:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub CpCrUpdRoleContainerLambda-${SystemName}-${SubName}
AssumeRolePolicyDocument:
Statement:
- Action: sts:AssumeRole
Effect: Allow
Principal:
Service: cloudformation.amazonaws.com
Version: "2012-10-17"
Path: /
Policies:
- PolicyName: !Sub CpCrUpdPolicyContainerLambda-${SystemName}-${SubName}
PolicyDocument:
Version: 2012-10-17
Statement:
- Action: "*"
Effect: Allow
Resource: "*"
# ------------------------------------------------------------#
# CodePipeline Deploy Create Update Action Role (IAM)
# ------------------------------------------------------------#
CodePipelineDeployCreateUpdateActionRoleContainerLambda:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub CpCrUpdActionRoleContainerLambda-${SystemName}-${SubName}
AssumeRolePolicyDocument:
Statement:
- Action: sts:AssumeRole
Effect: Allow
Principal:
AWS:
Fn::Join:
- ""
- - "arn:"
- Ref: AWS::Partition
- ":iam::"
- Ref: AWS::AccountId
- :root
Version: "2012-10-17"
Path: /
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AWSCloudFormationFullAccess
Policies:
- PolicyName: !Sub CpCrUpdPolicyContainerLambda-${SystemName}-${SubName}
PolicyDocument:
Version: 2012-10-17
Statement:
- Action: iam:PassRole
Effect: Allow
Resource: !GetAtt CodePipelineDeployCreateUpdateRoleContainerLambda.Arn
- Action:
- s3:GetBucket*
- s3:GetObject*
- s3:List*
Effect: Allow
Resource:
- !Sub arn:aws:s3:::${S3BucketArtifact}
- !Sub arn:aws:s3:::${S3BucketArtifact}/*
DependsOn:
- CodePipelineDeployCreateUpdateRoleContainerLambda
- S3BucketArtifact
# ------------------------------------------------------------#
# EventBridge Rule for Starting CodePipeline
# ------------------------------------------------------------#
EventBridgeRuleStartCodePipelineContainerLambda:
Type: AWS::Events::Rule
Properties:
Name: !Sub ${SystemName}-${SubName}-pptx-pdf-conv-start-codepipeline
Description: !Sub This rule starts pptx pdf converter CodePipeline for ${SystemName}-${SubName}. The trigger is the source code change in CodeCommit.
EventBusName: !Sub "arn:aws:events:${AWS::Region}:${AWS::AccountId}:event-bus/default"
EventPattern:
source:
- "aws.codecommit"
detail-type:
- "CodeCommit Repository State Change"
resources:
- !GetAtt CodeCommitRepoContainerLambda.Arn
detail:
event:
- referenceCreated
- referenceUpdated
referenceType:
- branch
referenceName:
- main
RoleArn: !GetAtt EventBridgeRuleStartCpRoleContainerLambda.Arn
State: ENABLED
Targets:
- Arn: !Sub "arn:aws:codepipeline:${AWS::Region}:${AWS::AccountId}:${CodePipelineContainerLambda}"
Id: !Sub ${SystemName}-${SubName}-pptx-pdf-conv-start-codepipeline
RoleArn: !GetAtt EventBridgeRuleStartCpRoleContainerLambda.Arn
DependsOn:
- EventBridgeRuleStartCpRoleContainerLambda
# ------------------------------------------------------------#
# EventBridge Rule Start CodePipeline Role (IAM)
# ------------------------------------------------------------#
EventBridgeRuleStartCpRoleContainerLambda:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub EventBridgeStartCpRoleContainerLambda-${SystemName}-${SubName}
Description: !Sub This role allows EventBridge to start pptx pdf converter CodePipeline for ${SystemName}-${SubName}.
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- events.amazonaws.com
Action:
- sts:AssumeRole
Path: /
Policies:
- PolicyName: !Sub EventBridgeStartCpPolicyContainerLambda-${SystemName}-${SubName}
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- "codepipeline:StartPipelineExecution"
Resource:
- !Sub "arn:aws:codepipeline:${AWS::Region}:${AWS::AccountId}:${CodePipelineContainerLambda}"
DependsOn:
- CodePipelineContainerLambda
# ------------------------------------------------------------#
# CodeBuild Project
# ------------------------------------------------------------#
CodeBuildProjectContainerLambda:
Type: AWS::CodeBuild::Project
Properties:
Name: !Sub ${SystemName}-${SubName}-pptx-pdf-conv
Description: !Sub The build project for ${SystemName}-${SubName}-pptx-pdf-conv
ResourceAccessRole: !GetAtt CodeBuildResourceAccessRoleContainerLambda.Arn
ServiceRole: !GetAtt CodeBuildServiceRoleContainerLambda.Arn
ConcurrentBuildLimit: 1
Visibility: PRIVATE
Source:
Type: CODEPIPELINE
SourceVersion: refs/heads/main
Environment:
Type: LINUX_CONTAINER
ComputeType: BUILD_GENERAL1_SMALL
Image: "aws/codebuild/amazonlinux-x86_64-standard:5.0"
ImagePullCredentialsType: CODEBUILD
PrivilegedMode: true
EnvironmentVariables:
- Name: AWS_DEFAULT_REGION
Type: PLAINTEXT
Value: !Sub ${AWS::Region}
- Name: AWS_ACCOUNT_ID
Type: PLAINTEXT
Value: !Sub ${AWS::AccountId}
- Name: IMAGE_REPO_NAME
Type: PLAINTEXT
Value: !Ref EcrRepositoryContainerLambda
TimeoutInMinutes: 30
QueuedTimeoutInMinutes: 60
Artifacts:
Type: CODEPIPELINE
Cache:
Type: NO_CACHE
LogsConfig:
CloudWatchLogs:
GroupName: !Sub /aws/codebuild/${SystemName}-${SubName}-pptx-pdf-conv
Status: ENABLED
S3Logs:
EncryptionDisabled: true
Location: !Sub arn:aws:s3:::${S3BucketLogs}/codebuildBuildlog
Status: ENABLED
Tags:
- Key: Cost
Value: !Sub ${SystemName}-${SubName}
DependsOn:
- EcrRepositoryContainerLambda
- CodeBuildResourceAccessRoleContainerLambda
- CodeBuildServiceRoleContainerLambda
# ------------------------------------------------------------#
# CodeBuild Resource Access Role (IAM)
# ------------------------------------------------------------#
CodeBuildResourceAccessRoleContainerLambda:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub CbResourceAccessRoleContainerLambda-${SystemName}-${SubName}
Description: This role allows CodeBuild to access CloudWatch Logs and Amazon S3 artifacts for the project's builds.
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- codebuild.amazonaws.com
Action:
- sts:AssumeRole
Path: /
Policies:
- PolicyName: !Sub CbResourceAccessPolicyContainerLambda-${SystemName}-${SubName}
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- "logs:CreateLogGroup"
- "logs:CreateLogStream"
- "logs:PutLogEvents"
Resource:
- !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/codebuild/${SystemName}-${SubName}-pptx-pdf-conv"
- !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/codebuild/${SystemName}-${SubName}-pptx-pdf-conv:*"
- Effect: Allow
Action:
- "s3:PutObject"
- "s3:GetObject"
- "s3:GetObjectVersion"
- "s3:GetBucketAcl"
- "s3:GetBucketLocation"
Resource:
- !Sub arn:aws:s3:::${S3BucketLogs}
- !Sub arn:aws:s3:::${S3BucketLogs}/*
# ------------------------------------------------------------#
# CodeBuild Service Role (IAM)
# ------------------------------------------------------------#
CodeBuildServiceRoleContainerLambda:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub CbServiceRoleContainerLambda-${SystemName}-${SubName}
Description: This role allows CodeBuild to interact with dependant AWS services.
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- codebuild.amazonaws.com
Action:
- sts:AssumeRole
Path: /
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPowerUser
Policies:
- PolicyName: !Sub CbServicePolicyContainerLambda-${SystemName}-${SubName}
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- "codecommit:GitPull"
Resource: !GetAtt CodeCommitRepoContainerLambda.Arn
- Effect: Allow
Action:
- "ssm:GetParameters"
Resource:
- !Sub "arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:parameter/${SystemName}_${SubName}_*"
- Effect: Allow
Action:
- "s3:*"
Resource:
- !Sub arn:aws:s3:::${S3BucketArtifact}
- !Sub arn:aws:s3:::${S3BucketArtifact}/*
- !Sub arn:aws:s3:::${S3BucketLogs}
- !Sub arn:aws:s3:::${S3BucketLogs}/*
- Effect: Allow
Action:
- "logs:CreateLogGroup"
- "logs:CreateLogStream"
- "logs:PutLogEvents"
Resource:
- !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/codebuild/${SystemName}-${SubName}-pptx-pdf-conv"
- !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/codebuild/${SystemName}-${SubName}-pptx-pdf-conv:*"
- Effect: Allow
Action:
- "codebuild:CreateReportGroup"
- "codebuild:CreateReport"
- "codebuild:UpdateReport"
- "codebuild:BatchPutTestCases"
- "codebuild:BatchPutCodeCoverages"
Resource:
- !Sub "arn:aws:codebuild:${AWS::Region}:${AWS::AccountId}:report-group/${SystemName}-${SubName}-pptx-pdf-conv*"
DependsOn:
- CodeCommitRepoContainerLambda
- S3BucketArtifact
- S3BucketLogs
関連記事
Lambda 関数の中身については、以下の記事で紹介しています。
まとめ
いかがでしたでしょうか。
本記事はコンテナ Lambda 関数の CI/CD 環境構築にフォーカスしていましたので、他の用途にも使えると思います。
本記事が皆様のお役に立てれば幸いです。



