Amazon CloudWatch Synthetics でURL監視を爆速で作る

どうも。サービス監視といえばURL監視です。
Amazon CloudWatch Synthetics は高機能であるがゆえ、わりと構築に手間がかかりますね。ちょっとURLを突っつけばいいだけなんだけど、ということはよくあります。
AWS CloudFormationテンプレートにしておけば、必要なときに短時間でできるのでそれを共有しておこうと思います。

概略

CloudWatch Syntheticsは、Lambdaを作りそこでヘッドレスブラウザを使いさまざまなHTTPリクエストを組み合わせてWeb操作を行うことが出来ます。その定型的なリクエストパターンに応じてCloudWatchメトリクスに情報を出力し、CloudWach Alarmで監視ができます。
さて、監視される側のWebサービスが対外的に公開されているサービスであれば、監視元を考慮する必要はありません。IPアドレス制御により一部のネットワークにしか提供していない場合に、Syntheticsで監視するとセキュリティグループをどうするかが問題になります。

今回は、以下のアーキテクチャにより、Syntheticsが作り出すリクエスト発行LambdaをVPC内に閉じ込め、EIPによる接続元IPアドレスの固定化を行います。
これにより、監視される側のセキュリティグループに監視元IPアドレス(SyntheticsのLambda)を許可する設定が可能となります。
監視の結果、URLからのレスポンスが返らなくなった場合、Systems Manager でEC2の再起動を行います。
この一連の仕組みを、CloudFormationテンプレートで一気に作成します。

アーキテクチャ図

 

CFnテンプレートの使用方法

事前に作成する必要のあるリソース

CloudFormationテンプレートが生成するのは、水色の点線の範囲です。以下のリソースについては事前に作成しておき、必要な情報をパラメータで与えてください。(もしくはテンプレート内のdefault値を書き換えてください)

  • 監視が失敗した場合の通知先SNSトピック
  • Synthetics が監視結果を保存するS3バケット

監視条件

現在のテンプレートでは、以下の監視条件が設定してあります。要件に応じて変更してください。

  • 監視対象URL: http://www.example.com/index.html
  • 監視間隔: 5分ごと
  • 監視リクエストのタイムアウト: 60秒
  • 監視対象を再起動する失敗回数: 3回
  • 監視結果のデータ保持期間: 90日間

注意点

  • CFnスタックを作成する際は、IAMロールの作成を行います。 --capabilities CAPABILITY_NAMED_IAM オプションを付けてください。
  • CFnスタックを削除する歳は、VPC Lambdaを使っている関係で、Lambdaが使用するENIがVPC内に保持されています。そのため、いきなりスタック削除を実行すると、Canaryを削除しても数分間はLambdaの仕様によりENIが残ります。その結果、セキュリティグループやサブネットの削除が失敗し、スタック削除自体が失敗します。 以下のような手順でスタック削除を実施してください。
    1. CloudWatch Synthetics の Canary を無効にする。
    2. 10分ほど待機する。
    3. スタックの削除を実行する。

閉域ネットワークへの応用

今回、Global Network側から監視を行っていますが、VPC間のIP到達性があれば、このCloudFormationテンプレートで社内のプライベートネットワークに閉じても利用可能です。監視する側のVPCにSyntheticsのVPCエンドポイントを追加してください。

CFnテンプレート

AWSTemplateFormatVersion: '2010-09-09'
Description: 'CloudWatch Synthetics Canary for URL monitoring with VPC configuration'

Parameters:
  MonitoringUrl:
    Type: String
    Default: 'http://www.exmple.com/index.html'
    Description: 'URL to monitor'
  
  ArtifactS3Location:
    Type: String
    Default: 's3://synurl-work/synthetics/'
    Description: 'S3 bucket URI for storing artifacts (format: s3://bucket-name/path/)'
    AllowedPattern: '^s3://[a-z0-9][a-z0-9-]*[a-z0-9]/.*$'
    ConstraintDescription: 'Must be a valid S3 URI format (s3://bucket-name/path/) and bucket name cannot contain periods'
  
  MonitoringFrequency:
    Type: String
    Default: 'rate(5 minutes)'
    Description: 'Monitoring frequency'
    AllowedValues:
      - 'rate(1 minute)'
      - 'rate(5 minutes)'
      - 'rate(10 minutes)'
      - 'rate(15 minutes)'
      - 'rate(30 minutes)'
      - 'rate(1 hour)'
  
  TimeoutSeconds:
    Type: Number
    Default: 60
    Description: 'Timeout in seconds'
    MinValue: 3
    MaxValue: 840
  
  DataRetentionDays:
    Type: Number
    Default: 90
    Description: 'Data retention period in days'
    MinValue: 1
    MaxValue: 455
  
  TargetEC2InstanceId:
    Type: String
    Default: 'i-04d493bd1eb75dd95'
    Description: 'EC2 Instance ID to restart on monitoring failure'
    AllowedPattern: '^i-[a-z0-9]{8,17}$'
    ConstraintDescription: 'Must be a valid EC2 instance ID (e.g., i-1234567890abcdef0)'
  
  NotificationTopicArn:
    Type: String
    Default: 'arn:aws:sns:ap-northeast-1:173173380307:synurl-TPC'
    Description: 'SNS Topic ARN for SMS notifications'
    AllowedPattern: '^arn:aws:sns:[a-z0-9-]+:[0-9]{12}:.+$'
    ConstraintDescription: 'Must be a valid SNS Topic ARN'

Resources:
  # VPC
  SYNURLVPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: '10.0.0.0/16'
      EnableDnsHostnames: true
      EnableDnsSupport: true
      Tags:
        - Key: Name
          Value: 'synurl-vpc'
        - Key: Cost
          Value: 'synurl'

  # Internet Gateway
  SYNURLIGW:
    Type: AWS::EC2::InternetGateway
    Properties:
      Tags:
        - Key: Name
          Value: 'synurl-igw'
        - Key: Cost
          Value: 'synurl'

  # Attach Internet Gateway to VPC
  AttachGateway:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      VpcId: !Ref SYNURLVPC
      InternetGatewayId: !Ref SYNURLIGW

  # Public Subnet (for NAT Gateway)
  SYNURLPublicSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref SYNURLVPC
      CidrBlock: '10.0.1.0/24'
      AvailabilityZone: !Select [0, !GetAZs '']
      MapPublicIpOnLaunch: true
      Tags:
        - Key: Name
          Value: 'synurl-pub-subnet'
        - Key: Cost
          Value: 'synurl'

  # Private Subnet (for Lambda)
  SYNURLSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref SYNURLVPC
      CidrBlock: '10.0.3.0/24'
      AvailabilityZone: !Select [0, !GetAZs '']
      MapPublicIpOnLaunch: false
      Tags:
        - Key: Name
          Value: 'synurl-pri-subnet'
        - Key: Cost
          Value: 'synurl'

  # Elastic IP for NAT Gateway
  YteraNATGatewayEIP:
    Type: AWS::EC2::EIP
    DependsOn: AttachGateway
    Properties:
      Domain: vpc
      Tags:
        - Key: Name
          Value: 'synurl-synthrics-natgw-eip'
        - Key: Cost
          Value: 'synurl'

  # NAT Gateway
  YteraNATGateway:
    Type: AWS::EC2::NatGateway
    Properties:
      AllocationId: !GetAtt YteraNATGatewayEIP.AllocationId
      SubnetId: !Ref SYNURLPublicSubnet
      Tags:
        - Key: Name
          Value: 'synurl-synthrics-natgw'
        - Key: Cost
          Value: 'synurl'

  # Public Route Table
  SYNURLPublicRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref SYNURLVPC
      Tags:
        - Key: Name
          Value: 'synurl-public-route-table'
        - Key: Cost
          Value: 'synurl'

  # Private Route Table
  SYNURLPrivateRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref SYNURLVPC
      Tags:
        - Key: Name
          Value: 'synurl-private-route-table'
        - Key: Cost
          Value: 'synurl'

  # Route to Internet Gateway (Public)
  PublicRoute:
    Type: AWS::EC2::Route
    DependsOn: AttachGateway
    Properties:
      RouteTableId: !Ref SYNURLPublicRouteTable
      DestinationCidrBlock: '0.0.0.0/0'
      GatewayId: !Ref SYNURLIGW

  # Route to NAT Gateway (Private)
  PrivateRoute:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref SYNURLPrivateRouteTable
      DestinationCidrBlock: '0.0.0.0/0'
      NatGatewayId: !Ref YteraNATGateway

  # Associate Public Route Table with Public Subnet
  PublicSubnetRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref SYNURLPublicSubnet
      RouteTableId: !Ref SYNURLPublicRouteTable

  # Associate Private Route Table with Private Subnet
  PrivateSubnetRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref SYNURLSubnet
      RouteTableId: !Ref SYNURLPrivateRouteTable

  # Security Group for Lambda
  SYNURLLambdaSG:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupName: 'synurl-lambda-none-sg'
      GroupDescription: 'Security group for Synthetics Lambda - no inbound, all outbound'
      VpcId: !Ref SYNURLVPC
      SecurityGroupEgress:
        - IpProtocol: -1
          CidrIp: '0.0.0.0/0'
          Description: 'Allow all outbound traffic'
      Tags:
        - Key: Name
          Value: 'synurl-lambda-none-sg'
        - Key: Cost
          Value: 'synurl'

  # IAM Role for CloudWatch Synthetics
  YteraCloudWatchSyntheticsRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: 'synurl-CloudWatchSyntheticsRole'
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: SyntheticsCanaryExecutionPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              # CloudWatch Synthetics基本権限
              - Effect: Allow
                Action:
                  - synthetics:*
                Resource: '*'
              # CloudWatch Logs権限
              - Effect: Allow
                Action:
                  - logs:CreateLogGroup
                  - logs:CreateLogStream
                  - logs:PutLogEvents
                Resource: 
                  - !Sub 'arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/cwsyn-*'
              # CloudWatch Metrics権限
              - Effect: Allow
                Action:
                  - cloudwatch:PutMetricData
                Resource: '*'
                Condition:
                  StringEquals:
                    'cloudwatch:namespace': 'CloudWatchSynthetics'
              # S3権限(アーティファクト保存用)
              - Effect: Allow
                Action:
                  - s3:PutObject
                  - s3:GetObject
                  - s3:GetObjectVersion
                  - s3:PutObjectAcl
                  - s3:GetBucketLocation
                  - s3:ListBucket
                Resource: 
                  - !Sub 
                    - '${BucketArn}/*'
                    - BucketArn: !Sub 
                      - 'arn:aws:s3:::${BucketName}'
                      - BucketName: !Select [2, !Split ['/', !Ref ArtifactS3Location]]
                  - !Sub 
                    - '${BucketArn}'
                    - BucketArn: !Sub 
                      - 'arn:aws:s3:::${BucketName}'
                      - BucketName: !Select [2, !Split ['/', !Ref ArtifactS3Location]]
              # VPC権限(VPC内実行用)
              - Effect: Allow
                Action:
                  - ec2:CreateNetworkInterface
                  - ec2:DescribeNetworkInterfaces
                  - ec2:DeleteNetworkInterface
                  - ec2:AttachNetworkInterface
                  - ec2:DetachNetworkInterface
                Resource: '*'
              # X-Ray権限(トレーシング用)
              - Effect: Allow
                Action:
                  - xray:PutTraceSegments
                Resource: '*'
              # Lambda基本実行権限
              - Effect: Allow
                Action:
                  - lambda:InvokeFunction
                Resource: '*'
              # Lambda関数とレイヤーのタグ管理権限
              - Effect: Allow
                Action:
                  - lambda:ListTags
                  - lambda:TagResource
                  - lambda:UntagResource
                Resource: 
                  - !Sub 'arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:cwsyn-synurl-canary01-*'
                  - !Sub 'arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:layer:cwsyn-synurl-canary01-*'
      Tags:
        - Key: Name
          Value: 'synurl-CloudWatchSyntheticsRole'
        - Key: Cost
          Value: 'synurl'

  # CloudWatch Synthetics Canary
  YteraCanary:
    Type: AWS::Synthetics::Canary
    DeletionPolicy: Delete
    Properties:
      Name: 'synurl-canary01'
      ExecutionRoleArn: !GetAtt YteraCloudWatchSyntheticsRole.Arn
      Code:
        Handler: 'heartbeat.handler'
        Script: !Sub |
          from aws_synthetics.selenium import synthetics_webdriver as webdriver
          from aws_synthetics.common import synthetics_logger as logger
          from selenium.webdriver.support.ui import WebDriverWait
          from selenium.webdriver.support import expected_conditions as EC
          from selenium.webdriver.common.by import By
          import time

          def heartbeat_monitoring():
              # ブラウザインスタンスを作成
              browser = webdriver.Chrome()
              
              try:
                  # 監視対象URLにアクセス
                  logger.info(f'Navigating to ${MonitoringUrl}')
                  browser.get('${MonitoringUrl}')
                  
                  # ページの読み込み完了を待機
                  WebDriverWait(browser, 10).until(
                      EC.presence_of_element_located((By.TAG_NAME, "body"))
                  )
                  
                  # スクリーンショットを保存
                  browser.save_screenshot('heartbeat_screenshot.png')
                  
                  # ページタイトルをログに記録
                  page_title = browser.title
                  logger.info(f'Page title: {page_title}')
                  
                  # HTTPステータスコードの確認(JavaScript経由)
                  status_code = browser.execute_script(
                      "return window.performance.getEntriesByType('navigation')[0].responseStatus || 200"
                  )
                  
                  if status_code >= 400:
                      raise Exception(f'HTTP error: {status_code}')
                  
                  logger.info(f'Successfully accessed ${MonitoringUrl} with status: {status_code}')
                  
              except Exception as e:
                  logger.error(f'Heartbeat monitoring failed: {str(e)}')
                  raise e
              
              finally:
                  # ブラウザを閉じる(自動的に閉じられるが明示的に記述)
                  browser.quit()

          # Canaryのエントリーポイント
          def handler(event, context):
              return heartbeat_monitoring()
      ArtifactS3Location: !Ref ArtifactS3Location
      RuntimeVersion: 'syn-python-selenium-9.0'
      Schedule:
        Expression: !Ref MonitoringFrequency
        DurationInSeconds: 0
      RunConfig:
        TimeoutInSeconds: !Ref TimeoutSeconds
        MemoryInMB: 960
        ActiveTracing: false
      FailureRetentionPeriod: !Ref DataRetentionDays
      SuccessRetentionPeriod: !Ref DataRetentionDays
      StartCanaryAfterCreation: true
      VpcConfig:
        VpcId: !Ref SYNURLVPC
        SubnetIds:
          - !Ref SYNURLSubnet
        SecurityGroupIds:
          - !Ref SYNURLLambdaSG
      # Canary自体のタグ
      Tags:
        - Key: Name
          Value: 'synurl-canary01'
        - Key: Cost
          Value: 'synurl'
      # Canaryが作成するLambda関数とレイヤーにタグを複製
      ResourcesToReplicateTags:
        - lambda-function

  # CloudWatch Alarm for Canary failure detection
  YteraCanaryFailureAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: 'synurl-canary01-failure-alarm'
      AlarmDescription: 'Trigger EC2 restart and SNS notification when Canary fails for 15 minutes'
      MetricName: SuccessPercent
      Namespace: CloudWatchSynthetics
      Statistic: Minimum
      Period: 300
      EvaluationPeriods: 3
      Threshold: 100
      ComparisonOperator: LessThanThreshold
      Dimensions:
        - Name: CanaryName
          Value: !Ref YteraCanary
      TreatMissingData: breaching
      ActionsEnabled: true
      AlarmActions:
        - !Ref NotificationTopicArn
      OKActions:
        - !Ref NotificationTopicArn

  # EventBridge Rule to trigger SSM Automation on Alarm
  YteraAlarmToSSMRule:
    Type: AWS::Events::Rule
    Properties:
      Name: 'synurl-canary-alarm-to-ssm'
      Description: 'Trigger SSM Automation to restart EC2 when Canary alarm fires'
      State: ENABLED
      EventPattern:
        source:
          - aws.cloudwatch
        detail-type:
          - CloudWatch Alarm State Change
        detail:
          alarmName:
            - !Ref YteraCanaryFailureAlarm
          state:
            value:
              - ALARM
      Targets:
        - Arn: !Sub 'arn:aws:ssm:${AWS::Region}::automation-definition/AWS-RestartEC2Instance:$DEFAULT'
          RoleArn: !GetAtt YteraEventBridgeRole.Arn
          Id: 'RestartEC2Target'
          Input: !Sub |
            {
              "InstanceId": ["${TargetEC2InstanceId}"],
              "AutomationAssumeRole": ["${YteraSSMAutomationRole.Arn}"]
            }

  # IAM Role for EventBridge to invoke SSM Automation
  YteraEventBridgeRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: 'synurl-EventBridgeSSMAutomationRole'
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: events.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: StartSSMAutomationPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - ssm:StartAutomationExecution
                Resource:
                  - !Sub 'arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:automation-definition/*'
                  - !Sub 'arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:automation-execution/*'
                  - !Sub 'arn:aws:ssm:${AWS::Region}::document/AWS-*'
              - Effect: Allow
                Action:
                  - iam:PassRole
                Resource: !GetAtt YteraSSMAutomationRole.Arn
      Tags:
        - Key: Name
          Value: 'synurl-EventBridgeSSMAutomationRole'
        - Key: Cost
          Value: 'synurl'

  # IAM Role for SSM Automation to restart EC2
  YteraSSMAutomationRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: 'synurl-SSMAutomationExecutionRole'
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - ssm.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonSSMAutomationRole
      Policies:
        - PolicyName: EC2RestartPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - ec2:RebootInstances
                  - ec2:DescribeInstances
                  - ec2:DescribeInstanceStatus
                Resource: '*'
      Tags:
        - Key: Name
          Value: 'synurl-SSMAutomationExecutionRole'
        - Key: Cost
          Value: 'synurl'

Outputs:
  CanaryName:
    Description: 'Name of the created Canary'
    Value: !Ref YteraCanary
    Export:
      Name: !Sub '${AWS::StackName}-CanaryName'
  
  VPCId:
    Description: 'VPC ID'
    Value: !Ref SYNURLVPC
    Export:
      Name: !Sub '${AWS::StackName}-VPCId'
  
  SubnetId:
    Description: 'Subnet ID'
    Value: !Ref SYNURLSubnet
    Export:
      Name: !Sub '${AWS::StackName}-SubnetId'
  
  SecurityGroupId:
    Description: 'Security Group ID'
    Value: !Ref SYNURLLambdaSG
    Export:
      Name: !Sub '${AWS::StackName}-SecurityGroupId'
  
  IAMRoleArn:
    Description: 'IAM Role ARN'
    Value: !GetAtt YteraCloudWatchSyntheticsRole.Arn
    Export:
      Name: !Sub '${AWS::StackName}-IAMRoleArn'
  
  CanaryId:
    Description: 'Canary ID'
    Value: !GetAtt YteraCanary.Id
    Export:
      Name: !Sub '${AWS::StackName}-CanaryId'
  
  MonitoringUrl:
    Description: 'Monitoring target URL'
    Value: !Ref MonitoringUrl
    Export:
      Name: !Sub '${AWS::StackName}-MonitoringUrl'
  
  NATGatewayEIP:
    Description: 'NAT Gateway Elastic IP'
    Value: !Ref YteraNATGatewayEIP
    Export:
      Name: !Sub '${AWS::StackName}-NATGatewayEIP'
  
  AlarmName:
    Description: 'CloudWatch Alarm Name'
    Value: !Ref YteraCanaryFailureAlarm
    Export:
      Name: !Sub '${AWS::StackName}-AlarmName'
  
  EventBridgeRuleName:
    Description: 'EventBridge Rule Name'
    Value: !Ref YteraAlarmToSSMRule
    Export:
      Name: !Sub '${AWS::StackName}-EventBridgeRuleName'
  
  EventBridgeRoleArn:
    Description: 'EventBridge IAM Role ARN'
    Value: !GetAtt YteraEventBridgeRole.Arn
    Export:
      Name: !Sub '${AWS::StackName}-EventBridgeRoleArn'
  
  SSMAutomationRoleArn:
    Description: 'SSM Automation IAM Role ARN'
    Value: !GetAtt YteraSSMAutomationRole.Arn
    Export:
      Name: !Sub '${AWS::StackName}-SSMAutomationRoleArn'

タイトルとURLをコピーしました