Skip to content

Commit

Permalink
Merge pull request awslabs#149 from jmp-aws/master
Browse files Browse the repository at this point in the history
MWAA(Managed workflows for Apache Airflow) verify-env
  • Loading branch information
joshua-at-aws authored Mar 15, 2021
2 parents 4fb4ef7 + 66f1e42 commit 93480b0
Show file tree
Hide file tree
Showing 7 changed files with 1,199 additions and 0 deletions.
10 changes: 10 additions & 0 deletions MWAA/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.vscode
__pycache__
*.swp
.DS_Store
launch.json
.idea
.classpath
.project
.settings
.pyc
14 changes: 14 additions & 0 deletions MWAA/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this
software and associated documentation files (the "Software"), to deal in the Software
without restriction, including without limitation the rights to use, copy, modify,
merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
175 changes: 175 additions & 0 deletions MWAA/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# MWAA(Amazon Managed Workflows for Apache Airflow)

## verify environment
An environment can fail to create for the following reasons [documented here](https://docs.aws.amazon.com/mwaa/latest/userguide/troubleshooting.html#t-create-environ-failed)

The `verify_env.py` script will print information support needs to debug these issues. Additionally it will perform checks along with the documented reasons on a best effort basis to help identify the failure. If encountering the error

```
The scheduler does not appear to be running. Last heartbeat was received 1 month ago.
The DAGs list may not update, and new tasks will not be scheduled.
```

This script may identify why

Specifically it will:

- confirm that the Amazon VPC network includes 2 private subnets that can access the Internet(if public environment) for creating containers. If its a private environment it'll verify the number of VPC endpoint for MWAA
- confirm the security groups have at least 1 rule associated with them
- confirm the security groups allow ingress to itself or all traffic
- confirm the security groups allow egress to 5432 and 443 to all traffic
- confirm that the log groups were created for the environment
- if not it will check CloudTrail for the failing CreateLogGroup API call.
- confirm that the KMS key has a resource policy allowing airflow environments to use it
- confirm the route tables have a route to a NAT gateway if the environment is public
- confirm if the VPC endpoints were created if the environment is private
- confirm if the role's policies are valid using [IAM policy simulation](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_testing-policies.html)
- confirm the s3 bucket is blocking public access
- Call SSM with the document [AWSSupport-ConnectivityTroubleshooter](https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-awssupport-connectivitytroubleshooter.html) to confirm connectivity between MWAA and different services
- search logs for any errors and print those to standard output

**Note: SSM automation is charged to the AWS account. For more information [please follow this link](https://aws.amazon.com/systems-manager/pricing/#Automation)**.

This script requires permission to the following API calls:
- [ec2:DescribeNetworkAcls](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeNetworkAcls.html)
- [ec2:DescribeNetworkInterfaces](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeNetworkInterfaces.html)
- [ec2:DescribeRouteTables](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeRouteTables.html)
- [ec2:DescribeSecurityGroups](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeSecurityGroups.html)
- [ec2:DescribeSubnets](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeSubnets.html)
- [ec2:DescribeVpcEndpoints](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeVpcEndpoints.html)
- [airflow:GetEnvironment](https://docs.aws.amazon.com/mwaa/latest/userguide/mwaa-actions-resources.html)
- [s3:GetBucketPublicAccessBlock](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetPublicAccessBlock.html)
- [logs:DescribeLogGroups](https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_DescribeLogGroups.html)
- [logs:FilterLogEvents](https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_FilterLogEvents.html)
- [cloudtrail:LookupEvents](https://docs.aws.amazon.com/awscloudtrail/latest/APIReference/API_LookupEvents.html)
- [ssm:StartAutomationExecution](https://docs.aws.amazon.com/systems-manager/latest/APIReference/API_StartAutomationExecution.html)
- [kms:GetKeyPolicy](https://docs.aws.amazon.com/kms/latest/APIReference/API_GetKeyPolicy.html)
- [iam:ListAttachedRolePolicies](https://docs.aws.amazon.com/IAM/latest/APIReference/API_ListAttachedRolePolicies.html)
- [iam:GetPolicy](https://docs.aws.amazon.com/IAM/latest/APIReference/API_GetPolicy.html)
- [iam:GetPolicyVersion](https://docs.aws.amazon.com/IAM/latest/APIReference/API_GetPolicyVersion.html)
- [iam:SimulateCustomPolicy](https://docs.aws.amazon.com/IAM/latest/APIReference/API_SimulateCustomPolicy.html)

### example usage:

`python3 verify_env.py -h`
```
usage: verify_env.py [-h] --envname ENVNAME [--region REGION]
optional arguments:
-h, --help show this help message and exit
--envname ENVNAME name of the MWAA environment
--region REGION region, Ex: us-east-1
```

### example output:

`python3 verify_env.py --envname test --region us-east-1`
```
please send support the following information
If a case is not opened you may open one here https://console.aws.amazon.com/support/home#/case/create
Please make sure to NOT include any personally identifiable information in the case
AirflowConfigurationOptions : {}
AirflowVersion : 1.10.12
Arn : arn:aws:airflow:us-east-1:111122223333:environment/test
CreatedAt : 2021-01-01 16:47:56-05:00
DagS3Path : dags
EnvironmentClass : mw1.small
ExecutionRoleArn : arn:aws:iam::111122223333:role/service-role/AmazonMWAA-test-O2gIU8
LastUpdate : {'CreatedAt': datetime.datetime(2021, 1, 21, 10, 11, 4, tzinfo=tzlocal()), 'Status': 'SUCCESS'}
LoggingConfiguration : {'DagProcessingLogs': {'CloudWatchLogGroupArn': 'arn:aws:logs::111122223333:log-group:airflow-test-DAGProcessing', 'Enabled': True, 'LogLevel': 'WARNING'}, 'SchedulerLogs': {'CloudWatchLogGroupArn': 'arn:aws:logs::111122223333:log-group:airflow-test-Scheduler', 'Enabled': True, 'LogLevel': 'WARNING'}, 'TaskLogs': {'CloudWatchLogGroupArn': 'arn:aws:logs::111122223333:log-group:airflow-test-Task', 'Enabled': True, 'LogLevel': 'INFO'}, 'WebserverLogs': {'CloudWatchLogGroupArn': 'arn:aws:logs::111122223333:log-group:airflow-test-WebServer', 'Enabled': True, 'LogLevel': 'WARNING'}, 'WorkerLogs': {'CloudWatchLogGroupArn': 'arn:aws:logs::111122223333:log-group:airflow-test-Worker', 'Enabled': True, 'LogLevel': 'WARNING'}}
MaxWorkers : 10
Name : test
NetworkConfiguration : {'SecurityGroupIds': ['sg-00f282e3f1cb821f3'], 'SubnetIds': ['subnet-0c32d5b057c851f2e', 'subnet-02752c9df247ffa0d']}
ServiceRoleArn : arn:aws:iam::111122223333:role/aws-service-role/airflow.amazonaws.com/AWSServiceRoleForAmazonMWAA
SourceBucketArn : arn:aws:s3:::airflow-your-bucket-mwaa
Status : AVAILABLE
Tags : {}
WebserverAccessMode : PUBLIC_ONLY
WebserverUrl : 11112222-5e9d-4203-b247-c078ed1b60cf.c4.us-east-1.airflow.amazonaws.com
WeeklyMaintenanceWindowStart : THU:15:00
VPC: vpc-09b69221ce542334c
### Checking the IAM role arn:aws:iam::111122223333:role/service-role/AmazonMWAA-test-123455 using iam policy simulation
Using AWS CMK
Action: airflow:PublishMetrics is allowed on resource arn:aws:airflow:us-east-1:111122223333:environment/test ✅
Action: s3:ListAllMyBuckets is blocked successfully on resource arn:aws:s3:::airflow-your-bucket-mwaa ✅
Action: s3:ListAllMyBuckets is blocked successfully on resource arn:aws:s3:::airflow-your-bucket-mwaa/ ✅
Action: s3:GetObject* is allowed on resource arn:aws:s3:::airflow-your-bucket-mwaa ✅
Action: s3:GetObject* is allowed on resource arn:aws:s3:::airflow-your-bucket-mwaa/ ✅
Action: s3:GetBucket* is allowed on resource arn:aws:s3:::airflow-your-bucket-mwaa ✅
Action: s3:GetBucket* is allowed on resource arn:aws:s3:::airflow-your-bucket-mwaa/ ✅
Action: s3:List* is allowed on resource arn:aws:s3:::airflow-your-bucket-mwaa ✅
Action: s3:List* is allowed on resource arn:aws:s3:::airflow-your-bucket-mwaa/ ✅
Action: logs:CreateLogStream is allowed on resource arn:aws:logs:us-east-1:111122223333:log-group:airflow-test-* ✅
Action: logs:CreateLogGroup is allowed on resource arn:aws:logs:us-east-1:111122223333:log-group:airflow-test-* ✅
Action: logs:PutLogEvents is allowed on resource arn:aws:logs:us-east-1:111122223333:log-group:airflow-test-* ✅
Action: logs:GetLogEvents is allowed on resource arn:aws:logs:us-east-1:111122223333:log-group:airflow-test-* ✅
Action: logs:GetLogGroupFields is allowed on resource arn:aws:logs:us-east-1:111122223333:log-group:airflow-test-* ✅
Action: logs:DescribeLogGroups is not allowed on resource *
failed with implicitDeny 🚫
Action: cloudwatch:PutMetricData is allowed on resource * ✅
Action: sqs:ChangeMessageVisibility is allowed on resource arn:aws:sqs:us-east-1:*:airflow-celery-* ✅
Action: sqs:DeleteMessage is allowed on resource arn:aws:sqs:us-east-1:*:airflow-celery-* ✅
Action: sqs:GetQueueAttributes is allowed on resource arn:aws:sqs:us-east-1:*:airflow-celery-* ✅
Action: sqs:GetQueueUrl is allowed on resource arn:aws:sqs:us-east-1:*:airflow-celery-* ✅
Action: sqs:ReceiveMessage is allowed on resource arn:aws:sqs:us-east-1:*:airflow-celery-* ✅
Action: sqs:SendMessage is allowed on resource arn:aws:sqs:us-east-1:*:airflow-celery-* ✅
Action: kms:Decrypt is allowed on resource arn:aws:kms:*:111122223333:key/* ✅
Action: kms:DescribeKey is allowed on resource arn:aws:kms:*:111122223333:key/* ✅
Action: kms:Encrypt is allowed on resource arn:aws:kms:*:111122223333:key/* ✅
Action: kms:GenerateDataKey* is allowed on resource arn:aws:kms:*:111122223333:key/* ✅
If the policy is denied you can investigate more at
https://policysim.aws.amazon.com/home/index.jsp?#roles/AmazonMWAA-test-111123
These simulations are based off of the sample policies here
https://docs.aws.amazon.com/mwaa/latest/userguide/mwaa-create-role.html#mwaa-create-role-json
### Checking if log groups were created successfully...
The number of log groups is less than the number of enabled suggesting an error creating 🚫
checking cloudtrail for CreateLogGroup/DeleteLogGroup requests...
if events are failing, try creating the log groups manually
### Trying to verify nACLs on subnets...
nacl: acl-1111111111111111 allows port 5432 on egress ✅
nacl: acl-1111111111111112 denied port 5432 on ingress 🚫
missing VPC endpoints, only found 🚫
### Trying to verify if route tables are valid...
Route Table: rtb-11111111111111111 does not have a route to a NAT Gateway 🚫
Route Table: rtb-11111111111111112 does have a route to a NAT Gateway ✅
### Verifying 'block public access' is enabled on the s3 bucket...
s3 bucket arn:aws:s3:::airflow-your-bucket-mwaa blocks public access: BlockPublicAcls ✅
s3 bucket arn:aws:s3:::airflow-your-bucket-mwaa blocks public access: IgnorePublicAcls ✅
s3 bucket arn:aws:s3:::airflow-your-bucket-mwaa blocks public access: BlockPublicPolicy ✅
s3 bucket arn:aws:s3:::airflow-your-bucket-mwaa blocks public access: RestrictPublicBuckets ✅
### Trying to verifying ingress on security groups...
ingress for security group: sg-00f282e3f1cb821f3 does allow itself ✅
### Testing connectivity to the following service endpoints from MWAA enis...
['sqs.us-east-1.amazonaws.com', 'ecr.us-east-1.amazonaws.com', 'monitoring.us-east-1.amazonaws.com', 'kms.us-east-1.amazonaws.com', 's3.us-east-1.amazonaws.com', 'env.airflow.us-east-1.amazonaws.com']
Testing connectivity between eni eni-0edefdfd24bded4de with private ip of 10.192.21.51 and sqs.us-east-1.amazonaws.com
https://console.aws.amazon.com/systems-manager/automation/execution/a9ff7cf6-49c2-477c-88ba-2627f450d471?REGION=us-east-1
Testing connectivity between eni eni-0edefdfd24bded4de with private ip of 10.192.21.51 and ecr.us-east-1.amazonaws.com
https://console.aws.amazon.com/systems-manager/automation/execution/7e5e8197-afa9-4fc0-a9cd-6dda07692334?REGION=us-east-1
no enis found for MWAA, exiting test for monitoring.us-east-1.amazonaws.com
no enis found for MWAA, exiting test for kms.us-east-1.amazonaws.com
no enis found for MWAA, exiting test for s3.us-east-1.amazonaws.com
no enis found for MWAA, exiting test for env.airflow.us-east-1.amazonaws.com
### Checking CloudWatch logs for any errors less than 1 day old
Found the following failing logs in cloudwatch:
```

### Development

#### Unit tests
run the unit tests using the command [`python3 -m pytest`](https://docs.pytest.org/en/stable/usage.html#calling-pytest-through-python-m-pytest)
Empty file added MWAA/tests/__init__.py
Empty file.
146 changes: 146 additions & 0 deletions MWAA/tests/test_verify_env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
import argparse
import pytest
from verify_env import verify_env


def test_validation_region():
'''
test various inputs for regions and all valid MWAA regions
https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/
'''
regions = [
'us-east-2',
'us-east-1',
'us-west-2',
'ap-southeast-1',
'ap-southeast-2',
'ap-northeast-1',
'eu-central-1',
'eu-west-1',
'eu-north-1'
]
for region in regions:
assert verify_env.validation_region(region) == region
unsupport_regions = [
'us-west-1',
'af-south-1',
'ap-east-1',
'ap-south-1',
'ap-northeast-3',
'ap-northeast-2',
'ca-central-1',
'eu-west-2',
'eu-south-1',
'eu-west-3',
'me-sourth-1',
'sa-east-1'
]
for unsupport_region in unsupport_regions:
with pytest.raises(argparse.ArgumentTypeError) as excinfo:
verify_env.validation_region(unsupport_region)
assert ("%s is an invalid REGION value" % unsupport_region) in str(excinfo.value)
bad_regions = [
'us-east-11',
'us-west-3',
'eu-wheat-3'
]
for region in bad_regions:
with pytest.raises(argparse.ArgumentTypeError) as excinfo:
verify_env.validation_region(region)
assert ("%s is an invalid REGION value" % region) in str(excinfo.value)


def test_validate_envname():
'''
test invalid and valid names for MWAA environment
'''
with pytest.raises(argparse.ArgumentTypeError) as excinfo:
env_name = '42'
verify_env.validate_envname(env_name)
assert ("%s is an invalid environment name value" % env_name) in str(excinfo.value)
env_name = 'test'
result = verify_env.validate_envname(env_name)
assert result == env_name


def test_check_ingress_acls():
''' goes through the following scenarios
* if no acls are passed
* if there is an allow
* if there is a deny but no allow
'''
acls = []
src_port_from = 5432
src_port_to = 5432
result = verify_env.check_ingress_acls(acls, src_port_from, src_port_to)
assert result == ''
acls = [
{
'CidrBlock': '0.0.0.0/0',
'Egress': False,
'Protocol': '-1',
'RuleAction': 'allow',
'RuleNumber': 1
},
{
'CidrBlock': '0.0.0.0/0',
'Egress': False,
'Protocol': '-1',
'RuleAction': 'deny',
'RuleNumber': 32767
}
]
result = verify_env.check_ingress_acls(acls, src_port_from, src_port_to)
assert result
acls = [
{
'CidrBlock': '0.0.0.0/0',
'Egress': False,
'Protocol': '-1',
'RuleAction': 'deny',
'RuleNumber': 32767
}
]
result = verify_env.check_ingress_acls(acls, src_port_from, src_port_to)
assert not result


def test_check_egress_acls():
''' goes through the following scenarios
* if no acls are passed
* if there is an allow
* if there is a deny but no allow
'''
acls = []
dest_port = 5432
result = verify_env.check_egress_acls(acls, dest_port)
assert result == ''
acls = [
{
'CidrBlock': '0.0.0.0/0',
'Egress': False,
'Protocol': '-1',
'RuleAction': 'allow',
'RuleNumber': 1
},
{
'CidrBlock': '0.0.0.0/0',
'Egress': False,
'Protocol': '-1',
'RuleAction': 'deny',
'RuleNumber': 32767
}
]
result = verify_env.check_egress_acls(acls, dest_port)
assert result
acls = [
{
'CidrBlock': '0.0.0.0/0',
'Egress': False,
'Protocol': '-1',
'RuleAction': 'deny',
'RuleNumber': 32767
}
]
result = verify_env.check_egress_acls(acls, dest_port)
assert not result
Empty file added MWAA/verify_env/__init__.py
Empty file.
Loading

0 comments on commit 93480b0

Please sign in to comment.