Writing AWS Lambdas

One of the things I like about AWS Lambda is the fact that they come pretty handy as “crontab” jobs. A good use case, when I have to schedule the execution of some code, triggered under a specific event. This can be easily achieved using CloutWatch events, but there are a few things we should take into account.

A few weeks ago, I had a good example of lambdas. In versions before to ElasticSearch 5.0, you could set TTL for a given indice, meaning the data would expire after that time. We don’t run any ES on that version, so I came up with the idea of building a lambda that would clean up indices on a given Elastisearch cluster if the indice was older than a given time.

Almost for all the lambdas, I have built lately I tend to follow the same pattern. First, a reproducible build, which means modifying the lambda, packaging and testing it needs to happen quickly.

There is a large number of programming languages supported, such as Go, Python, Nodejs or Ruby. In my experience, I try to go with an interpreted programming language, especially if you work on a team with more people, this is very convenient as it’s easier to pick up. I choose Python as the main option.

Lambda structure

Having a consistent structure helps to establish a workflow. In my case, I like to have a Makefile to automate the packaging of the lambda and installing all dependencies.

.
├── LambdaPackagedName.zip
├── Makefile
├── README.md
├── lambda
│   ├── lambda.py
│   └── main.py
├── lambda.tf
└── variables.tf

The Makefile assists with:

  • install python dependencies
  • ZIP lambda
  • clean
SHELL := /bin/bash


.PHONY: all zip

all: zip clean

zip:
        cd lambda && \
        pip install elasticsearch -t . && \
        pip install elasticsearch-curator -t . && \
        pip install requests -t . && \
        pip install requests-aws4auth  -t . && \
        zip -r9 ../ElasticsearchIndexLambda.zip * && \
        cd ..

clean:
         cd lambda &&  \
         find . -not -name 'lambda.py' -not -name 'main.py' -not -name '.' -not -name '..'  -maxdepth 1 -exec rm -v -rf {} \;
         cd ..

It’s important to mention, that if you don’t use any third parties libraries, you don’t actually need to generate the ZIP file.

Test locally

Essentially I just copy the filelambda.py to main.py, adapt the function to test locally, and see if behaves as expected.

def main():
    event = {}
    event['region'] = 'us-east-1'
    event['pattern'] = '^stg-.*'
    time_unit = event.get('time_unit', 'months')
    time_count= event.get('time_count', 1)
    service = 'es'
    region = event['region']
    pattern = event['cluster']
    credentials = boto3.Session().get_credentials()
    awsauth = AWS4Auth(credentials.access_key, credentials.secret_key,
                   region, service, session_token=credentials.token)

    endpoints = retrive_endpoints(region, awsauth, pattern)

    for esname in endpoints:
         logger.info(f'Processing ES clsuter {esname}')
         es = Elasticsearch(
               hosts=[{'host': esname, 'port': 443}],
               http_auth=awsauth,
               use_ssl=True,
               connection_class=RequestsHttpConnection,
               timeout=120
               )
         delete_indices_by_date(es, time_unit, time_count)

    return {"ok": True}

if __name__ == '__main__':
      main()

Configure a test event

There are some constraints in the naming you have to use. For instance, if you have a file named lambda.py and function named

def lambda_handler(event, context)

your Handler Info should be called lambda.lambda_handler. At this point, I just define a test event to see if the Lambda performs the same way.

{
  "region": "eu-central-1",
  "cluster": "^stg_infra-alpha.*",
  "time_unit": "month",
  "time_count": 1
}

If it works we are good to go, but some issues may crop up. You may experience that your main.py works fine, however when you upload your lambda it does not work. Review that you have the right role with the relevant policies attached to your lambda.

Terraform

In case I’m requested to deploy the same lambda for different regions and environments I do not want to deploy manually. I use Terraform precisely for this, but it would work as well using the AWS SDK or CloudFormation.

resource "aws_lambda_function" "lambda" {
  function_name    = "${var.env_name}-elasticsearch-indices-cleaner"
  role             = "${var.role_arn}"
  handler          = "lambda.lambda_handler"
  runtime          = "python3.7"
  timeout          = "900"
  filename         = "${path.module}/ElasticsearchIndexLambda.zip"
  source_code_hash = "${base64sha256("ElasticsearchIndexLambda.zip")}"
  memory_size      = "512"

  tags {
    "Name"        = "${var.env_name}-elasticsearch-indices-cleaner"
    "team"        = "ops"
    "environment" = "${var.env_name}"
  }
}

resource "aws_cloudwatch_event_rule" "lambda" {
  name                = "${var.env_name}-elesticsearch-indices-cleaner"
  schedule_expression = "${var.schedule}"
}

resource "aws_lambda_permission" "lambda" {
  statement_id  = "${var.env_name}-elasticsearch-indices-cleaner-invoker"
  action        = "lambda:InvokeFunction"
  function_name = "${aws_lambda_function.lambda.function_name}"
  principal     = "events.amazonaws.com"
  source_arn    = "${aws_cloudwatch_event_rule.lambda.arn}"
}

resource "aws_cloudwatch_event_target" "lambda" {
  rule = "${aws_cloudwatch_event_rule.lambda.name}"
  arn  = "${aws_lambda_function.lambda.arn}"

  # event hash
  input = "${jsonencode(map(
      "cluster", "${var.cluster}",
      "region", "${var.region}",
      "time_unit", "${var.time_unit}",
      "time_count", "${var.time_count}",
  ))}"
}

output "lambda_arn" {
  value = "${aws_lambda_function.lambda.arn}"
}