ElasticSearch on AWS EC2 using Terraform

ElasticSearch on AWS EC2 using Terraform

In this, we’ll learn to set up an ElasticSearch Stack on AWS EC2. Elastic Stack consists of ElasticSearch, Filebeat, LogStash, and Kibana(ELK stack) which brings all the logs and traces into a single place. This is one of the most popular tools for storing and viewing logs.

In Elastic Stack—

  • ElasticSearch is used to store data.

  • Filebeat transfers the logs into ElasticSearch through LogStash

  • LogStash filters the logs

  • Kibana helps in visualizing the data and navigating the logs.

In this article, we’re going to deploy ElasticStack on AWS EC2 using Terraform.

Why run your own Elastic Stack on AWS EC2 instead of hosted services

We can create ElasticSearch in AWS either by using Elastic Cloud or by using AWS ElasticSearch Service(OpenSearch). But running our own ElasticSearch on AWS EC2 instead of hosted services has the following advantages:

  • Cheaper

  • Full control over configuration, accessibility, and visibility.

  • Easy plugins installation

  • Access logs

  • Perform any configuration changes

  • No boundary in choosing any instance type

PREREQUISITES

  • AWS and Terraform Knowledge

  • AWS Credentials

Creating Elastic Stack on AWS EC2

Here we’ll create an elastic stack in a VPC, and set up Filebeat on EC2 which helps to view logs, and then, LogStash to apply some filters to the data/logs. All the logs are created and stored inside /var/log. Filebeat takes all these logs and sends them to LogStash. LogStash then applies filters to send them to ElasticSearch.

Finally, Kibana will be configured to display logs from ElasticSearch. Which can be accessed from the Kibana dashboard.

Configuration Set-Up(Important)

To set up the configuration for each component, first, we need to install components on our EC2 server. For the sake of simplicity, we’ll be using Terraform data_template method to replace the default file. And it must be done before starting the component.

  1. VPC, subnets set-up

#basic setup
resource "aws_vpc" "elastic_stack_vpc"{
  cidr_block = cidrsubnet("172.20.0.0/16",0,0)
  tags={
    Name="example-elasticsearch_vpc"
  }
}
resource "aws_internet_gateway" "elastic_stack_ig" {
  vpc_id = aws_vpc.elastic_vpc.id
  tags = {
    Name = "example_elasticsearch_igw"
  }
}
resource "aws_route_table" "elastic_stack_rt" {
  vpc_id = aws_vpc.elastic_vpc.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.elastic_internet_gateway.id
  }
  tags = {
    Name = "example_elasticsearch_rt"
  }
}
resource "aws_main_route_table_association" "elastic_stack_rt_main" {
  vpc_id         = aws_vpc.elastic_vpc.id
  route_table_id = aws_route_table.elastic_rt.id
}
resource "aws_subnet" "elastic_stack_subnet"{
  for_each = {ap-south-1a=cidrsubnet("172.20.0.0/16",8,10),ap-south-1b=cidrsubnet("172.20.0.0/16",8,20)}
  vpc_id = aws_vpc.elastic_vpc.id
  availability_zone = each.key
  cidr_block = each.value
  tags={
    Name="elasticsearch_subnet_${each.key}"
  }
}

Now, Setup ElasticSearach Cluster:

  1. ElasticSearch cluster set-up

In this, we’ll set up a two-masternode and one-datanode ElasticSearch cluster in different AZs. In the security group for ElasticSearch, add the inbound access rule to port 9200. This is required so that Kibana can access it.

# elasticsearch security group
resource "aws_security_group" "elasticsearch_sg" {
  vpc_id = var.vpc_id
  description = "ElasticSearch Security Group"
  ingress {
    description = "ingress rules"
    cidr_blocks = [ "0.0.0.0/0" ]
    from_port = 22
    protocol = "tcp"
    to_port = 22
  }
  ingress {
    description = "ingress rules"
    from_port = 9200
    protocol = "tcp"
    to_port = 9300
    security_groups = [aws_security_group.kibana_sg.id] # Kibana security group to access ElasticSearch
  }

  ingress {
    description = "ingress rules"
    from_port = 9200
    protocol = "tcp"
    to_port = 9300
    security_groups = [var.lambda_sg] # If you're using lambda to access ES.
  }

  egress {
    description = "egress rules"
    from_port   = 0
    protocol    = "-1"
    to_port     = 0
    cidr_blocks = ["0.0.0.0/0"]
  }
  tags={
    Name="elasticsearch_sg"
  }
}

ElasticSearch master nodes:

# Elastic-Search master nodes
resource "aws_key_pair" "elastic_ssh_key" {
  key_name="elasticsearch_ssh"
  public_key= file("~/.ssh/elasticsearch_keypair.pub")
}
resource "aws_instance" "elastic_nodes" {
  count                  = 2
  ami                    = var.elastic_aws_ami
  instance_type          = var.elastic_aws_instance_type
  # subnet_id              = aws_subnet.elastic_subnet[var.azs[count.index]].id
  subnet_id              = var.public_subnet_ids[count.index]
  vpc_security_group_ids = [aws_security_group.elasticsearch_sg.id]
  key_name               = aws_key_pair.elastic_ssh_key.key_name
  iam_instance_profile   = "${aws_iam_instance_profile.elastic_ec2_instance_profile.name}"
  associate_public_ip_address = true
  tags = {
    Name = "elasticsearch dev node-${count.index}"
  }
}
data "template_file" "init_elasticsearch" {
  depends_on = [ 
    aws_instance.elastic_nodes
  ]
  count          = 2
  template = file("./elasticsearch/configs/elasticsearch_config.tpl")
  vars = {
    cluster_name = "elasticsearch_cluster"
    node_name    = "node_${count.index}"
    node         = aws_instance.elastic_nodes[count.index].private_ip
    node1        = aws_instance.elastic_nodes[0].private_ip
    node2        = aws_instance.elastic_nodes[1].private_ip
    node3        = aws_instance.elastic_datanodes[0].private_ip
  }
}

data "template_file" "init_backupscript" {
  depends_on = [ 
    aws_instance.elastic_nodes
  ]
  count          = 2
  template = file("./elasticsearch/configs/s3_backup_script.tpl")
  vars = {
    cluster_name = "elasticsearch_cluster"
    node         = aws_instance.elastic_nodes[count.index].private_ip
    node1        = aws_instance.elastic_nodes[0].private_ip
    node2        = aws_instance.elastic_nodes[1].private_ip
    node3        = aws_instance.elastic_datanodes[0].private_ip
  }
}

resource "aws_eip" "elasticsearch_eip"{
    count     = 2
    instance  = element(aws_instance.elastic_nodes.*.id, count.index)
    vpc       = true

    tags = {
    Name = "elasticsearch-eip-${terraform.workspace}-${count.index + 1}"
  }
}

resource "null_resource" "move_es_file" {
  count          = 2
  connection {
     type        = "ssh"
     user        = "ec2-user"
     private_key = file("~/.ssh/elasticsearch_keypair.pem")
     host        = aws_instance.elastic_nodes[count.index].public_ip
  } 
  provisioner "file" {
    content      = data.template_file.init_elasticsearch[count.index].rendered
    destination  = "elasticsearch.yml"
  }

  provisioner "file" {
    content      = data.template_file.init_backupscript[count.index].rendered
    destination  = "s3_backup_script.sh"

  }

}
resource "null_resource" "start_es" {
  depends_on     = [ 
    null_resource.move_es_file
  ]
  count          = 2
  connection {
     type        = "ssh"
     user        = "ec2-user"
     private_key = file("~/.ssh/elasticsearch_keypair.pem")
     host        = aws_instance.elastic_nodes[count.index].public_ip
  }
  provisioner "remote-exec" {
    inline = [
      "#!/bin/bash",
      "sudo yum update -y",
      "sudo yum install java-1.8.0 -y",
      "sudo rpm -i <https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.16.1-x86_64.rpm>",
      "sudo systemctl daemon-reload",
      "sudo systemctl enable elasticsearch.service",
      "sudo chmod -R 777 /etc/elasticsearch",
      "sudo sed -i 's@-Xms1g@-Xms${aws_instance.elastic_nodes[count.index].root_block_device[0].volume_size/2}g@g' /etc/elasticsearch/jvm.options",
      "sudo sed -i 's@-Xmx1g@-Xmx${aws_instance.elastic_nodes[count.index].root_block_device[0].volume_size/2}g@g' /etc/elasticsearch/jvm.options",
      # "sudo sed -i 's/#network.host: 192.168.0.1/network.host: 0.0.0.0/g' /etc/elasticsearch/elasticsearch.yml",
      "sudo rm /etc/elasticsearch/elasticsearch.yml",
      "sudo cp elasticsearch.yml /etc/elasticsearch/",
      "sudo systemctl start elasticsearch.service",
    ]
  }
}
# Elastic-Search data nodes setup
resource "aws_key_pair" "elastic_datanode_ssh_key" {
  key_name="elasticsearch_datanode_ssh"
  public_key= file("~/.ssh/elasticsearch_keypair.pub")
}
resource "aws_instance" "elastic_datanodes" {
  count                  = 1
  ami                    = var.elastic_aws_ami
  instance_type          = var.elastic_aws_instance_type
  # subnet_id              = aws_subnet.elastic_subnet[var.azs[count.index]].id
  subnet_id              = var.public_subnet_ids[count.index]
  vpc_security_group_ids = [aws_security_group.elasticsearch_sg.id]
  key_name               = aws_key_pair.elastic_ssh_key.key_name
  iam_instance_profile   = "${aws_iam_instance_profile.elastic_ec2_instance_profile.name}"
  associate_public_ip_address = true
  tags = {
    Name = "elasticsearch dev node-${count.index + 2}"
  }
}
data "template_file" "init_es_datanode" {
  depends_on = [ 
    aws_instance.elastic_datanodes
  ]
  count          = 1
  template = file("./elasticsearch/configs/elasticsearch_datanode_config.tpl")
  vars = {
    cluster_name = "elasticsearch_cluster"
    node_name    = "datanode_${count.index}"
    node         = aws_instance.elastic_datanodes[count.index].private_ip
    node1        = aws_instance.elastic_nodes[0].private_ip
    node2        = aws_instance.elastic_nodes[1].private_ip
    node3        = aws_instance.elastic_datanodes[0].private_ip
  }
}

data "template_file" "init_backupscript_datanode" {
  depends_on = [ 
    aws_instance.elastic_datanodes
  ]
  count          = 1
  template = file("./elasticsearch/configs/s3_backup_script.tpl")
  vars = {
    cluster_name = "elasticsearch_cluster"
    node         = aws_instance.elastic_datanodes[count.index].private_ip
    node1        = aws_instance.elastic_nodes[0].private_ip
    node2        = aws_instance.elastic_nodes[1].private_ip
    node3        = aws_instance.elastic_datanodes[0].private_ip
  }
}

# Uncomment following if you want to attach elastic IP to your data nodes.
# resource "aws_eip" "elasticsearch_datanode_eip"{
#     count     = 1
#     instance  = element(aws_instance.elastic_datanodes.*.id, count.index)
#     vpc       = true

#     tags = {
#     Name = "elasticsearch-eip-datanode-${count.index + 1}"
#   }
# }

resource "null_resource" "move_es_datanode_file" {
  count          = 1
  connection {
     type        = "ssh"
     user        = "ec2-user"
     private_key = file("~/.ssh/elasticsearch_keypair.pem")
     host        = aws_instance.elastic_datanodes[count.index].public_ip
  } 
  provisioner "file" {
    content      = data.template_file.init_es_datanode[count.index].rendered
    destination  = "elasticsearch_datanode.yml"
  }

  provisioner "file" {
    content      = data.template_file.init_backupscript_datanode[count.index].rendered
    destination  = "s3_backup_script.sh"

  }

}
resource "null_resource" "start_es_datanodes" {
  depends_on     = [ 
    null_resource.move_es_datanode_file
  ]
  count          = 1
  connection {
     type        = "ssh"
     user        = "ec2-user"
     private_key = file("~/.ssh/elasticsearch_keypair.pem")
     host        = aws_instance.elastic_datanodes[count.index].public_ip
  }
  provisioner "remote-exec" {
    inline = [
      "#!/bin/bash",
      "sudo yum update -y",
      "sudo yum install java-1.8.0 -y",
      "sudo rpm -i <https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.16.1-x86_64.rpm>",
      "sudo systemctl daemon-reload",
      "sudo systemctl enable elasticsearch.service",
      "sudo chmod -R 777 /etc/elasticsearch",
      "sudo sed -i 's@-Xms1g@-Xms${aws_instance.elastic_datanodes[count.index].root_block_device[0].volume_size/2}g@g' /etc/elasticsearch/jvm.options",
      "sudo sed -i 's@-Xmx1g@-Xmx${aws_instance.elastic_datanodes[count.index].root_block_device[0].volume_size/2}g@g' /etc/elasticsearch/jvm.options",
      # "sudo sed -i 's/#network.host: 192.168.0.1/network.host: 0.0.0.0/g' /etc/elasticsearch/elasticsearch.yml",
      "sudo rm /etc/elasticsearch/elasticsearch.yml",
      "sudo cp elasticsearch_datanode.yml /etc/elasticsearch/elasticsearch.yml",
      "sudo systemctl start elasticsearch.service"
    ]
  }
}
  1. Set-up Kibana

resource "aws_security_group" "kibana_sg" {
  vpc_id = aws_vpc.elastic_stack_vpc.id
  ingress {
    description = "ingress rules"
    cidr_blocks = [ "0.0.0.0/0" ]
    from_port = 22
    protocol = "tcp"
    to_port = 22
  }
  ingress {
    description = "ingress rules"
    cidr_blocks = [ "0.0.0.0/0" ]
    from_port = 5601
    protocol = "tcp"
    to_port = 5601
  }
  egress {
    description = "egress rules"
    cidr_blocks = [ "0.0.0.0/0" ]
    from_port = 0
    protocol = "-1"
    to_port = 0
  }
  tags={
    Name="kibana_security_group"
  }
}

Kibana:

# Kibana setup
resource "aws_instance" "kibana" {
  depends_on = [ 
    null_resource.start_es
   ]
  ami                    = "ami-0ed9277fb7eb570c9"
  instance_type          = "t2.small"
  subnet_id              = aws_subnet.elastic_subnet[var.az_name[0]].id
  vpc_security_group_ids = [aws_security_group.kibana_sg.id]
  key_name               = aws_key_pair.elastic_ssh_key.key_name
  associate_public_ip_address = true
  tags = {
    Name = "kibana"
  }
}
data "template_file" "init_kibana" {
  depends_on = [ 
    aws_instance.kibana
  ]
  template = file("./configs/kibana_config.tpl")
  vars = {
    elasticsearch = aws_instance.elastic_nodes[0].public_ip
  }
}
resource "null_resource" "move_kibana_file" {
  depends_on = [ 
    aws_instance.kibana
   ]
  connection {
     type        = "ssh"
     user        = "ec2-user"
     private_key = file("elasticsearch_keypair.pem")
     host        = aws_instance.kibana.public_ip
  } 
  provisioner "file" {
    content     = data.template_file.init_kibana.rendered
    destination = "kibana.yml"
  }
}

resource "null_resource" "install_kibana" {
  depends_on = [ 
      aws_instance.kibana
   ]
  connection {
    type        = "ssh"
    user        = "ec2-user"
    private_key = file("elasticsearch_keypair.pem")
    host        = aws_instance.kibana.public_ip
  } 
  provisioner "remote-exec" {
    inline = [
      "sudo yum update -y",
      "sudo rpm -i <https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.16.1-x86_64.rpm>",
      "sudo rm /etc/kibana/kibana.yml",
      "sudo cp kibana.yml /etc/kibana/",
      "sudo systemctl start kibana"
    ]
  }
}
  1. Set-up LogStash

This takes up logs sent by Filebeats and applies filters to them before sending them to ElasticSearch. For this, add the inbound rule to port 5044.

resource "aws_security_group" "logstash_sg" {
  vpc_id = aws_vpc.elastic_stack_vpc.id
  ingress {
    description = "ingress rules"
    cidr_blocks = [ "0.0.0.0/0" ]
    from_port = 22
    protocol = "tcp"
    to_port = 22
  }
  ingress {
    description = "ingress rules"
    cidr_blocks = [ "0.0.0.0/0" ]
    from_port = 5044
    protocol = "tcp"
    to_port = 5044
  }
  egress {
    description = "egress rules"
    cidr_blocks = [ "0.0.0.0/0" ]
    from_port = 0
    protocol = "-1"
    to_port = 0
  }
  tags={
    Name="logstash_sg"
  }
}

LogStash:

resource "aws_instance" "logstash" {
  depends_on = [ 
    null_resource.install_kibana
   ]
  ami                    = "ami-04d29b6f966df1537"
  instance_type          = "t2.large"
  subnet_id              = aws_subnet.elastic_subnet[var.az_name[0]].id
  vpc_security_group_ids = [aws_security_group.logstash_sg.id]
  key_name               = aws_key_pair.elastic_ssh_key.key_name
  associate_public_ip_address = true
  tags = {
    Name = "logstash"
  }
}

data "template_file" "init_logstash" {
  depends_on = [ 
    aws_instance.logstash
  ]
  template = file("./logstash_config.tpl")
  vars = {
    elasticsearch = aws_instance.elastic_nodes[0].public_ip
  }
}

resource "null_resource" "move_logstash_file" {
  depends_on = [ 
    aws_instance.logstash
   ]
  connection {
     type        = "ssh"
     user        = "ec2-user"
     private_key = file("tf-kp.pem")
     host        = aws_instance.logstash.public_ip
  } 
  provisioner "file" {
    content     = data.template_file.init_logstash.rendered
    destination = "logstash.conf"
  }
}

resource "null_resource" "install_logstash" {
  depends_on = [ 
      aws_instance.logstash
   ]
  connection {
    type        = "ssh"
    user        = "ec2-user"
    private_key = file("tf-kp.pem")
    host        = aws_instance.logstash.public_ip
  } 
  provisioner "remote-exec" {
    inline = [
      "sudo yum update -y && sudo yum install java-1.8.0-openjdk -y",
      "sudo rpm -i <https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.16.1-x86_64.rpm>",
      "sudo cp logstash.conf /etc/logstash/conf.d/logstash.conf",
      "sudo systemctl start logstash.service"
    ]
  }
}
  1. Filebeat Setup

It takes logs from /var/logs/ and sends them to LogStash on port 5044.

resource "aws_security_group" "filebeat_sg" {
  vpc_id = aws_vpc.elastic_stack_vpc.id
  ingress {
    description = "ingress rules"
    cidr_blocks = [ "0.0.0.0/0" ]
    from_port = 22
    protocol = "tcp"
    to_port = 22
  }
  egress {
    description = "egress rules"
    cidr_blocks = [ "0.0.0.0/0" ]
    from_port = 0
    protocol = "-1"
    to_port = 0
  }
  tags={
    Name="filebeat_sg"
  }
}

Filebeat:

resource "aws_instance" "filebeat" {
  depends_on = [ 
    null_resource.install_logstash
   ]
  ami                    = "ami-04d29b6f966df1537"
  instance_type          = "t2.large"
  subnet_id = aws_subnet.elastic_subnet[var.az_name[0]].id
  vpc_security_group_ids = [aws_security_group.filebeat_sg.id]
  key_name               = aws_key_pair.elastic_ssh_key.key_name
  associate_public_ip_address = true
  tags = {
    Name = "filebeat"
  }
}

resource "null_resource" "move_filebeat_file" {
  depends_on = [ 
    aws_instance.filebeat
   ]
  connection {
     type        = "ssh"
     user        = "ec2-user"
     private_key = file("tf-kp.pem")
     host        = aws_instance.filebeat.public_ip
  } 
  provisioner "file" {
    source      = "filebeat.yml"
    destination = "filebeat.yml"
  }
}

resource "null_resource" "install_filebeat" {
  depends_on = [ 
    null_resource.move_filebeat_file
   ]
  connection {
    type        = "ssh"
    user        = "ec2-user"
    private_key = file("tf-kp.pem")
    host        = aws_instance.filebeat.public_ip
  } 
  provisioner "remote-exec" {
    inline = [
      "sudo yum update -y",
      "sudo rpm -i <https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.16.1-x86_64.rpm>",
      "sudo sed -i 's@kibana_ip@${aws_instance.kibana.public_ip}@g' filebeat.yml",
      "sudo sed -i 's@logstash_ip@${aws_instance.logstash.public_ip}@g' filebeat.yml",
      "sudo rm /etc/filebeat/filebeat.yml",
      "sudo cp filebeat.yml /etc/filebeat/",
      "sudo systemctl start filebeat.service"
    ]
  }
}

Now, you have successfully set up Elastic Stack on AWS EC2 using Terraform.

Visit, <public_ip_of_any_es_node>:9200/_cluster/health to see ElasticSearch Status

Visit, <public_ip_of_any_es_node>:9200/_cat/nodes?v to see ElasticSearch nodes

Visit, <public_ip_of_kibana_instance>:5601/_cluster/health to see Kibana

After accessing Kibana, go to settings > Index Patterns > Add the logstash index

To check logs, SSH into each component and run the command:

$ sudo systemctl status <component-name> -l

Then SSH int Filebeat EC2 instance and add sample .log file inside /var/log/ . And you can search for the logs inside the Kibana dashboard.

For a sample log run the following and see a record on Kibana:

echo "echo 'This is a sample log for test' >> /var/log/test-log.log" | sudo bash

That’s It.

Conclusion

Congratulations, you have successfully learned about how to set up Elastic Stack on AWS EC2 using Terraform.

Thank You!