Skip to main content
Splunk Lantern

Health of AWS elastic load balancers

AWS ELBs often play an integral role in distributing traffic to appropriate back-end applications.  If there are no healthy instances supporting the traffic requests, there's likely a problem to be investigated. You would like to start monitoring this information. 

Data required 

AWS description data

​​​​​​Procedure
  1. Configure the Splunk Add-on for Amazon Web Services.
  2. Ensure that your deployment is ingesting AWS data through one of the following methods:
    • Pulling the data from Splunk via AWS APIs. At small scale, pull via the AWS APIs will work fine.
    • Pushing the data from AWS into Splunk via Lambda/Firehose to Splunk HTTP event collector. As the size and scale of either your AWS accounts or the amount of data to be collected grows, pushing data from AWS into Splunk is the easier and more scalable method.
  3. Run the following search. You can optimize it by specifying an index and adjusting the time range.
sourcetype="aws:description" region="*" source="*_load_balancers" 
|eval name=if(isnull(name),LoadBalancerName,name), vpc_id=if(isnull(vpc_id),VpcId,vpc_id), dns_name=if(isnull(dns_name),DNSName,dns_name) 
|eval uniq_id=((((name . "#") . account_id) . "#") . region) 
|dedup uniq_id sortby -_time 
|eval availability_zones=if(isnotnull('availability_zones{}'),mvjoin('availability_zones{}',","),mvjoin('AvailabilityZones{}.ZoneName',",")), instances=if(isnotnull('instances{}.state'),mvzip('instances{}.instance_id','instances{}.state'),mvzip('TargetGroups{}.TargetHealthDescriptions{}.Target.Id','TargetGroups{}.TargetHealthDescriptions{}.TargetHealth.State')), healthy_instance_state=mvfilter((match(instances,"\\w+,InService$") OR match(instances,"\\w+,healthy$"))), healthy_instance_count=if(isnull(healthy_instance_state),0,mvcount(healthy_instance_state)), total_instance_count=if(isnull(instances),0,mvcount(instances)) 
|fields account_id, region, name, instances, availability_zones, healthy_instance_count, total_instance_count, Type
|where ((total_instance_count >= 0) AND (healthy_instance_count == 0))
|eval insight="No healthy instances. (".total_instance_count." unhealthy instances)"
|table account_id region name availability_zones insight

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.

Splunk Search Explanation

sourcetype="aws:description"
region="*"
source="*_load_balancers" 

Search only your load balancers and filter by description data for all regions.

|eval name=if(isnull(name),LoadBalancerName,name), vpc_id=if(isnull(vpc_id),VpcId,vpc_id), dns_name=if(isnull(dns_name),DNSName,dns_name)

Handle potential for null values and set name, vpc_id and dns_name fields accordingly. 

|eval uniq_id=((((name . "#") . account_id) . "#") . region) 

|dedup uniq_id sortby -_time 

Use eval to set uniq_id as the concatenation of name, account_id and region separated by # and then dedup and sort descending by event time.

|eval availability_zones=if(isnotnull('availability_zones{}'), mvjoin('availability_zones{}',","), mvjoin('AvailabilityZones{}.ZoneName',",")), instances=if(isnotnull('instances{}.state'), mvzip('instances{}.instance_id','instances{}.state'), mvzip('TargetGroups{}.TargetHealthDescriptions{}.Target.Id', 'TargetGroups{}.TargetHealthDescriptions{}.TargetHealth.State')), healthy_instance_state=mvfilter((match(instances,"\\w+,InService$") OR match(instances,"\\w+,healthy$"))), healthy_instance_count=if(isnull(healthy_instance_state), 0, mvcount(healthy_instance_state)), total_instance_count=if(isnull(instances), 0, mvcount(instances)) 

Check for null values and join the multi values together for availability zones and ZoneName.  Do the same for instances but put together with mvzip. Do the same for TargetGroups. Use mvfilter to match strings in the multi value collections built above. Lastly count the number of healthy instance state occurrences and the count the instances. 

|fields account_id, region, name, instances, availability_zones, healthy_instance_count, total_instance_count, Type

|where ((total_instance_count >= 0) AND (healthy_instance_count == 0))

Reduce the output to the fields shown and filter by the relations in the where clause. 

|eval insight="No healthy instances. (".total_instance_count." unhealthy instances)"

Create the insight string using concatenation of the text shown and the instance count. 

|table account_id region name availability_zones insight

Display the results in a table with columns in the order shown.

Next steps

Sample results for this search are shown in the table below. The insight field is the key indicator for decision or action. You can inspect the other fields and can see that TAtestelb3 has three instances all of which are out of service. For any ELBs present in the table, determine if the ELB should be removed, if instances should be associated with the ELB, or if unhealthy instances should be fixed.  

account_id region name availability_zones insight

63605715280

ap-southeast-1

TAtestelb3

ap-southeast-1a,ap-southeast-1b

No healthy instances. (3 unhealthy instances)

63605715280

ap-southeast-1

SaaSQATestELB2

ap-southeast-1a

No healthy instances. (3 unhealthy instances)

63605715280

ap-southeast-1

SaaSQATestELB

ap-southeast-1a

No healthy instances. (3 unhealthy instances)

63605715280

ap-southeast-1

TATestELB2

ap-southeast-1a

No healthy instances. (3 unhealthy instances)

63605715280

ap-southeast-1

saastestelb2

ap-southeast-1a,ap-southeast-1b

No healthy instances. (3 unhealthy instances)

The AWS app recommended has a dashboard with ELB insights that include this example search and others for missing items, such as no autoscaling, not enough requests, insecure listener protocol, and healthy instances that are not cross-zone.

Finally, you might be interested in other processes associated with the Managing an Amazon Web Services environment use case.