Athena Setup


AWS Athena is a Serverless query service used to analyze large volumes of data stored in S3.

Data in Athena is searchable via ANSI SQL and powered by Presto.

StreamAlert uses AWS Athena for historical searching of:

  • Generated alerts from StreamAlert (currently supported)
  • All incoming Data sent to StreamAlert (coming soon)

This works by:

  • Creating a streamalert Athena database
  • Creating Athena tables to read S3 data
  • Using a Lambda function to periodically refresh Athena to make the data searchable


Getting Started

To get started with Athena, run the following commands:

$ python athena init
$ python athena enable

This will initialize and enable the configuration for StreamAlert’s usage of Athena.

Next, create the streamalert database:

$ python athena create-db

Create the alerts table for searching generated StreamAlerts:

$ python athena create-table --type alerts --bucket <>

Create tables for data sent to StreamAlert:

$ python athena create-table \
  --type data \
  --bucket <prefix> \
  --refresh_type add_hive_partition \
  --table_name <log_name>

Note: The log name above is representative of an enabled log source to your StreamAlert deployment.

For example, if you have ‘cloudwatch’ in your sources, you would want to create tables for all possible subtypes. This includes cloudwatch_events and cloudwatch_flow_logs. Also notice that : is substituted with _; this is due to Hive limitations on table names.

Repeat this process for all relevant data tables in your deployment.