Athena Setup¶
Overview¶
AWS Athena is a Serverless query service used to analyze large volumes of data stored in S3.
Data in Athena is searchable via ANSI SQL and powered by Presto.
StreamAlert uses AWS Athena for historical searching of:
- Generated alerts from StreamAlert (currently supported)
- All incoming Data sent to StreamAlert (coming soon)
This works by:
- Creating a
streamalert
Athena database - Creating Athena tables to read S3 data
- Using a Lambda function to periodically refresh Athena to make the data searchable
Getting Started¶
To get started with Athena, run the following commands:
$ python manage.py athena init
$ python manage.py athena enable
This will initialize and enable the configuration for StreamAlert’s usage of Athena.
Next, create the streamalert
database:
$ python manage.py athena create-db
Create the alerts
table for searching generated StreamAlerts:
$ python manage.py athena create-table --type alerts --bucket <s3.bucket.id.goes.here>
Create tables for data sent to StreamAlert:
$ python manage.py athena create-table \
--type data \
--bucket <prefix>.streamalert.data \
--refresh_type add_hive_partition \
--table_name <log_name>
Note: The log name above is representative of an enabled log source to your StreamAlert deployment.
For example, if you have ‘cloudwatch’ in your sources, you would want to create tables for all possible subtypes. This includes cloudwatch_events
and cloudwatch_flow_logs
. Also notice that :
is substituted with _
; this is due to Hive limitations on table names.
Repeat this process for all relevant data tables in your deployment.