Athena
Amazon Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats.
Athena
provides a simplified, flexible way to analyze petabytes of data where it lives. Analyze data or build applications from an Amazon Simple Storage Service (S3) data lake and 30 data sources, including on-premises data sources or other cloud systems using SQL or Python.Athena
is built on open-sourceTrino
andPresto
engines andApache Spark
frameworks, with no provisioning or configuration effort required.
This notebook goes over how to load documents from AWS Athena
.
Setting up
Follow instructions to set up an AWS accoung.
Install a python library:
! pip install boto3
Example
from langchain_community.document_loaders.athena import AthenaLoader
API Reference:
database_name = "my_database"
s3_output_path = "s3://my_bucket/query_results/"
query = "SELECT * FROM my_table"
profile_name = "my_profile"
loader = AthenaLoader(
query=query,
database=database_name,
s3_output_uri=s3_output_path,
profile_name=profile_name,
)
documents = loader.load()
print(documents)
Example with metadata columns
database_name = "my_database"
s3_output_path = "s3://my_bucket/query_results/"
query = "SELECT * FROM my_table"
profile_name = "my_profile"
metadata_columns = ["_row", "_created_at"]
loader = AthenaLoader(
query=query,
database=database_name,
s3_output_uri=s3_output_path,
profile_name=profile_name,
metadata_columns=metadata_columns,
)
documents = loader.load()
print(documents)