S3
Ingest data from S3 buckets
You can create a connector to an AWS S3 bucket that will automatically update on a schedule.
Let’s go through a simple example of how to do this in your workspace.
Create Source
Navigate to Sources
, and click on +
Click on Amazon S3
Here you’ll find the external-id
generated by Airfold and several editable fields.
Set up permissions
To grant Airfold permissions to download data from S3, follow these steps:
- Open AWS Console in a browser and navigate to
IAM
- Select
Policies
on the left and clickCreate policy
- Choose the
JSON
option and create the following policy, replacing<my bucket name>
with your actual bucket name:
- You can add multiple buckets to the
Resources
property if you want Airfold to access them as well - Complete the policy creation and give it a descriptive name, such as
AirfoldS3ReadOnly
- Navigate to
Roles
and clickCreate role
- Select
Custom Trust Policy
and use the following JSON, replacing<external id>
with the external-id provided by Airfold:
- In the next step, attach the policy you created (
AirfoldS3ReadOnly
) to the Role - Complete the role creation by providing a descriptive name like
AirfoldS3Access
- After creating the role, copy the
role ARN
Complete the configuration
Now you can continue with the S3 connector configuration.
Enter the role arn, bucket name, and the path to file in the bucket.
In the path to file field, you can use wildcards like *
or ?
to include multiple files.
The S3 connector will scan the bucket during each run and fetch all objects that match the wildcards.
Your configuration should look like this:
Set the schedule
Next, set up the update schedule.
Enter a cron expression and select the appropriate time zone.
Verify the schema
The final step will display the YAML that will be deployed.
Take a moment to verify that all data types were detected correctly and make any necessary adjustments.
That’s it! Your new table will now update from S3 according to the schedule you set.