Create a S3 Website to Upload Large Files
Large File Service: Securely Uploading Large Files to S3
Author: Andrew Pach
FINRA routinely requests information from member brokerage firms. Firms, for instance, upload individual attachments using a browser based application. As the firm's data increases over time, so do their attachments.
Nonetheless, this comes with challenges such equally browser restrictions, auto speed, network speed, and server storage capacity. Previously, uploads had been streamed from the user's car to a FINRA server and ultimately stored on premise. Past moving to a deject centric technical compages, we've been able to provide a secure approach to upload large files.
Note: Some configuration and coding examples are advisory merely are not meant to be complete. Consider your own requirements when designing an architectural solution.
LFS Architecture Components
In 2014, FINRA began using AWS instead of on premise resource. As part of this shift, we built a new and more robust large file service (LFS) that leverages cloud resources. AWS technologies used include Elastic Compute Deject (EC2), Simple Storage Service (S3), Identity and Admission Management (IAM), Security Token Service (STS), Primal Management Service (KMS), and Simple Queue Service (SQS).
LFS is part of a Coffee web application running on an EC2 case and exposes Remainder API'due south to clients such as the Information Intake application. It uses STS to create temporary IAM credentials for writing files to an S3 loading dock bucket. It also uses SQS to receive S3 notifications and send messages back to customer applications.
Below is a diagram showing the loftier level components for LFS.
LFS Sequence Flow
User Uploads File to S3 Loading Dock Saucepan
- A user requests to upload a file via the Information Intake browser-based awarding.
- The browser makes a REST phone call to its application server requesting temporary credentials to the S3 loading dock bucket.
- The Data Intake application makes a Residual call to the LFS application requesting temporary credentials.
- The LFS application makes an STS "presume role" request to obtain the temporary credentials.
- Credentials are returned to the Data Intake application server.
- Credentials are returned back to the browser.
- The browser awarding uploads the file to the S3 loading dock saucepan using the temporary credentials.
File Moved from Loading Dock Bucket to Permanent Bucket
- In one case the file is uploaded, S3 publishes a notification to an SQS queue.
- LFS receives the notification that the upload is complete from the queue.
- LFS copies the file to a permanent bucket. While being copied, the file is re-encrypted with a unlike KMS cardinal.
- The original loading dock file is deleted.
- LFS publishes a "file available" notification message to the SQS queue.
- The Information Intake application reads the message and marks the file as available.
The following diagram shows the sequence flow when a user uploads a file using LFS.
LFS Technical Details
Two Bucket Security Arroyo
LFS uses a two saucepan compages for security. The first bucket is a temporary loading dock that is externally exposed, assuasive firms to write files. Security measures include:
- Users are given temporary credentials to write to specific locations.
- Read operations are non permitted.
- HTTPS connectivity is required to write files.
- Server Side Encryption (SSE) is required.
- A specific KMS id is used to encrypt files.
- The Cross-Origin Resources Sharing (CORS) configuration allows applications from finra.org to write to the bucket since the saucepan is within amazon.com.
- Files are automatically deleted via a lifecycle policy after a specified time limit to remove any remaining transit files.
The 2nd bucket is more secure and becomes the files' final place. External users don't have access to it. A split up internal KMS id is used to encrypt files. Information technology as well requires HTTPS connectivity and server side encryption.
While a two bucket solution strengthens security, it complicates the architecture. A less secure alternative would employ a single bucket only require carefully configured policies, granting admission to users and applications simply as needed.
Creating an S3 Saucepan
To create a S3 bucket follow these directions and these values:
Bucket Name: Enter the proper noun for the bucket (due east.thou. "lfs-loading-dock" or "lfs-permanent").
Region: Select from ane of the regions (due east.g. United states of america Standard).
One time the bucket is created, you tin create the security policy in iii steps.
- Click on the "Properties" button after selecting the bucket.
- Nether "Permissions" section, click "Edit bucket policy".
- Enter the security policy in JSON format as well every bit SSL and SSE enforcement.
Find out more on saucepan policies here.
Under the "Edit CORS Configuration" button you tin configure the CORS header information (i.e. which HTTP methods are allowed for specific domains). For security reasons, web browsers do non allow requests to be made to other domains. However, a server on a different domain tin can render "CORS" headers informing the browser that information technology can make sure HTTP calls to it. Acquire more virtually CORS hither.
In the "Lifecycle" section you can configure policy to automatically delete files subsequently a flow of fourth dimension from the loading dock. This way, the loading dock merely contains parts and files in-flight.
Configure this by:
- Clicking "Add rule" and follow the wizard steps.
- Nether "Whole Saucepan", check the "Permanently Delete" option to specify the number of days earlier the file will exist deleted.
So, the S3 service will automatically delete files later the specified time period has passed.
Note: Be sure this configuration is just applied on the loading dock bucket. You don't want files automatically deleted from the permanent bucket.
Obtaining Credentials
When a client application wants to upload a file, it invokes an LFS API, specifying information about the file and optional metadata. The LFS determines a unique location in the loading dock bucket and returns temporary write credentials for that location. LFS creates temporary credentials for writing to the specific location with IAM/STS. This API may be called multiple times to extend credentials once they accept expired.
KMS Key Cosmos
The KMS key can be created using the IAM portion of the console by selecting "Encryption Keys" from the left mitt menu. The "Create Key" button volition initiate a wizard that takes you through the process. Make certain permissions are granted to the EC2 instance that creates the temporary credentials. You tin can discover out more hither.
Express Access S3 Loading Dock Writing Function
LFS uses a special express admission role that only puts objects (s3:PutObject) in the loading dock bucket every bit well every bit generate KMS data keys (kms:GenerateDataKey) and decrypt (kms:Decrypt) files using the KMS cardinal. Create roles in the IAM portion of the console past selecting "Roles" from the left hand menu. When creating the role, enter a policy that performs these restrictions using the Amazon Resources Names (ARN) for the S3 saucepan and KMS key. More information on creating IAM Roles tin can be found here.
STS Part Restriction
When the LFS API is invoked to return temporary credentials, information technology uses the AWS STS AWSSecurityTokenServiceClient.assumeRole method to create restricted credentials. This blog post demonstrates how the assumeRole API can be used.
Upload Files to S3
Using temporary credentials, the intake browser awarding uses AWS'due south JavaScript Software Development Kit (SDK) to perform a multi-part upload to the S3 loading dock bucket over the secure HTTPS protocol. The browser breaks the source file into multiple chunks and uploads each chunk individually.
If the temporary credentials expire at whatever point during the upload, the LFS REST endpoint is invoked for new temporary credentials. This procedure tin be repeated as long as necessary until all the parts have been uploaded. And then, the Information Intake awarding waits for an LFS notification that the file has been moved to the permanent saucepan.
Amazon provides a basic JavaScript example of how a file can be uploaded to S3 using their SDK. The AWS SDK JavaScript documentation for the "S3" class provides more details on methods such equally "createBucket" and "upload".
Moving Files to a Permanent Bucket
One time the application uploads files to the loading dock bucket, the LFS service receives S3 SQS notifications, 1 per file. To receive these notifications, an SQS queue needs to be created. AWS directions on creating an SQS tin can exist found here. When creating the queue, select a meaningful name such every bit "loading-dock-queue".
To configure the notifications, follow these steps:
- Select the loading dock saucepan from the S3 portion of the console.
- Select "Events"
- Click "Add together Notification".
- Select "ObjectCreated (All)".
- For the Event, select "SQS Queue".
- For where to ship the event to, enter the previously created loading dock queue name for the SQS queue.
More than details on creating a new effect notification can be found within the "Enable Event Notifications" section of this page.
In one case this notification is received, LFS copies the file to the permanent bucket. During this copy, the file is re-encrypted with a different KMS id. The re-create occurs within the AWS infrastructure then no data leaves the cloud. The copy can be done using the TransferManager class of the AWS S3 Java SDK. TransferManager improves performance because it copies files in multiple parts with multiple threads. This blog post demonstrates how a file tin can exist copied using TransferManager.
Once the copy completes, the original loading dock file is deleted using the AmazonS3Client.deleteObjects API. If the upload fails, leaving any loading dock in-flying parts, an LFS background task automatically deletes them. This is washed using the listMultipartUploads and abortMultipartUpload methods of the SDK S3Client class. Merely in case the original file couldn't be deleted, the loading dock lifecycle policy previously configured also ensures whatsoever dangling files will be automatically deleted.
Other Topics to Consider
Although the LFS compages is very robust, there are still other bug to consider:
- The Amazon JavaScript SDK running in-browser does a dandy job of uploading large files to S3. However, information technology doesn't back up older browsers. Here are some of the browsers that are supported. If uploads are required for unsupported browsers, you'll demand a custom built fallback strategy. In the case of Data Intake, the browser streams files to the Information Intake application server which proxies the uploads to S3. This flow is not as robust and is but recommended for files 500MB or less.
- Some Information Intake users could accept firewall rules that don't permit the browser to connect directly to the S3 bucket or strip hallmark header information from the request. A possible solution would be connecting the browser to a reverse proxy server (e.g. F5, Nginx, etc.) that is configured within the aforementioned domain equally the application server.
Source: https://technology.finra.org/code/large-file-service.html
0 Response to "Create a S3 Website to Upload Large Files"
Post a Comment