Schwarz Software, Perth, WA

Schwarz Software Blog

S3 Glacier --restore-request on Windows for a large number of files

Posted 1st August 2023

Bulk restore of Glacier files on Windows using AWS CLI

Restoring a large number of files out of glacier on Windows is difficult. There are examples on stack overflow that work in Unix environment with Awk. This is no good for Windows as windows does not have awk. The official AWS advice is to program your own bulk operation and gives no examples!

There is one suggestion from AWS to use the AWS S3api list-objects call to list all glacier objects and use the result from that in a restore-request command. The problem with that is the S3api is restricted to returning a maximum of 1000 results at once! Multiple calls need to be made to get a complete list. Note that if you are using several different types of Glacier storage you will need to run the command filtered for each type of storage.

I have an easier, faster solution!

For simple step-by-step instructions scroll down. For an explanation of each step and the reasons why read on.

The first step that I took was to get a list of files that I could not download using the AWS S3 sync command. The two reasons I use this command is that

AWS S3 is an 'over the top' api and is able to return more than 1000 results, unlike the S3api list-objects command that AWS suggests to use.
You get the full list of glacier files in one command without having to filter multiple glacier types

My command was simply:

aws s3 sync s3://bucketname/folderpath/ .

The output from this command will be a list of errors, every file that is stored as Glacier will give an error like this:

warning: Skipping file s3://bucket/path/filename.txt. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.

I then pasted the list of thousands of files into Notepad and used the find/replace command to convert that into a list of --restore-object commands.

You should then have a long list of commands looking something like this:

aws s3api restore-object --bucket bucketname --restore-request Days=25,GlacierJobParameters={"Tier"="Bulk"} --key path/filename1.txt --output text

aws s3api restore-object --bucket bucketname --restore-request Days=25,GlacierJobParameters={"Tier"="Bulk"} --key path/filename2.txt --output text

aws s3api restore-object --bucket bucketname --restore-request Days=25,GlacierJobParameters={"Tier"="Bulk"} --key path/filename3.txt --output text

...

Where the --key parameter is the path and filename in the error log above.

Paste your result into a batch file and double click on that file and hey presto you are done! Your files are now being restored. To check the status of restoration you can call this command:

aws s3api head-object --bucket bucketname --key path/filename.txt --output text

The result will let you know if the file is still being restored: ongoing-request="true", or if the value is false then the ongoing request has been completed.

Once the restore is completed then run this command:

aws s3 sync s3://bucketname/path/ . --force-glacier-transfer

The final parameter is required for bulk functions to download a restored file. Your job is then complete! No complex programming or 3rd party software required.

So in summary the steps are:

Call the sync function and get a list of files that are in glacier storage.

set AWS_ACCESS_KEY_ID=key

set AWS_SECRET_ACCESS_KEY=secretkey

set AWS_DEFAULT_REGION=region

aws s3 sync s3://bucketname/path/ .

pause

Paste the results into a text editor and use the find/replace function to end up with a list of commands that look like this (use tier = Standard for faster retrieval):

aws s3api restore-object --bucket bucketname --restore-request Days=25,GlacierJobParameters={"Tier"="Bulk"} --key path/filename.extension --output text

Specifically these are the two replace calls I performed. Replace this:

warning: Skipping file s3://bucketname/

with this:

aws s3api restore-object --bucket bucketname --restore-request Days=25,GlacierJobParameters={"Tier"="Bulk"} --key

And replace this:

. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.

with this (include a space at the front):

--output text

Paste the result of the find/replace into a batch file and run that
Wait a few hours for Amazon to restore the files from Glacier! If you want to check on the status, call this on one of the files, the output should be 'false' for ongoing-request:

aws s3api head-object --bucket bucketname --key path/filename.extension --output text

Call the sync function again adding --force-glacier-transfer to download the files like so:

set AWS_ACCESS_KEY_ID=

set AWS_SECRET_ACCESS_KEY=

set AWS_DEFAULT_REGION=

aws s3 sync s3://bucket/path/ --force-glacier-transfer

pause