Migrate existing object storage data to a Filespace

  • Updated

There are a couple of important factors to consider before beginning a migration of existing data from your cloud service provider into a LucidLink Filespace.

You need to consider the cloud provider's egress policies, as many cloud providers charge transfers or egress fees for moving data out of their platform. the majority are quite happy to receive your data, and often take it for free, however are reluctant to give it back.

Careful considerations must equally be made for how you are going to move the data. Whether certain data must be moved first, and over what timeframes does the migration need to take place.

If we are clever, there are ways to get around these additional fees and significantly improve on the transfer times by leveraging their services to perform the migration. 

Amazon S3Azure and Google Cloud are common providers we come across when our customers transition data within a native object store format, to a Filespace. 

These providers do charge egress although not against data accessed within their region, via their compute service offerings therefore we can leverage their compute instances to perform our migration. 

Compute instances equally take advantage of vendor's content delivery networks (CDN) to accommodate data access and offer significant speeds to perform migrations efficiently.

In this knowledge base we will take the example of an Amazon S3 bucket that contains 43GB based on 2700x Adobe PDF files (objects) of various sizes. 

aws s3 ls s3://sourcedataset --recursive --human-readable --summarize

...

Total Objects: 2700
Total Size: 43.2 GiB

We will utilize Rclone https://rclone.org/overview/ to perform our transfer. Although you can use AWS CLIs4cmd, etc. we've found Rclone natively supports many cloud providers and is an obvious choice for convenience. 

To leverage the CDN and get around egress fees, in this KB we will create an AWS EC2 Ubuntu Server 18.04 Linux instance in the region of our source bucket.

t2.medium instance should be sufficient, although larger instances which have an equally higher price, often have the advantage of faster network connectivity.

It is important to ensure this EC2 instance is in the source bucket region to avoid egress fees. The destination Filespace bucket, or provider doesn't tend to matter much, as stated earlier, they'll often take your data at $0 cost. 

Once you've created your EC2 instance you'll download and install our LucidLink client. The presumption is you've already created a Filespace, if not, please follow our getting started guide. We'll assume you have a Filespace and will begin at this point. 

1. Download LucidLink client

wget https://www.lucidlink.com/download/latest/lin64/stable/ -O lucid.deb

ME1.png

2. Install LucidLink client

sudo apt update -y
sudo apt upgrade -y
sudo apt install ./lucid.deb -y

ME2.png

3. Download install Rclone

sudo curl https://rclone.org/install.sh | sudo bash

ME3.png

4. Next configure Rclone to access your Amazon S3 data. For this step you will need your access-key and secret-key and your source data region.

rclone config

a. Select N for New Remote and provide a profile name "AWS" will do in our case. 

ME4.png

b. Choose 4 for S3 compatbile storage provider

ME5.png

c. Select 1 for provider Amazon S3

ME6.png

d. In this particular step we will provide our AWS credentials (access-key and secret-key) by selecting 1 however environmental variables could be retrieved via your EC2 instance. 

ME7.png

e. Choose your region. ours is Frankfurt 9

ME8.png

f. We will leave our endpoint blank and set our location constraint to EU with option 9

ME9.png

g. Our next options we will leave default by simply pressing "enter" for default "" and choose N to not edit advanced config.

ME10.png

h. And Y to save our profile.

ME11.png

i. Finally quit the config session with Q

ME12.png

5. We can test our AWS profile with a simple "ls" List Files and Directories - rclone ls <profile>:<bucket>

rclone ls AWS:sourcedataset

ME13.png

6. All things being equal, you'll see a list of files, you can cancel at any time with CTRL+C. Now it's time to Link to our Filespace and begin our migration. 

lucid daemon --fs <filespace.domain> --user <fsuser> --password <fsuserpwd> --mount-point <mount> &

ME14.png

7. Check we can access our Filespace mount

ME15.png

8. We've created a MIGRATE folder to put our Amazon S3 data. We can access this location from our Windows workstation - obviously it's empty right now.

ME16.png

9. Let's begin our Rclone to our Filespace MIGRATE directory with 30x file transfers to run in parallel and displaying our progress. 

rclone copy AWS:sourcedataset ~/demo/MIGRATE --transfers 30 -P

ME17.png

Rclone has a sophisticated set of include and exclude rules https://rclone.org/filtering/ to improve on the efficiency of migrating data.

10. You will note our transfer has begun populating our Filespace.

ME18.png

11. Once our transfer is complete, we can confirm our data behaves as expected, and safely terminate our EC2 instance. 

You can move your data around your Filespace by simply cut/paste vs. copy allowing you to further refine your data layout, organize your data directory structure in preparation for presenting to your users.

12. We recommend performing a snapshot, or implementing a snapshot schedule for our Filespace to protect our data.

13. Our Filespace is now ready for user access, you can create your users, and apply your shares to control the presentation of your data to your users.

Should you have any queries or require assistance, please do not hesitate to reach out via a support ticket.

 

Was this article helpful?

1 out of 1 found this helpful