Migration Strategies

  • Updated

Migrating data to your Filespace needs to be achieved through the LucidLink client application. All data must traverse through our client to reap the unique benefits of our data streaming. You have many options to perform the transfer however the goal is to select the most efficient method. 

It is best practice to ensure that your migration and user acceptance testing completes before exposing the Filespace, or data to users. You can restrict provisioning your data via our user access controls. There are of course times where you are ingesting data to a live Filespace therefore be mindful that data might be changing throughout the filesystem hierarchy.

Each operating system has appropriate tools built-in to perform efficient, resumable file copies. In this article, we will provide some example concepts for Windows, Linux and macOS operating environments. 

We will leverage Robocopy and Rsync in our examples and provide additional guidance for segmenting your data to enable you to prioritize the transfer of your most important, active data. Explorer and Finder at times are problematic with poor copy speeds, transfer time estimations and timeouts.

Estimate your transfer speed based using a bandwidth calculator. You'll require a speed test to determine your upload Mbps and the size of your data.

Please note data transfers may differ depending on data types. LucidLink compresses data for upload therefore some data types like documents compress reasonably well and consume less upload bandwidth. Transfer calculation estimates don't take into account the benefits of our compression.

It is important to size your Filespace file index and cache (root-path) before beginning any migration. Ensure that your cache is ample to act as an appropriate buffer to complement your upload rate:

  1. The default cache is 25GiB and can be increased to 10TiB
  2. Relocate your file index (metadata) and cache to a suitable location

It is often suitable to leave the cache as default 25GiB. Ideally the data in and data out should match, you should be able to evacuate your cache at close to the speed of your copy process, therefore queuing too much data in the cache is unnecessary. See Bandwidth below.

We refer to <source> logically as the data to be migrated, and <destination> the Filespace mount-point. Default LucidLink mount-points are determined by your operating system or Filespace settings. You can change your mount-point to a different location to simplify migration.

Windows:

Robocopy https://ss64.com/nt/robocopy.html is accessible through the Start Menu -> Command Prompt can copy all folders, including empty, with resume support, multiple (8 default) threads, no retry, no wait and outputting to log with no percentage progress:

robocopy <source> <destination> /e /z /mt /r:0 /w:0 /log:<file.log> /np

A very similar Robocopy command excluding files older than 90 days. The concept here is if you have limited time and need to copy the last 3 months of active data, you can copy that data 1st:

robocopy <source> <destination> /e /z /mt /r:0 /w:0 /log:<file.log> /np /maxage:90

Following up after the initial 90-day exclusion copy, excluding older files to transfer the remainder of the data:

robocopy <source> <destination> /e /z /mt /r:0 /w:0 /log:<file.log> /np /xo

macOS:

Rsync https://ss64.com/osx/rsync.html via Terminal through Finder -> Applications -> Utilities is often outdated on macOS which usually ships with rsync version 2.6.9 protocol version 29. 

Difference being extended attributes switch -E vs. -X in updated rsync versions. 

A general Rsync of files between source and destination:

rsync -aEvP <source> <destination> --log-file=<file.log>

Rsync of files 90-days old or less:

rsync -aEvP --files-from=<(find <source> -mtime -90 -type f -exec basename {} \;) <source> <destination> --log-file=<file.log>

Update destination files:

rsync -aEvP --update <source> <destination> --log-file=<file.log>

Ignore existing files:

rsync -aEvP --ignore-existing <source> <destination> --log-file=<file.log>
Note: you may prefer to include the additional -h, --human-readable option to output numbers in a more human-readable format.

Linux:

Rsync https://ss64.com/bash/rsync_options.html in your Terminal can perform similar functionality as macOS with a updated syntax and nuance for extended attributes -X and ACLs -A for updated rsync  version 3.1.2  protocol version 31.

A general Rsync of files between source and destination: 

rsync -aAXvP <source> <destination> --log-file=<file.log>

Rsync of files 90-days old or less:

rsync -aAXvP --files-from=<(find <source> -mtime -90 -type f -exec basename {} \;) <source> <destination> --log-file=<file.log>

Update destination files:

rsync -aAXvP --update <source> <destination> --log-file=<file.log>

Ignore existing files:

rsync -aAXvP --ignore-existing <source> <destination> --log-file=<file.log>
Should you be considering a migration over time and continuing to use the source, you might consider with caution --delete (Rsync) and or /mir (Robocopy) to ensure your destination maintains a true reflection of your source.

Source data:

Depending on where your source data is located and how you wish the data transfer to the destination will be determined by whether it is local data or a removable media (ie. USB disk).

If your source data is located on a removable media, you'll need to identify where the data is mounted. 

Usually on macOS it is within /Volumes, Linux /media and Windows typically shows up as a drive letter.

On Linux and macOS you can identify the media mount point via mount command:

mount

To identify a removable media on both macOS and Linux can display the human mount point capacities with df command:

df -H

Adding a trailing slash / at the end of the source changes transfer behaviours to avoid creating an additional directory level at the destination. 

No trailing slash on a source will mean copy this directory and contents of this directory as a directory into the destination path, whereas a trailing slash on source/ will result in copying of the contents of the source into the destination directory.

Putting it all together:

If your source data on macOS is hosted on a USB removable media /Volumes/usb and your Filespace is mounted within /Volumes/filespace and you'd like to copy the USB in its entirety into your Filespace as its own directory in the root folder or a share.

We will generate a transfer log file within our users home Desktop directory.

rsync -aEvP /Volumes/usb /Volumes/filespace --log-file=/users/username/Desktop/transfer.log
rsync -aEvP /Volumes/usb /Volumes/filespace/share --log-file=/users/username/Desktop/transfer.log

If you'd like all the directory contents of your USB to begin in the root folder of your Filespace mount-point or a share.

rsync -aEvP /Volumes/usb/ /Volumes/filespace --log-file=/users/username/Desktop/transfer.log
rsync -aEvP /Volumes/usb/ /Volumes/filespace/share --log-file=/users/username/Desktop/transfer.log

On Windows if you had a USB media with a drive letter of U: and a Filespace mounted as L: your Robocopy command would look like the following to transfer the data into a USB folder or share on your Filespace. 

And log the data transfer in the user's profile Desktop directory. 

robocopy U: L:\USB /e /z /mt /r:0 /w:0 /log:"%userprofile%\Desktop\transfer.log" /np

Linux would look very similar to macOS with updated rsync options with the same examples as our macOS with /media paths. Logging to our user home directory Desktop folder.

rsync -aAXvP /media/usb /media/filespace --log-file=/home/username/Desktop/transfer.log
rsync -aAXvP /media/usb /media/filespace/share --log-file=/home/username/Desktop/transfer.log
Both Robocopy and Rsync as outlined will resume if interrupted. Simply restart the copy processes and they'll continue where they left off - if data has changed, if necessary update with appropriate options to capture changes as required.

Bandwidth:

It is important to ensure your copy process doesn't overrun your cache, size your cache accordingly and relocate it if necessary to a suitable location. In some situations you may want to limit your upload settings in our Filespace Control Panel or throttle the copy process.

Should your cache fill, your copy process will slow to your upload bandwidth as calculated earlier in this article. The copy process may timeout, depending on the method employed - it's for this reason it is recommended to avoid Explorer or Finder transfers.

Rsync `--bwlimit=KBPS` and Robocopy `/IPG:n` can limit bandwidth utilization to ensure that your data in, is approximately proportionate to your data out and ensuring you do not saturate your Internet and risk an unsatisfactory result. 

Particularly useful if you want your migration to transfer in the background while you use the system for other activities. 

Cloud:

If you have data hosted within a cloud vendor that would benefit from the performance characteristics of a LucidLink Filespace you can easily migrate these via Rclone using a familiar Linux style command-line interface.

Online cloud storage vendors are limited to how they integrate into workflows and operating environments whereas a Filespace presents universally across operating environments as a convenient extension of your operating system.

Our migrating cloud storage providers article provides guidance for data hosted within Google DriveDropboxOneDrive etc. or object storage accounts of S3 compatible or blob

Equally, data within a LucidLink mount-point can also be accessed via S3 APIs for applications that require native API List, Put, Get, Delete through an article we put together in conjunction with MinIO which can also be used in building your own object storage cluster as a backend for our Filespaces. 

We assume in this article you have LucidLink client installed however if you have an operating environment that inhibits your ability to install software, and you can leverage the local area network: you may like to consider configuring an NFS export or SMB server share to facilitate your migration.

 
 

Was this article helpful?

0 out of 0 found this helpful