Large environments use multiple clustered File Director appliances to scale out to service hundreds of thousands of end users. Smaller organisations may consider load balancing their File Director appliance estate to realise the following benefits:
- Removes the File Director appliance as a single point of failure
- Provides improved scalability and the ability to absorb load in peak periods of demand
This document assumes the prerequisite appliance clustering configuration has been performed, and discusses the different approaches that can be taken with regards to configuration of the Network Load balancing device
01. Background: File Director clients communicate with the File Director appliance via a RESTful API. There is no persistent tunnel between the clients and appliance (Communications are in the form of HTTP POSTS and GETS to the appliance API web service) The appliance(s) issue session tokens for each client for which they have facilitated a Windows logon. This token is presented with each request from the client
Prior to DataNow 4.1,in a clustered environment each appliance maintained it's own in-memory list of session tokens.
In DataNow 4.1 and File Director, appliances distribute session tokens via clustering so a logged-in client can present it's session token to any appliance and it will be honoured, if valid.
The following flow diagram illustrates typical client-server communication with a client in a logged-in state:
The following diagram illustrates the communication between the File Director client and a File Director appliance where one of the following scenarios is true:
- The File Director User / Device combination has not logged in to File Director before
- The File Director User / Device combination has logged in, but over 24 hours ago
- The File Director User / Device has interactively logged out or rebooted ( causing the local session token to be discarded)
- The File Director User / Device has presented its session token to a File Director appliance that did not issue it (pre-4.1)
- The File Director appliance platform has been rebooted or patched and cleared its cache of issued session tokens
It is normal and expected behaviour to experience client reauthentications from time to time. Examples include:
- After an endpoint reboot
- After the first communication with the appliance after a session token has expired (>24 hrs)
- After a manual log out / log in to File Director
- After a network load balancer re-distributes load and directs client traffic to a different appliance (pre 4.1)
02: Client/Server communications - understanding the load
Upon logging on to File Director, the full (Windows and Mac) clients will begin a 'logon sync'. This involves a full reconciliation of the local File Director cache with the file server to ensure that all files are up to date, and to discover or download any new or changed eligible server-side content. Once this process is completed, the (idle) client enters a 'steady state'
In this phase, a full (and online) client will communicate with the appliance approximately once every 30 seconds to perform a 'notification poll'. Any changes made on the file server by File Director clients are registered in the File Director appliance database (shared database in a clustered environment) and are communicated to the appliance via this notification poll so that the client can update their cache more quickly. Changes made outside of File Director will be picked up during one of the following scenarios:
- Logon sync
- Browse / refresh the directory from a client
- Perform an action on a 'sibling' item in the same folder
As users interacting with local paths managed by File Director have an effect on client-server communications, the actual load generated per user can vary a large amount depending on how File Director is configured.
In this example, a user has a read-only software repository configured as a File Director folder. This is a static folder with content consumed on demand by the end user. Client communication with the server is low, as the content is only reconciled at logon, and whenever the user needs to interact with this folder via Windows Explorer.
In this example, a user has their Documents and Desktop folders redirected into File Director via in-location sync. In addition, they have a shared map point with which they frequently read-write content with other users:
In this example, the user will reconcile server-side changes whenever they interact with or traverse content on their desktop or documents. Updates to synced content in the 'shared' map point made by other users are synced down within 30 seconds.
03: Load Balancer configuration
The primary goals when load balancing File Director traffic should be:
- Distribute load equally between File Director appliance nodes
- Leverage SSL-Offload where possible
- Ensure session stickiness (persistency) is configured to avoid logged-on clients needlessly getting directed to an appliance that does not recognise their session token
- Ensure a health monitor is configured so that traffic can be failed over to other appliances in the event of a failure (or maintenance flag) on a particular appliance
Choosing an NLB Topology:
Whilst SSL-Offload is preferred for the best scalability and performance (SSL session between the File Director Client and NLB device, HTTP between the NLB device and appliance) it is also possible to use SSL-Passthrough (SSL session redirected to the appliance without being decrypted) and SSL-Bridging (SSL decrypted on NLB, then re-encrypted before forwarding to File Director appliance)
If an SSL-Offload configuration is chosen the appliance needs to be be enabled for HTTP mode, which is configured via the 'Advanced' section of the admin console:
NOTE: The admin console cannot be configured to use HTTP - it will always use https://<appliance_URL>:8443 regardless of the above check box
The choice of topology will depend largely on the network security requirements.
Choosing an NLB Method
The NLB Method refers to how the NLB device establishes the most appropriate appliance to route traffic to (outside of any matching persistence policy) In most environments, this should be configured for 'Least Connection' This means that when a new client accesses the DataNow URL, it will be routed to the DataNow appliance currently servicing the least number of connections. It is possible that other configuration types may be more suited to certain environments.
Choosing an NLB Persistency and Persistency Timeout
It is very important that File Director clients can communicate with the File Director appliance that issued their session token for a long a period as possible. This requires a form of session persistency to accomplish. If using SSL-Bridging or SSL-Offload, the most common choice is Cookie-Insert persistence - In this scenario, the NLB device provides the File Director client with a cookie after the logon to the appliance which the client presents to the NLB in subsequent requests and instruct the NLB to route to the same appliance.
CAUTION: There is a known issue affecting Microsoft Windows 7 that can cause NLB persistency cookies to be dropped if a file synced by File Director has a period in the filename. If using cookie-insert persistency and Windows 7 OS, ensure the following hotfix is present: https://support.microsoft.com/en-gb/kb/3051475
Another commonly used persistency method is Source-IP in which the NLB device tracks the IP address of the File Director client to route traffic to the same appliance for the duration of the persistency timeout, after which the NLB method is re-evaluated.
This may cause uneven loading if multiple users are behind a NAT device which aggregates and presents the same client IP to the NLB.
Other persistency methods may suit particular environments / hardware.
The persistency timeout is a balanced approach between redistributing load between appliances in the event part of the platform is taken down, versus minimising load on the infrastructure caused by frequent logon events (pre 4.1). In most cases this should be set to 24 hours
Configuring the Health Monitor
File Director includes a dedicated web service delivered on a separate port which facilitates comprehensive internal health checks, as well as an option exposed in the admin console to allow the administrator to temporarily remove the appliance from the NLB pool. This works by simulating a 'fail' condition, so a correctly configured monitor will re-route traffic serviced by that appliance to other nodes, based on the NLB method.
For full configuration details please refer to the product documentation