Apologies for the (very) delayed reply, we're starting to monitor these forums actively and I thought I'd post a response to your question for the benefit of other users.
The amount of data collected by Insight for a given number of clients is heavily dependent on the data collection items which have been enabled. Our current recommendation is that the data purge should be tuned to a given deployment by enabling the required data collection items and allowing some time for data to be gathered. This sizing information is then used to estimate the data growth for the complete number of clients and the purge period set appropriately.
As a side-node, as part of this tuning you may want to look into ways to improve network performance for large client volumes to allow you to maximize clients per Insight server. Information to this topic can be found in the following KB article: