Troubleshooting Replication

Version 8
    Purpose

              Give some basic Troubleshooting steps for Replication issues

     

    Overview

     

              Sanity Check: Is GSS Up?

    o    Browse to addresses:

    §  https://cdn.securegss.net(where cdn.securegss.net is the IP address of the pinged GSS Host)

    ·         Cert error is good negative test

    ·         Certificate is published only when Distribution Service is running and listening

    §  http://cache.lumension.com, http://cache.patchlinksecure.net

    ·         Repository page should be displayed when browsing to root

    ·         404 may indicate blockage or outage

    ·         Bad file in cache.lumension.com may need to be purged (very rare; hash mismatch)

    o    Telnet to addresses

    §  Caveat: Not always accurate (firewall might respond instead of host)

     

              Reminder

    o    This is not a fatal error!

    §  ERROR A required file has errors or was not available. RequestFile.GetMetadata():

    §  ERROR Source file not found! C:\Storage\00000000-0000-0000-0000-000000000000\CustomerComponents\EE6AAB43-5203C2ED.xml

    o    Indicates customer is not licensed for non-Windows content

    §  SerialNumber.xml is created through automated process when non-Windows subscriptions are created

    §  Can change or update frequently based on content releases

    §  Replicated to other GSS nodes

    o    Replication Service (RS)

    §  C:\Program Files (x86)\HEAT Software\EMSS\Replication Services\Logs

    o    Endpoint Distribution Service (EDS)

    §  C:\Program Files (x86)\HEAT Software \EMSS\Endpoint Distribution Service

    §  edsrolling_AVController.log

              Common Replication Issues

    o    Unable to replicate licenses

    §  Inability to reach cdn.securegss.net

    o    Unable to download modules, manifests, or updates

    §  Blocking cache.lumension.com, cache.patchlinksecure.net

    §  Firewall or virus filter blocks file

    §  Stale file cached in proxy

    o    PR: Content metadata replication fails

    §  Timeouts

    §  Peer connection terminated

    §  Delta window issues

    o    PR: Content binary replication fails

    §  Unable to reach content provider (Microsoft, Adobe, etc.)

    §  Firewall or virus filter blocks file

    o    AV: Virus definition replication fails

    §  Inability to reach cache.lumension.com

    §  Unzipping of definitions fail

              Unable to replicate licenses

    o    Rare; usually encountered in demo, new install, or POC situations

    o    Symptoms include:

    §  No history at all in Subscription Updates page

    §  AccountID is 00000000-0000-0000-0000-000000000000

    §  Errors in the PLFW.log

    §  Unable to reach cdn.securegss.net

    o    Common troubleshooting steps/solutions :

    §  Open firewall to cdn.securegss.net

    §  Check for proxies or filters (transparent or otherwise)

    §  Add proxy information

    §  Fix proxy information (bad credentials, etc.)

    §  Add web filter exclusions

    o    NOTE: License replication returning a FALSE state may indicate GSS outage

              Unable to download modules, manifests, or updates

    o    Symptoms include:

    §  “System” job fails (FALSE state)

    §  Install Manager does not find new updates

    §  Agent manifests are missing from UI and/or from disk

    o    Common troubleshooting steps/solutions:

    §  Check the RS logs for specific file download failures

    §  Test-download the file yourself and then from the customer machine

    ·         Compare the SHA1 hash of the file with the expected value

    ·         No match may indicate file tweak or stale content stuck in customer proxy

     

    §  Flush stale content from customer proxy using CURL

    §  Flush

    ·         curl.exe -H "Cache-Control: max-age=0, must-revalidate, proxy-revalidate" http://cache.patchlinksecure.net/InstallManager/InstallManager.xml> InstallManager.xml

     

    o    GSS:// files

    §  Downloaded by Replication Services directly from Distribution Services

    §  Cannot be cached (downloaded over SSL)

    §  Can be downloaded manually:

    o    Original file URI: GSS://lumension/GssComponents/protocols.xml

    o    Transforms into:

    §  HTTP://cache.lumension.com/00000000-0000-0000-0000-000000000000/GssComponents/protocols.xml

    §  HTTP://cache.patchlinksecure.net/00000000-0000-0000-0000-000000000000/GssComponents/protocols.xml

    ·         Most configuration files downloaded for non-Windows is transferred this way

     


              Content metadata replication fails

    o    Symptoms Include:

    §  “Vulnerability / Content” job fails (FALSE state)

    §  Job fails at the same percentage or length of time

    §  SQL errors may be present in Event Log or Replication Logs

    o    Common troubleshooting steps/solutions

    §  Investigate SQL errors, test using your system for repro

    ·         If repro, may indicate content issue

    ·         Reset feed if you cannot repro

     

    §  Delta windows issues may occur if server hops between different GSS Hosts

    ·         SQL errors may be a symptom, as well as missing or incomplete content

    ·         Resetting feed usually solves the issue

     

              Content binary replication fails

    o    Symptoms Include:

    §  “Packages” job fails repeatedly

    §  Content isn’t being cached

    o    Common troubleshooting steps/solutions

    §  Check disk space availability

    §  Check the RS logs for download failures

    §  Attempt to download the file from the server

    §  Update firewall/web filtering exclusions for the host

    §  If non-Windows content, check CacheManager logs (depending on content type)

     

              Virus definition replication fails

    o    Symptoms Include:

    §  “Antivirus / Content” job failed (FALSE state)

    o    Common troubleshooting steps/solutions

    §  Check the edsrolling_AVController.log in the EDS directory for errors

    §  Check the avfilelist.xml manually

    ·         http://cache.lumension.com/antivirus/avfilelist.xml

    ·         http://cache.patchlinksecure.net/antivirus/avfilelist.xml

    §  Check for available disk space

    §  Out of Memory Exception

    ·         Fixed in 7.3 SP1 and higher (upgrade!)

    ·         Caused by failed extraction or hash validation of zipped file

    ·         Increase available RAM or restart SQL to free available memory

     

     

     

     

     

    Additional Information

    For log explanations and walkthroughs: Replication Log Files Explanation

    Common Replication Questions: Common Replication Questions

    Replication Stuck Downloading Updates: Replication stuck downloading updates

    -       Manually Reset Replication: How To: Manually Reset Replication on the server

    R   Replication patches fail to download: SQL: Replication patches failed download

     

    Affected Products

    EMSS