1 2 Previous Next 21 Replies Latest reply on Nov 9, 2016 8:41 AM by phoffmann

    Active Installs Is Low

    jpozucek Apprentice

      We just upgraded to LDMS2016 and I'm doing a large Policy Support Push with Accelerated Push checked to 5000+ devices.  It a tiny batch file to just change the local admin ID and password.  It's set to download from source.

       

      When I started it all 5000+ went active and after a few minutes they all went to Pending with a status of "policy has been made available"

       

      The problem is it's only 6 to 10 are Active at a time.  Is there a way to bump this up?  I would think it should be process 100's at a time.

       

      Jim

        • 1. Re: Active Installs Is Low
          MarXtar ITSMMVPGroup

          If you turn off accelerated push do you see it proceed as it did previously? Rolling windows of X machines? Do you hit a similar hard stop at the same number of machines?

           

          We are seeing a similar situation at a customer of ours but not at any others so we are currently trying to find out if this is somehow environmental (e.g. network security software detecting accelerated push as some kind of security threat and limiting them somehow) or if this is definitely product related.

           

          In the meantime consider cranking up the frequency that the clients check for policies to 15 or 30 minutes since this seems to work fine and still results in a pretty good time to achieve full coverage. Not a solution but a potential workaround.

           

          Mark McGinn

          MarXtar Ltd/MarXtar Corporation

          http://landeskone.marxtar.co.uk

          LANDESK One Development Partner

           

          Try MarXtar State Management for LANDESK to Better Understand and Manage your Assets

          • 2. Re: Active Installs Is Low
            phoffmann SupportEmployee

            I saw something like this happening, and I've been able to trace it down to "Network" side stuff, but couldn't go beyond that (had to handover to the relevant customer's network team - never heard from them).

             

            Here's what (simplified) happens normally:

            • You start the "accelerated push" task.
            • Core contacts (VERY quickly) all of the affected clients with a quick "go check your policies" message.
            • Clients go and check for / download new policies & work off the relevant soft dist job.

             

            Now - at the situation I fell afoul of, I could see that the Core was TRYING to discover the clients' via CBA, but other than an initial few, would never get very far. CBA discovery is SUPER basic (just a UDP packet) - but given that we do so quite quickly, I'm still of the opinion that some sort of network defence mechanism at the customer was trying to protect what it felt to be a UDP flood in the making (when it wasn't).

             

            Never heard back from the network people, but I've got the Wireshark traces to back up my claims .

             

            Here's what I used as a workaround (give it a try):

            1. Create a custom script.
            2. The script is only a single line - run PolicySync
            3. Schedule the custom script.

             

            ... this "worked" because CUSTJOB's "slow" approach (60 devices at a time max) didn't seem to trip off whatever was blocking us with accelerated push. If you see the same behaviour, you've definitely got something to go poke your network people over. Yes, this is slower, but at least it'll give you an idea whether you're facing the same problem or not.

             

            The following will also help:

            - How to troubleshoot Agent Discovery

             

            => This will explain to your network team how our CBA discovery works - and what they need to trace. In my case, I could Wireshark & Procmon it quite clearly that the Core sent (successfully) about 6-7 initial discoveries out, but no more ... something was blocking it from even going out the port (I suspected some sort of UDP-flood defence mechanism).

             

            As per our logs, we tried the discovery against all the clients legitimately.

             

            Hope that helps.

            1 of 1 people found this helpful
            • 3. Re: Active Installs Is Low
              dgonzalez Apprentice

              Noticing the same issue on 2016 su5, had to do small batches to push out. Are you still experiencing the issue, or were  you able to find a solution?

              • 4. Re: Active Installs Is Low
                jpozucek Apprentice

                Still having the issue.  We're on SU1 and LANDESK support said SU5 will fix everything.  Not getting a good feeling it will.   Turing off "Accelerated Push" helps, I guess slow is better than failed.  The status messages are inconclusive now.  It used to show a status of "Off" if a device is off now it just shows "Failed".  This seems to be for Distribution Package based pushes.  I created a Full Sync Inventory scan from a script and pushed that out.  Went fast and devices that were off has a status of "Off".  Something changed in 2016 with the way devices are discovered and results returned and it doesn't seem to be for the better.

                • 5. Re: Active Installs Is Low
                  phoffmann SupportEmployee

                  dgonzalez and jpozucek - could you run through the things that I did to troubleshoot that "network issue"?

                   

                  Here's what you should do:

                   

                  1 - Enable debug-logging on the Core. You'll mainly need this for verbose logging in the "PolicyTaskHandler"-log. The entries you're looking for are something like the following:

                  (...)

                  {DATE} {TIME} INFO 7328:1 RollingLog : [Task: Firefox 35 Installer, TaskID: 1, ProcID: 7328] : TargetMachineContainer.MachineTargetOS: Operating System is: [Microsoft Windows XP Professional] for machine: [AZMODAN]

                  {DATE} {TIME} INFO 7328:6 RollingLog : [Task: Firefox 35 Installer, TaskID: 1, ProcID: 7328] : Discover: Discovering machine: [AZMODAN] using it's known ip address [192.168.110.130]...

                  (...)

                  and (if discovery is successful):

                  {DATE} {TIME} INFO 7328:6 RollingLog : [Task: Firefox 35 Installer, TaskID: 1, ProcID: 7328] : TargetMachineContainer.MachineTargetOS: Operating System is: [Microsoft Windows XP Professional] for machine: [AZMODAN]

                  {DATE} {TIME} INFO 7328:6 RollingLog : [Task: Firefox 35 Installer, TaskID: 1, ProcID: 7328] : SyncPolicyTask: Synchronizing policy with the command: [C:\Program Files\LANDesk\LDClient\PolicySync.exe -taskid=1], to machine: [AZMODAN]

                   

                   

                  2 / 3 - Run Wireshark and/or ProcMon. You want to ideally have both ...

                   

                  Wireshark will tell you about what ACTUALLY makes it over the wire.

                   

                  ProcMon will tell you what gets ATTEMPTED by the OS (note that I didn't see/catch any failures when I ran into that issue).

                   

                  4 - ... and evaluate what you get.

                   

                  The Background discovery process is Super basic, so not a lot can go wrong there from our end (either something does or doesn't respond). The log & the traces should tell you which is the case.

                   

                  Now - in my particular customer's case, the following caused some head-scratching:

                  - The PolicyTaskHandler-log stated that we (our process) was trying to contact the correct IP's. So far so good.

                  - Wireshark was NOT seeing those packets being sent over the network.

                  ... if memory serves (can't find my notes), I seem to recall that ProcMon showed that we were trying to send the UDP packets (from an OS level at any rate).

                   

                  ... so in that situation, "something" was intercepting those UDP discovery packets. Since they were being intercepted / blocked by something (likely as a false-positive possible network flood prevention type tool?), the clients in question wouldn't run policy sync ... and so the distribution slowed down.

                   

                  If my workaround (the 1-line CUSTOM SCRIPT calling PolicySync "manually") works, you've got the same sort of issue & need to talk to your network people. Likely something on their side is blocking our messages (probably out of a misplaced sense of "this is a DDOS about to happen" or so). From our side, it's "just" a UDP packet, and other than a wrong IP address, there's not much that can go wrong.

                   

                  On a network level though, there's plenty of things that can (/do?) intercept that sort of stuff. So you'll need to trace a bit on your side & see what's what.

                   

                  Does this somewhat expanded explanation help you be a bit confident in doing those traces (assuming you know how to work with ProcMon / Wireshark)?

                  • 6. Re: Active Installs Is Low
                    jpozucek Apprentice

                    This is happening with regular pushes and not just policy supported pushes.  Also,  the tasks processes and a get a number of successful and then probably 90% off all devices pushed to fail at once with the "Cannot Find Agent" result.  Turning off "Accelerated Push" gets past this.  It's quite frustrating.  The same weekly reboot task that starts at 1am and  used to hit 5000+ devices on 9.5 SP3 in about an hour now only hits about half as many by 8am.   Plus I don't know why it's not immediately failing "Off" devices.

                     

                    I'm going to SU5 tomorrow so we'll see if there is any affect.

                    • 7. Re: Active Installs Is Low
                      phoffmann SupportEmployee

                      Yeah - but 9.5 worked QUITE different (and a LOT slower).

                       

                      9.6 saw a complete overhaul of the software dist tech side on the back-end, and when we say "accelerated push", we mean "very effing fast". One key component of that is touching all those clients to tell them "oi - go check policies - now".

                       

                      In the few instances we've run into so far with that stuff running into issues, it's ALWAYS ended up being network-config / security related. We give you the option to turn it off if needed ... if networking stuff blocks us, there's not much we can do about it, is there .

                       

                      I wouldn't expect SU5 to change the behaviour here, since I doubt it's "us" that's tripping over here. I don't expect to be anything broken. I'd just expect what we've seen a few times now - network environments being secured / tightened / controlled to a point where they (mis-?)interpret our UDP "flood" as a network attack or so.

                       

                      Anyway - you have the information you need here now, I hope .

                      • 8. Re: Active Installs Is Low
                        MarXtar ITSMMVPGroup

                        Paul, can this information be fed through to the frontline support guys please? They are simply throwing out 'it's your network' even to people that have zero special measures in place to block UDP floods.

                         

                        It's not an easy thing to fix, but the symptoms are very simple to recognise. Too much finger pointing to the network with not enough supporting information to help people resolve it. Your detailed response was great and helps set the scene well, but that quality of information is not making it to the customers and causes a great deal of frustration.

                         

                        From a customer perspective, Accelerated Push was touted as being fantastic, so fantastic in fact that it immediately became the default in the product, However when it fails due to network conditions it appears to be the product at fault and unless the information about network dependencies is communicated well at the very beginning then it doesn't look good.

                         

                        On a side note, I wonder how many potential customers might trial the system, be impacted by this issue and simply think the product isn't working? They probably wouldn't have the knowledge that there is even an alternative and are they really going to be motivated to try taking this through a support channel?

                        • 9. Re: Active Installs Is Low
                          jpozucek Apprentice

                          First I was told it's my version now I'm told upgrading won't fix it it's my network.  I agree with MarXtar, saying "it's your network" and washing you hands of it is not a solution.  When you implement new methods of distribution shouldn't you also be publishing new network requirements?  There was nothing in the prereq's that indicated there would be any addition network changes or configurations.  If this is such a common occurrence  there should be a fix or at least doc's in place for it.

                          • 10. Re: Active Installs Is Low
                            jpozucek Apprentice

                            Upgraded to SU5 and no change still slow and still massive failures with "Accelerated Push" checked.  Forwarded the "UDP Flood" info to the Network team.  They do not believe there is anything in place and nothing is triggering any alarms but they will look into it.  Opened a formal ticket with LANDESK hopefully I will get some clue as to what to look for.

                            • 11. Re: Active Installs Is Low
                              MarXtar ITSMMVPGroup

                              Are you running on VMware? Is there any chance this might help?

                               

                              https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2039495

                               

                              I dont have access to a system with the issue.

                              • 12. Re: Active Installs Is Low
                                dgonzalez Apprentice

                                I will try this to see if it helps, thanks!

                                • 13. Re: Active Installs Is Low
                                  dgonzalez Apprentice

                                  I am not using the same adapter, but I took a look at the driver properties and increased the transmit buffers from 512 to 1024. It seems like more jobs are actually starting now.

                                   

                                  I tested it on a small batch of 212 workstations, before change it placed 77 in pending after the change cancelling and retrying the job only 58 went to pending. I went through the list and most of them that are in pending seem to be off at this time. I would have to do further testing at a later time to confirm.

                                  • 14. Re: Active Installs Is Low
                                    phoffmann SupportEmployee

                                    So - a couple of responses here. Been rather busy on the road & trying to catch up...

                                     

                                    ==========

                                    MarXtar - comments/questions:

                                     

                                    • Q: Can this be shared with support?
                                    • A: Yes - it already is / has been. It's both available among the "higher echelons" (the folks who worked with me to debug the original instance I ran into) and in our internal knowledge bases. However, that may not necessarily be findable among the similar symptoms (if you're looking for a "software distribution doesn't work" type situation, a "CBA discovery is getting blocked" type article may not immediately seem related, as an example).Issues like this tend to get escalated in support and will bump into the folks who've worked them in the past & (hopefully) 1+1 gets put together.

                                     

                                    So the information is shared (to some individuals proactively) and it exists in our own internal KB's / records.

                                     

                                    However, we can't flood everyone's e-mail with 1,000-s of instances of "hey - ran into issue X that looked like so", as that'd defeat the whole point of searchable knowledge bases. All it'd result in would be clogged up e-mail inboxes that no one would read (I get enough e-mail per day as is - can happily do without 1,000 additional notiications per day) .

                                     

                                    • Interesting find / share on that VMWare article. MOST interesting. Another possibility for this situation (which wouldn't rely on anti-DDoS watchdogs), in addition to some of the others we've stumbled across (notice how the following article -- How to troubleshoot Agent Discovery -- calls out specifically Symantec Endpoint Protection as a potential cause). But there's far more out there that's capable to cause problems that we won't be able to track down (as all we'd see from our end is "UDP packet doesn't make it to the client").

                                     

                                    Certain interesting to see VMWare having issues with that sort of stuff. That one's a keeper for sure . Big thanks!

                                     

                                    ==========

                                    jpozucek  - I understand your frustration. Believe me I do (this comes from someone who spent the best part of 3 months trying to make sense of that particular issue).

                                     

                                    As far as network requirements go ... for this situation, it's as simple as "don't have stuff in place that'd block UDP traffic from the Core". It's not much of a requirement - but stuff that blocks it, still happens.

                                     

                                    For instance ... notice how the following article -- How to troubleshoot Agent Discovery -- calls out specifically Symantec Endpoint Protection as a known problem for instance (that wasn't me who had that one / figured that one out). But for every occasional cause that we DO find out about, there's going to be dozens of other external factors that can cause problems.

                                     

                                    Here's a quick list of the sort of "WTF?" nonsense that doesn't make intuitively sense that I've run into that stuck in my memory (not related to your issue per se - just making a point and sharing some stories):

                                    • A particular customer's site was "self-sabotaging" the OSD / OS Provisioning process on that site due to their choice of VOIP phones. Turned out that this model / manufacturer loved to make them all be DHCP servers of their own for some weird reason - thus causing race conditions for legitimate PXE booting (because they responded to PXE DHCP requests as well ...).
                                    • Various endpoint security / AV products have been picking up network / DDoS protection elements over the years - seen that cause problems with stuff like Peer Download (well - the subnet broadcasts at any rate, which amounts to the same) and similar things.
                                    • Web-caching appliances interfering with / causing problems with anything from Provisioning, software distribution and pretty much "anything" related to network stuff (usually helped by putting in exclusions for the Core / Package servers specifically).

                                     

                                    There's a lot of stuff that can / "may" cause you trouble for the weirdest of reasons. It's not *ALL* web-caching appliances (to use the last example for instance) and not in *ALL* cases. A lot of it comes down to individual config / preference.

                                     

                                    So a "compatibility list" is really not a feasible item either, as a lot can be broken (or fixed) with a few configuration changes (once you know where it is).

                                     

                                    • If it'll help, I can try to see if I can find the ProcMon traces I did (by and large it boiled down to the easiest option being to filter down to UDP traffic if memory serves) a year or so ago / any notes I may have still got & share that trail so you can check if you see the same / can hand some useful information to your network folks (in addition to what I posted above).
                                    • ... long story short was that I could/did prove that "as far as the Core side tech was concerned" we were trying to send the UDP packets to the correct IP's (as per the database entries) ... and once they weren't seen on the network, it was out of my hands. As either VMWare (as per the interesting article that MarXtar posted above), an AV-product that's overly protective (whether "as configured" or "by default without giving the user an option") or a plethora of other factors could each be at fault.

                                     

                                    ... we can only trace as far as "we're asking the OS to send a UDP packet to X" (which is pretty much as basic an OS operation as you can get) ... after which point the OS stack and "anything that choses to interfere" gets in the way.

                                     

                                    If you're running a virtualised Core, I'd certainly recommend you try out the above VMWare article, as it seems to have quite a bit of merit to it.

                                     

                                    ==========

                                     

                                    Let me know if I need to try and rummage through some old archives ... at worst, I *should* (I hope) still some screenshots of the evaluated traces ...

                                     

                                    Right - hope that helps / makes all sense? Apologies for the wall of text, but had quite a bit to get back on.

                                    1 2 Previous Next