I am having problems PXE booting ESX VMs on 2017.3. This morning I had VMs that would not provision, they could not get an IP address for the template. I spent the morning working with various drivers and finally got that to work. Then I went to reload a couple more VMs that happened to be on VM version 8. Knowing that I have had previous lockup problems with Windows 10 on VM version 8, I opted to upgrade the hardware to currently available 11. Now when I try to PXE that newly upgraded system, I cannot load the boot.sdi file. It ends up crashing with the message that "A required device isn't connected or can't be accessed." I went through all the postings about changing the TFTP block size, but unless it takes time for that change to replicate, those options didn't help. At the moment, I am stuck in a pickle: Either stay on hardware version 8 which locks up machines randomly, or have machines that can't PXE. Assistance is greatly appreciated. Thanks.
I have moved this over to: All Places > All Products > Endpoint Manager and Endpoint Security (EPM) (Powered by LANDESK) > OS Deployment and Provisioning > Discussions
The original space it was posted to (Our Community and finding Advice) tends to just get visibility for Community related website requests/questions - and usually not topics regarding specific products.
If this is not the right product for your question, please reply with more details and I will help get it in front of the right audience.
This is the right place(-ish) to put this up.
First time I (personally) hear of VMWare's configured hardware level causing issues though. Interesting.
Drivers (disk & NIC being particularly egregious elements) causing trouble per se is nothing new, and sadly "one of those things" that Provisioning environments (well - WinPE) needs to cope with. And we're at the mercy of the quality of drivers of the relevant vendors.
That said, in my (limited, granted) personal experience, VMWare is USUALLY quite benign on this front.
Now BOOT.SDI is (as I understand it) effectively the "empty RAMDISK" ... so you're failing BEFORE you're even loaded into WinPE then?
As an aside, you may want to give the support folks a poke & go over your ESX configuration with them. I know that they've got access to "some" ESX versions (not sure which), but if they can replicate your specific config & replicate the specific behaviour, that would certainly be a big step towards having that resolved.
Could be that you may want to play with the BIOS options, mainly around the HDD options, as that may be where the problem's from (I figure the NIC adapter is 'good enough' to load the PXE image, so anything after this should at least be of the "I can't get an IP address" type failure if it's related to that).
Hope that helps a bit?
I'll get with my server guys on this. I had a feeling it would be something bigger than a driver, but since I spent all day getting the network driver to work right and this worked before, I thought I'd give it a try before they come back to me and tell me to investigate it from this side. The nice part is that there seems to be a workaround in that if they recreate the VM with the older hardware version, we're back in business and then once it's loaded, we can upgrade. Not ideal, but it works.
Aye - having a workaround that removes the pressure is a huge boost.
Still would be nice to figure this out, but yeah - identifying a workaround so you can "keep the lights on" as it were, is absolutely the right thing to do.
An update since I went to see one of the admins about it. It does not appear to be the VM version after all, which is the good news. He moved the machine to another host that would hit another rep and it worked. The bad news is that the problem isn't VM specific in that now I have other machines that can't PXE with the same message, and these are hitting a different rep.
BIOS setting within the VM (based on hardware mode) ?
Or something of a defect (that was fixed) within ESXi itself that may be "aberrant" because some servers are on a newer version of ESXi than others?
It gets even more strange. I moved a system from the test bench to my desk, and the only difference is that they use different switches. The one on the bench had a problem, the one on my desk boots fine. Strange thing to pop up after changing core servers.
Through various configuration trials, it seems that on our Cisco SG-100 business class switches this is a problem. On SD-100 desktop switches, it seems to work. I don't know what to make of it, but it looks like it not strictly an Ivanti or VMWare problem, perhaps a protocol problem or change between versions.
Might be worth throwing at your network team?
Might be that Cisco have a known defect (/ideally - a fix) for firmware, or a setting that's different on the two switches perhaps?
Wireshark may help (to get a comparison with how packets look when they go back & forth potentially) both before & after the switch ... but that'd be something to work on with Cisco support.
Yeah, it is that time. I just wanted to round out the discussion with my findings.
Oh no problem - useful info if anyone else will run into something similar (and I'll highlight your enlightening post as the answer as well).
Absolutely the right thing to do .
What's interesting is that I run into the same issue in my lab. It doesn't happen ALL of the time and it usually only happens after a reboot to flash the BIOS during PE. Most of the time it boots right back to PE, but every once in a while it runs into that error and just powers the machine down.
1 of 1 people found this helpful
It turns out in my case, this was related to the TFTP block size. In 9.6, it was apparently fixed at 1456, and in 2017.3 it is variable. However, it does not start out at the most compatible size, it starts out at the largest size. I had changed it previously, but when it didn't appear to work, I changed it back. It didn't work because I had to either A) wait for the polling interval for the reps to update, or B) restart the LANDesk Targeted Multicast service, and then try it.