Yeah - race conditions are a lot of fun with PXE booting.
As far as the PXE rep is concerned - it's "sniffing" the network for those DHCP requests at boot time (from a device attempting to PXE boot) and then tells the relevant device (after checking in with the Core) whether to boot into Provisioning (if the Core knows that the device is due to run a task).
I've *no* idea why your Gateway would be getting involved in your PXE process ... unless (for some reason) you're forwarding PXE-requests across the WAN / entire corporate network? I suspect that it's not so much "the Gateway" as "forwarded traffic" ? Either of those scenarios would certainly cause problems (and you want to find out what's going on ... if your Gateway is forwarding DHCP requests & offers - you will want to know why that is ... that can scupper all manner of PXE boot processes, as you're experiencing).
If your gateway is responding to DHCP requests seperately / off its own back, you will want to look into that too.
IF the Gateway happens to be a "backup DHCP" / sort of redundancy factor - then that'll also risk breaking PXE (due to the race condition) ... it's a pretty time sensitive process & does not play well with stuff like multiple DHCP servers.
... as for the other matters, I'm having to rummage in the memories of "times long past" ... so this stuff is to be taken with a pinch of salt .
- PXE on DHCP server?
If I recall correctly, it's mainly a "not really recommended" type of scenario. It's not technically impossible (a couple of clunky workarounds / instructions are needed), but it's not ideal. Given that the PXE rep doesn't need to be a server, it's usually easier to just not do it (since it requires special sauce stuff for DHCP configuration).
That said, it's not IMPOSSIBLE and some smart alec figured out how to get it to work "most of the time" (I seem to recall there were situations where things weren't going as expected) ... instructions like those to "crowbar things into working when normally they shouldn't" are usually written with a mindset of "you've got limited servers in your lab - this is how you make them play nice". It's usually not a best practice for live environments.
- Single PXE on multiple subnets?
So the reason for THAT is to mitigate race conditions. Usually most people translate "Subnet == Broadcast domain" (so it's not always technically accurate / the same). The sort of this this is intended to address is people "simply forwarding" DHCP requests across routers / WAN - which then causes race conditions / time-outs and all manner of other nonsense.
Just easier to have a rep per broadcast domain ... stay sane.
<And yes, I've had to deal with environments where the network config did even stuff like "yeah, we forward even UDP traffic over the WAN / routers" ... which caused all manner of interesting nonsense to occur ... >
- IP Helper & CISCO
This should be the "firewall / network forwarding" rule configuration tool for Cisco's routers / switches. This *CAN* be helpful (when you need to forward stuff) but can get you into a whole heap of trouble too (BECAUSE you're forwarding stuff like UDP traffic which was never really expected to go beyond a router).
It's the sort of "strong-arm" approach to solving network protocol forwarding limitations ... usually not the best idea, but on occasion necessary.
... does that help at all?
Thas is great explanation Paul, here my findings so far:
After checking my firewall, we have 2 subnets, .20 and .30
Our DHCP is in .20 and all .30 are being relay to that, i think that's why we get that .30.1 response.
We are not installing PXE rep on our DHCP so that leaves us with 2 options
1. A PEX rep per broadcast domain (So I would need 1 in the .20 and another in .30 and in any other subnets which we do have about 7)
2. 1 PEX for multiple Subnets, with this i have from the documentation:
- Set Option 60 in combination with option 43 if the PXE representative is not on the DHCP server, but also not in the broadcast range of the client
Ok I know how to set 60, that's easy but how about option 43? Option 43 seems to be a very hard to configure setting, where do I start? I'm completely lost with this, any ideas?
Does the (in the comments) linked article on option 43 help a bit? Seems like it requires some actual "proper networking head" for this:
<The link still worked when I tried it ...>