Massive props for getting it done anyway. For others reading: In general a switch should never run DHCPd, but will normally/often relay it for you, your arista's would 100% have supported relaying, but in this case it sounds like it might even be flat L2. Normally you'd host dhcpd on a server.
Some general feedback incase it's helpful.. -20K on contractors seems insane if we're talking about rack and stack for 10 racks. Many datacentres can be persuaded to do it for free as part of you agreeing to sign their contract. Your contractors should at least be using a server lift of some kind, again often provided kindly by the facility. If this included paying for server configuration and so on, then ignore that comment (bargin!).
-I would almost never expect to actually pay a setup fee (beyond something nominal like 500 per rack) to the datacentre either, certainly if you're going to be paying that fee it had better include rack and stack.
-A crash cart should not be used for a install of this size, the servers should be plugged into the network, and then automatically configured by a script/IPXE. It might sound intimidating or hard but it's not, doesn't even require IMPI (though frankly I would strongly, strongly recommend it, if you do't already have it). I would use managed switches for the management network too, for sure.
-Consider two switches, especially if they are second hand. The cost of the cluster not being usable for a few days while you source and install a replacement even here probably is still thousands.
-Personally not a big fan of the whole JBOD architecture and would have just filled by boots with single socket 4u supermicro chasis. To each their own, but JBOD's main benefit is a very small financial saving at the cost of quite a lot of drawbacks IMO. YMMV.
-Depending on who you use for GPUs, getting a private link or 'peering' to them might save you some cost and provide higher capacity.
-I'm kind of shocked that FMT2 didn't turn out much cheaper than your current colo, would expect less than those figures possibly with the 100G DIA included (normally about $3000/month no setup).
def agree on the setup fees, that was just a price crunch to get it done within the weekend. (too short-notice for professional services, too sensitive for craigslist, so basically just paying a bunch of folks we already knew and trusted)
for IPXE do you have any reference material you'd recommend? we had 3 people each with reasonably substantial server experience try for like 6 hours each and for whatever reason it turned out to be too difficult.
I have done a ton of iPXE boot setups in the past. We use iPXE at our DC location for imaging, system recovery, etc. In fact, I just finished up a new boot image that creates a 100MB virtual floppy drive used for BIOS updates. Reach out and I can provide the entire setup if you like (pxe config files, boot loaders, scripts, etc).
Similarly I'm happy to share my ipxe scripts. It's just one of those things that you need to understand the fundamentals of before you start. It's about a hundred lines of bash to setup.
I assume it was their first time setting up ipxe? There's a lot of hang nails with it depending on the infra you're using it in.
For 10 racks it might not make sense.
Honestly, with 10 servers, a pxe setup is probably overkill. If you're getting used servers (and maybe even if not), you might need to poke them with a KVM to set the boot options so that PXE is an option, and you might want to configure the BMC/IPMI from the console too, and then configure anything for serial over IPMI / bios console on serial ports... do that in your office, since your colo is across the street, and then you may as well do the install too. Then when you install, it should just work and crash cart if not. But, PXE is fun, so...
For PXE / iPXE, there's several stages of boot. You have your NIC's option rom, which might be, but probably is not iPXE. That will hit DHCP to get its own IP and also request info about where to pull boot files. You'll need to give it a tftp server IP and a filename. DHCPD config below
I server iPXE executables to non-iPXE. When iPXE starts up, it again asks DHCP, but now you can give it an http boot script. The simplest thing is to have something like
You can also boot isos, but that's a lot easier if you're in BIOS boot rather than UEFI. Better to practice booting kernels and initrds (unless you need to boot things like firmware update isos)Then you'll have your installer (or whatever) booted, and you might have an unattended install setup for that, or you can just setup a rescue image that does dhcp (again!) and opens sshd so you can shell in and do whatever. Up to you.
the pxe part of my isc dhcpd config is:
(This is mostly consoldidating bits and pieces from here [1] )And I have those three files in the root of my tftp server. There's all sorts of other stuff you could do, but this should get you started. You don't really need iPXE either, but it's a lot more flexible if you need anything more, and it can load from http which is gobs faster if you have large payloads.
If you really wanted to be highly automated, your image could be fully automated, pull in config from some system and reconfigure the BMC while it was there. But there's no need for that unless you've got tons of servers. Might be something to consider if you mass replace your disk shelves with 4U disk servers, although it might not save a ton of time. If you're super fancy, your colo network would have different vlans and one of them would be the pxe setup vlan --- new servers/servers needed reimaging could be put into the pxe vlan and the setup script could move them into the prod vlan when they're done. That's fun work, but not really needed, IMHO. Semi-automated setup scales a lot farther than people realize, couple hundred servers at least. autopw [2] can help a lot!
[1] https://ipxe.org/howto/dhcpd
[2] https://github.com/jschauma/sshscan/blob/master/src/autopw