Have two connections to another network in separate buildings on same campus.
The default connection goes through firewall/gateway and through the VPN tunnel, and the firewall redirects on the second ping to go through the faster "pipe" that is direct.
Main Gateway/Firewall: 10.0.12.1 --> 10.0.2.1
Faster Connection: 10.0.12.3 --> 10.0.2.3
Yes, it would be nice is the first packet goes through the "direct" way, but Ubuntu and the Cisco gateway we use don't seem to do static routes the same as our older slower one. We haven't found a direct solution. The routing technically works, but NFS targets are failing between the Ubuntu NFS servers on the 10.0.12.x and the ESXi servers on the 10.0.2.x networks.
The ESXi hosts a static route, which they honor on the first packet.
However, the Ubuntu and sometimes Mac clients on the 12.x network ask the 10.0.12.1 firewall and it sends the first packet goes through the gateway, and then gives the hint on the second packet to go through the "direct" 10.0.12.3 connection. This causes the NFS stores to go inaccessible for the ESXi servers whose first packet comes back the "long" way. We can reconnect manually, but then a certain number of hours later when the Ubuntu NFS server's routing tables go stale, the problem shows up again.
Since the Ubuntu 18.04 NFS serves are using NetPlan, we had to use the new YAML config file format.
$ route -v
< confirms the direct route has dropped out>
$ cd /etc/netplan/
$ sudo vi /50-cloud-init.yaml
Being VERY careful to use spaces and not tabs, and keep the indenting perfect, add the following bolded section below, and save:
- to: 10.0.2.0/24
Restarting the network doesn't touch netplan, and we wanted to know that it would stick for a reboot.
$ sudo reboot
After the reboot, we performed another route test, and confirmed the route "stuck".
$ sudo route -v
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default _gateway 0.0.0.0 UG 0 0 0 eth1
10.0.2.0 192.168.12.3 255.255.255.0 UG 10 0 0 eth1
10.0.12.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
Returning to the ESXi hosts, we reconnect the NFS stores. List the NFS stores to confirm they aren't already "back".
[root@mwvhost1:/] esxcfg-nas -l
< shows two of the NFS servers still disconnected>
[root@mwvhost1:/] esxcfg-nas -r
< nothing output>
Check again with:
[root@mwvhost1:/] esxcfg-nas -l
< shows all the NFS server shares all connected properly.
If you deleted the NFS stores, you'll have to re-add them. In addition, the VCSA will remember the "inaccessible" ones old names and the new ones with the same name will have a (1) added. Before you read the NFS stores, do a rename on the inaccessible NFS stores add add a suffix "dead", or something. They stick until all the hosts have been rebooted.
ESXi Helpful Commands:
List static routes on ESXi servers:
Add the NFS store back to a ESXi host:
[root@mwvhost1:/etc] esxcfg-nas -a -o 192.168.22.176 -s /local2/ MW12NFS03
Review recent log:
[root@mwvhost1:/etc] tail /var/log/vobd.log
Test a NAS Port:
[root@mwvhost1:/etc] nc -z 10.0.12.176 2049
Test that Jumbo Frames are making it through using Jumbo Ping:
[root@mwvhost1:/etc] vmkping -s 9000 10.0.12.186