A Small Connection Headache with an Intranet Host
A while ago at work, I ran into a slightly annoying issue. From my Mac laptop, which was on the company intranet, I used SSH to connect to another machine on the same intranet, a DGX Spark. After the connection had been working normally for a while, say one or two hours, it would suddenly drop. At first I wondered if the host had gone to sleep, so I operated the host locally and tested the connection again. Once I did that, the connection between my laptop and the host would recover.
After trying this a few times, I came up with a temporary workaround: whenever my laptop started getting Request timeout from ping host IP, I would go operate the host directly and ping my laptop from there. As soon as I did that, my laptop’s connection to the host would immediately go back to normal. It worked every single time, but only for about one or two hours before I had to do it again.
sequenceDiagram
participant Mac as Mac laptop
participant Host as DGX Spark host
Mac--xHost: SSH drops / ping times out
Note over Host: Operate the host locally
Host->>Mac: Ping my Mac from the host
Mac->>Host: Connection recovers My teammates were using Windows, and I was the only one on a Mac. They said they had not run into this problem. On top of that, I did not have access to the company’s router configuration, so I could not see the lower-level routing details of the intranet. After going back and forth with GPT many times, the best guess was that Windows and macOS handle the ARP cache differently, which led to different connection behavior.
So I spent a while stuck in this endless loop: “Ah, it is disconnected again ⭢ plug a keyboard, mouse, and monitor into the host ⭢ ping my laptop ⭢ connection works.” But this approach was way too troublesome. After discussing it with GPT, it suggested another method: write the host’s IP / MAC address mapping directly into my laptop, so macOS would no longer need to rely on dynamic ARP lookup. One thing to watch out for, though: at one point, I switched the host to wired networking to make downloading LLM models faster. Then I realized that when the host used Wi-Fi versus wired networking, it could receive a different IP, and the MAC address would also be different. That mapping also had to be added before I could connect.
flowchart LR
host["<div class='dgx-gold-metallic'>DGX Spark host</div>"]
subgraph laptop["Mac laptop"]
arp["Add the host mapping to the laptop<br/>Host IP + MAC address"]
end
laptop -->|Find the host directly and connect| host
style laptop fill:#e8f2ff,stroke:#3b82f6,color:#1e3a8a
classDef dgx fill:transparent,stroke:transparent,color:#3f2a00
class host dgx Not long after that, I found an even more convenient solution: connecting through the host’s mDNS name. In other words, the host broadcasts its .local hostname through mDNS on the local network. As long as the device is on the same local network, it can connect to the host directly with that name. This completely solved the problem for me. Later, I realized that the .local name of the DGX Spark was printed right on the cover of its manual. The answer had been right in front of me the whole time, haha.
flowchart LR
subgraph lan["Same local network"]
host["<div class='dgx-gold-metallic'>DGX Spark host</div>"]
mdns(("mDNS multicast<br/>Broadcasts hostname and IP"))
laptop["Mac laptop"]
other1["Other device"]
other2["Other device"]
end
host -.->|Broadcasts| mdns
mdns -.->|Devices on the same network can discover it| laptop
mdns -.-> other1
mdns -.-> other2
laptop -->|Connect with the .local hostname| host
classDef mac fill:#e8f2ff,stroke:#3b82f6,color:#1e3a8a
classDef dgx fill:transparent,stroke:transparent,color:#3f2a00
class laptop mac
class host dgx Postscript: I originally thought setting up mDNS would be the end of this issue. But later, after the host’s driver version was updated, the disconnect problem disappeared too. All the workarounds above could have been skipped. It was a completely unexpected, brute-force fix 🙃.