concept: interstice: embedded, phy-agnostic meshing + ip routing

2024-03-14

library link: https://cgit.nathanperry.dev/cgit/interstice (placeholder currently)

As Jake has written about, there is a lack of easy networking / communication on embedded, prompting his implementation of MUDL (and my Rust port).

To motivate more directly: it is frankly a pain in the ass to make a microcontroller talk to anything, and communication is normally at some layer ad-hoc. As a result, nothing is reused — we don't get the nicely-layered abstraction that the OSI model suggests we should, and I reimplement a COBS encoder / decoder, message buffering, etc. in pretty much every embedded project I work on. Partially that's my problem for not standardizing my own work, but I'm also pointing a finger at the embedded world for not having any networking standards outside of microcontrollers with wireless stacks — I'm generally pretty good about using nice things when they exist.

To some extent, this problem is a function of working in the embedded world. There's a tendency to see the embedded environment as categorically different from programming with an OS available. This makes sense if your first thought when you hear the word "embedded" is a PIC-8 — I can't afford a networking stack when I'm working with at most 4KB RAM.

However, we now live in a world where you can easily get megabytes of both flash and RAM on low-cost (< $10) MCUs — that's enough to store the entire disk and process image of a regular OS executable without stripping or size optimization. We can afford to have some nice things.

In particular, I would like to have a networking stack that works with regular IP — you know, what the internet is made of. And when I say "works", I don't mean supports-being-a-leaf in the way that all the wireless networking stacks do. Rather, I mean I want to implement a routed, PHY-agnostic IP network. For IP to work as-designed, you need to be able to route packets.

aside: storage cost

I write most of my firmware in Rust, which has many of the nice bells and whistles you'd expect in an OS environment as libraries. This costs memory (on the order of hundreds of KiB flash, dozens of KiB RAM), but even so, I'm typically not space-constrained.

If storage space is a concern though, flash and RAM are cheap — if it's going to make firmware development easier, I lean heavily towards just adding more space to the board, even at the shocking cost of a whole additional couple dollars.

goal

I want to be able to power on an MCU and have it automatically advertise itself as an IP-capable router over all communication interfaces:

USB: CDC-NCM, optionally CDC-ACM as serial-ppp
UART, I2C, SPI, CAN: ppp
WiFi: native
Bluetooth, LoRa, Thread: 6LoWPAN / native Thread

By router I mean it should support:

iBGP
MLD / IGMP
ICMP{,v6}
ARP / NDP (as necessary -- most of these are point-to-point)
IP{,v6}

update (4/4/24): routing

I was thinking about proposing some version of this project for an nlnet grant and so have been reading more on prior art. There are really two components to the project: the generic network-interface implementation and bridging and routing. There is extensive prior art on the routing side that I was no more than peripherally aware of. The term of art for this problem is MANET (mobile ad-hoc network), and routing implementations differ in a few dimensions — most notably between reactive and proactive paradigms, and in node-local state overhead. For my use-case, a proactive routing algorithm that does not require global network knowledge is ideal, as I value both low latency and a minimal memory footprint.

I have considered both BATMAN and Babel in some depth.

BATMAN(-advanced) operates at layer 2 and has an in-tree Linux kernel implementation. Its goal as described in its draft RFC is to discover the best next-hop for each route, rather than computing the route in its entirety. I'm hopeful that this means that memory overhead can be kept small, in accordance with the goal of operating on embedded microcontrollers.

Babel was interesting mostly because it has a reference implementation that isn't tied to the Linux kernel and states that it has a low code-size. It's a hybrid MANET implementation that seems (by reputation) to be fairly efficient, has nice loop-avoidance capabilities, and is designed to respond favorably during reconvergence events.

Based on the RFCs, BATMAN appears more suited to my needs and easier to implement, so I intend to port it to Rust, then expose an Arduino interface to the Rust library.

meshing

The system should support automatic, provisioning-free/zero-configuration meshing. That is, every device that can see any other device should connect to it and route packets for it.

In order to efficiently make use of routing tables, the network should ideally converge to an addressing scheme that divides subnets based on locality and connectivity. Graph regions with low diameter should be seen as candidates for subnet division — what we want is for this graph:

A - B - C
    |
    D
    |
E - F - G

to "understand" that the addresses for A, B, C should be in one subnet, and E, F, G should be in another, so that D can serve as an edge router between them.

To rephrase the problem in the converse sense, if we randomly assign addresses, we should expect a high degree of fragmentation in the routing tables. In the limiting / pathological case, every node's routing table contains the whole network topology — the benefit of dynamic destination routing vanishes. So we need some spatial organization of the node addressing.

I suspect such an addressing scheme can be achieved with a locally-greedy algorithm, but the price we pay is that existing node addresses may change when new nodes are added to the network. We can address this algorithmically and mitigate the impact by having old addressing information survive network reorganization for a limited period of time.

prior art

Zephyr
- Has a nice model for abstracting the L2 networking interface and allowing pluggable implementations. I don't use zephyr, but it's compelling and a great model for portability for OS concepts onto embedded platforms.
- Has CONFIG_NET_ROUTING, which unfortunately appears to be unused ("something would have to populate the routing table" — doesn't give me confidence that it's stable/definitely working)
Rust
- smoltcp: no_std-compatible userspace networking stack (name misleading -- not just TCP!)
- embedded-nal, embedded-nal-async: no_std-compatible interface traits
- embassy-net and associated drivers: no_std networking stack built on top of smoltcp, intended for use within the embassy project
  - Associated drivers include embassy-net-ppp, which runs a networking link over
Meshing

plan / project components

Meshing experiments in simulation
- Candidate meshing implementation -- develop addressing algorithm
- Simulate a bunch of agents connecting, disconnecting
- Measure network throughput, packet loss, etc.
IP transport over multiple PHYs
- Currently planning to use embassy-net-ppp
- Validate functionality across multiple PHYs
  - Highest priority: SPI, UART, I2C
  - CAN if possible
Router implementation
- Rust library to act as an IP router — own multiple smoltcp interfaces, route packets across them