embedded

2023-09-27

dev board: esp32-c3 xiao

note: reviewing this week, I realized I didn't demonstrate a video of the board working in this documentation. Please see networking for a video of a later version of this board design working.

I took this week in the direction of doing two implementations in two different languages.

Source code listings are included as a tarball here.

rust

I wrote a rust implementation first using Espressif's esp-rs tooling.

My first approach here was the esp_idf_hal crate¹, which is implemented on top of Espressif's C SDK (ESP-IDF). This was easy to get up and running -- I was able to blink an LED and poll httpbin to verify WiFi connectivity, while outputting to a terminal via the virtual USB serial port.

This took some doing, but the overhead was much lower than for the original ESP32 or the S series, as the upstream Rust compiler has support for RISC-V, but not for Xtensa. You still do need to install esp-idf itself in order to build, which is annoying, because it's a huge tentacle monster of git submodules and custom build tooling. Rust build scripts in the esp_idf_hal crate mostly handle this, but I run NixOS, so my system is as a rule not amenable to random build processes being able to mutate the system, install their own dependencies, and also assume that bash lives in /bin/. I was able to hack it in eventually by excising several heuristic checks in the esp-idf build scripts that didn't believe that they weren't already in a virtualenv.

In other news, I discovered some nice tooling here -- cargo-espflash gives a single-command compile-upload-and-monitor without also forcing you to live with a whole monolithic blob of dead weight like PlatformIO. So nice to have tooling developed by people who believe in clear, decoupled interfaces; Unix philosophy, etc. etc.

part 1.5: pure rust

esp_idf_hal was not particularly satisfying -- it is, for one thing, mostly C. While much of the world is built on C, I prefer Rust if I can get it, and there is a pure-Rust implementation of most everything esp_idf_hal gives you in esp32**_hal (in our case esp32c3_hal, though there is a crate for each Espressif MCU model). Furthermore, Rust has a burgeoning community of embedded libraries and projects that actually attempt to interoperate with each other.

esp_idf_hal does make an attempt to participate, but I didn't find it that effective -- ideally, it would handle chip-specifics and then get out of your way to make using the generic interfaces as usable as possible, but Espressif seems more interested in implementing a whole HTTP stack from scratch on their own instead, and then having you use that. Not really my cup of tea -- I am desperately hoping for a world where I can use the same HTTP client on a Linux environment as on an MCU.

Main, relevant open-source Rust projects:

embedded-rust is a github org with a set of interface crates that abstract different parts of the computational stack
- embedded-hal{,-async} (abstracting hw peripherals)
- embedded-io{,-async}
- embedded-nal{,-async} (Network Abstraction Layer)
- embedded-time
embassy: embedded async runtime, notably including
- embassy-executor (async executor itself)
- embassy-sync (synchronization primitives -- mutexes, channels)
- embassy-net (network abstraction)
smoltcp: runtime-agnostic/userspace network stack

The holy grail is to be easily able to develop a portable embedded program (as in: nearly the whole body of the program is entirely, character-by-character, identical on an ESP32 and an RP2040) that uses an async executor to schedule network I/O, disk/flash I/O, and peripheral access.

In the scheme of things, we're close, with Rust. smoltcp is parametric over both its physical medium (Ethernet, IP packets (virtual medium), etc.) and the implementation -- I can write a new raw socket interface and it will work fine. embassy-net and embedded-nal are themselves generic over their tcp and udp socket types -- even the DNS implementation. embedded-hal gets us peripheral bus access that is abstracted from the MCU -- you can actually write a generic driver in Rust against a clearly- and well-defined interface.²

I mention all of this because I received a painstakingly clear picture of where things stand for the ESP32C3. Things are close, but the edges are still very rough. It is posssible to write a program that uses an asynchronous executor to do HTTP requests and blink LEDs and at least try to connect to an MQTT broker all at the same time. However, the HTTP requests take tens of seconds each, TLS doesn't work, and I never did get the MQTT broker to connect.

The fact that it works at all, however, gives me hope -- that such a thing works at all means that this highly distributed and mutually-abstracted implementation is mostly right.

Anyway, some other things I ran into:

const HEAP_SIZE: usize = 64 * 1024;
static mut HEAP_MEM: [u8; HEAP_SIZE] = [0u8; HEAP_SIZE];

#[global_allocator]
static HEAP: Heap = Heap::empty();

fn main() -> Result<(), _> {
    HEAP.init(unsafe { HEAP_MEM.as_ptr() } as usize, unsafe { HEAP_MEM.len() });

    // elided
}

We're looking at the initialization of my heap data structure and its backing memory. I expected this to work, but in fact I was getting clear memory corruption (behavior-dependent, occasional-crashes, store/load exception messages, etc.). Thought it could be the stack, but it didn't actually seem to be.

The actual solution to the problem was this:

SECTIONS {
    .heap (NOLOAD) : ALIGN(4) {
        __heap_start = .;
        . += 128K;
        __heap_end = .;
    } > RWDATA
}
INSERT AFTER .stack_end;

extern "C" {
    static __heap_start: u8;
    static __heap_end: u8;
}

#[global_allocator]
static HEAP: Heap = Heap::empty();

fn main() -> Result<(), _> {
    unsafe {
        let start = (&__heap_start).as_ptr();
        let end = (&__heap_end).as_ptr();


        HEAP.init(start, end - start);
    }


    // elided
}

Theoretically, HEAP_MEM in my first example should have statically allocated the space I requested, meaning no other symbol should have intruded upon it. If the problem was running out of space, the heap data structure would fail mallocs, and I'd seen that in the logs for other reasons before, so I had been fairly confident that wasn't happening.

Instead, I'm pretty sure this was UB -- there was nowhere HEAP_MEM was actually getting used for anything, so the Rust compiler wasn't able to reason about it and simply didn't reserve the memory in .bss. Specifically, HEAP_MEM.as_ptr() as usize totally erases any association to the backing array by degrading to an integer type. Hence, mallocs were stomping over other memory.

I'm fairly confident in this because after using the linker to reserve the space, the problem went away -- no more corruption.

`defmt`

This is a nice embedded logging utility available in Rust that I discovered as part of this implementation. It's essentially a very aggressive string-interner that shoves the interned strings into a debug section and replaces them in the actual binary with their indices -- when the binary gets flashed, there's very little overhead from logging (code size, or runtime memory/I/O). You need the original binary in order to decode the logs, but defmt comes with a tool defmt-print that you can pipe them through. It supports interpolation and does compression on top of that, too.

elixir / erlang on atomvm

I have mixed feelings about erlang/elixir because of their typing-indecisiveness. They don't give you many tools, as languages, to prove correctness -- in an environment like erlang that's intentionally designed to encourage dynamism of behavior, this spooks me more than a little bit.

However, I also in my heart believe that actor systems are essentially correct as a way to build distributed systems (at least up to hard performance requirements). WiFi-enabled microcontrollers represent an environment that is not terribly computationally demanding (if you really cared about power (hence computational overhead), you wouldn't be using WiFi) and could benefit from this model.

Maybe most attractive here is that all data in Erlang is represented as a single inductive type (term), which can be binary-encoded and decoded with a single function call (:erlang.term_to_binary, :erlang.binary_to_term) -- you get this for free, for all data, always -- so the overhead of communicating to remote nodes is very low. Similarly it's two function calls to send a UDP packet:

{:ok, sock} = :gen_udp.open(port)
:gen_udp.send(sock, ip, port, data)

The socket then even sends data to your processes as messages:

defmodule MyModule do
    use GenServer

    # ... elided ...

    def handle_info({:udp, sock, addr, port, packet}) do
    # ... elided ...
    end
end

The model is really so straightforward -- it captures something deep.

atomvm

I had come across AtomVM probably a couple of years ago looking at embedded projects and wanted to try it but never found a fit. I saw it supported ESP32 and figured I'd give it a shot for this week.

This was not easy. AtomVM is alpha-readiness, open-source software and to my knowledge not commercially-supported. It builds off of esp-idf (discussed above), which I luckily already had installed. Notionally, it also supports esp32c3 (even though they mostly explicitly talk about the original esp32), but the installer images don't work if you follow their instructions directly.

For some background, AtomVM is a binary firmware image that you can load into flash. It won't do anything, though, unless you put a .avm (AtomVM precompiled Erlang bytecode) image at a different flash offset. Then (after a reboot) the base image loads the bytecode and runs it -- you're off to the races.

Now that we have this context, we can get to the incorrect instructions. You may imagine it's critical that you load each of these images at the correct offset so that the first one is picked up by the Espressif bootloader and the second one is picked up by the first. The base offset for the esp32 is 0x1000. For all other models of esp (as far as I can tell) there is no offset -- load to 0x0. The default tooling and all the instruction say to put the image at 0x1000.

I didn't figure out this was the problem until I had cloned atomvm and fully scratch-rebuilt freshest master, where the flashimage.sh script for the esp32c3 platform actually shows (once you build it) that its offset is 0x0.

After I did this I was able to run erlang/elixir on the MCU. This was nice, but it was followed up by several hours of tracing back exactly how AtomVM differs from BEAM/OTP (the canonical erlang distribution, runtime, and standard library).

There were many subtle features I discovered here, like that the AtomVM folks didn't implement :gen_server.start_link/4, but they did implement :gen_server.start_link/3 and the name registry. You can call :gen_server.start_link/3 and then call :erlang.register/2 with the same name you would have passed to start_link/4, and it's as if you had called start_link/4. Why there isn't just a start_link/4 perhaps I'll never know.

elixir

Elixir seems to mostly have been added to AtomVM as an afterthought. There's almost no overhead to doing so, as the bytecode is the same, but almost none of the Elixir standard library works. This is fine, as it turns out, as most of the Elixir standard library is just an opinionated wrapper around the Erlang core.

In the abstract, I knew this, but I hadn't been so aware of how little extra Elixir gives you until I had to go without most of it. It's mostly trying to be magical and that ends up being confusing -- I gained a much clearer appreciation for what the "actual" Erlang API looks like as a part of this.

I did some patching by copy-pasting parts of the Elixir standard library into my project, then stripping out anything not supported by the AtomVM runtime. As a result, I have my own GenServer and Supervisor that work on AtomVM, though I ended up just using :supervisor directly from erlang because the implementation is really not complete on the AtomVM side -- there's a lot hard-coded -- so adapting Supervisor to all possible use-cases is a pain.

result

The final functional result for this implementation are some indicator LEDs that show the WiFi status (and NTP -- AtomVM will grab this for you as part of the network configuration), plus one that's controlled over a UDP socket. I wrote a UDP client also in Elixir (it's pretty much the 2 lines from above) and they just pass terms directly via binary. Really the easiest RPC I've ever done, and it'd be trivial to wire more actors into the system.

crate = Rust package.

Note: by contrast, I do not consider Arduino to be clean or well-defined.