component: rust: mudl

2024-03-14

https://cgit.nathanperry.dev/cgit/mudl

This is a work-in-progress port of Jake's MUDL library to Rust.

# Cargo.toml
[dependencies]
mudl = { git = "https://cgit.nathanperry.dev/git/mammothbane/mudl" }

rust: Codec

There are multiple versions of a generic codec concept in Rust crates:

The concept interacts with Rust's Stream and Sink types, which represent sources and sinks (respectively) of values, which may be unbounded in length and can be iterated or submitted to asynchronously.

A Decoder<T> implementation wraps an io::AsyncRead and converts it to a Stream<T>. An Encoder<T> wraps an io::AsyncWrite and converts it to a Sink<T>.

Concretely, we might have Decoder<serde_json::Json> which decodes bytes into the generic json inductive type, or Encoder<String>, a trivial Encoder that encodes strings as (for instance) utf-8 bytes.

This is the abstraction I would like to use for my implementation of mudl, as it's extremely generic and abstracts exactly the concept I'm aiming for. Unfortunately, existing implementations of this type depend on rust's full stanadard library, which isn't generally available on embedded targets.

Week of 3/14/2024

I forked and began rewriting asynchronous_codec to support embedded-io-async's Read and Write traits instead of AsyncRead and AsyncWrite (standard-library-dependent traits) — you can see this in the repo. Unfortunately, the signatures for Stream and Sink aren't readily compatible with the embedded traits' read and write methods. I haven't decided yet if I'm going to adapt them to one another or write a specialized embedded_codec crate instead.

Week of 3/21/2024

This week I realized mudl's approach doesn't agree with my needs, so I decided to diverge from it.

The project I'm writing now (to be renamed) has the same goals (establishing a plug-and-play link over Serial), but uses FEC rather than checksumming and has packet segmentation + reassembly built-in.

I'm using Reed-Solomon as an ECC and COBS still for delimiting frames.

The message encoding stack is as follows:

|           Original data           |
                |
                v

            COBS Encode
                |
                v

|  COBS  |  COBS  |  COBS  |  COBS  | 0 |
    |____________
                v
            Reed-Solomon
            (each frame)
                |
                v

|  COBS  |  Parity  | (one frame)
                |
                v

            COBS Encode
                |
                v

|        COBS        | 0 |

I.e. we split the original packet into COBS subpackets (can be done iteratively), calculate Reed-Solomon for each subpacket, then COBS encode that. This gives a natural way to do segmentation while only requiring a single subframe in memory at any time (applications may choose to fully reassemble) the complete original frame if desired, but this is not required for the protocol to work.

The double-COBS wrapping is an attempt to mitigate corruptions of the sentinel.

The Reed-Solomon ECC overhead can be tuned based on the physical link characteristics to mitigate packet corruption or loss without stateful negotiation or a control channel in the protocol.

I'm still in the middle of implementing this.