Operating Systems / RTOS

An operating system (kernel) is a software system that arbitrates hardware resources (processor time, I/O peripherals, storage, etc.) and may enforce isolation constraints. This is a general definition that doesn’t necessarily look like a desktop environment.

For instance, Linux is specifically the OS kernel and has nothing to do with the userspace environment (e.g. your Ubuntu desktop, vtty, login program, init process (systemd), etc.). You can configure the Linux kernel to completely disable graphics, the concept of separate users, filesystems, and most hardware peripherals — this is still Linux. All it needs to do to serve its function as an OS is isolate processes and arbitrate access to the hardware.

An RTOS is also an operating system, but much more minimal and generally designed for microcontrollers. They are designed for stability above all else — an RTOS should be suitable for use in e.g. a medical device or industrial equipment, where minimal latency and consistent execution characteristics are paramount.

Typically, an RTOS runs one or more user applications, which contain tasks (comparable to OS threads) that may execute concurrently at different priorities. High-priority tasks run whenever possible, even to the point of starving out low-priority tasks. This is to permit latency-, performance-, and safety-sensitive code to be consistently prioritized above less-important work.

OS Fundamentals

Concurrency / Parallelism

Short version: overlapping execution is concurrency, multicore is parallelism. You can have one without the other.

Neither concurrency nor parallelism: strict single-core
Concurrency, no parallelism: time-slicing / cooperative or preemptive multitasking
Paralellism, no concurrency: only arguably if nothing is shared between parallel processes (AMP, pinning)
Paralellism + concurrency: SMP — OS schedules tasks to different cores, work-stealing

Race conditions; synchronization — need ways to protect resources and synchronize tasks between cores.

Threads, Tasks, Scheduling

They’re called threads generally when you have a full general-purpose OS, tasks generically or on RTOS. I’ll use ‘task’ going forward.

The problem solved by multitasking is that of maintaining multiple simultaneous code execution points that can make progress concurrently. How? Store all the registers from a task when you suspend it, load all the registers saved by another task to context switch. Set up a hardware timer, and use it prompt you to periodically switch tasks.

This is preemptive multitasking — the tasks don’t necessarily consent to the context switch. By contrast, cooperative multitasking is implemented entirely by tasks volunteering to yield — usually because they’re waiting (blocking) for something to happen (I/O, timers, IPC, lock acquisition).

Once the OS has control back from a task, its job is to figure out who to run next. This is scheduling. Considerations include:

Fairness: every other task should have a chance to make progress before a given task is rerun
Readiness: blocking tasks are not candidates for execution
Priority: on an RTOS, higher priority tasks absolutely trump lower-priority ones. On a general-purpose OS, preference is given to higher-priority tasks — they execute more often and for more time.
Time budget / work type: CPU-bound tasks tend to take up their entire time-slice without yielding, where I/O-bound tasks tend to yield cooperatively. Heuristically, it’s often beneficial to grant extra time to historically I/O-bound processes to try to get them to block, as this means they have sent I/O requests and are “productively waiting”. CPU-bound tasks, by contrast, don’t have opportunity cost in this way.

Interrupts / Exceptions / Priority

“Normally”, a processor is runs in what is sometimes called thread mode, i.e. there is nothing special happening, and you’re just running application code. But, especially in an MCU, oftentimes there are events to respond to with tight timing deadlines — e.g. you need to do motor control on an interval, or you’re bitbanging some interface over GPIO, or you want to respond to an error condition and kill power immediately. Rather than polling your I/O registers in a big loop, we have the abstraction of interrupts, which interrupt normal execution when an event happens, and enter an interrupt handler or interrupt service routine (ISR), which are implemented as user-specified functions (though RTOSes or vendor HALs will often intercept or wrap some of these (timer alarms, peripheral I/O, etc.)).

What can trigger an interrupt handler is platform-dependent, but you’ll commonly see:

Hardware timers
Hardware peripherals
- Discrete inputs (rising/falling edge)
- DMA completion
- Bus readiness
Exceptions: illegal instructions, software traps, division by zero, etc. (usually nonmaskable)

Interrupts have priorities — if multiple interrupts are triggered at the same time, the interrupt with the highest logical priority will execute, and then the next-highest, and so on. Higher-priority interrupts will also interrupt other interrupt handlers. Priorities are configurable to some extent, depending on the platform. On Arm, the interrupt controller is handled by the NVIC (nested vectored interrupt controller), though your RTOS will usually have functions to configure interrupt levels for you.

It’s critical to minimize and bound the amount of work done in interrupt handlers — while you’re in one, you’re blocking execution for everything else on the processor that’s trying to run. Because of this constraint,

Critical Sections

A critical section in the general sense is a section of code that must run atomically to avoid race conditions. In the context of embedded computing, critical sections are implemented by masking (disabling) interrupts, such that it’s impossible to be preempted or have an interrupt run. They should be used sparingly, and instruction count should be kept low and bounded (generally, don’t write loops across unbounded, dynamically-specified amounts of data in a critical section). Interrupts have overriding priority for a good reason, so only use critical sections if you truly need to.

If you need protection from other tasks potentially accessing resources you need, use synchronization primitives (semaphores, mutexes, etc.) discussed below, not critical sections. The synchronization primitives sometimes use critical sections internally for their implementations, but are already designed to make minimal use of them.

Semaphores

Semaphores protect a specified limited number of resources and are typically implemented by integers with hardware atomic compare-and-swap where available (otherwise critical sections).

Acquiring a semaphore requires at least one resource to be available, and doing so decrements the number of available resources by one. If no resources are available, you have to wait.

Releasing the semaphore increments the count by one, allowing someone else to take your “slot”.

A semaphore with a maximum count of 1 (binary semaphore) is functionally equivalent to a mutex.

Mutexes

A mutual exclusion lock. Protects a resource so that only one execution context (thread/task/async context/etc.) can access it a time.

Some pointers:

A common pattern is the “condition variable” (condvar), where you are waiting for a condition to be the case, and only then do you want to enter the mutex. That might look like this:

extern Mutex_t* MUTEX;
extern uint32_t* VALUE;

extern bool is_in_range(uint32_t);

// There's some other task updating VALUE and I want to know when it's in an acceptable range,
// then take some action without letting the other task make progress.
void do_something() {
    while (true) {
        // Check first (before acquiring the lock) to see if we should even try.
        if (!is_in_range(*VALUE)) goto wait;

        // Optimistically acquire the lock if we think the value is in range.
        if (!acquire(MUTEX)) goto wait;

        // I checked _outside_ the mutex protection -- the value may have changed, so I need to
        // recheck and release the lock if the condition failed.
        if (is_in_range(*VALUE)) break;

        // Condition failed, retry.
        release(MUTEX);

    wait:
        delay(10);
    }

    // mutex acquired
    // elided: do some work

fini:
    release(MUTEX);
}

What I want to get across is that you can never assume that two lines of code execute sequentially with no interrupts or other threads intervening. You may be preempted, 10,000 interrupts may fire in sequence — you may be waiting for ten minutes between checking the value and acquiring the mutex. No guarantees with concurrency — if something can go wrong, it will, eventually. When you’re executing hundreds of millions to billions of instructions per second, eventually is usually not very long.

Keep a consistent mutex acquisition order if you need to nest multiple mutexes. This generally helps avoid deadlock:

static Mutex_t MUTEXES[2];
static uint32_t VALS[2];

void thread0() {
    while(true) {
        blocking_acquire(&MUTEX[0]);
        VALS[0]++;

        blocking_acquire(&MUTEX[1]);

        VALS[1] += 1 + VALS[0];

        release(&MUTEX[1]);
        release(&MUTEX[0]);
    }
}

// Do the inverse order of thread 0.
void thread1() {
    while(true) {
        blocking_acquire(&MUTEX[1]);
        VALS[1]++;

        blocking_acquire(&MUTEX[0]);

        VALS[0] += 1 + VALS[1];

        release(&MUTEX[0]);
        release(&MUTEX[1]);
    }
}

Exemplar execution order:

Thread 0 locks mutex 0
Thread 1 locks mutex 1
Thread 0 waits on mutex 1 (held by t1)
Thread 1 waits on mutex 0 (head by t0)
Deadlock — both threads are stuck and wait forever.

If you flip the mutex acquisition order in either thread (and move the first add appropriately), this can’t happen.

Flags / Event / Signal

A means of waiting until some event happens, usually triggered by an interrupt or an event in another task.

Pseudocode:

static Signal_t SIGNAL;

static uint32_t SHARED_STATE;

void task0() {
    while (true) {
        do_some_work();
        SHARED_STATE += 1;

        signal_notify(&SIGNAL);
    }
}

void task1() {
    while (true) {
        signal_wait(&SIGNAL);
        uint32_t latched_state = SHARED_STATE;

        do_some_other_work(latched_state);
    }
}

Queues / Channels

Queues (sometimes called channels) are a way to provide inter-process communication and synchronization. In an RTOS context, these are not the plain LIFO data structure you implemented in your intro Java course — they involve synchronization to permit values to be passed between tasks.

Functionally, queues permit decoupling the production of values from their consumption, so e.g. I can write a task that reads packets off of a serial stream from a hardware peripheral and puts them in a queue without worrying about how those values are going to be consumed.

Separately, I can write a task that consumes values from a queue and does something with them. I can write those tasks independently, and substitute either implementation at any point. This provides an enormous amount of flexibility that traditional shared-memory-type synchronization makes much more difficult.

Generically, I would say a queue is for passing structured data of known maximum size (maybe a complete parsed packet), a command, etc. Queue architectures also often have something like a “stream” type, which is for streaming data instead — rather than managing buffers yourself to parse packets, every time you read from a UART, just push all of the data into the stream, then do the parsing separately by treating it as a monolithic/uninterrupted stream of bytes.

RTOS

FreeRTOS

Was its own thing for some amount of time, then taken on by Amazon. Still free and open, all written in C.

Fairly bare-bones, ergonomics aren’t great, single-application-tenancy, but it has a small footprint and good, established reputation.

Many Arduino cores (e.g. ESP32) are implemented on FreeRTOS.

Zephyr

Nice ergonomics, extensive ecosystem with support for many MCU platforms and drivers for bus peripherals. Written in C. Very easy to extend with your own drivers. Haven’t used myself, but very compelling for greenfield projects requiring RTOS.

Supports deploying multiple modular applications.

nRF Connect SDK (Nordic’s vendor SDK) is a fully-Zephyr project.

RTIC

This is the main Rust RTOS project :).