Skip to content
Snippets Groups Projects
Verified Commit 82fe0cd7 authored by rahix's avatar rahix
Browse files

feat(epicardium): Add a proper mutex implementation


In the current firmware, different locking mechanisms a littered around
the code-base.  Among them are bare FreeRTOS mutexes and the hw-locks.
The callers for these often specify timeouts but don't make much effort
in A) picking robust values for the timeout and B) recovering gracefully
from a timeout happening.  Most of the time, we return -EBUSY to _Python
code_.  This is really really bad API design.  The firmware needs to
have enough integrity to ensure these situations can't ever occur.

To combat this, add a new locking primitive: The `struct mutex`.  The
intention is to replace all other locking and synchronization APIs with
this one.  This will provide one central place to debug any sort of
locking issues.

The `struct mutex` API is based on a few assumptions about locking.
Those are detailed in `Documentation/epicardium/mutex.rst`, which is
part of this commit.  The most important one is:

    Locking can **never** fail.

By requiring this to be true, we eliminate the need for drivers to
contain (often incorrect) logic for dealing with locking fails.  This
should drastically improve the stability of the firmware in regards to
lock-related bugs.

This commit does not introduce any functional changes yet.

Signed-off-by: default avatarRahix <rahix@rahix.de>
parent ec71acfd
No related branches found
No related tags found
No related merge requests found
Mutex
=====
Ontop of FreeRTOS, we have our own mutex implementation. **Never use the
FreeRTOS mutexes directly! Always use this abstraction layer instead**. This
mutex implementation tries to make reasoning about program flow and locking
behavior easier. And most importantly tries to help with debugging possible
dead-locks.
Design
------
There are a few guiding design principles:
- Mutexes can only be used from tasks, **never** from interrupts!
- Timers can use mutexes, but only with :c:func:`mutex_trylock`, **never** with
:c:func:`mutex_lock` (Because they are not allowed to block).
- Locking can *never* fail (if it does, we consider this a fatal error ⇒ panic).
- No recursive locking.
- An unlock can only occur from the task which previously acquired the mutex.
- An unlock is only allowed if the mutex was previously acquired.
For a more elaborate explanation of the rationale behind these rules take a
look at the :ref:`mutex-design-reasons`.
Definitions
-----------
.. c:autodoc:: epicardium/modules/mutex.h
.. _mutex-design-reasons:
Reasons for this Design
-----------------------
Only from tasks
^^^^^^^^^^^^^^^
Cases where it might make sense to acquire a lock from an IRQ are very hard to
get right. They often involve the lock and unlock being in different functions
and thus the correctness of the code is resting on the assumption that those
functions are called in the correct order, also in respect to the rest of the
related code. By disallowing this entirely, we force code-flow to be more
linear and hopefully easier to reason about.
An exception is with software timers: While they technically *can* acquire a
mutex (using :c:func:`mutex_trylock`), I strongly recommend **not** doing that.
Because :c:func:`mutex_trylock` can fail, doing these kind of timers correctly
is not trivial. Instead, use a task and block it using :c:func:`mutex_lock`.
Locking can *never* fail
^^^^^^^^^^^^^^^^^^^^^^^^
From a technical standpoint this might seem unintuitive. But when looking at
it from the semantics of what it means for a lock to fail, it makes total
sense: There are two cases which could be considered failure of a lock:
1. **A dead-lock condition**. Dead-locks should not be catched by code, but by
review of the code earlier on. Tools like static- or runtime-analysis are
the proper way to detect possible dead-locks. Going the windows-route of
trying to write the OS defensive against itself is not going to end well.
Fix the code which might dead-lock instead of slapping a band-aid ontop.
2. **Another task blocking the lock for a long time**. This is a bit more
tricky: FreeRTOS provides the ability to add a timeout to locking attempts
to catch these cases. But this is also a bad idea. Think about it: What
would you even do after the locking attempt timed out? You cannot access
the mutexed data/peripheral anyway because someone else is still modifying
it. Even if you were to kill the offending task, the data/peripheral might
be in a totally unpredictable state. It is also very hard to define timeout
values that are relaxed enough for all possible code-paths acquiring the
lock. You would have to take situations like the task holding the mutex
being preempted by a long-running high priority task into account. So
instead, we provide :c:func:`mutex_trylock`: If the mutex cannot be
acquired immediately you have the chance to choose an alternative code-path
and try again later.
No recursive locking
^^^^^^^^^^^^^^^^^^^^
While recursive locking might seem to make a few cases easier to deal with, it
carries quite a few implications which make it not worthwhile. It makes
analysis of locking chains a *lot* harder and in some cases even impossible
(for proving 100% correcteness). We should strive for a design which is as
simple as possible and as obvious to understand as possible. Keeping locking
hierarchy flat helps quite a lot with that.
Only unlock from the acquiring task
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This one should be obvious but FreeRTOS does not enforce this ... By stating
that locking can never fail, we also won't ever have a situation where
force-unlocking a forgein lock might be necessary.
Only unlock once after acquisition
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As a pseude-recursive locking implementation, sometimes the
:c:func:`mutex_unlock` call is written in a way that allows unlocking an
already unlocked mutex. But this only encourages bad practices like unlocking
everything again at the end of a function "just to be sure". Code should be
written in a way where the lock/unlock pair is immediately recognizable as
belonging together.
......@@ -52,6 +52,7 @@ Last but not least, if you want to start hacking the lower-level firmware, the
debugger
pycardium-guide
memorymap
epicardium/mutex
epicardium/sensor-streams
.. toctree::
......
......@@ -52,6 +52,8 @@
#define INCLUDE_vTaskDelay 1
#define INCLUDE_uxTaskGetStackHighWaterMark 1
#define INCLUDE_xTimerPendFunctionCall 1
#define INCLUDE_xSemaphoreGetMutexHolder 1
/* Allow static allocation of data structures */
#define configSUPPORT_STATIC_ALLOCATION 1
......
......@@ -14,6 +14,7 @@ module_sources = files(
'light_sensor.c',
'log.c',
'max30001.c',
'mutex.c',
'panic.c',
'personal_state.c',
'pmic.c',
......
#include "modules/mutex.h"
#include <assert.h>
void _mutex_create(struct mutex *m, const char *name)
{
/* Assert that the mutex has not been initialized already */
assert(m->name == NULL);
/*
* The name is just the parameter stringified which is almost always a
* pointer. If it is, skip over the '&' because it adds no value as
* part of the name.
*/
if (name[0] == '&') {
m->name = &name[1];
} else {
m->name = name;
}
m->_rtos_mutex = xSemaphoreCreateMutexStatic(&m->_rtos_mutex_data);
}
void mutex_lock(struct mutex *m)
{
int ret = xSemaphoreTake(m->_rtos_mutex, portMAX_DELAY);
/* Ensure locking was actually successful */
assert(ret == pdTRUE);
}
bool mutex_trylock(struct mutex *m)
{
int ret = xSemaphoreTake(m->_rtos_mutex, 0);
return ret == pdTRUE;
}
void mutex_unlock(struct mutex *m)
{
/* Ensure that only the owner can unlock a mutex */
assert(mutex_get_owner(m) == xTaskGetCurrentTaskHandle());
int ret = xSemaphoreGive(m->_rtos_mutex);
/*
* Ensure that unlocking was successful; that is, the mutex must have
* been acquired previously (no multiple unlocks).
*/
assert(ret == pdTRUE);
}
TaskHandle_t mutex_get_owner(struct mutex *m)
{
return xSemaphoreGetMutexHolder(m->_rtos_mutex);
}
#ifndef _MUTEX_H
#define _MUTEX_H
#ifndef __SPHINX_DOC
/* Some headers are not recognized by hawkmoth for some odd reason */
#include <stddef.h>
#include <stdbool.h>
#include "FreeRTOS.h"
#include "semphr.h"
#else
typedef unsigned int size_t;
typedef _Bool bool;
typedef void *TaskHandle_t;
typedef void *SemaphoreHandle_t;
typedef int StaticSemaphore_t;
#endif /* __SPHINX_DOC */
/**
* Mutex data type.
*/
struct mutex {
/* Name of this mutex, kept for debugging purposes. */
const char *name;
/* FreeRTOS mutex data structures. */
SemaphoreHandle_t _rtos_mutex;
StaticSemaphore_t _rtos_mutex_data;
};
/**
* Create a new mutex.
*
* Call this function as early as possible, in an obvious location so it is easy
* to find. Mutexes should be defined statically so they stay alive for the
* entire run-time of the firmware.
*/
#define mutex_create(mutex) _mutex_create(mutex, #mutex)
void _mutex_create(struct mutex *m, const char *name);
/**
* Lock a mutex.
*
* If the mutex is held by another task, :c:func:`mutex_lock` will block the
* current task until the mutex is unlocked.
*
* .. warning::
*
* This function is **not** safe to use in a timer!
*/
void mutex_lock(struct mutex *m);
/**
* Try locking a mutex.
*
* If the mutex is currently locked by another task, :c:func:`mutex_trylock`
* will return ``false`` immediately. If the attmept to lock was successful, it
* will return ``true``.
*
* This funciton is safe for use in timers.
*/
bool mutex_trylock(struct mutex *m);
/**
* Unlock a mutex.
*
* You **must** call this function from the same task which originally locked
* the mutex.
*/
void mutex_unlock(struct mutex *m);
/**
* Get the current owner of the mutex.
*
* Returns the task-handle of the task currently holding the mutex. If the
* mutex is unlocked, ``NULL`` is returned.
*/
TaskHandle_t mutex_get_owner(struct mutex *m);
#endif /* _MUTEX_H */
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment