diff --git a/Documentation/epicardium/mutex.rst b/Documentation/epicardium/mutex.rst new file mode 100644 index 0000000000000000000000000000000000000000..928730b089b3caba9a3d33f0e6f4a762834031ea --- /dev/null +++ b/Documentation/epicardium/mutex.rst @@ -0,0 +1,136 @@ +Mutex +===== +Ontop of FreeRTOS, we have our own mutex implementation. **Never use the +FreeRTOS mutexes directly! Always use this abstraction layer instead**. This +mutex implementation tries to make reasoning about program flow and locking +behavior easier. And most importantly tries to help with debugging possible +dead-locks. + +Design +------ +There are a few guiding design principles: + +- Mutexes can only be used from tasks, **never** from interrupts! +- Timers can use mutexes, but only with :c:func:`mutex_trylock`, **never** with + :c:func:`mutex_lock` (Because they are not allowed to block). +- Locking can *never* fail (if it does, we consider this a fatal error ⇒ panic). +- No recursive locking. +- An unlock can only occur from the task which previously acquired the mutex. +- An unlock is only allowed if the mutex was previously acquired. + +For a more elaborate explanation of the rationale behind these rules take a +look at the :ref:`mutex-design-reasons`. + +Definitions +----------- +.. c:autodoc:: epicardium/modules/mutex.h + +.. _mutex-design-reasons: + +Reasons for this Design +----------------------- + +Locking can *never* fail +^^^^^^^^^^^^^^^^^^^^^^^^ +This might seem like a bold claim at first but in the end, it is just a matter +of definition and shifting responsibilities. Instead of requiring all code to +be robust against a locking attempt failing, we require all code to properly +lock and unlock their mutexes and thus never producing a situation where +locking would fail. + +Because all code using any of the mutexes is contained in the Epicardium +code-base, we can - *hopefully* - audit it properly behaving ahead of time and +thus don't need to add code to ensure correctness at runtime. This makes +downstream code easier to read and easier to reason about. + +History of this project has shown that most code does not properly deal with +locking failures anyway: There was code simply skipping the mutexed action on +failure, code blocking a module entirely until reboot, and worst of all: Code +exposing the locking failure to 'user-space' (Pycardium) instead of retrying. +This has lead to spurious errors where technically there would not need to be +any. + +Only from tasks +^^^^^^^^^^^^^^^ +Locking a mutex from an ISR, a FreeRTOS software timer or any other context +which does not allow blocking is complicated to do right. The biggest +difficulty is that a task might be holding the mutex during execution of such a +context and there is no way to wait for it to release the mutex. This requires +careful design of the program flow to choose an alternative option in such a +case. A common approach is to 'outsource' the relevant parts of the code into +an 'IRQ worker' which is essentially just a task waiting for the IRQ to wake it +up and then attempts to lock the mutex. + +If you absolutely do need it (and for legacy reasons), software timers *can* +lock a mutex using :c:func:`mutex_trylock` (which never blocks). I strongly +recommend **not** doing that, though. As shown above, you will have to deal +with the case of the mutex being held by another task and it is very well +possible that your timer will get starved of the mutex because the scheduler +has no knowledge of its intentions. In most cases, it is a better idea to use +a task and attempt locking using :c:func:`mutex_lock`. + +.. todo:: + + We might introduce a generic IRQ worker queue system at some point. + +No recursive locking +^^^^^^^^^^^^^^^^^^^^ +Recursive locking refers to the ability to 'reacquire' a mutex already held by +the current task, deeper down in the call-chain. Only the outermost unlock +will actually release the mutex. This feature is sometimes implemented to +allow more elegant abstractions where downstream code does not need to know +about the mutexes upstream code uses and can still also create a larger region +where the same mutex is held. + +But exactly by hiding the locking done by a function, these abstractions make +it hard to trace locking chains and in some cases even make it impossible to +create provably correct behavior. As an alternative, I would suggest using +different mutexes for the different levels of abstraction. This also helps +keeping each mutex separated and 'local' to its purpose. + +Only unlock from the acquiring task +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Because of the above mentioned mutex locking semantics, there should never be a +need to force-unlock a forgein mutex. Even in cases of failures, all code +should still properly release all mutexes it holds. One notable exceptions is +``panic()``\s which will abort all ongoing operations anyway. + +Only unlock once after acquisition +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Justified with an argument of robustness, sometimes the :c:func:`mutex_unlock` +call is written in a way that allows unlocking an already unlocked mutex. But +robustness of downstream code will not really be improved by the upstream API +dealing with arguably invalid usage. For example, this could encourage +practices like unlocking everything again at the end of a function "just to be +sure". + +Instead, code should be written in a way where the lock/unlock pair is +immediately recognizable as belonging together and is thus easily auditable to +have correct locking behavior. A common pattern to help with readability in +this regard is the *Single Function Exit* which looks like this: + +.. code-block:: cpp + + int function() + { + int ret; + mutex_lock(&some_mutex); + + ret = foo(); + if (ret) { + /* Return with an error code */ + ret = -ENODEV; + goto out_unlock; + } + + ret = bar(); + if (ret) { + /* Return the return value from foo */ + goto out_unlock; + } + + ret = 0; + out_unlock: + mutex_unlock(&some_mutex); + return ret; + } diff --git a/Documentation/index.rst b/Documentation/index.rst index 44ecfcd04f8fb330aea3ae070643895bc70d9511..2d51547a1def13ebe0239310f33e07809ac63836 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -52,6 +52,7 @@ Last but not least, if you want to start hacking the lower-level firmware, the debugger pycardium-guide memorymap + epicardium/mutex epicardium/sensor-streams .. toctree:: diff --git a/epicardium/FreeRTOSConfig.h b/epicardium/FreeRTOSConfig.h index b731f6a9abdee7556f073936d86aba649e72a8d1..22c6dfb0f050c38c2d33b7f1d64b1e7940d8ea97 100644 --- a/epicardium/FreeRTOSConfig.h +++ b/epicardium/FreeRTOSConfig.h @@ -52,6 +52,8 @@ #define INCLUDE_vTaskDelay 1 #define INCLUDE_uxTaskGetStackHighWaterMark 1 #define INCLUDE_xTimerPendFunctionCall 1 +#define INCLUDE_xSemaphoreGetMutexHolder 1 + /* Allow static allocation of data structures */ #define configSUPPORT_STATIC_ALLOCATION 1 diff --git a/epicardium/modules/meson.build b/epicardium/modules/meson.build index 022397ca26e3866ef9a72472a7621e4cd6dbfec2..8ead5e72c4cc80e3fe0601d713784028cc16645f 100644 --- a/epicardium/modules/meson.build +++ b/epicardium/modules/meson.build @@ -14,6 +14,7 @@ module_sources = files( 'light_sensor.c', 'log.c', 'max30001.c', + 'mutex.c', 'panic.c', 'personal_state.c', 'pmic.c', diff --git a/epicardium/modules/mutex.c b/epicardium/modules/mutex.c new file mode 100644 index 0000000000000000000000000000000000000000..9d58eec080977d93f69f275350a5101a15d4a64b --- /dev/null +++ b/epicardium/modules/mutex.c @@ -0,0 +1,55 @@ +#include "modules/mutex.h" + +#include <assert.h> + +void _mutex_create(struct mutex *m, const char *name) +{ + /* Assert that the mutex has not been initialized already */ + assert(m->name == NULL); + + /* + * The name is just the parameter stringified which is almost always a + * pointer. If it is, skip over the '&' because it adds no value as + * part of the name. + */ + if (name[0] == '&') { + m->name = &name[1]; + } else { + m->name = name; + } + + m->_rtos_mutex = xSemaphoreCreateMutexStatic(&m->_rtos_mutex_data); +} + +void mutex_lock(struct mutex *m) +{ + int ret = xSemaphoreTake(m->_rtos_mutex, portMAX_DELAY); + + /* Ensure locking was actually successful */ + assert(ret == pdTRUE); +} + +bool mutex_trylock(struct mutex *m) +{ + int ret = xSemaphoreTake(m->_rtos_mutex, 0); + return ret == pdTRUE; +} + +void mutex_unlock(struct mutex *m) +{ + /* Ensure that only the owner can unlock a mutex */ + assert(mutex_get_owner(m) == xTaskGetCurrentTaskHandle()); + + int ret = xSemaphoreGive(m->_rtos_mutex); + + /* + * Ensure that unlocking was successful; that is, the mutex must have + * been acquired previously (no multiple unlocks). + */ + assert(ret == pdTRUE); +} + +TaskHandle_t mutex_get_owner(struct mutex *m) +{ + return xSemaphoreGetMutexHolder(m->_rtos_mutex); +} diff --git a/epicardium/modules/mutex.h b/epicardium/modules/mutex.h new file mode 100644 index 0000000000000000000000000000000000000000..1c3055996ac2142bee634d9fd9f4c09b3233d69d --- /dev/null +++ b/epicardium/modules/mutex.h @@ -0,0 +1,82 @@ +#ifndef _MUTEX_H +#define _MUTEX_H + +#ifndef __SPHINX_DOC +/* Some headers are not recognized by hawkmoth for some odd reason */ +#include <stddef.h> +#include <stdbool.h> + +#include "FreeRTOS.h" +#include "semphr.h" +#else +typedef unsigned int size_t; +typedef _Bool bool; + +typedef void *TaskHandle_t; +typedef void *SemaphoreHandle_t; +typedef int StaticSemaphore_t; +#endif /* __SPHINX_DOC */ + + +/** + * Mutex data type. + */ +struct mutex { + /* Name of this mutex, kept for debugging purposes. */ + const char *name; + + /* FreeRTOS mutex data structures. */ + SemaphoreHandle_t _rtos_mutex; + StaticSemaphore_t _rtos_mutex_data; +}; + +/** + * Create a new mutex. + * + * Call this function as early as possible, in an obvious location so it is easy + * to find. Mutexes should be defined statically so they stay alive for the + * entire run-time of the firmware. + */ +#define mutex_create(mutex) _mutex_create(mutex, #mutex) +void _mutex_create(struct mutex *m, const char *name); + +/** + * Lock a mutex. + * + * If the mutex is held by another task, :c:func:`mutex_lock` will block the + * current task until the mutex is unlocked. + * + * .. warning:: + * + * This function is **not** safe to use in a timer! + */ +void mutex_lock(struct mutex *m); + +/** + * Try locking a mutex. + * + * If the mutex is currently locked by another task, :c:func:`mutex_trylock` + * will return ``false`` immediately. If the attmept to lock was successful, it + * will return ``true``. + * + * This funciton is safe for use in timers. + */ +bool mutex_trylock(struct mutex *m); + +/** + * Unlock a mutex. + * + * You **must** call this function from the same task which originally locked + * the mutex. + */ +void mutex_unlock(struct mutex *m); + +/** + * Get the current owner of the mutex. + * + * Returns the task-handle of the task currently holding the mutex. If the + * mutex is unlocked, ``NULL`` is returned. + */ +TaskHandle_t mutex_get_owner(struct mutex *m); + +#endif /* _MUTEX_H */