Threading¶
stdlib threading only. No asyncio, no multiprocessing on the hot path. Subprocess isolation is reserved for "bridges" - heavy-data sources like webcams or ultrasound that would otherwise saturate a thread.
What runs where¶
flowchart LR
subgraph MAIN["Main thread"]
UI["GUI event loop<br/>ImGui render<br/>button clicks"]
end
subgraph DAEMONS["Daemon threads (one each)"]
direction TB
AT["Acquisition thread (per Stream)<br/>source.read → ring buffer<br/>→ display snapshot<br/>→ optional Zarr append"]
PT["Predict thread (only with Pipeline)<br/>extract → predict<br/>→ pipeline.predictions"]
TT["Training thread (transient)<br/>train → pipeline.model"]
OT["Output thread (per Output)<br/>drain _latest → transport @ hz"]
end
subgraph BRIDGE["Subprocess (optional)"]
BR["Bridge process<br/>webcam decode → Zarr<br/>→ publish LSL clock"]
end
AT --> CTX["ctx.streams[name]<br/>(threading.Lock)"]
CTX --> PT
CTX --> UI
PT -.->|push| OT
BR -.->|LSL clock| AT
| Thread | Owner | Job |
|---|---|---|
| Main | OS | GUI event loop, ImGui render, button clicks |
Acquisition (one per Stream) |
Stream |
source.read → ring buffer → display snapshot → optional Zarr append |
| Predict | Pipeline |
extract → predict → write to pipeline.predictions |
| Training (transient) | Pipeline |
train → assign to pipeline.model |
Output (one per Output) |
user-owned Output |
drain _latest to its destination at hz |
| Bridge subprocess (optional) | Bridge |
webcam decode → Zarr → publish LSL clock |
Every non-main thread is a daemon: the program exits cleanly even if a thread is mid-iteration when the user closes the window. The predict thread is started via app.before_run_hooks (at app.run(), not on first Predict click) and joined with a short timeout via a threading.Event flag on cleanup. Acquisition and output threads are daemons that exit when the process does; their stop() methods set a sentinel but don't join().
Why the GIL doesn't bite¶
Python's Global Interpreter Lock would be a problem if any of these threads spent CPU time in pure Python. They don't:
- NumPy releases the GIL inside C extensions (
np.copy,np.fft.rfft,np.dot). The ring buffer copy inget_windowis GIL-released. - PyTorch CUDA runs on the GPU; the Python thread is mostly waiting for the kernel.
- LSL (
pylslandmne_lsl) is a C library;pull_chunkreleases the GIL. - ImGui / ImPlot are C++; the render thread spends its time in C++ frame building.
- OpenCV (used by
WebCamBridge) releases the GIL incv2.VideoCapture.read.
The Python interpreter is mostly orchestrating - handing buffers between threads. The actual compute happens in GIL-released code, so the threads run in parallel on multi-core machines.
Synchronisation¶
The shared Context is the only synchronisation surface, and it relies on three properties:
- Reference assignment is atomic in CPython.
pipeline.model = traineddoesn't tear; readers either see the old reference or the new one. Same forpipeline.predictions = {...}. - Ring buffer reads/writes hold a
threading.Lock. ~1–5 µs overhead per access - invisible against the µs–ms latencies of real workloads. ctx.statetransitions go throughApp.start_*/App.stop_*. Callers don't pokectx.state = "..."directly. State changes are point-in-time; there's no "stop in progress" intermediate state.
No condition variables, no queues, and the only threading.Event is the predict thread's shutdown flag. If a thread needs the latest value, it reads the field; if a thread needs to update, it writes the field.
The GPU contention rule¶
Predicting pauses while training runs. This is a hard rule, not a soft hint:
Click Train while predicting is running → predicting stops first. Training completes → user clicks Predict again to resume.
Why: PyTorch CUDA streams can be juggled, but the engineering complexity isn't worth it for a single-user experiment app. Predict throughput drops to zero during training, which lasts seconds-to-minutes; that's acceptable for the use case. The state machine refuses parallel attempts so you can't accidentally OOM the GPU.
Bridge subprocesses¶
Heavy-data sources break the GIL-release assumption - webcam decoding, even with OpenCV, can saturate the Python thread enough to disrupt the predict thread. The escape hatch is a subprocess:
The bridge process:
- Owns its own Python interpreter (no GIL contention with the main app).
- Writes frames directly to a Zarr array - that's the persistence step.
- Publishes an LSL "clock" stream so the main app knows what frame number is current.
The main app never touches the camera frames; it just reads frame-number stamps from the LSL stream and looks up frames in the Zarr if the experiment needs them. A process_launcher panel shows the bridge's start/stop state.
Common mistakes¶
See also: full Troubleshooting index, organised by symptom across every subsystem.
time.sleepin@pipeline.predict. It blocks the predict thread. The framework already paces ticks atpredict_hz; if you need rate-limiting, changepredict_hzor use a state machine that returns the previous prediction on stale ticks.- Touching ImGui from a non-main thread. ImGui is not thread-safe. Widgets only run inside
@app.ui, which is the render thread. Predict-thread code must NOT callimgui.*. - Long blocking I/O inside a Source's
read. It pauses the acquisition thread, which pauses ring-buffer updates, which delays prediction. Sources should poll non-blockingly (e.g.pylsl.pull_chunk(timeout=0.0)). - Holding the ring buffer lock during compute. The lock is taken inside
get_windowfor the copy; release happens before the function returns. User code holds no buffer lock.