Summary
Self-supervised learning is expected to improve representation quality in tiny sEMG models without increasing deployment cost, enabling more robust cross-subject generalization and label-efficient adaptation under fixed system budgets.
Motivation
Wearable sEMG gesture recognition systems often underperform in real-world deployment — not because models are too small, but because learned representations degrade under strong inter-subject variability, scarce personalization data, and strict energy and compute constraints.
In embedded systems, increasing model capacity or training complexity is rarely feasible. Performance is therefore limited by how much transferable structure a model can extract from limited data under a fixed inference budget.
While self-supervised learning (SSL) has shown promise for biosignal modeling, many existing studies rely on increased model capacity, relaxed compute budgets, or large-scale pretraining. This makes it unclear whether SSL itself improves representation quality under realistic deployment constraints.
Research Question
Can self-supervised pretraining improve the representation quality learned by very small sEMG encoders, such that cross-subject generalization and label-efficient user adaptation improve without increasing inference cost or model capacity?
Hypothesis
For a fixed tiny encoder architecture and a fixed inference budget, introducing self-supervised learning during pretraining improves:
- cross-subject representation quality
- label-efficient user adaptation
compared to purely supervised training, while leaving deployment cost unchanged.
Method
The project is designed as a causally controlled study that isolates the role of representation learning.
Encoder architecture, optimization budget, and inference cost are held constant across all experimental variants. Self-supervised learning is introduced as a single, explicit intervention via masked signal modeling during pretraining.
Contrastive SSL is deliberately excluded due to its additional training dependencies, which could confound causal interpretation in resource-constrained settings.
Representation quality is assessed using frozen encoders with linear probes and low-data adaptation scenarios, decoupling representational effects from classifier capacity or fine-tuning flexibility.
Evaluation
Evaluation emphasizes deployment realism over benchmark maximization:
- Leave-one-subject-out generalization to unseen users
- Post-deployment adaptation using only 5–30 seconds of labeled calibration data
- Explicit control of normalization, data splits, and temporal leakage
- Reporting of system metrics (parameter count, MACs, batch-1 latency) alongside task performance
The objective is to assess whether SSL meaningfully alters representation behavior under constraints typical of wearable and embedded systems.
Implications
If the hypothesis holds, the implication is not that model capacity becomes irrelevant, but that representation learning can be made more efficient within a fixed system envelope.
While grounded in wearable sEMG gesture decoding (NinaPro DB2), the underlying question and methodology are relevant to a broader class of resource-constrained edge AI systems with:
- temporal sensor data
- strong domain or subject variability
- limited labeled data
- fixed inference budgets
These include wearable health monitoring, inertial sensing, bioacoustics, environmental monitoring, neurotechnology, and TinyML systems.
Any transfer beyond sEMG and NinaPro DB2 requires independent empirical verification.
Artifacts
Planned outputs emphasize auditability and system relevance:
- Reproducible experimental codebase with leakage-safe evaluation protocols
- Concise technical report documenting design decisions and observed trends
- Explicit reporting of system metrics alongside task performance
- ONNX export and post-training INT8 quantization analysis