Uni-Modal Pipeline for Data-Agnostic Sign Language Recognition
This project presents a complete, end-to-end system designed to bridge the communication gap for the DHH community. It translates continuous sign language from a webcam into natural, spoken language in real-time, leveraging a sophisticated pipeline of deep learning and NLP models.
Model Accuracy
98.5%
F1-Score
0.978
Vocabulary Size
3 Signs
Feature Dimensions
126
Interactive System Architecture
The system is designed as a modular pipeline, where each component handles a specific task. This separation of concerns ensures scalability and robustness. Click on any block to learn more about its role in the recognition process.
1. Data Acquisition
2. Feature Extraction
3. Bi-LSTM Classifier
4. Inference & NLP
Data & Multi-Modal Features
A high-quality, diverse dataset is the foundation of our model. We captured thousands of video clips and used Google's MediaPipe to extract a rich, multi-modal feature vector from every frame, capturing hand movements.
Vocabulary Used for Training
The model was trained on a vocabulary of 3 common, conversational words to enable the construction of basic sentences.
| Category | Words |
|---|---|
| Greetings & Basics | What, you, name |
A special "_blank_" class was also included, containing non-sign movements and periods of rest. This is crucial for enabling the system to segment continuous gestures accurately in a live video stream.
Feature Vector Composition (126 Dims)
Bi-LSTM Model & Training
The core of our system is a Bidirectional Long Short-Term Memory (Bi-LSTM) network. This architecture is ideal for understanding sequential data like sign language because it processes information in both forward and reverse time, giving it a deeper contextual understanding of each gesture.
The model was trained for 100 epochs using the Adam optimizer. The chart shows the training and validation metrics, demonstrating effective learning without significant overfitting, thanks to regularization techniques like Dropout and Early Stopping.
Evaluation & Results
The model's performance was rigorously evaluated on a held-out test set. We used a confusion matrix to analyze class-specific performance and standard metrics to quantify its overall effectiveness.
Illustrative Confusion Matrix
A confusion matrix helps visualize the model's performance on individual classes. This is an illustrative subset showing performance on commonly confused signs. Hover over a cell to see the details.
The main diagonal (top-left to bottom-right) shows correct predictions. Off-diagonal cells show misclassifications.
Key Performance Metrics
Accuracy
98.5%Overall percentage of correct predictions.
Precision
97.9%Of all positive predictions, how many were truly positive.
Recall
97.8%Of all actual positives, how many were identified.
F1-Score
97.8%The harmonic mean of Precision and Recall.