Uni-Modal Pipeline for Data-Agnostic Sign Language Recognition

This project presents a complete, end-to-end system designed to bridge the communication gap for the DHH community. It translates continuous sign language from a webcam into natural, spoken language in real-time, leveraging a sophisticated pipeline of deep learning and NLP models.

Model Accuracy

98.5%

F1-Score

0.978

Vocabulary Size

3 Signs

Feature Dimensions

126

Interactive System Architecture

The system is designed as a modular pipeline, where each component handles a specific task. This separation of concerns ensures scalability and robustness. Click on any block to learn more about its role in the recognition process.

1. Data Acquisition

2. Feature Extraction

3. Bi-LSTM Classifier

4. Inference & NLP

Data & Multi-Modal Features

A high-quality, diverse dataset is the foundation of our model. We captured thousands of video clips and used Google's MediaPipe to extract a rich, multi-modal feature vector from every frame, capturing hand movements.

Vocabulary Used for Training

The model was trained on a vocabulary of 3 common, conversational words to enable the construction of basic sentences.

Category Words
Greetings & BasicsWhat, you, name

A special "_blank_" class was also included, containing non-sign movements and periods of rest. This is crucial for enabling the system to segment continuous gestures accurately in a live video stream.

Feature Vector Composition (126 Dims)

Bi-LSTM Model & Training

The core of our system is a Bidirectional Long Short-Term Memory (Bi-LSTM) network. This architecture is ideal for understanding sequential data like sign language because it processes information in both forward and reverse time, giving it a deeper contextual understanding of each gesture.

The model was trained for 100 epochs using the Adam optimizer. The chart shows the training and validation metrics, demonstrating effective learning without significant overfitting, thanks to regularization techniques like Dropout and Early Stopping.

Evaluation & Results

The model's performance was rigorously evaluated on a held-out test set. We used a confusion matrix to analyze class-specific performance and standard metrics to quantify its overall effectiveness.

Illustrative Confusion Matrix

A confusion matrix helps visualize the model's performance on individual classes. This is an illustrative subset showing performance on commonly confused signs. Hover over a cell to see the details.

The main diagonal (top-left to bottom-right) shows correct predictions. Off-diagonal cells show misclassifications.

Key Performance Metrics

Accuracy

98.5%

Overall percentage of correct predictions.

Precision

97.9%

Of all positive predictions, how many were truly positive.

Recall

97.8%

Of all actual positives, how many were identified.

F1-Score

97.8%

The harmonic mean of Precision and Recall.