Machine Learning & NLP
Interactive sentiment analysis using DistilBERT, featuring real-time predictions, model performance metrics, and explainability visualizations.
About the Model
This implementation uses DistilBERT, a distilled version of BERT fine-tuned on the Stanford Sentiment Treebank (SST-2) dataset. The model runs entirely in the browser using Transformers.js, providing real-time sentiment analysis without server calls.
66M
Parameters
91.8%
Accuracy
~100ms
Inference Time
67K+
Training Samples
Interactive Sentiment Classifier
Type or paste text below to analyze its sentiment in real-time using DistilBERT.
0 characters
Or try an example:
Model Performance
Evaluation metrics on the Stanford Sentiment Treebank (SST-2) test set.
Accuracy
91.8%
Overall correctness of predictions
Precision (Weighted)
91.8%
Ratio of correct positive predictions
Recall (Weighted)
91.8%
Ratio of actual positives correctly identified
F1 Score (Weighted)
91.8%
Harmonic mean of precision and recall
Per-Class Performance
| Class | Precision | Recall | F1 Score |
|---|---|---|---|
| Positive | 92.0% | 92.5% | 92.2% |
| Negative | 91.5% | 91.1% | 91.3% |
Dataset Information
Sample Distribution
Model Information
Class Distribution
Confusion Matrix
Visualization of prediction accuracy showing true positives, true negatives, false positives, and false negatives.
Predicted
Actual
3730
55.4%
301
4.5%
253
3.8%
2451
36.4%
Legend
Color intensity indicates the percentage of total predictions.
Matrix Insights
Feature Analysis
Visualize the most important words and features that influence sentiment predictions.
positive Sentiment Word Cloud
Size indicates relative importance in sentiment classification. Hover over words to see their importance scores.
Top 10 Most Predictive Positive Features
How it works: The model learns which words are most strongly associated with positive sentiment during training. Words with higher importance scores have a greater influence on the model's predictions. The word cloud visualizes these features, with larger words indicating higher importance.
Live Predictions on Real Data
Test the model on actual Reddit posts from the database and compare predictions with ground truth.
Technical Implementation
Architecture & Approach
The sentiment analysis system leverages a pre-trained DistilBERT model fine-tuned on the SST-2 dataset. DistilBERT is a smaller, faster, and lighter version of BERT that retains 97% of BERT's language understanding while being 60% faster and 40% smaller.
- •Model: distilbert-base-uncased-finetuned-sst-2-english
- •Framework: Transformers.js for client-side inference
- •Task: Binary sentiment classification (positive/negative)
- •Deployment: Fully client-side, no backend API calls required
Training & Evaluation
The model was fine-tuned on the Stanford Sentiment Treebank v2 (SST-2), a benchmark dataset for binary sentiment classification containing movie reviews labeled as positive or negative.
Dataset
- • 67,349 total samples
- • 80% training, 10% validation, 10% test
- • Balanced class distribution
- • Movie review domain
Performance
- • 91.8% accuracy on test set
- • 92.2% weighted F1-score
- • Low false positive/negative rates
- • Fast inference (~100ms)
Integration with Portfolio Data
The model integrates with the existing Reddit data pipeline to provide live sentiment predictions on real posts. This demonstrates the practical application of ML models in production environments.
- •Fetches random Reddit posts from PostgreSQL database
- •Client-side inference using browser-loaded model
- •Compares predictions with ground truth (when available)
- •Real-time feedback on model performance