Back to Projects
Computer Vision

Activity Vision

A productivity monitoring tool that uses OCR and image classification to categorize workflow states from screen captures in real-time.

Activity Vision

Latency

<100ms

Users

5k+

Technologies

PythonOpenCVTensorFlowTesseract OCRRedisElectron

Overview

Activity Vision is a privacy-first productivity monitoring solution that uses computer vision to understand work patterns without invasive keystroke logging or continuous screen recording.

The Problem

Remote work monitoring tools face a dilemma:

  • Too invasive = employee distrust
  • Too lax = no actionable insights

We needed to find a balance that respects privacy while providing useful productivity data.

Technical Solution

Snapshot Analysis

Instead of recording screens, we take periodic snapshots and immediately process them locally:

class ActivityClassifier:
    def __init__(self):
        self.ocr_engine = TesseractEngine()
        self.classifier = load_model('activity_classifier.h5')

    def analyze_screenshot(self, image):
        # Extract text for app identification
        text = self.ocr_engine.extract(image)

        # Classify activity type
        features = self.extract_features(image, text)
        activity = self.classifier.predict(features)

        # Delete image immediately
        del image

        return activity, confidence

Privacy by Design

  • Images are processed locally and immediately deleted
  • Only activity categories are sent to the server
  • No keystroke logging or clipboard monitoring
  • User can pause tracking anytime

Activity Categories

The model classifies work into categories:

  1. Coding: IDE usage, terminal commands
  2. Communication: Email, Slack, meetings
  3. Research: Documentation, web browsing
  4. Design: Creative tools, presentations
  5. Administrative: Spreadsheets, documents

Model Training

We trained on 50,000+ anonymized screenshots with:

  • Multi-task learning (app + activity type)
  • Data augmentation for screen variations
  • Transfer learning from ImageNet

Edge Deployment

For privacy and latency, the model runs entirely on-device:

  • Model size: 15MB (quantized)
  • Inference: <100ms on CPU
  • Memory: <200MB RAM

Results

  • 5,000+ active users across 50 companies
  • 100ms average processing time
  • 94% accuracy in activity classification
  • Zero privacy complaints in 2 years

Future Improvements

  • Attention pattern analysis (with consent)
  • Meeting engagement scoring
  • Burnout prediction alerts
  • Integration with project management tools