Part of the Agentic Family · Zero Dependencies · Browser Native

Agentic Sense

Give your AI agent eyes. 478 face landmarks, 52 blendshapes, hand tracking, body pose, object detection — all running locally in the browser via MediaPipe. Single file, zero dependencies.

Live Demo GitHub →

Core Output

Raw perception, not opinions

Every frame, agentic-sense outputs structured data about every face, hand, and body in view. Your agent decides what it means.

{
  "faceCount": 1,
  "faces": [{
    "head": { "yaw": -0.023, "pitch": 0.041, "facing": true },
    "eyes": { "avgEAR": 0.312, "ipd": 0.089, "iris": { ... } },
    "blendshapes": { "jawOpen": 0.03, "smileL": 0.12, ... 38 values },
    "interpretation": {
      "expression": "smiling", "focus": { "score": 82, "level": "high" },
      "gaze": { "region": "center", "looking": true },
      "blinkRate": 14, "distance": 0.8
    }
  }],
  "hands": [{ "gesture": "Open_Palm", "fingers": { ... } }],
  "body": { "joints": { ... }, "shoulderWidth": 0.34 },
  "segmentation": { "personRatio": 0.42 },
  "objects": [{ "label": "laptop", "confidence": 0.91 }]
}

👁

478 Landmarks

Full face mesh with iris tracking. Head pose, eye corners, lip contour — sub-pixel precision at 30fps.

🎭

52 Blendshapes

Every facial muscle as a 0–1 weight. jawOpen, smileL, browDown — the raw signal for expression analysis.

✋

Hand Tracking

21 landmarks per hand, gesture recognition (8 built-in), per-finger extension state. Up to 2 hands.

🦴

Body Pose

33 skeletal landmarks. Shoulder width, torso length, joint positions with visibility scores.

🔒

Fully Local

Zero network requests after model load. MediaPipe WASM in-browser. Camera feed never leaves the device.

📦

Single File

~480 lines of JS. No npm, no build step. One import and you're sensing.

Architecture

Library returns data, you draw

AgenticSense wraps MediaPipe into a single class. detect() gives you structured data. Overlay, dashboard, synthesis — all optional, in the demo folder.

AgenticSense

agentic-sense.js (~480 lines) init({ face, hands, pose, segment, objects }) detect() → SenseFrame rawResults → MediaPipe objects

↑ the library

Interpretation

expression classifier focus scorer gaze estimator blink detector head pose EMA

↑ built into detect()

Demo Only

overlay.js dashboard.js synthesis engine camera switcher

Quick Start

Three steps to perceive

// 1. Import
import { AgenticSense } from 'agentic-sense'

// 2. Init
const video = document.getElementById('cam')
const sense = new AgenticSense(video)
await sense.init({ wasmPath: './mediapipe/', face: true, hands: true })

// 3. Sense
function loop() {
  const frame = sense.detect()
  if (frame?.faceCount > 0) {
    console.log(frame.faces[0].interpretation.expression)  // 'smiling'
    console.log(frame.faces[0].interpretation.focus.score)  // 82
    console.log(frame.faces[0].blendshapes.jawOpen)         // 0.031
  }
  if (frame?.hands.length > 0) {
    console.log(frame.hands[0].gesture)                    // 'Open_Palm'
  }
  requestAnimationFrame(loop)
}
loop()

Agentic Sense

Raw perception, not opinions

See it in action

Library returns data, you draw

Three steps to perceive

Part of the agentic family