Playing around with Visual AI

See a demo of a toy that detects objects in camera feeds, crops them, and uses LLAVA for image processing with structured output.

Overview

I’m not gung-ho to do this, but I heard you’re looking for speakers. I built a little toy that utilizes A Python package called imageai to detect images in the camera (via a yolo3 model). It then performs some cropping on a detected image and sends it to llava (MS model) for some light image processing with structured output. I would be willing do demo it and walk through the source.

If that’s something of interest, I could talk about that. Probably would take no more than 20 minutes.

Links

https://github.com/brettschneider/ai-object-detection
Yolo-v3 detection crops objects for Llava multi-modal analysis and metadata extraction.

Tech stack