Let’s find out what GPT4 vision can do

GPT4 vision isn't just a gimmick. We've been given a new superpower, and so we must "deal with it".

This is probably as big a moment as when chatGPT first arrived, maybe more. Machine Vision for the masses (and more).

I tried doing some very loose sketches, and it really struggled to identify them until they were coloured in. Humans could easily what they were. But, in order to see what uses it has, we need to know what capabilities it does and does not have.

Pick a question and see what you can learn!

can it use TINY images (I assume they are much faster)
can it tell you what has changed in two images?
can it measure distances ? (with perspective?)
- can it make 3d models from instructions?
can it "learn" to recognise people/ similar objects (in the same context window)
what limits are there to exhaustive listing
- exhaustive description
is it better at details or overviews
can it read maps / graphs / text
how smart is it on DIY / xrays / mechanics
can it follow wires??
(Can it find lego)
is there a formal reference system you can use (X/Y)
- can it give co-ordinates in large grids or grid-like (how un-grid like)
  - ie film strip, or window-panes
- can it navigate a 2d maze turn-by turn? 3d maze? can that be insanely complex?
can it make ebay descriptions (condition)
can it estimate food weight
can it estimate strength / angles / volume
can it create programs from screenshots. Can it use programs? games? control RC car / robot?
what kind of language / instructions are best when talking about images.
what other questions do we need

submitted by /u/inteblio
[link] [comments]