I've had access to Dall-3, Vision and voice chat features, and I've been blown away by how impressive each of the new features are. Dall-E 3 seems roughly comparable to Midjourney in overall image quality, but does a much better job at understanding the prompt. The vision model continues to surprise by how well it is able to understand images at a seemingly human level of comprehension. And the voice chat is such an intuitive and captivating way of interacting with ChatGPT, it felt like I was interacting with one of the AI assistants from the movie "Her".
However, it's unfortunate that these amazing new features cannot be used together at the same time. Up until gaining access to these features, I had been using the advanced data analysis model as my default, which is great for helping with programming tasks. I can only imagine how revolutionary ChatGPT will be when a cohesive multi-modal model is released sometime in the near future which has all these capabilities available from the start.
What things would you want to try if such a cohesive model was released? I can already imagine some use cases where you could set up iterative improvement for things like interface design, which some people have already got to work with just the base vision model by itself.
[link] [comments]