There's an entire field of study called computer vision that focuses on making machines see the world like humans.
It's focused on using machine learning to decipher digital images. Right now it's not perfect, so other sensors are still required.