A new survey paper by Ma et al (2024) explores various methods for combining LLMs with 3D data representations, such as point clouds and Neural Radiance Fields, to perform tasks like scene understanding, captioning, and navigation.