David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba, James R. Glass: Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input. CoRR abs/1804.01452 (2018)