Related Publications and Presentations

Dissemination Activities

Our presentations at the Digging Into Data Conference can be found here.

In conjunction with our Argentina meeting in August 2018, we gave a series of dissemination talks about ACLEW to South American researchers at the Encuentro sobre Lenguaje, Cognición e Interacción social en la primera infancia. You can find the slides here. Video-recordings are also available for the 3 talks:

Speech processing tools for the study of infants language environments (Metze/Schuller)
Examining cross-linguistic and cross-cultural similarities and differences in the language environments of infants and toddlers (Soderstrom)
The ACLEW Annotation Scheme (AAS): A forward-looking system for annotating recordings of naturalistic speech (Casillas)

Journal Publications

Florencia, A. L. A. M., Rosemberg, C. R., Garber, L., & Stein, A. (2021, online). Variation sets in the speech directed to toddlers in Argentinian households. SES and type of activity effects. Journal of Child Language.
Stein, A., Menti, A. B., & Rosemberg, C. R. (2021, online). Socioeconomic status differences in the linguistic environment: a study with Spanish-speaking populations in Argentina. Early Years.
Soderstrom, M., Casillas, M., Gornik, M., Bouchard, A., MacEwan, S., Shokrkon, A., & Bunce, J. (2021). English-Speaking Adults' Labeling of Child-and Adult-Directed Speech Across Languages and Its Relationship to Perception of Affect. Frontiers in Psychology, 12, 708887.
Soderstrom, M., Casillas, M., Bergelson, E., Rosemberg, C., Alam, F., Warlaumont, A. S., & Bunce, J. (2021). Developing A Cross-Cultural Annotation System and MetaCorpus for Studying Infants’ Real World Language Experience. Collabra: Psychology, 7(1), 23445.
Räsänen, O., Seshadri, S., Lavechin, M., Cristia, A., & Casillas, M. (2021). ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings. Behavior Research Methods, 53(2), 818-835.
Cristia, A., Lavechin, M., Scaff, C., Soderstrom, M., Rowland, C. F., Räsänen, O., … Bergelson, E. (2021). A thorough evaluation of the Language Environment Analysis (LENATM) system. Behavior Research Methods, 53, 467-486.
Rosemberg, C. R., Alam, F., Audisio, C. P., Ramirez, M. L., Garber, L., & Migdalek, M. J. (2020). Nouns and verbs in the linguistic environment of Argentinian toddlers: Socioeconomic and context-related differences. First Language, 40(2), 192-217.
Cychosz, M., Romeo, R., Soderstrom, M., Scaff, C., Ganek, H., Cristia, A., ... & Weisleder, A. (2020). Longform recordings of everyday life: Ethics for best practices. Behavior Research Methods, 52(5), 1951-1969.
Cristia, A., Bulgarelli, F., & Bergelson, E. (2020). Accuracy of the Language Environment Analysis System Segmentation and Metrics: A systematic review. Journal of Speech, Language and Hearing Research, 63(4), 1093-1105.
Cristia, A. (2020). Language input and outcome variation as a test of theory plausibility: The case of early phonological acquisition. Developmental Review, 57, 100914.
Casillas, M., Brown, P., & Levinson, S. C. (2020). Early language experience in a Tseltal Mayan village. Child Development, 91(5), 1819-1835. doi:10.1111/cdev.13349.
Zhang, Z., Han, J., Qian, K., Janott, C., Guo, Y., & Schuller, B. (2019). Snore-GANs: Improving Automatic Snore Sound Classification with Synthesized Data. IEEE Journal of Biomedical and Health Informatics, 24(1), 300-310.
Han, J., Zhang, Z., Ren, Z., & Schuller, B. W. (2019). EmoBed: Strengthening Monomodal Emotion Recognition via Training with Crossmodal Emotion Embeddings. IEEE Transactions on Affective Computing, 12(3), 553-564.
Han, J., Zhang, Z., & Schuller, B. (2019). Adversarial training in affective computing and sentiment analysis: Recent advances and perspectives. IEEE Computational Intelligence Magazine, 14(2), 68-81.
Räsänen, O., Seshadri, S., Karadayi, J., Riebling, E., Bunce, J., Cristia, A., Metze, F., Casillas, M. Rosemberg, C., Bergelson, E., Soderstrom, M. (2019). Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech. Speech Communication, 113, 63-80.
Casillas, M., & Cristia, A. (2019). A step-by-step guide to collecting and analyzing long-format speech environment (LFSE) recordings. Collabra:Psychology, 5(1), 24.
Räsänen, O., Kakouros, S. & Soderstrom, M. (2018). Is infant-directed speech interesting because it is surprising? — Linking properties of IDS to statistical learning and attention at the prosodic level. Cognition, 178, 193–206.
Bergelson*, E., Casillas*, M., Soderstrom, M., Seidl, A., Warlaumont, A. S., & Amatuni, A. (2018). What do North American babies hear? A large-scale cross-corpus analysis. Developmental Science, 22(1), e12724. *Joint first author.
Zhang, Z., Han, J., Xu, X., Deng, J., Ringeval, F., & Schuller., B. (2018). Leveraging Unlabelled Data for Emotion Recognition with Enhanced Collaborative Semi-Supervised Learning. IEEE Access 6.1, 22196–22209.
Zhang, Z., Han, J., Coutinho, E., & Schuller, B. (2018). Dynamic difficulty awareness training for continuous emotion prediction. IEEE Transactions on Multimedia, 21(5), 1289-1301.

Conference Presentations and Proceedings

Lavechin, M., Gill, M. P., Bousbib, R., Bredin, H., & Garcia-Perera, L. P. (2020, in press). End-to-end Domain-Adversarial Voice Activity Detection. Interspeech 2020.
Lavechin, M., Bousbib, R., Bredin, H., Dupoux, E., & Cristia, A. (2020, in press). An open-source voice type classifier for child-centered daylong recordings. Interspeech 2020.
MacDonald, K., Räsänen, O., Casillas, M., & Warlaumont, A. S. (2020). Measuring prosodic predictability in children’s home language environments. Proceeedings of Cognitive Science Society.
Rosemberg, C.R., Alam, F., Garber, L., Stein, A., Bunce, J., Migdalek, M.J y Soderstrom, M. (2019) Conjuntos de variación en el habla dirigida al niño en el entorno natural del hogar: efectos de la educación materna a través de culturas. IX Congreso Internacional de Adquisición del Lenguaje. Asociación de Estudios de Adquisición del Lenguaje, Madrid, España, Septiembre 4-6.
Bunce, J., Bergelson, E., Warlaumont, A., & Casillas, M. (2019, July). Daylong data: Raw audio to transcript via automated & manual open-science tools. CogSci 2019 preconference.
Lavechin, M., Gill, M. P., Bousbib, R., Bredin, H., & Garcia-Perera, L. P. (2019). End-to-end Domain-Adversarial Voice Activity Detection. arXiv preprint arXiv:1910.10655.
Ryant, N., Church, K., Cieri, C., Cristia, A., Du, J., Ganapathy, S., & Liberman, M. (2019). The Second DIHARD Diarization Challenge: Dataset, task, and baselines. arXiv preprint arXiv:1906.07839.
Seidl, A., Warlaumont, A. S., & Cristia, A. (2019). Towards detection of canonical babbling by citizen scientists: Performance as a function of clip length. Proc. Interspeech 2019, 3579-3583.
Han, J., Zhang, Z., Ren, Z., & Schuller, B. (2019, May). Implicit Fusion by Joint Audiovisual Training for Emotion Recognition in Mono Modality. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5861-5865). IEEE.
Rizos, G., & Schuller, B. (2019, May). Modelling Sample Informativeness for Deep Affective Computing. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3482-3486). IEEE.
Zhang, Z., Wu, B., & Schuller, B. (2019, May). Attention-augmented End-to-end Multi-task Learning for Emotion Prediction from Speech. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6705-6709). IEEE.
Han, J., Zhang, Z., Ren, Z., & Schuller, B. (2019, May). Implicit Fusion by Joint Audiovisual Training for Emotion Recognition in Mono Modality. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5861-5865). IEEE.
Soderstrom, M., Casillas, M., Bergelson, E., Kirby, J., Rosemberg, C., Stein, A., ... & Bunce, J. (2019). Quantifying child directed speech cross-culturally across development. The Journal of the Acoustical Society of America, 145(3), 1763-1763.
Najla Al Futaisi, Zixing Zhang, Alejandrina Cristia, Anne Warlaumont, and Bjorn Schuller. 2019. VCMNet: Weakly Supervised Learning for Automatic Infant Vocalisation Maturity Analysis. In 2019 International Conference on Multimodal Interaction (ICMI ’19), Wen Gao, Helen Mei Ling Meng, Matthew Turk, Susan R. Fussell, Björn Schuller, Yale Song, and Kai Yu (Eds.). ACM, New York, NY, USA, 205-209. DOI: https://doi.org/10.1145/3340555.3353751
Bunce, J. , Casillas, M., Bergelson, E., Kirby, J., Rosemberg, C., Stein, A., Warlaumont A., Soderstrom, M. (2019). A cross-cultural examination of child-directed speech across development. Society for Research in Child Development.
Casillas, M., & Elliot, M. (2019). Carrying practices and infant object handling in two non-WEIRD communities. Society for Research in Child Development.
Räsänen, O., Seshadri, S. & Casillas, M. (2018). Comparison of Syllabification Algorithms and Training Strategies for Robust Word Count Estimation across Different Languages and Recording Conditions. Proc. Interspeech-2018, Hyderabad, India.
Zhang, Z., Han, J., Qian, K., & Schuller, B. (2018). Evolving Learning for Analysing Mood-Related Infant Vocalisation. Proceedings of Interspeech.
Zhang, Z., Warlaumont, A., Schuller, B., Yetish, G., Scaﬀ, C., Colleran, H., Stieglitz, J., & Cristia, A. (2018). Developing computational measures of vocal maturity from daylong recordings. Proceedings of the 16th annual conference of the French Phonology Network (RFP).
Han, J., Zhang, Z., Schmitt, M., Ren, Z., Ringeval, F., Schuller, B. (2018) Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech. Proc. Interspeech 2018, 3082-3086, DOI: 10.21437/Interspeech.2018-996.
Han, J., Zhang, Z., Ren, Z., Ringeval, F., & Schuller, B. (2018). Towards conditional adversarial training for predicting emotions from speech. Proceedings of ICASSP.
Karadayi, J., Scaff, C., Stieglitz, K., & Cristia, A. (2018). Diarization in maximally ecological recordings: Data from Tsimane children. Proceedings of SLTU.
Cristia, A., Ganesh, S., Casillas, M., & Ganapathy, S. (2018). Talker diarization in the wild: The case of child-centered daylong audio-recordings. Proceedings of Interspeech.
Le Franc, A., Riebling, E., Karadayi, J., Wang, Y., Scaff, C., Metze, F., & Cristia, A. (2018).The ACLEW DiViMe: An easy-to-use diarization tool. Proceedings of Interspeech.
Zhang, Z., Cristia, A., Warlaumont, A., & Schuller, B. (2018). Automated Classification of Children's Linguistic versus Non-Linguistic Vocalisations. Proceedings of Interspeech.
Räsänen, O., Seshadri, S. & Casillas, M. (2018). Comparison of Syllabification Algorithms and Training Strategies for Robust Word Count Estimation across Different Languages and Recording Conditions. Proc. Interspeech-2018, Hyderabad, India.
Räsänen, O., Kakouros, S., & Soderstrom, M. (2017). Connecting stimulus-driven attention to the properties of infant-directed speech – Is exaggerated intonation also more surprising?, Proc. 39th Annual Conference of the Cognitive Science Society, London, UK, pp. 998–1003, 2017.
Casillas, M., Amatuni, A, Seidl, A., Soderstrom, M., Warlaumont, A.S., & Bergelson, E. (2017). What do Babies Hear? Analyses of Child- and Adult-Directed Speech. Proceedings of Interspeech 2017, pp. 2093--2097. DOI: 10.21437/Interspeech.2017-1409.
Casillas, M., Bergelson, E., Warlaumont, A.S., Cristia, A., Soderstrom, M., VanDam, M., Sloetjes, H. (2017). A New Workflow for Semi-Automatized Annotations: Tests with Long-Form Naturalistic Recordings of Childrens Language Environments. Proceedings of Interspeech 2017, pp. 2098--2102. DOI: 10.21437/Interspeech.2017-1418.
Räsänen, O., Kakouros, S., & Soderstrom, M. (2017). Connecting stimulus-driven attention to the properties of infant-directed speech – Is exaggerated intonation also more surprising?, Proc. 39th Annual Conference of the Cognitive Science Society, London, UK, pp. 998–1003, 2017.
Casillas, M., Bergelson, E., Seidl, A., Soderstrom, M. & Warlaumont, A. (2017). Characterizing North American Child-Directed speech by Age, Gender and SES. Boston University Conference on Language Development. (poster)

Google Sites

Report abuse