Working at the Intersection of Data Science and NLP
Linda Moreau, MITRE
Dr. Linda Moreau provided a glimpse of work that occurs at the intersection of data science and natural language processing. She illustrated the importance of seemingly low-level linguistic processing to the success of data scientific algorithms through a series of case studies involving the retargeting to other languages of algorithms originally developed for English. Among the topics touched upon were cross-cultural name matching, Arabic proximity search, Chinese IR for term highlighting, Korean word similarity, and emoji processing for sentiment analysis.
Dr. Linda Moreau (née Van Guilder) is a Principal Computational Linguist at the MITRE Corporation who has 25 years of industry experience developing and deploying language technologies and who periodically serves as an adjunct professor for Georgetown University's Department of Linguistics. Dr. Moreau earned her Ph.D. from Georgetown in 2007, with dissertation research focused on the applicability of cross-language speech perception to computational problems such as name matching for cross-cultural identity resolution. Throughout her career, Dr. Moreau has been involved in a number of specialty areas within the realm of Natural Language Processing (NLP), including information extraction, automatic summarization, Arabic handwriting recognition, machine translation and identity resolution. Her current work involves a blend of natural language systems engineering and NLP, with a goal of maximizing the success of data science analytic techniques as they are incorporated into workflows that process multilingual data.