alexr_rwx: (removal of signs)
[personal profile] alexr_rwx
Announcing the Final Examination of
Alexander James Rudnick
for the Degree of Doctor of Philosophy in Computer Science
Tuesday, November 6, 2018, 2:00pm
Indiana Memorial Union, Walnut Room
Cross-Lingual Word Sense Disambiguation for Low-Resource Hybrid Machine Translation


This thesis argues that cross-lingual word sense disambiguation (CL-WSD) can be used to improve lexical selection for machine translation when translating from a resource-rich language into an under-resourced one, especially when relatively little bitext is available. In CL-WSD, we perform word sense disambiguation, considering the senses of a word to be its possible translations into some target language, rather than using a sense inventory developed manually by lexicographers.

Using explicitly trained classifiers that make use of source-language context and of resources for the source language can help machine translation systems make better decisions when selecting target-language words. This is especially the case when the alternative is hand-written lexical selection rules developed by researchers with linguistic knowledge of the source and target languages, but also true when lexical selection would be performed by a statistical machine translation system, when there is a relatively small amount of available target-language text for training language models.

In this work, I present the Chipa system for CL-WSD and apply it to the task of translating from Spanish to Guarani and Quechua, two indigenous languages of South America. I demonstrate several extensions to the basic Chipa system, including techniques that allow us to benefit from the wealth of available unannotated Spanish text and existing text analysis tools for Spanish, as well as approaches for learning from bitext resources that pair Spanish with languages unrelated to our intended target languages. Finally, I provide proof-of-concept integrations of Chipa with existing machine translation systems, of two completely different architectures.

Outline of Current Studies
Major: Computer Science
Minor(s): Computational Linguistics
Educational Career
B.S., Georgia Institute of Technology, 2005
M.S., Georgia Institute of Technology, 2007

Committee in Charge
Emeritus Associate Professor Michael E. Gasser, Co-chair, (XXX) XXX-XXXX, Computer Science
Professor Sandra Kübler, Co-chair, (XXX) XXX-XXXX, Linguistics
Associate Professor David J. Crandall, Computer Science
Assistant Teaching Professor John S. DeNero, U.C. Berkeley EECS
Associate Professor Markus Dickinson, Linguistics

Any member of the Graduate Faculty may attend. As a courtesy, please notify the committee chairperson in advance.

Date: 2018-10-14 09:19 pm (UTC)
brainwane: My smiling face, including a small gold bindi (Default)
From: [personal profile] brainwane
Congrats and best wishes!

Profile

alexr_rwx: (Default)
Alex R

May 2022

S M T W T F S
1234 567
891011121314
15161718192021
22232425262728
293031    

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Oct. 25th, 2025 12:02 pm
Powered by Dreamwidth Studios