My expertise lies in the areas of formal and computational models of syntax and other levels of linguistic representation. I have an active interest in both theoretical issues and practical applications.
I am currently working on email summarization. Email summarization poses special problems (as compared to other summarization tasks) because of the dialogic nature of the source, and the informal or incomplete language often employed in email. Research issues include the development of a corpus for training, identifying the correct syntactic or semantic units for extraction, the choice of features (including email-specific) to be used in machine learning, and determining how best to recompose the extracted units into a smooth and readable summary.
In addition, I am working on multilingual modeling in two projects. One project aims at finding a good linguistic representation for the lexicon, morphology, and syntax of Arabic dialects. Arabic dialects pose a problem for natural language processing as the spoken dialects are not written, and the written language is not natively spoken. Thus, standard corpus-based approaches do not work. A second project, joint with five other sites, aims at devising a workable ``interlingual'' annotation for the semantics of a parallel text corpus.