Research

    Publications                    Projects

    Conference   Theses      Current Projects    Past Projects

 

Research Summary

I am part of the research team at Comcast Applied AI Research (Washington DC), working on various problems at the intersection of Natural Language Processing (NLP), Machine Learning (ML), and Information Retrieval (IR). I am part of the team that develops NLP models for the voice-enabled entertainment operating system X1, used by millions of customers every day. Before joining Comcast in June 2015, I was a computer scientist at the Speech, Language, and Multimedia group at BBN Technologies, developing novel approaches for modeling natural language, and for reasoning with these models to answer questions.

I graduated from the PhD program of the Department of Computer Science at University of Maryland in 2013, where I had been working on ideas that improve Machine Translation (MT) using techniques from Cross-Language Information Retrieval (CLIR), and vice versa, since 2008. I also gained experience working with large data, primarily using MapReduce to deal with scaling. I am an alumnus of the Cloud Computing Center (CCC) and the Computational Linguistics and Information Processing (CLIP) lab, where I had the chance to collaborate with many great researchers, including my advisor Jimmy Lin, as well as Douglas W. Oard and Philip Resnik.

Publications

Refereed Papers

29) Challenges and Opportunities in Understanding Spoken Queries Directed at Modern Entertainment Platforms.

Ferhan Ture, Jinfeng Rao, Raphael Tang, and Jimmy Lin

In Proc. of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019).

[pdf]


28) Yelling at Your TV: An Analysis of Speech Recognition Errors and Subsequent User Behavior on Entertainment Systems.

Raphael Tang, Ferhan Ture, and Jimmy Lin

In Proc. of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019).

[pdf]


27) Streaming Voice Query Recognition using Causal Convolutional Recurrent Neural Networks.

Raphael Tang, Gefei Yang, Hong Wei, Yajie Mao, Ferhan Ture, and Jimmy Lin

arXiv (Submitted 12/19/2018)

[pdf]


26) Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search.

Jinfeng Rao, Wei Yang, Yuhao Zhang, Ferhan Ture, and Jimmy Lin

In Proc. of Association for the Advancement of Artificial Intelligence (AAAI 2019).

[pdf]


25) Multi-Task Learning with Neural Networks for Voice Query Understanding on an Entertainment Platform.

Jinfeng Rao, Ferhan Ture, and Jimmy Lin

In Proc. of International Conference on Knowledge Discovery & Data Mining (KDD 2018).

[pdf]


24) What Do Viewers Say to Their TVs? An Analysis of Voice Queries to Entertainment Systems.

Jinfeng Rao, Ferhan Ture, and Jimmy Lin

In Proc. of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018).

[pdf]


23) Talking to Your TV: Context-Aware Voice Search with Hierarchical Recurrent Neural Networks.

Jinfeng Rao, Ferhan Ture, Hua He, Oliver Jojic, and Jimmy Lin

In Proc. of International Conference on Information and Knowledge Management (CIKM 2017).

[pdf]


22) No Need to Pay Attention: Simple Recurrent Neural Networks Work! (for Answering "Simple" Questions).

Ferhan Ture and Oliver Jojic

In Proc. of Empirical Methods in NLP (EMNLP 2017).

[pdf]


21) Integrating Lexical and Temporal Signals in Neural Ranking Models for Searching Social Media Streams.

Jinfeng Rao, Hua He, Haotian Zhang, Ferhan Ture, Royal Sequiera, Salman Mohammed, and Jimmy Lin

In Proc. of SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR 2017).

[pdf]


20) Mining Temporal Statistics of Query Terms for Searching Social Media Posts.

Jinfeng Rao, Ferhan Ture, Xing Niu and Jimmy Lin

To appear in International Conference on the Theory of Information Retrieval (ICTIR 2017).

[pdf]


19) Learning to Translate for Multilingual Question Answering.

Ferhan Ture and Elizabeth Boschee

In Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP 2016).

[pdf]


18) Ask Your TV: Real-Time Question Answering with Recurrent Neural Networks.

Ferhan Ture and Oliver Jojic

In Proc. of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016) - Industry Track.

[pdf] [presentation]


17) Structured TV Shows --- "You have been Chopped".

Ferhan Ture, Jonghyun Choi, Hongcheng Wang and Vamsi Potluru

In ICML Workshop on Multi-View Representation Learning (MVRL 2016).

[pdf] [poster]


16) Exploiting Representations from Statistical Machine Translation for Cross-Language Information Retrieval.

Ferhan Ture and Jimmy Lin

In ACM Transactions on Information Systems (TOIS). Volume 32, Issue 4, 2014.

[pdf]


15) Learning to Translate: A Query-Specific Combination Approach for Cross-Lingual Information Retrieval.

Ferhan Ture and Elizabeth Boschee

In Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP 2014).

[pdf]


14) Towards Efficient Large-Scale Feature-Rich Statistical Machine Translation.

Vladimir Eidelman, Ke Wu, Ferhan Ture, Philip Resnik and Jimmy Lin

In Proc. of Workshop on Statistical Machine Translation (WMT 2013).


13) Mr. MIRA: Open-Source Large-Margin Structured Learning on MapReduce.

Vladimir Eidelman, Ke Wu, Ferhan Ture, Philip Resnik and Jimmy Lin

In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL 2013).


12) Flat vs. Hierarchical Translation Models for Cross-Language Information Retrieval.

Ferhan Ture and Jimmy Lin

In Proc. of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2013).

[presentation]


11) Combining Statistical Translation Techniques for Cross-Language Information Retrieval.

Ferhan Ture, Jimmy Lin, and Douglas W. Oard

In Proc. of International Conference on Computational Linguistics (COLING 2012).

[pdf]


10) Looking Inside the Box: Context-Sensitive Translation for Cross-Language Information Retrieval.

Ferhan Ture, Jimmy Lin and Douglas W. Oard

In Proc. of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012).

[pdf] [poster]


9) Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling.

Ferhan Ture and Jimmy Lin

In Proc. of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2012).

[pdf] [presentation] [code] [data]


8) Encouraging Consistent Translation Choices.

Ferhan Ture, Douglas W. Oard and Philip Resnik

In Proc. of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2012).

[pdf] [presentation]


7) No Free Lunch: Brute Force vs Locality-Sensitive Hashing for Cross-Lingual Pairwise Similarity.

Ferhan Ture, Tamer Elsayed and Jimmy Lin

In Proc. of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011).

[pdf] [presentation] [code]


6) cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models.

Chris Dyer, Adam Lopez, Juri Ganitkevitch, Jonathan Weese, Ferhan Ture, Phil Blunsom, Hendra Setiawan, Vladimir Eidelman and Philip Resnik

In Proc. of Association for Computational Linguistics (ACL 2010 - Demonstration Track).

[software]


5) HAPLO-ASP: Haplotype Inference using Answer Set Programming.

Esra Erdem, Ozan Erdem, and Ferhan Ture

In Proc. of Logic Programming and Nonmonotonic Reasoning (LPNMR 2009).

[abstract] [pdf]


4) Comparing ASP, CP, ILP on two Challenging Applications: Wire Routing and Haplotype Inference.

Elvin Coban, Esra Erdem and Ferhan Ture (alphabetical order)

In Proc. of Logic and Search (LaSh 2008).

[abstract] [pdf]


3) Efficient Haplotype Inference with Answer Set Programming.

Esra Erdem and Ferhan Ture (alphabetical order)

In Proc. of Association for the Advancement of Artificial Intelligence (AAAI 2008).

[abstract] [presentation] [pdf]


2) Solving challenging grid puzzles with answer set programming.

Merve Cayli, Ayse Gul Karatop, Emrah Kavlak, Hakan Kaynar, Ferhan Ture and Esra Erdem

In Proc. of Answer Set Programming (ASP 2007).

[abstract] [pdf]


1) Learning Morphological Disambiguation Rules for Turkish.

Deniz Yuret and Ferhan Ture

In Proc. of North American Chapter of the Association for Computational Linguistics (NAACL 2006).

[abstract] [presentation] [pdf]


Technical Reports

1) Brute-Force Approaches to Batch Retrieval: Scalable Indexing with MapReduce, or Why Bother?.

Tamer Elsayed, Ferhan Ture, and Jimmy Lin

Technical Report HCIL-2010-23, University of Maryland, College Park, October 2010.

[pdf]


Theses

Searching to Translate, and Translating to Search: When Information Retrieval Meets Machine Translation.

Ferhan Ture

Doctoral Dissertation, University of Maryland, College Park. May 2013.

[presentation] [pdf]


A Hybrid Machine Translation System from Turkish to English.

Ferhan Ture

Masters Thesis, Sabanci University, Turkey. July 2008.

[abstract] [presentation] [pdf]


Projects

Current Research Projects



Past Research Projects

2011-2013

Using Translation Models to Improve CLIR

Joint work with Doug Oard and Jimmy Lin.


Translation models can provide better translations for the task of CLIR by using larger

translation units (e.g. phrases) and wider context. We explore ways that a translation grammar

and decoder can improve effectiveness and efficiency of CLIR systems.


2010-2012

Encouraging Translation Consistency

Joint work with Doug Oard and Philip Resnik.


We re-visit the one-sense-per-discourse heuristic in the context of translation, and argue that the translation of a token should be consistent throughout a discourse. We’ve implemented variants of this heuristic as a feature in the translation model and have shown significant BLEU improvements in both Arabic-English and Chinese-English.


2009-2013

Cross-lingual Pairwise Similarity Computation in Large Collections of Documents

Joint work with Jimmy Lin.


Pairwise similarity is the task of finding similar pairs of documents in a large collection efficiently. We can extend this to cross-lingual domains such as Wikipedia, to detect similar documents written in different languages. We explore various approaches to implement this idea and propose using it in the application of bilingual parallel text collection.


2009-2010

Parallel Conditional Random Field (CRF) Training for Machine Translation Systems

Joint work with Chris Dyer, Jimmy Lin, and Philip Resnik.


We parallelize CRF training, a supervised learning method that is a good combination of descriptive and generative learning approaches. The feature set one can use in a CRF model is very flexible and it can be trained using an EM-like approach. Scalability of CRF models is necessary in MT applications, which is the motivation to parallelize the process with MapReduce.


2009-2010

Learning a Sentiment Lexicon from the Web

Joint work with Jimmy Lin.


We can exploit lots and lots of data (50 million English web pages from the ClueWeb09 collection), in order to learn a sentiment lexicon in an unsupervised manner. Using emoticons as annotations, we propose an approach to determine subjectivity by calculating various term statistics.


2009

Learning Decision Lists in Parallel for Morphological Disambiguation in Turkish

Joint work with Jimmy Lin.


The goal is to apply “cloud computing” to parallelize the process of learning decision lists. Our approach was intended to scale our previous morphological disambiguator (a joint work with Deniz Yuret) to much larger data sets, and possibly perform bootstrapping to gain from unannotated data.


2007-2008

A Hybrid Machine Translation (MT) System from Turkish to English

Supervised by Prof. Kemal Oflazer.


We have created a hybrid Turkish-to-English MT system, which maps Turkish text to all possible English translations, and builds an English language model that selects the most probable one. Mapping is done at the sentence level via a parallel grammar implemented using Avenue1 transfer engine, and the SRILM Toolkit2 is used to create a language model.


2006-2009

Formal Approaches to Haplotype Inference

Supervised by Dr. Esra Erdem.


In this project, we develop new formal approaches to solving Haplotype Inference problem, by means of various declarative programming paradigms, such as Answer Set Programming, Constraint Programming and Integer Linear Programming.


2006-2008

AI Planning for Genome Rearrangement

Supervised by Dr. Esra Erdem.


We view the genome rearrangement problem as the problem of planning rearrangement events that transform one genome to the other, represent it as a planning problem, and use TLPlan to solve it.


Fall 2006

Automated Reasoning about Challenging Grid Puzzles

Supervised by Dr. Esra Erdem. Joint work with Merve Cayli, Ayse Gul Karatop, Emrah Kavlak, and Hakan Kaynar.

 

In this project we study challenging grid puzzles (of complexity NP) interesting for answer set programming from the viewpoints of representation and computation.


Spring 2006

Perturbation Theory and WKB Approximation Methods

Supervised by Prof. Ali Mostafazadeh.


In this work, we analyzed the method of asymptotic approximations, and applied the WKB Approximation Method to solve the Schroedinger Equation.


2005-2006 

Morphological Analysis and Disambiguation of Turkish Language

Supervised by Dr. Deniz Yuret.


We developed a learning-based morphological disambiguator for Turkish. Stand-alone disambiguator can be downloaded here.


2005-2006 

Prediction of Lagrangian Trajectories in the Ocean

Supervised by Dr. Mine Caglar.


We approximate the trajectory of an object lost in the ocean, using a mathematical model that applies interpolation and regression techniques to discrete time-position data.