BIU Machine Learning and Data science

Learning Club

Seminar in machine learning, data science, & applications, @ Bar-Ilan University

Announcements are made in the following group: (link).

Supported by the BIU Data Science Institute.

We meet on Thursdays. Talks begin at 10am.

Google Calendar

Stay tuned and subscribe to our calendar by pressing the plus button at the bottom right corner :)


Upcoming Talks

Feb. 24th 2019, Sun. 12:00 , Sagie Benaim (webpage).

Tel-Aviv University (PhD Student).

Location: Gonda Building (901), Room 101.

New Capabilities in Unsupervised Image to Image Translation

Abstract:

In Unsupervised Image to Image Translation, we are given an unmatched set of images from domain A and domain B, and our task is to generate, given an image from domain A, its analogous image in domain B.

In the first part of the talk, I'll describe a new capability which allows us to perform such translation, where only a single image is present in domain A. Specifically, given a single image x from domain A and a set of images from domain B, our task is to generate the analogous of x in B. We argue that this task could be a key AI capability that underlines the ability of cognitive agents to act in the world and present empirical evidence that the existing unsupervised domain translation methods fail on this task.

In the second part of the talk, I'll describe a new capability which allows us to disentangle the "common" and "domain-specific" information of domains A and B. This allows us to generate, given a sample a in A and a sample b in B, an image in domain B which contains the "common" information of a and "domain-specific" information of b. For example, ignoring occlusions, B can be "people with glasses", A can be "people without". The "common" information is "faces" where the "domain-specific" information of B is "glasses". At test time, we add the glasses of person in domain B to any person in domain A.

Lastly, time permitting, I'll describe the application of these techniques in the context of Singing Voice Separation, where the training data contains a set of samples of mixed music (singing and instrumental) and an unmatched set of instrumental music.


Mar. 3rd 2019, Sun. 12:00 , Daniel Soudry (webpage).

Technion - Israel Institute of Technology.

Location: Gonda Building (901), Room 101.


Mar. 10th 2019, Sun. 12:00 , Yair Weiss (webpage).

The Hebrew University of Jerusalem.

Location: Gonda Building (901), Room 101.


Mar. 17th 2019, Sun. 12:00 , Tomer Galanti (webpage).

Tel-Aviv University (PhD Student).

Location: Gonda Building (901), Room 101.


Mar. 31st 2019, Sun. 12:00 , Sivan Sabato (webpage).

Ben-Gurion University.

Location: Gonda Building (901), Room 101.


Apr. 28th 2019, Sun. 12:00 , Raja Giryes (webpage).

Tel-Aviv University.

Location: Gonda Building (901), Room 101.


May. 12th 2019, Sun. 12:00 , Lihi Zelnik-Manor (webpage).

Technion - Israel Institute of Technology.

Location: Gonda Building (901), Room 101.


June. 2nd 2019, Sun. 12:00 , Oren Freifeld (webpage).

Ben-Gurion University.

Location: Gonda Building (901), Room 101.

Previous Talks

Only presentations from this year are shown.

Jan. 6th 2019, Sun. 12:00 , Roy Bar-Haim (webpage).

Debating Technologies group, IBM Research AI - Haifa.

Location: Gonda Building (901), Room 101.

Stance Classification and Sentiment Analysis in IBM Project Debater

Abstract:

Project Debater is the first AI system that was shown to debate humans in a meaningful manner in a full live debate. Developing this system started in 2012, as the next AI Grand Challenge pursued by IBM Research, following the demonstration of Deep Blue in Chess in 1997, and Watson in Jeopardy! In 2011. The system was revealed in June 2018, in two full live debates against expert human debaters, and received massive media attention.

In this talk I will first give a high-level view of the project and its core technologies. I will then focus on one of its most challenging parts – understanding the stance of arguments. I will survey several of our works on stance classification and sentiment analysis of arguments, which resulted in several publications, language resources and datasets.

In the last part of the talk, I will present our recent work on learning sentiment composition, a fundamental sentiment analysis problem. Previous work relied on

manual rules and manually-created lexical resources such as negator lists, or learned a composition function from sentiment-annotated phrases or sentences. We propose a new approach for learning sentiment composition from a large, unlabeled corpus, which only requires a word-level sentiment lexicon for supervision.

Bio:

Roy Bar-Haim is a Research Staff Member in IBM Research – Haifa. Over the last six years, he has been leading a global team of research scientists working on core components in Project Debater. Roy also serves as the Haifa lab’s co-chair of the Natural Language Processing Professional Interests Community (NLP PIC). Before joining IBM, he led NLP and ML research teams in several startups. He has published in, and reviewed for, top NLP and AI conferences and journals. He serves on the elite standing reviewer team of TACL (Transactions of the Association for Computational Linguistics) and was an area co-chair at the COLING 2016 conference. Roy received his B.Sc and M.Sc degrees from the Technion, and his Ph.D from Bar-Ilan University, all in computer science.

Dec. 16th 2018, Sun. 12:00 , Aryeh Kontorovich (webpage).

Ben-Gurion University.

Location: Gonda Building (901), Room 101.

Vignettes on sample compression

Abstract:

Sample compression is a natural and elegant learning framework, which allows for storage and runtime savings as well as sharp generalization bounds. In this talk, I will survey a few recent collaborations that touch upon various aspects of sample compression. Central among these is the development of a new algorithm for learning in arbitrary metric spaces based on a margin-regularized 1-nearest neighbor, which we call OptiNet. The latter is strongly universally Bayes-consistent in all essentially-separable metric probability spaces. OptiNet is the first learning algorithm to enjoy this property; by comparison, k-NN and its variants are not Bayes-consistent, except under additional structural assumptions, such as an inner product, a norm, finite doubling dimension, or a Besicovitch-type property. I will then talk about sample compression in the context of regression, extensions to non-uniform margins, and, time permitting, generalization lower bounds.


Nov. 25th 2018, Sun. 12:00 , Tamir Hazan (webpage).

Technion - Israel Institute of Technology.

Location: Gonda Building (901), Room 101.

Direct Optimization through argmax for Discrete Variational Auto-Encoder

Abstract:

Reparameterization of variational auto-encoders is an effective method for reducing the variance of their gradient estimates. However, when the latent variables are discrete, a reparameterization is problematic due to discontinuities in the discrete space. In this work, we extend the direct loss minimization technique to discrete variational auto-encoders. We first reparameterize a discrete random variable using the arg max function of the Gumbel-Max perturbation model. We then use direct optimization to propagate gradients through the non-differentiable arg max using two perturbed arg max operations.

Nov. 11th 2018, Sun. 12:00 , Ohad Shamir (webpage).

Weizmann Institute of Science.

Location: Gonda Building (901), Room 101.

Optimization Landscape of Neural Networks: Where Do the Local Minima Hide?

Abstract:

Training neural networks is a highly non-convex optimization problem, which is often successfully solved in practice, but the reasons for this are poorly understood. Much recent work has focused on showing that these non-convex problems do not suffer from poor local minima. However, this has only been provably shown under strong assumptions or in highly restrictive settings. In this talk, I’ll describe some recent results on this topic, both positive and negative. On the negative side, I’ll show how local minima can be ubiquitous even when optimizing simple, one-hidden-layer networks under favorable data distributions. On the flip side, I’ll discuss how looking at other architectures (such as residual units), or modifying the question, can lead to positive results under mild assumptions.

June 21st 2018, Thu 10:00 , Amir Globerson (webpage).

Tel Aviv University (Faculty).

Location: Gonda Building (901), Room 101.

Deep Learning: Optimization, Generalization and Architectures

Abstract:

Three key challenges in deep learning are: understanding why optimization works despite non-convexity, understanding why generalization is possible despite training very large models with limited data, and understanding architecture design. In this talk I will discuss our recent work on these questions.

June 14th 2018, Thu 10:00 , Eran Malach (webpage) (slides).

The Hebrew University of Jerusalem (PhD Student)

Location: Gonda Building (901), Room 101.

A Provably Correct Algorithm for Deep Learning that Actually Works

Abstract:

We describe a layer-by-layer algorithm for training deep convolutional networks, where each step involves gradient updates for a two layer network followed by a simple clustering algorithm. Our algorithm stems from a deep generative model that generates images level by level, where lower resolution images correspond to latent semantic classes. We analyze the convergence rate of our algorithm assuming that the data is indeed generated according to this model (as well as additional assumptions). While we do not pretend to claim that the assumptions are realistic for natural images, we do believe that they capture some true properties of real data. Furthermore, we show that our algorithm actually works in practice (on the CIFAR dataset), achieving results in the same ballpark as that of vanilla convolutional neural networks that are being trained by stochastic gradient descent.

June 13th 2018, Mon. 12:00 , Yinyin Liu.

Head of Data Science, Intel AIPG.

Location: Building 216, Room 201.

Abstract:

The Intel AI Lab, within the AI Products Group was formed last year with the goal of pursuing fundamental and applied AI research, and the long term vision of building brain-like capabilities. The lab is focused on developing and implementing state of the art algorithms in topics such as natural language processing, vision, audio, reinforcement learning, recommendation systems, and robotic learning. Key vertical areas include autonomous driving, federal and retail. We partner both internally with Intel Labs and other groups, and externally with universities and companies. The output includes open source software releases and publishing at top AI conferences. The work also helps our partner Intel teams build better hardware and software products, and marketing demos. In this talk I will give an overview of the AI Lab.

Yinyin’s BIO

Yinyin Liu is the head of data science for AIPG at Intel, where she works with a team of data scientists on applying deep learning and Intel Nervana technologies to business applications across different industry domains and driving the development and design of the Intel Nervana platform. She and the Intel Nervana team have developed open source deep learning frameworks, such as neon and Intel Nervana Graph, bringing state-of-the-art models on image recognition, image localization, and natural language processing into the frameworks. Yinyin has research experience in computer vision, neuromorphic computing, and robotics.

June 11th 2018, Mon. 11:00 , Zachary Chase Lipton (webpage).

Carnegie Mellon University (CMU).

Location: Gonda Building (901), Room 101.

Detecting and Correcting for Label Shift with Black Box Predictors

Abstract:

Faced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels. Motivated by medical diagnosis, where diseases (targets), cause symptoms (observations), we focus on label shift, where the label marginal p(y) changes but the conditional p(x|y) does not. We propose Black Box Shift Estimation (BBSE) to estimate the test distribution p(y). BBSE exploits arbitrary black box predictors to reduce dimensionality prior to shift correction. While better predictors give tighter estimates, BBSE works even when predictors are biased, inaccurate, or uncalibrated, so long as their confusion matrices are invertible. We prove BBSE's consistency, bound its error, and introduce a statistical test that uses BBSE to detect shift. We also leverage BBSE to correct classifiers. Experiments demonstrate accurate estimates and improved prediction, even on high-dimensional datasets of natural images.

Bio:

Zachary Chase Lipton is an assistant professor at Carnegie Mellon University. His research spans both core machine learning methods and their social impact. concentrating on machine learning for healthcare, data-efficient deep learning, temporal structure, and learning under domain adaptation. This work addresses diverse application areas, including diagnosis, dialogue systems, and product recommendation. He is the founding editor of the Approximately Correct blog and the lead author of Deep Learning – The Straight Dope, an open-source interactive book teaching deep learning through Jupyter notebooks. Find on Twitter (@zacharylipton) or GitHub (@zackchase).

June 7th 2018, Thu 10:00 , Uri Shalit (webpage).

Technion – Israel Institute of Technology (Faculty).

Location: Gonda Building (901), Room 101.

Learning to Act from Observational Data: Machine Learning and Causal Inference in Healthcare

Abstract:

The proliferation of data collection in the health, commercial, and economic spheres, brings with it opportunities for extracting new knowledge leading to concrete policy implications. An example that motivates my research is using electronic healthcare records to individualize medical practices.

The scientific challenge lies in the fact that standard prediction models such as supervised machine learning are often not enough for decision making from this so-called “observational data”: Supervised learning does not take into account causality, nor does it account for the feedback loops that arise when predictions are turned into actions. On the other hand, existing causal-inference methods are not adapted to dealing with the rich and complex data now available, and often focus on populations, as opposed to individual-level effects.

In my talk, I will discuss the challenges of applying machine learning in the clinical healthcare setting, and show how we apply recent ideas from machine learning and specifically deep-learning to individual-level causal-inference and action.

May 31st 2018, Thu 10:00 , Or Sharir (webpage) (slides).

The Hebrew University of Jerusalem (PhD Student)

Location: Gonda Building (901), Room 101.

On the Expressive Power of ConvNets and RNNs as a Function of their Architecture

Abstract:

The driving force behind convolutional and recurrent networks — two of the most successful deep learning architectures to date — is their expressive power. Despite its wide acceptance and vast empirical evidence, formal analyses supporting this belief are scarce. The primary notions for formally reasoning about expressiveness are efficiency and inductive bias. Efficiency refers to the ability of a network architecture to realize functions that require an alternative architecture to be much larger. Inductive bias refers to the prioritization of some functions over others given prior knowledge regarding a task at hand. Through an equivalence to hierarchical tensor decompositions, we study the expressive efficiency and inductive bias of various architectural features in convolutional networks (depth, width, pooling geometry, inter-connectivity, overlapping receptive fields etc.) as well as the long-term memory capacity of deep recurrent networks. Our results shed light on the demonstrated effectiveness of modern networks, and in addition, provide new tools for network design.

May 3rd 2018, Thu 10:00 , Jonathan Berant (webpage).

Tel Aviv University (Faculty).

Location: Gonda Building (901), Room 101.

Talking to your Virtual Assistant about anything

Abstract:

Conversational interfaces and virtual assistants are now part of our lives due to services such as Amazon Alexa, Google Voice, Microsoft Cortana, etc. Thus, translating natural language queries and commands into an executable form, also known as semantic parsing, is one of the prime challenges nowadays in natural language understanding. In this talk I would like to highlight the main challenges and limitations in the field of semantic parsing, and to describe ongoing work that addresses those challenges. First, semantic parsers require information to be stored in a knowledge-base, which substantially limits their coverage and applicability. Conversely, the web has huge coverage but search engines that access the web do not handle well language compositionality. We propose to treat the web as a KB and compute answers to complex questions in broad domains by decomposing the question into a sequence of simple questions, extract answers with a search engine, and recompose the answers to obtain a final result. Second, deploying virtual assistants in many domains (cars, homes, calendar, etc.) requires the ability to quickly develop semantic parsers. However, most past work trains semantic parsers from scratch for any domain, while disregarding training data from other domains. We propose a zero-shot approach for semantic parsing, where we decouple the structure of language from the contents of the domain and learn a domain-independent semantic parser.

Bio:

Dr. Jonathan Berant is a senior lecturer in The School of Computer Science since October 2016 working on various natural language understanding problems. Jonathan got his PhD from Tel-Aviv University in 2012 and has been a post-doctoral fellow at Stanford’s Computer Science Department from 2012 to 2015. Jonathan was also a post-doctoral fellow at Google Research from 2015 to 2016. Jonathan was an Azrieli fellow and an IBM fellow during his graduate studies, and a Rothschild fellow during his post-doctoral period. His work has been recognized by a best paper award (authored by a student) in ACL 2011, a best paper award in EMNLP 2014, and also best paper nominations in ACL 2013 and ACL 2014. Since being appointed as senior lecturer at Tel-Aviv University he has won grants from the ISF (2016), BSF (2017), and Samsung runway project (2017).

Apr 26th 2018, Thu 10:00 , Idan Schwartz (webpage).

Technion – Israel Institute of Technology (PhD Student).

Location: Gonda Building (901), Room 101.

High-Order Attention Models for Visual Question Answering

Abstract:

The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.

Mar 22nd 2018, Thu 10:00 , Eliya Nachmani.

Facebook AI Research.

Location: Gonda Building (901), Room 101.

Synthesis and Cloning Human Voices

Abstract:

Text to speech (TTS) is able to transform text to speech. In this talk we present a new neural TTS for voices that are sampled in the wild. We introduce a new network architecture - VoiceLoop which is simpler than those in the existing literature and is based on a novel shifting buffer working memory. Our solution is able to deal with unconstrained voice samples and without requiring aligned phonemes or linguistic features. We also show how we can control the emotion variability in the generated speech by priming the network buffer.

We further propose a TTS systems have the potential to generalize from one speaker to another with relatively short sample of any new voice. We present a method that is designed to capture a new speaker from a short untranscribed audio sample. This is done by employing an additional network that given an audio sample, places the speaker in embedding space. This network is trained as part of the speech synthesis system using various consistency losses. Our results demonstrate a greatly improved performance on both the dataset speakers, and, more importantly, when fitting new voices, even from very short samples.

Mar 15th 2018, Thu 10:00 , Ofra Amir (webpage).

Technion – Israel Institute of Technology (Faculty).

Location: Gonda Building (901), Room 101.

Distilling relevant information to support human-human and human-agent collaboration

Abstract:

One of today's biggest challenges is the heightened complexity and information overload stemming from increasingly interacting systems, consisting of both humans and machines. In this talk, I will describe work that aims to address this challenge in two settings: human teamwork and human-agent collaboration. In the context of human teamwork, I will present our work towards developing intelligent systems that reduce coordination overhead in distributed teams by personalizing the information that is shared with team members. We developed an algorithm that determines what information about others’ activities is most relevant to each team member, and show through a user study that such personalized information sharing resulted in higher productivity and reduced workload of team members, without detrimental effects on the quality of the team’s work.

In the context of human-agent collaboration, I will describe our work towards the development of methods for summarizing agent behavior, with the goal of enabling users to better understand the capabilities of agents they interact with. We developed “HIGHLIHGTS'”, an algorithm that extracts important trajectories from the execution trace of an agent to generate a succinct description of key agent behaviors. Our experiments show that study participants were more successful at assessing agents’ capabilities when shown summaries generated by HIGHLIGHTS compared to baseline summaries.

Jan 4th 2018, Thu 16:00 , Yonatan Belinkov (webpage).

Massachusetts Institute of Technology (PhD Student).

Location: Building 211, room 101.

Understanding Internal Representations in Deep Learning Models for Language and Speech Processing

Abstract:

Language technology has become pervasive in everyday life, powering applications like Apple’s Siri or Google’s Assistant. Neural networks are a key component in these systems thanks to their ability to model large amounts of data. Contrary to traditional systems, models based on deep neural networks (a.k.a. deep learning) can be trained in an end-to-end fashion on input-output pairs, such as a sentence in one language and its translation in another language, or a speech utterance and its transcription. The end-to-end training paradigm simplifies the engineering process while giving the model flexibility to optimize for the desired task. This, however, often comes at the expense of model interpretability: understanding the role of different parts of the deep neural network is difficult, and such models are often perceived as “black-box”. In this work, we study deep learning models for two core language technology tasks: machine translation and speech recognition. We advocate an approach that attempts to decode the information encoded in such models while they are being trained. We perform a range of experiments comparing different modules, layers, and representations in the end-to-end models. Our analyses illuminate the inner workings of end-to-end machine translation and speech recognition systems, explain how they capture different language properties, and suggest potential directions for improving them. The methodology is also applicable to other tasks in the language domain and beyond.


Jan 2nd 2018, Tue 14:00, Masashi Sugiyama (website).

RIKEN / The University of Tokyo (Faculty).

Location: Gonda Building (901), Room 101.

Machine Learning from Weak Supervision - Towards Accurate Classification with Low Labeling Costs

Abstract:

Machine learning from big training data is achieving great success. However, there are various application domains that prohibit the use of massive labeled data. In this talk, I will introduce our recent advances in classification from weak supervision, including classification from two sets of unlabeled data, classification from positive and unlabeled data, a novel approach to semi-supervised classification, and classification from complementary labels. Finally, I will briefly introduce the activities of RIKEN Center for Advanced Intelligence Project.


Bio:

Prof Sugiama is the director of the RIKEN center for advanced intelligence (AIP) and a professor at the department of complexity science and engineering at the University of Tokyo.

He was born in Osaka, Japan, in 1974. He received the degrees of Bachelor of Engineering, Master of Engineering, and Doctor of Engineering in Computer Science from Tokyo Institute of Technology, Japan in 1997, 1999, and 2001, respectively. In 2001, he was appointed Assistant Professor in the same institute, and he was promoted to Associate Professor in 2003. He moved to the University of Tokyo as Professor in 2014. From 2016, he concurrently serves as Director of RIKEN Center for Advanced Intelligence Project. He received an Alexander von Humboldt Foundation Research Fellowship and researched at Fraunhofer Institute, Berlin, Germany, from 2003 to 2004. In 2006, he received European Commission Program Erasmus Mundus Scholarship and researched at the University of Edinburgh, Edinburgh, UK. He received the Faculty Award from IBM in 2007 for his contribution to machine learning under non-stationarity, the Nagao Special Researcher Award from the Information Processing Society of Japan in 2011 and the Young Scientists' Prize for the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology Japan in 2014 for his contribution to the density-ratio paradigm of machine learning, and the Japan Society for the Promotion of Science Award and the Japan Academy Medal in 2017 for his series of machine learning research. His research interests include theories and algorithms of machine learning and data mining, and a wide range of applications such as signal processing, image processing, and robot control.

Dec 24th 2017, Sun 11:00, Yuval Pinter (webpage).

Georgia Institute of Technology (PhD Student).

Location: Cyber-Center Meeting Room.

Integrating Distributional and Compositional Approaches to Word Embeddings

Abstract:

In recent years, nearly all applications in Natural Language Processing have become dominated by Machine Learning methods that use dense, low-dimensional, word vectors (known as embeddings) as their building blocks, including Machine Translation, Question Answering, Sentiment Analysis, and many more. These word embeddings, particularly suited for use in neural nets, are typically obtained via techniques that stem from a distributional approach, i.e. learning to maximize vector similarity of words that tend to appear in similar contexts within large textual corpora. This powerful approach has its limits, a major one being representation of words not seen in the training corpus, known as the out-of-vocabulary (OOV) problem.

In my talk, I will present some approaches that tackle the OOV problem by complementing the distributional embedding methods with a compositional view of word structure. I will focus on our recent algorithm, MIMICK (EMNLP 2017), which produces OOV embeddings by re-learning distributionally-trained vectors using only the way words are spelled, using a character-level Recurrent Neural Net (RNN). I will show the merits of our model across a diverse array of 23 languages on a sequence-tagging task. I will discuss the implications of our results based on attributes of different languages and datasets, as well as some new findings relating to the architecture choices underlying the MIMICK model.


Dec 21st 2017, Thu 12:00 , Karen Livsecu (webpage).

Toyota Technological Institute at Chicago (Faculty).

Location: Building 216, Colloquium room.

How should we use domain knowledge in the era of deep learning? (A perspective from speech processing)

Abstract:

Deep neural networks are the new default machine learning approach in many domains, such as computer vision, speech processing, and natural language processing. Given sufficient data for a target task, end-to-end models can be learned with fairly simple, almost universal algorithms. Such models learn their own internal representations, which in many cases appear to be similar to human-engineered ones. This may lead us to wonder whether domain-specific techniques or domain knowledge are needed at all.

This talk will provide a perspective on these issues from the domain of speech processing. It will discuss when and how domain knowledge can be helpful, and describe two lines of work attempting to take advantage of such knowledge without compromising the benefits of deep learning. The main application will be speech recognition, but the techniques discussed are general.

Oct 25th 2017, Wed, 10:30, Omer Levy (website).

University of Washington (Post Doc).

Location: Gonda Building (901), Room 101.

What does an LSTM Learn?

Abstract:

Long short-term memory (LSTM) was designed to address the problem of vanishing gradients in a simple recurrent neural network (S-RNN) by introducing a memory cell that records information via addition. We observe, on a variety of natural language tasks, that replacing the embedded S-RNN with a simple linear transformation does not degrade performance, implying that the S-RNN's role in an LSTM is redundant. We conjecture that the modeling power of an LSTM stems directly from the memory cell, and examine the value that it stores at each iteration. Our analysis reveals that the memory cell dynamically computes an element-wise weighted sum over its inputs, suggesting that this more restricted function space is the main driving force behind the success of LSTMs.

Bio:

I am a post-doc in the Department of Computer Science & Engineering at the University of Washington, working with Prof. Luke Zettlemoyer. Previously, I completed my PhD at Bar-Ilan University with the guidance of Prof. Ido Dagan and Dr. Yoav Goldberg. I am interested in designing algorithms that mimic the basic language abilities of humans, and using them to realize semantic applications such as question answering and summarization that help people cope with information overload. I am also interested in deepening our qualitative understanding of how machine learning is applied to language and why it succeeds (or fails), in hope that better understanding will foster better methods.