(HWT)- What is the topic and why is important - State the problem as simple as you can - First paraghraphs can be less dry - Where the thesis is going? (Aiala)- Planteo del problema- Motivación - Conceptos importantes - Objetivos - Descripción (del proceso y del producto) - Contribuciones - Organización del documento ¿Cuál es el problema que quiero resolver? Identificar hedge cues y su alcance? Identificar hedges? Alcance? Los hedges tienen un alcance (proposición afectada por el hedge?) El problema este ya está estudiado (y mucho), ¿cuál sería mi aporte? El uso de una metdología (que podría extenderse), que combina aprendizaje automático con conociemiento experto (sistema híbrido). Tendría que leer la tesis de Medlock. Presenta research questions. Entonces? - Tendría que presentar el problema del hedging y la detección de alcance, junto con su motivación (por ejemplo, reconocer si una relación extraída no está bajo la sombra de un hedge, buscar otros ejemplos). Mostrar mediante ejemplos los diferentes casos que hay, y cómo están relacionados con información léxica, sintáctica e incluso semántica. Luego, presento las research directions: - Estudiar cómo puede mejorar el reconocimiento al incorporar resultados de análisis de texto - Estudiar cómo incorporar conocimiento experto al método de aprendizaje supervisado que se utilice inicialmente - Definir una metodología y una arquitectura de software adecuada para resolver el problema, buscando que pueda extenderse a problemas similares de procesamiento de lenguaje natural. Luego una descripción de la solución presentada (esbozo de la metodología y de la solución, tal vez algún resultado, aunque me deja mis dudas). Finalmente, la organización de la tesis. Referencias In this report we investigate the application of classi.cation methods to four topical NLP tasks. Our investigation covers a wide range of techniques, focusing predominantly, though not exclusively, on probabilistic approaches, and exploring not only the classi.cation models themselves but also related issues such as data annotation, sample representation and evaluation strategies. In some cases we investigate tasks that are relatively new to the NLP community, while in others we explore better-known tasks, seeking a deeper understanding of both the problems and proposed solutions. Our focus is on analysis of the behaviour of classi.cation models as they are applied to each of the NLP tasks, rather than on theoretical mathematical analysis of the models themselves, though some under- standing of statistics and probability theory will be necessary. Our aim is to investigate why di.erent methods are appropriate for particular NLP tasks, as well as demonstrating their e.ectiveness through experiment. ... When the samples in a given classi.cation domain are ordered sequentially with strong interdependencies, as is the case for instance in part-of-speech tagging for natural language, or phoneme labeling in continuous speech recognition, a somewhat di.erent classi.cation paradigm is often used to the one we have presented so far. Models that are specifically designed to deal with sequential data are known as sequential class.ers and can also be designated within the generative/discriminative probabilistic/non-probabilistic framework. A popular generative probabilistic sequential classi.er is the Hidden Markov Model (HMM) which utilises the Markov assumption to label sequence data e.ciently (Rabiner 1989, Capp.e, Moulines & Ryden 2005). More recently Conditional Random Fields (CRFs) have been introduced with the ability to condition the category of the focus sample on features of arbitrary dependency distance (La.erty, McCallum & Pereira 2001, Sutton & McCallum 2006). In our work we focus almost exclusively on non-sequential classi.cation models. ... Hedge classi.cation is the task of automatically labelling sections of written text according to whether or not they contain expressions of linguistic `a.ect' used by the author to convey speculativity. It is a relatively new task within the NLP community and is currently of growing interest due to the expansion of research into the application of NLP techniques to scienti.c literature from the .elds of biomedicine/genetics (bioinformatics). NLP techniques are often used to identify and extract experimental results reported in scienti.c literature and thus rely on the veracity of the reported .ndings. However, experimental conclusions are often hedged by the authors if they consider them to be only speculative or potentially unreliable, and it is useful to be able to automatically identify when this is the case. In our formulation, the task involves classi.cation at the sentence level, and therefore di.ers in granularity from the other tasks. Hedges: A study in meaning criteria and the logic of fuzzy concepts (lakoff1973)Yet students of language, especially psychologists and linguistic philosophers, have long been attuned to the fact that natural language concepts have vague boundaries and fuzzy edges and that, consequently, natural language sentences will very often neither true, or false, nor nonsensical, but rather true to a certain extent and false to a certain extent, true in certain respects and false in other respects. ... For me, some of the most interesting questions are raised by the study of words whose meaning implicitly involves fuzziness, words whose job is to make things fuzzier or less fuzzy. I will refer to those words as 'hedges'. ... Sort of is a predicate modifier, but one of a type that has not been previously studied in formal semantics in that its effects can only be described in terms of membership functions for fuzzy sets. It takes values that are true or close to true and makes them false while uniformly raising values in the low to mid range of the scale, leaving the very low range of the scale constant. ... Dana Scott (PC) has suggested a method for setting up fuzzy propositional logic in a way that shows its relation to modal and many-valued logic in general. The Kripke semantics for modal logics is based on the notion of a 'possible world', that is, a complete and consistent assignment of truth values to every proposition, in other words, a classical (two-valued) valuation. A model for classical modal logic contains a set of possible worlds (that is, a set of classical valuations) and a two-place alternativeness relation between worlds (that is, a relation between classical valuations). Scott has suggested a semantics for propositional fuzzy logic that looks like a modal semantics, though as we will see below, it differs from classical modal semantics. Hedging is the expression of tentativeness and possibility in language use and it is crucial to scientific writing where statements are rarely made without subjective assessments of truth. ... Hedges indicate interpretations and allow writers to convey their attitude to the truth of the statements they accompany, thereby presenting unproven claims with caution and softening categorical assertions. Los hedges aplican a "claims", "assertions" o "facts" ... This papers examines the importance of hedging in cell and molecular biology research articles and characterizes its extent and major forms of realisation in this genre. It is based on a corpus of 75,000 words taken from 26 RAs in the six leading journals in the field, identified by expert informants and the Journal Citation reports. ... used to refer to devices which qualify the writer's expression. ... Essentially it represents an absence of certainty and is used here to describe any linguistic item or strategy employed to indicate either a) a lack of commitment to the truth value of an accompanying proposition or b) a desire not to express that commitment categorically. ... The term does not therefore include other attitudinal markers or devices which convey the writer's conviction; items are only hedges in their epistemic sense and only then when they mark uncertainty. ... My corpus shows that hedging represents more than one word in every 50 and this is supported by numerous studies looking at "authorial comment"which have all found one hedge every two or three sentences. ... Categories: lexical verbs (23%), adverbial constructions (21%), adjectives (18.8), modal verbs (16.6) , reference to limiting condicionts, modal nouns, reference to a model, theory or methodology or admission to a lack of knowledge (17%). ... Hedging is principally a lexical phenomenon. ... Lexical verbs constituted the greatest range of items with 38 different forms represented by indicate, suggest, appear and propose (55.7% de las instancias) S1.7 These results indicate that in monocytic cell lineage, HIV-1 could mimic some differentiation/activation stimuli allowing nuclear NF-KB expression. S6.8 Thus, it appears that the T-cell-specific activation of the proenkephalin promoter is mediated by NF-kappa B. S141.8 We propose that a generally permissive enhancer/promoter interaction is of evolutionary benefit for higher eukaryotes: by enhancer shuffling, genes could be easily brought under a new type of inducibility/cell type specificity. S144.8These results suggest that Oct-2 participates in transcriptional regulation during T-cell activation. All four verbs, and particularly indicate and suggest, appear to be more prominent in scientific writing. Modal adjectives [...] The most frequently occurring modal adjectives where likely, possible, most and consistent with S149.7This is most likely due to the presence of the lymphoid-specific OTF-2 factor. S410.17It is possible
that the CRE site is responsible for induction of bcl-2 expression in
other cell types, particularly those in which protein kinase C is
involved. (most y consistente with no están marcados como hedges en bioscope) Adverbial constructions over 36 adverbial forms were identifie [...] which included downtoners such as quite, almost and usually which lower the effect of the force of the verb, and disjuncts that convey an attitude to the truth of a staements (probably, generally, evidently). S572.7Thus, cosignaling via the LT-beta and TNF-alpha receptors is probably involved in the modulation of HIV-1 replication and the subsequent determination of HIV-1 viral burden in monocytes (generally no se marca como hedge en bioscope) S653.7Thus, depending on the cellular environment, the short- and long-term effects of Tax expression can be quite different. (en esta oración, bioscope marca el can como hedge... en general el quite no se marca, y almost tampoco) Modal verbs Some 65% of the modal verbs in the corpus were used epistemically. Would, may and could [...] accounted for 76.6% of the total, suggesting thath a more restricted range is used in scientifica writing than in conversation. Modal nouns such as possibility, assumption, estimate and tendency are less significant in the RAs S934.10The possibility that Oct-2a can act as an activator and/or a repressor may have important consequences for the function of Oct-2a in B cell differentiation and other developmental processes. S299.14Altogether, the data do not provide any evidence to support an assumption
that moderately increased body burdens of PCDDs/PCDFs in adults induce
decreases in the cellular components of the human immune system. ... Interstingly, hedges tend to reinforce one another in clusters with 43% of hedges occurring in the same sentences as at least one other device. ... Strategic markers 15% The most numerous strategies are those which qualify commitment by referring to experimental weaknesses or to shortcomings in the model, theory or methodology. Some examples help frame the proposition in a suitable environment of uncertainties and probabilities (The model implies that...) while other cases simply signal possible alternative explanations: S141.8 We propose that a generally permissive enhancer/promoter interaction is of evolutionary benefit for higher eukaryotes: by enhancer shuffling, genes could be easily brought under a new type of inducibility/cell type specificity. S416.10 Taken together, these results support our conclusion that transcriptional transactivation is a necessary signaling component of S49 cell apoptosis, although an additional role for GR-mediated transrepression cannot be excluded. Writhers also refer to deficiencies in the research model, theory or method.... According to current models the inhibitory capacity of I(kappa)B(alpha) would be mediated through the retention of Rel/NF-kappaB proteins in the cytosol. En bioscope, el according to current models no está marcado como un hedge S693.7According to our results, however, the kinases responsible for the two processes are distinct entities. Acá ni está marcado The variation of forms used to express these strategies and the fact they are not nearly quantifiable means their significance has been overlooked in the literature. They should, however, be consdiered among the hedging devices available to scientific writers. The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes (vincze2008)The corpus consists of three parts, namely medical free texts, biological full papers and biological scientific abstracts. ... more than 20.000 sentences that were considered for annotation and over 10% of them actually contain one (or more) linguistic annotation suggesting negation or uncertainty. ... The annotation is based on linguistic principles, i.e. parts of sentences which do not contain any biomedical term are also annotated if they assert the non-existence/uncertainty of something. ... As for speculative annotation, sentences that state the possible existence of a thing, i.e. neither its existence nor its non-existence is unequivocally stated are considered speculative sentences. If a sentence is a statement, that is, it does not include any speculative element that suggests uncertainty, it is disregarded. Questions inherently suggest uncertainty – which is why they are asked -, but they will be neglected and not annotated unless they contain speculative language. ... Also, the speculative or negative cue is always included within its scope: This result (<suggests> that the valency of Bi in the material is smaller than + 3). ... On the other hand, a sequence of words cannot be marked as a complex keyword if it is only one of those words that express speculative or negative content (even without the other word). Thus prepositions, determiners, adverbs and so on are not annotated as parts of the complex keyword if the keyword can have a speculative or negative content on its own: The picture most (<likely> reflects airways disease). ... When marking the scopes of negative and speculative keywords, we extended the scope to the largest syntactic unit possible ... The syntactic structure of the above sentence is ambiguous. First, the airways disease is surely mild, but it is not known whether it is viral or reactive: (viral <or> reactive); or second, the airways disease is either mild and viral or reactive and not mild (mild viral <or> reactive). Most of the sentences with similar problems cannot be disambiguated on the basis of contextual information, hence the proper treatment of such sentences remains problematic. However, we chose to mark the widest scope available: in other words, we preferred to include every possible element within the scope rather than exclude elements that should probably be included. Thus, in the previous two examples, the wider scopes were finally marked. .Interesante como ejemplo de dificultad para la introducción ... he scope of a keyword can be determined on the basis of syntax. The scope of verbs, auxiliaries, adjectives and adverbs usually extends to the right of the keyword. In the case of verbal elements, i.e. verbs and auxiliaries, it ends at the end of the clause (if the verbal element is within a relative clause or a coordinated clause) or the sentence, hence all complements and adjuncts are included, in accordance with the principle of maximal scope size. Take the following examples: Baseline? ... The scope of attributive adjectives generally extends to the following noun phrase, whereas the scope of predicative adjectives includes the whole sentence. For example, in the following two statements: This is a 3 month old patient who had (<possible> pyelonephritis) with elevated fever. (Atelectasis in the right mid zone is, however, <possible>). ... Complex keywords such as either ... or have one scope: Mild perihilar bronchial wall thickening may represent (<either> viral infection <or> reactive airways disease). Oh! esto no lo estoy teniendo en cuenta en mi análisis ... The main exception that changes the original scope of the keyword is the passive voice. The subject of the passive sentence was originally the object of the verb, that is, it should be within its scope. This is why the subject must also be marked within the scope of the verb or auxiliary. For instance, (A small amount of adenopathy <cannot be> completely <excluded>). ... Another example of scope change is the case of raising verbs (seem, appear, be expected, be likely etc.). These can have two different syntactic patterns, as the following examples suggest: It seems that the treatment is successful. The treatment seems to be successful. In the first case, the scope of seems starts right with the verb. If this was the case in the second pattern, the treatment would not be included in the scope, but it should be like that shown in the first pattern. Hence in the second sentence, the scope must be extended to the subject as well: It (<seems> that the treatment is successful). (The treatment <seems> to be successful). ... However, we have come across several cases where the presence of a speculative/negative keyword does not imply a hedge/negation. That is, some of the cue candidates do not denote speculation or negation in all their occurrences. In other words, they are ambiguous. ... 1. This part contains 1954 documents, each having a clinical history and an impression part, the latter being denser in negated and speculative parts. 2. Another part of the corpus consists of full scientific articles. 5 articles from FlyBase (the same data were used by Medlock and Briscoe [8] for evaluating sentence-level hedge classifiers) and 4 articles from the open access BMC Bioinformatics website were downloaded and annotated for negations, uncertainty and their scopes. 3. We therefore decided to include quite a lot of texts from the abstracts of scientific papers. This is why we included the abstracts of the Genia corpus [12].
Doubt and Certainty in ESL Textbooks (holmes1988)Learning to express doubt and certainty in English is a complex task, but an important one since epistemic devices also function pragmatically as politeness markers. ... Modal expressions are complex because they express such a wide range of meanings and because the linguistic devices used do not relate to particular meanings in a convenient one-to-one relationship. There are many many ways of expressing doubt and certainty in English including modal verbs [...] adjectives [...] tag questions and fall-rise information, as well as a wide selection of non-linguistic and paralinguistic devices such as facial expression , hesitations, and stutters. ... Discussions of politeness in the sociolinguistic literature refere to epistemic devices used in this way as hedges. .. linguistic devices which can be used both as means of expressing the extent of the speaker's confidence about the validity of a proposition (i.e. to express epistemic modality), and as pragmatic devices modifying the illocutionary force of utterances for interpersonal reasons (i.e. politeness strategies). For simplicity I will henceforth refere to such forms, whatever their primary function, as epistemic devices. ... Give, for example, that epistemic modality may be expressed prosodically, syntactically, lexically, and through discourse markers, it is undoubtedly true that lexical devices are the easiest of these to acquire. ... The former section focuses on epistemic modal verbs, adjectives, and adverbials, while the latter also provides examples of lexical verbs and nouns used to express the speaker's doubt or certainty. ... In order to evaluate the books, information on the range and frequency of lexical expressions of doubt and certainty in native speakers'usage was collected from four different corpuses.[...] This was supplemented by lexical items identified in the research literature, and by native speakers'contributions. Given the nature of language and language use, an inventory of such items can never be complete, of course, but this approach identified well over 350 relevant lexical items. ... In this paper I have combined information on the range and frequency of lexical devices used to express doubt and certainty in order to evaluate the treatment of epistemic modality in four ESL textbooks. [...] Many, for instance, devote an unjustifiably large amount of attention to modal verbs, neglecting alternative linguistic strategies for expressing doubt and certainty. Native speakers do not confine themselves to modal verbs, and it is important therefore that textbooks present learners with alternative syntactic and lexical devices selected from those occurring most frequently in relevant spoken and written texts. | ||||||||||||||||||||||||||||||||||||||||