Research meeting October 18

Next October 18 RELISCO network will organise a research meeting for all members. Our guest lecturer, Serge Verlinde (Lovain University) will give a talk:

Title: "Lexical error correction: yes, but how?"
Spelling and grammar checkers, both integrated in software or online, are well known. Language learners, however, make many lexical errors. Is it also possible to correct (automatically) these errors? Which approach do we have to choose?

Lecture on October 17

Next october 10 RELISCO network will organise a lecture at the Facultade de Filoloxía (Salón de Graos) of the Universidade da Coruña (1 pm)

Serge Verlinde (University of Lovaine)

Título: "Interactive Language Toolbox: from old-fashioned dictionaries to state-of-the-art writing assistants"
Abstract: The internet hosts many websites that provide interesting information on words. Unfortunately, many of these resources remain unknown or underused. On the Interactive Language Toolbox website, we provide a user-friendly access to a large number of sites for Dutch, English and French with specific tools for translation and reviewing (spelling, grammar and lexicon).

Lecture on October 9

Next october 9 RELISCO network will organise a lecture in the Aula de Grados at Facultad de Informática of the Universidade da Coruña, at 6 pm.

Éric Villemonte de la Clergerie (INRIA)

Título: "Desiging and improving FRMG, a wide coverage French meta-grammar"
I will present how the notion of metagrammar has been used to develop a large Tree-Adjoining Grammar (TAG) for French and focus on the description of some syntactic phenomena. The parser derived from the grammar has been tried on larger and larger corpora, and the second part of the talk will survey the long term effort that is needed to improve coverage, efficiency and accuracy. In particular, I will focus on recent experiments done to significantly improve the accuracy using machine learning techniques and existing syntactic annotations.

University-Enterprises Workshop on July 9

On July 9 RELISCO network will organize an enterprises workshop in the Salón de Grados of the Facultad de Filología in Santiago de Compostela. The presentations will deal with Opinion Mining. We will shortly include a program showing the enterprises which will participate as well as the content of the different lectures.

Seminar on June 15

On June 15 RELISCO network will have a seminar at the Facultad de Informática of the University of A Coruña (aula 2.1a). We will shortly include a detailed program with all the participants as well as the corresponding schedule. See below the abstracts of the conferences that will be given:

Xavier Carreras (UPF)
Authors: Xavier Carreras, Michael Collins and Terry Koo
Title: "A TAG formalism for Parsing and Translation"

Syntactic parsing is the fundamental problem of determining the structure of natural language sentences. It is a challenging task, because syntactic structures of natural languages are recursive, and there is a significant degree of ambiguity in determining how different parts of a sentence combine together syntactically. In any computational model for parsing, the choice of grammar formalism is critical to both the representational power of the model and its computational efficiency. In this talk I will describe a variant of a Tree Adjoining Grammar (TAG) that can use a wide variety of rich features and, at the same time, has efficient algorithms. I will present two applications of our TAG. The first is a discriminative parser, a generalization of Conditional Random Fields for structured prediction that extends the framework to syntactic parsing. The second application is machine translation, where we frame the problem as a parsing task. The TAG-based translation system makes direct use of syntactic structures in modeling differences in word order between different languages, and in modeling the grammaticality of translation output. In both applications we show improvements over state-of-the-art systems.

André Martins (Carnegie Mellon University)
Authors: André Martins, Noah Smith, Mário Figueirido, Eric Xing and Pedro Aguiar
Title: "Turbo Parsing and Constrained Inference with AD^3"

In the first part of this talk, I will present AD^3 (Alternating Direction Dual Decomposition), a new decoding algorithm for approximate LP-MAP inference in constrained factor graphs. The LP-MAP approximation consists in ignoring global effects caused by the cycles of the graph, and can be seen as a linear relaxation of the original problem. The proposed algorithm can handle arbitrary first-order logic constraints and is suitable to massive decompositions, unlike previously proposed dual decomposition algorithms. As an intermediate step, it requires solving small quadratic programs, for which I provide closed form solutions or efficient procedures.
In the second part of the talk, I will apply this methodology to dependency syntax with rich-feature models. I will start by formulating dependency parsing as a concise integer linear program, which is relaxed for tractability. A constrained factor graph is then constructed for this problem and the relaxation is shown to be equivalent to LP-MAP inference in such graph. The resulting framework is called "turbo parsing," and includes as particular cases other parsers proposed in the literature. Finally, I will apply AD^3 for solving the relaxation. Experiments in 14 languages yield state-of-art results.

Carlos Gómez Rodríguez (Universidade da Coruña)
Authors: Carlos Gómez Rodríguez y Daniel Fernández-González.
Title: "Undirected Parsing and Buffer Transitions: Two Approaches to Improve Transition-Based Dependency Parsers"

A dependency parser is a system that can be used to automatically obtain the structure of natural language sentences, as expressed by directed links (dependencies) between words. One of the most widely-used types of dependency parsers are transition-based parsers, which achieve this by using a non-deterministic state machine and a model that scores transitions between its states. In this talk, I will present two different approaches to modify existing transition-based dependency parsers in order to improve their accuracy.
In the first approach, we transform the dependency parsers into variants which build an undirected graph rather than a (directed) dependency structure. The undirected graph is then converted into a directed dependency tree in a post-processing step. This technique
alleviates error propagation, as undirected parsers do not need to observe the single-head constraint.The second approach consists of enriching the parsers with simple transitions that act on buffer nodes. We define two sets of such transitions: projective buffer transitions, which create a left or right links of length one between the first two buffer nodes; and non-projective buffer transitions, which create links involving the second buffer node and the topmost stack node, allowing a limited form of non-projectivity.

Pablo Gamallo (Universidade de Santiago de Compostela)
Title: A Depurative Strategy for Dependency Parsing with Finite State Transducers


We describe a dependency parsing strategy based on finite state transducers, which minimizes the complexity of rules/transducers by using a technique we call /depurative/. Depurative parsing is driven by the "single-head" constraint of Dependency Grammar, and can be seen as an alternative method to the standard /constructive/ strategy. It simplifies the input string by progressively identifying and removing those words that were recognized as /dependents/ by each transducer. At the end of the depurative process, if all the dependencies in the sentence were identified, the input string should contain just one token representing the main head of the sentence. This finite-state strategy was inspired by the /Right/ and /Left Reduce/ operations used in deterministic dependency parsing.

Seminar by Ignacio Bosque

May 4th, Professor Ignacio Bosque (UCM and RAE) will teach a seminar at Facultad de Filología y Traducción of Universidad de Vigo on "La integridad léxica y los componentes de la gramática"

Seminar Program

Program of the seminar "Recuperación da Información e PLN" which will be held on May 2 at the Facultad de Informática of the UDC, is available

Seminar on May 2

On May 2 RELISCO network will have a seminar at the Facultad de Informática of the University of A Coruña. We
will shortly include a detailed program with all the participants as well as the corresponding schedule. See below the abstracts of the conferences that will be given by two of our visitors:

Gaël Dias (University of Caen Basse-Normandi)
entitled "Information Digestion".
The World Wide Web (WWW) is a huge information network within which searching for relevant quality contents remains an open question. The ambiguity of natural language is traditionally the main reason, which prevents search engines from retrieving information according to users' needs. However, the globalized access to the WWW via weblogs or social networks has highlighted new problems. Web documents tend to be subjective, they mainly refer to actual events to the detriment of past events and their ever growing number contributes to the well-known problem of Information Overload. In this presentation, we present our contributions to digest information in real-world heterogeneous text environments (i.e. the Web) thus leveraging users' e fforts to encounter relevant quality information. Within this context, we will specifically focus on presenting language-independent methodologies to extract implicit and explicit knowledge from real-world texts, thus allowing to reach Multilingual Information Digestion.

Luis Pérez Freire (Gradiant: Centro Tecnolóxico de Telecomunicacións de Galicia) entitled "Content-based multimedia information retrieval: current research challenges in high-level understanding"
Authors: Daniel González Jiménez, Luis Pérez Freire
Multimedia information retrieval is about the extraction of knowledge from all kind of multimedia contents.
Content-based multimedia information retrieval (CBMIR) is the field that addresses techniques for knowledge extraction from multimedia contents when tags or text annotations are not available. Even when text descriptions are available, CBMIR can increase accuracy and provide a deeper level of understanding. This talk will provide an overview of CBMIR techniques for audiovisual contents and the hottest research topics related to high-level understanding, mainly focusing on the analysis of human signals for inferring affective states, identity, demographics, and actions.

Milagros Fernández Gavilanes (Grupo COLE - Universidade de Vigo) "
Un modelo de recuperación semántica conceptual"
Introducimos un entorno de adquisición y representación de información a partir de técnicas de procesamiento del lenguaje natural que permite la integración de conocimiento lingüístico en las aplicaciones de recuperación de información en base a un modelo matemático bien definido. El objetivo práctico es facilitar el mantenimiento de la aplicación resultante, así como su trazabilidad, genericidad, accesibilidad a todo tipo de usuarios y un comportamiento predecible. La interpretación matemática de la semántica descansa en la noción de grafo conceptual, que servirá de base a la indexación y posterior localización de los textos mediante un mecanismo de emparejamiento aproximado de patrones basado en la proyección y generalización de grafos.

Adrián Blanco González
(Grupo COLE - Universidade de Vigo) "Evaluación del modelo de RI conceptual"
Habitualmente, la consideración del modelo de recuperación de información conceptual se ha justificado por la ventaja que supone en este tipo de aplicaciones la facilidad para integrar conocimiento lingüístico en base a un modelo matemático bien definido, así como su trazabilidad, genericidad, accesibilidad a todo tipo de usuarios y un comportamiento predecible. Sin embargo, también a menudo se ha argumentado que las expectativas se habrían sobrevalorado y que en la práctica el rendimiento no era el esperado, de manera que el esfuerzo requerido para su implementación no se veía compensado. En este trabajo, tratamos de disipar la duda planteada evaluando en detalle las características del modelo conceptual para demostrar que las capacidades operativas exhibidas son superiores a las de los modelos clásicos en uso o, en el peor de los casos, análogas.

Jesús Vilares Ferro (Grupo LYS - Universidade da Coruña)  "Subword-Level Pseudo-Translation for CLIR Using Parallel Corpora"
Cross-Language Information Retrieval (CLIR) is a particular case of IR where queries and documents are in different languages, thus requiring the use of Machine Translation (MT) techniques for making matching possible. Word and phrase-level translation approaches have been commonly used in this context. However, translation at character $n-grams$ level ---or pseudo-translation, properly speaking--- appears as an alternative for retrieval purposes. This is a knowledge-light approach which avoids the need for word normalization during indexing or translation, and also dealing with out-of-vocabulary words. Moreover, since such a solution does not rely on language-specific processing, only needeing a parallel corpus as input, it can be used with languages of very different natures even when linguistic information and resources are scarce or unavailable.

Daniel Fernández González (Grupo COLE - Universidade de Vigo) "Análisis de dependencias basado en transiciones"
Una de las representaciones sintácticas que ha suscitado mayor interés dentro de la comunidad del procesamiento del lenguaje natural en los últimos años ha sido el análisis de dependencias. Bajo esta motivación, han surgido diferentes analizadores de dependencias entre los que destacamos los analizadores basados en transiciones. Con el fin de mejorar la precisión del análisis ofrecido por estos últimos se han abordado dos enfoques diferentes. Por un lado, desarrollar analizadores no dirigidos basados en transiciones  y, por el otro, ampliar el conjunto de transiciones de los analizadores..

RELISCO, bronze sponsor of EACL 2012

posted 26 Mar 2012, 10:31 by Miguel A. Alonso

EACL 2012 is the 13th Conference of the European Chapter of the Association for Computational Linguistics. Papers accepted at the conference after a thorough blind peer review process will present substantial, original, and unpublished research on all areas of computational linguistics, broadly conceived to include
disciplines such as psycholinguistics, speech, information retrieval, multimodal language processing. The conference welcomes theoretical, empirical, and application-orientated papers as well as papers targeting emerging domains such as bioinformatics and social media.

EACL meetings are non profit international conferences. EACL is proud of maintaining very low fares affordable to all researchers. This makes of EACL meetings one of the best opportunities in Europe of brainstorming recent, original and promising ideas in Computer Linguistics. This wouldn't be possible without sponsorship.

LATA 2012 Conference

posted 3 Mar 2012, 07:58 by Miguel A. Alonso

The 6th International Conference on Language and Automata Theory and Applications (LATA 2012) will be held at the Faculty of Informatics, A Coruña from 5 to 9 March 2012. The LATA conferences are an annual scientific event in which researchers present scientific papers in the field of theoretical computer science and its applications. The proceedings are published in the Lecture Notes in Computer Science Seriesof Springer.

The organizers of this year are the Research Group on Langauge in the Information Society (LYS) of the University of A Coruña and the Research Group on Mathematical Linguistics (GRLMC) of the Universitat Rovira i Virgili of Tarragona.

In this issue, attended by scientists from 20 different countries, 41 papers are presented, plus 3 invited lectures and two tutorials given by Eugene Asarin  (Université Paris Diderot and CNRS, France), Bernard Boigelot (Université de Liège , Belgium), Jack H. Lutz (Iowa State University, USA), Gilles Dowek (INRIA, France) and Rod Downey (Victoria University, New Zealand)

