word sense disambiguation

Word sense disambiguation is the process used to identify which sense of the word is being used in each sentence, when a word has more than one sense. However, this has some problems.

The first problem is that the different meanings of the words sometimes are much closed, so it is difficult to know which one is being used. Another problem is that these systems are tested by humans, and humans don’t agree which the sense of each word is, so it’s impossible for the computer to know the right answer.

We have two different approaches, deep and shallow.

Deep approaches, give an explanation to each sense of the word, but this is impossible in computer format. Shallow approaches, however, analyses the words of the surroundings and decides which of the different meaning is, but it is a problem if words of more than one sense are arround.


retrieved from wikipedia the free encyclopedia, sep. 06 11:29

Categorisation is to recognize, differentiate and understand ideas. In this process objects that have the same relation are put in categories or groups. There are lots of categorisation techniques but the most general ones are:
*classical categorisation
*conceptual clustering
*prototype theory

Classical categorisation:
This type of categorisation started with Plato, who separates objects based on their similar properties. Then this method was also used by Aristotle, who uses it to separate living beings into groups. In this type of categorisation groups or categories should be defined and each object has to be in one of the groups, no one can be without category.

Conceptual clustering:
In this type of categorisation, first we describe the objects and then, according to their description we classify them. The difference from the classical one is that here, we have one description for each category. Here, objects can belong to more than one category.

Prototype theory:

In prototype theory, some things that are in the same category are more central than others, is more possible to say chair when asked for furniture and not a stool, or an eagle when asked for a bird and not a penguin. This is because we have models for each category.

In prototype categorisation, we have basic level categorisation, that is to say chair instead of kitchen chair or furniture.

* wikipedia the free encyclopedia, article about categorisation, retriebed on sep. 05, 12:30
* wikipedia the free encyclopedia, article about prototype theory, retrieben on sep. 05, 12:50

Answer extraction or Question Answering (QA) is a way of information retrieval. When a quantity of documents is given, the system should be able to answer questions written in natural language. QA needs a more complicated technology of natural language processing than other types of document retrieval.

Question answering systems are one of the most complicated systems in the information retrieval, because this system has to find a fragment of text that answers to the question made in natural language. This systems have to recognise questions like who, how, why, ..

A good QA system needs a good search engine that selects the documents that contain the answer. If we are searching in the web, where we have lots of documents, it common to find parts of the answer in different documents, but this has its benefits, because we can choose the answers that appear more.

We have two different methods, deep and shallow.

Shallow: Some methods use keyword techniques to find passages and sentences in documents and filter based on the presence of the desired answer. They made the ranking based on syntactic characteristics like word order.

Deep: Sometimes using keyword searching is not enough, and we need to use the system that include named-entity recognition, word sense disambiguation,… If the question done is why or how, we will also need this system.


retrieved from, wikipedia the free encylcopedia, sep. 05, 10:51

topics list (Q2)

In my opinion, these are the 10 topics that can be more interesting to write about:

• answer extraction
• spell checking
• topic detection
• word sense disambiguation
• speaker recognition
• automatic hyperlinking
• categorisation
• summarisation
• natural language parsing
• morphological analysis


* Language Technology World’s page, retrieved, September 5th, 11:32

Machine translation, sometimes used like MT, is a sub-field of the computational linguistic that investigates the use of the informatics programs to translate a text from a natural language to another. In its basic level, MT makes only a substitution of the words in a natural language for the words in another language. More complicated translations can be made by using text corpus, which makes possible to recognise sentences and to translate idioms, for example. Machine translation can be made based is rules, in the corpus or in the context.

Nowadays translation is very important, because of the high quantity of information and the necessity of translinguistic communication.

Translation between romantic languages like Spanish and Portuguese has a very good quality, but this changes if the languages are typologically very different, like Spanish and English. Another thing that influences the quality of the translation is how specialised the text is. If a translator is specialised in meteorological texts, it will not be valid to translate a sportive text.

When translating a text, you have to take in account the morphology, the syntactic and the semantic, and also the style and pragmatic. Nowadays the tendency is to integrate all the methodologies; linguistics, statistics and others in the data vase of a corpus.

On-line machine translators
o OpenTrad
o Systran
o ProMT
o Lucy
o Google Traductor
o Translated
o WorldLingo


* From wikipediua the free encyclopedia, from the article traduccion automatica, retrieved on april 30, 2009 at 11:47

Hans Uszkoreit and Yorick Wilks (Q1)

There are many relevant researchers like Martin Kay, Yorick Wilks, Hans Uszkoreit, Fabian M. Suchanek and Silviu Cucerzan, but I’m going to talk about two of them, Hans Uszkoreit and Yorick Wilks.


Hans Uszkoreit is Professor of Computational Linguistics at Saarland University. At the same time he serves as Scientific Director at the German Research Centre for Artificial Intelligence (DFKI) where he heads the DFKI Language Technology Laboratory. By cooptation he is also Professor of the Computer Science Department.
Uszkoreit studied Linguistics and Computer Science at the Technical University of Berlin. He co-founded the Berlin city magazine Zitty, for which he worked as an part-time editor and writer. During his time in Austin he also worked as a research associate in a large machine translation project at the Linguistics Research Center. In 1984 Uszkoreit received his Ph.D. in linguistics from the University of Texas. From 1982 until 1986, he worked as a computer scientist at the Artificial Intelligence Center of SRI International in Menlo Park, California. While working at SRI, he was also affiliated with the Center for the Study of Language and Information at Stanford University as a senior researcher and later as a project leader. In December 1986 he returned to Stuttgart to work for IBM Germany as a project leader in the project LILOG (Linguistic and Logical Methods for the Understanding of German Texts).
In 1988 Uszkoreit was appointed to a newly created chair of Computational Linguistics at Saarland University and started the Department of Computational Linguistics and Phonetics. In 1989 he became the head of the newly founded Language Technology Lab at DFKI. He has been a co-founder and principal investigator of the Special Collaborative Research Division (SFB 378) “Resource-Adaptive Cognitive Processes” of the DFG (German Science Foundation).
Uszkoreit is Permanent Member of the International Committee of Computational Linguistics (ICCL), Member of the European Academy of Sciences, Past President of the European Association for Logic, Language and Information, Member of the Executive Board of the European Network of Language and Speech, Member of the Board of the European Language Resources Association (ELRA), and serves on several international editorial and advisory boards. He is co-founder and Board Member of XtraMind Technologies GmbH, Saarbruecken, acrolinx gmbh, Berlin and Yocoy Technologies GmbH, Berlin. Since 2006, he serves as Chairman of the Board of Directors of the international initiative dropping knowledge.


Yorick Wilks was born on1939 and is a British Computer Scientist, Professor of Artificial Intelligence at the University of Sheffield, and a Senior Research Fellow at the Oxford Internet Institute.

Wilks was educated at Torquay Boys’ Grammar School, before attending Pembroke College, and he obtained his Ph.D. in 1968 under Professor R. B. Braithwaite for his thesis Argument and Proof. His main early contribution in the 1970s was called “Preference Semantics”. That early work was hand-coded with semantic entries as was normal at the time.
Yorick Wilks has been elected a fellow of the American and European Associations for Artificial Intelligence, of the British Computer Society, and a member of the UK Computing Research Committee. In 1991 he received a Defense Advanced Projects Agency grant on interlingual pragmatics-based machine translation and in 1994 he received a grant by the Engineering and Physical Sciences Research Council in order to investigate in the field of Large-scale information extraction (LaSIE). In the 1990s Professor Wilks also became interested in modelling human-computer dialogue . He is currently the Director of the EU funded Companions Project on creating long-term computer companions for people. He was awarded the Antonio Zampolli prize in honor of his lifetime work at the LREC’2008 conference on May 28 in 2008, and the Lifetime Achievement Award at the ACL’2008 conference on June 18 in 2008. In 2009, he was awarded the British Computer Society’s Lovelace Medal.

In 1998, Wilks became in head of the Department of Computer Science of the University of Sheffield, where he had started working in the year 1993 as Professor of Artificial Intelligence, a post that he still possesses. In 1993 he became the founding director of the Institute of Language, Speech and Hearing (ILASH). Yorick Wilks also heads the Natural Language Processing Group of the University of Sheffield.


* Yorick Wilks, from wikipedia, retrieved on March 14,13:02 http://en.wikipedia.org/wiki/Yorick_Wilks
* Hans Uszkoreit, from his own webpage, retrieved on March 14, 12:47 http://hans.uszkoreit.net/

I’m going to talk about the three more important centres in Europe, the one of Edinburgh (the Edinburgh Language Technology Group), the THE German Research Centre for Artificial Intelligence, and the Natuional Centre for Languge Technology of Ireland.

Edinburg Languge Technology Group (LGT)

This investigation group has been working since 1990 in Natural Language Engeenering. First of all it was part og the Human Comunication Research Centre, but nowadays it is in the Institute for Communicating and Collaborative Systems of the University of Edinburgh, one of the largest communities in Europe. Their objective is to build solutions that are practical to real problems in text processing.

The German Research Centre for Artificial Intelligence (DFKI)

This centre of investigation, is one of the biggest institutes of the world who work in software technology based on Artificial Inteligence (AI).

It was founded in 1988 and nowadays is situated in Kaiserslautern, Saarbrücken, Bremen and Berlin.
Microsoft, SAP, BMW and DaimlerChrysler are one of the companies thath we can find in the DFKI shareholders. The DFKI works in all areas of Artificial Intelligence, including image and pattern recognition, knowledge management, intelligent visualization and simulation, deduction and multi-agent systems, speech- and language technology, intelligent user interfaces, business informatics and robotics.

DFKI worked with the national proyect called Verbmobil, a proyect that tried to translate speech bidirectionally for German/English and German/Japanese.
Nowadays they have more than 90 proyects ongoing in the investigation centre.


This centre works in the field of the automatic translation, natural language parsing, grammar induction, question answering, sentiment analysis, computer-aided language learning, software localisation, speech recognition and speech synthesis. The researchers were taken from the School of Computing, the School of Applied Languages and Intercultural Studies and the School of Electronic Engineering.
We have to mention that this centres is affiliated with the Centre for Next Generation Localisation.


* Edinburg Languge Technology Group (LGT), on February 28, 12:15. http://www.ltg.ed.ac.uk/

* German Research centre for artificial inteligence, in wikipedia. retrieved on February 28, 12:32. http://en.wikipedia.org/wiki/DFKI

* National centre for language technology in Ireland, retrieved on February 28, 12:57. http://www.nclt.dcu.ie/

