Posts filed under 'hlt'

word sense disambiguation

Word sense disambiguation is the process used to identify which sense of the word is being used in each sentence, when a word has more than one sense. However, this has some problems.

The first problem is that the different meanings of the words sometimes are much closed, so it is difficult to know which one is being used. Another problem is that these systems are tested by humans, and humans don’t agree which the sense of each word is, so it’s impossible for the computer to know the right answer.

We have two different approaches, deep and shallow.

Deep approaches, give an explanation to each sense of the word, but this is impossible in computer format. Shallow approaches, however, analyses the words of the surroundings and decides which of the different meaning is, but it is a problem if words of more than one sense are arround.

References:

retrieved from wikipedia the free encyclopedia, sep. 06 11:29

Add comment Iraila 6, 2009

CATEGORISATION

Categorisation is to recognize, differentiate and understand ideas. In this process objects that have the same relation are put in categories or groups. There are lots of categorisation techniques but the most general ones are:
*classical categorisation
*conceptual clustering
*prototype theory

Classical categorisation:
This type of categorisation started with Plato, who separates objects based on their similar properties. Then this method was also used by Aristotle, who uses it to separate living beings into groups. In this type of categorisation groups or categories should be defined and each object has to be in one of the groups, no one can be without category.

Conceptual clustering:
In this type of categorisation, first we describe the objects and then, according to their description we classify them. The difference from the classical one is that here, we have one description for each category. Here, objects can belong to more than one category.

Prototype theory:

In prototype theory, some things that are in the same category are more central than others, is more possible to say chair when asked for furniture and not a stool, or an eagle when asked for a bird and not a penguin. This is because we have models for each category.

In prototype categorisation, we have basic level categorisation, that is to say chair instead of kitchen chair or furniture.

References:
* wikipedia the free encyclopedia, article about categorisation, retriebed on sep. 05, 12:30
* wikipedia the free encyclopedia, article about prototype theory, retrieben on sep. 05, 12:50

Add comment Iraila 5, 2009

ANSWER EXTRACTION

Answer extraction or Question Answering (QA) is a way of information retrieval. When a quantity of documents is given, the system should be able to answer questions written in natural language. QA needs a more complicated technology of natural language processing than other types of document retrieval.

Question answering systems are one of the most complicated systems in the information retrieval, because this system has to find a fragment of text that answers to the question made in natural language. This systems have to recognise questions like who, how, why, ..

A good QA system needs a good search engine that selects the documents that contain the answer. If we are searching in the web, where we have lots of documents, it common to find parts of the answer in different documents, but this has its benefits, because we can choose the answers that appear more.

We have two different methods, deep and shallow.

Shallow: Some methods use keyword techniques to find passages and sentences in documents and filter based on the presence of the desired answer. They made the ranking based on syntactic characteristics like word order.

Deep: Sometimes using keyword searching is not enough, and we need to use the system that include named-entity recognition, word sense disambiguation,… If the question done is why or how, we will also need this system.

References:

retrieved from, wikipedia the free encylcopedia, sep. 05, 10:51

Add comment Iraila 5, 2009

topics list (Q2)

In my opinion, these are the 10 topics that can be more interesting to write about:

• answer extraction
• spell checking
• topic detection
• word sense disambiguation
• speaker recognition
• automatic hyperlinking
• categorisation
• summarisation
• natural language parsing
• morphological analysis

References:

* Language Technology World’s page, retrieved, September 5th, 11:32
http://www.lt-world.org/

Add comment Iraila 5, 2009

MACHINE TRANSLATION (Q3)

Machine translation, sometimes used like MT, is a sub-field of the computational linguistic that investigates the use of the informatics programs to translate a text from a natural language to another. In its basic level, MT makes only a substitution of the words in a natural language for the words in another language. More complicated translations can be made by using text corpus, which makes possible to recognise sentences and to translate idioms, for example. Machine translation can be made based is rules, in the corpus or in the context.

Nowadays translation is very important, because of the high quantity of information and the necessity of translinguistic communication.

Translation between romantic languages like Spanish and Portuguese has a very good quality, but this changes if the languages are typologically very different, like Spanish and English. Another thing that influences the quality of the translation is how specialised the text is. If a translator is specialised in meteorological texts, it will not be valid to translate a sportive text.

When translating a text, you have to take in account the morphology, the syntactic and the semantic, and also the style and pragmatic. Nowadays the tendency is to integrate all the methodologies; linguistics, statistics and others in the data vase of a corpus.

On-line machine translators
o OpenTrad
o Systran
o ProMT
o Lucy
o Google Traductor
o Translated
o WorldLingo

References

* From wikipediua the free encyclopedia, from the article traduccion automatica, retrieved on april 30, 2009 at 11:47

Add comment Apirila 30, 2009

Hans Uszkoreit and Yorick Wilks (Q1)

There are many relevant researchers like Martin Kay, Yorick Wilks, Hans Uszkoreit, Fabian M. Suchanek and Silviu Cucerzan, but I’m going to talk about two of them, Hans Uszkoreit and Yorick Wilks.

HANS USZKOREIT

Hans Uszkoreit is Professor of Computational Linguistics at Saarland University. At the same time he serves as Scientific Director at the German Research Centre for Artificial Intelligence (DFKI) where he heads the DFKI Language Technology Laboratory. By cooptation he is also Professor of the Computer Science Department.
Uszkoreit studied Linguistics and Computer Science at the Technical University of Berlin. He co-founded the Berlin city magazine Zitty, for which he worked as an part-time editor and writer. During his time in Austin he also worked as a research associate in a large machine translation project at the Linguistics Research Center. In 1984 Uszkoreit received his Ph.D. in linguistics from the University of Texas. From 1982 until 1986, he worked as a computer scientist at the Artificial Intelligence Center of SRI International in Menlo Park, California. While working at SRI, he was also affiliated with the Center for the Study of Language and Information at Stanford University as a senior researcher and later as a project leader. In December 1986 he returned to Stuttgart to work for IBM Germany as a project leader in the project LILOG (Linguistic and Logical Methods for the Understanding of German Texts).
In 1988 Uszkoreit was appointed to a newly created chair of Computational Linguistics at Saarland University and started the Department of Computational Linguistics and Phonetics. In 1989 he became the head of the newly founded Language Technology Lab at DFKI. He has been a co-founder and principal investigator of the Special Collaborative Research Division (SFB 378) “Resource-Adaptive Cognitive Processes” of the DFG (German Science Foundation).
Uszkoreit is Permanent Member of the International Committee of Computational Linguistics (ICCL), Member of the European Academy of Sciences, Past President of the European Association for Logic, Language and Information, Member of the Executive Board of the European Network of Language and Speech, Member of the Board of the European Language Resources Association (ELRA), and serves on several international editorial and advisory boards. He is co-founder and Board Member of XtraMind Technologies GmbH, Saarbruecken, acrolinx gmbh, Berlin and Yocoy Technologies GmbH, Berlin. Since 2006, he serves as Chairman of the Board of Directors of the international initiative dropping knowledge.

YORICK WILKS

Yorick Wilks was born on1939 and is a British Computer Scientist, Professor of Artificial Intelligence at the University of Sheffield, and a Senior Research Fellow at the Oxford Internet Institute.

Wilks was educated at Torquay Boys’ Grammar School, before attending Pembroke College, and he obtained his Ph.D. in 1968 under Professor R. B. Braithwaite for his thesis Argument and Proof. His main early contribution in the 1970s was called “Preference Semantics”. That early work was hand-coded with semantic entries as was normal at the time.
Yorick Wilks has been elected a fellow of the American and European Associations for Artificial Intelligence, of the British Computer Society, and a member of the UK Computing Research Committee. In 1991 he received a Defense Advanced Projects Agency grant on interlingual pragmatics-based machine translation and in 1994 he received a grant by the Engineering and Physical Sciences Research Council in order to investigate in the field of Large-scale information extraction (LaSIE). In the 1990s Professor Wilks also became interested in modelling human-computer dialogue . He is currently the Director of the EU funded Companions Project on creating long-term computer companions for people. He was awarded the Antonio Zampolli prize in honor of his lifetime work at the LREC’2008 conference on May 28 in 2008, and the Lifetime Achievement Award at the ACL’2008 conference on June 18 in 2008. In 2009, he was awarded the British Computer Society’s Lovelace Medal.

In 1998, Wilks became in head of the Department of Computer Science of the University of Sheffield, where he had started working in the year 1993 as Professor of Artificial Intelligence, a post that he still possesses. In 1993 he became the founding director of the Institute of Language, Speech and Hearing (ILASH). Yorick Wilks also heads the Natural Language Processing Group of the University of Sheffield.

REFERENCES

* Yorick Wilks, from wikipedia, retrieved on March 14,13:02 http://en.wikipedia.org/wiki/Yorick_Wilks
* Hans Uszkoreit, from his own webpage, retrieved on March 14, 12:47 http://hans.uszkoreit.net/

Add comment Martxoa 14, 2009

RESEARCH CENTRES of HUMAN LANGUAGE TECHNOLOGIES in EUROPE (questionare 1)

I’m going to talk about the three more important centres in Europe, the one of Edinburgh (the Edinburgh Language Technology Group), the THE German Research Centre for Artificial Intelligence, and the Natuional Centre for Languge Technology of Ireland.

Edinburg Languge Technology Group (LGT)

This investigation group has been working since 1990 in Natural Language Engeenering. First of all it was part og the Human Comunication Research Centre, but nowadays it is in the Institute for Communicating and Collaborative Systems of the University of Edinburgh, one of the largest communities in Europe. Their objective is to build solutions that are practical to real problems in text processing.

The German Research Centre for Artificial Intelligence (DFKI)

This centre of investigation, is one of the biggest institutes of the world who work in software technology based on Artificial Inteligence (AI).

It was founded in 1988 and nowadays is situated in Kaiserslautern, Saarbrücken, Bremen and Berlin.
Microsoft, SAP, BMW and DaimlerChrysler are one of the companies thath we can find in the DFKI shareholders. The DFKI works in all areas of Artificial Intelligence, including image and pattern recognition, knowledge management, intelligent visualization and simulation, deduction and multi-agent systems, speech- and language technology, intelligent user interfaces, business informatics and robotics.

DFKI worked with the national proyect called Verbmobil, a proyect that tried to translate speech bidirectionally for German/English and German/Japanese.
Nowadays they have more than 90 proyects ongoing in the investigation centre.

NATIONAL CENTRE FOR LANGUAGE TECHNOLOGY, IN IRELAND

This centre works in the field of the automatic translation, natural language parsing, grammar induction, question answering, sentiment analysis, computer-aided language learning, software localisation, speech recognition and speech synthesis. The researchers were taken from the School of Computing, the School of Applied Languages and Intercultural Studies and the School of Electronic Engineering.
We have to mention that this centres is affiliated with the Centre for Next Generation Localisation.

REFERENCES

* Edinburg Languge Technology Group (LGT), on February 28, 12:15. http://www.ltg.ed.ac.uk/

* German Research centre for artificial inteligence, in wikipedia. retrieved on February 28, 12:32. http://en.wikipedia.org/wiki/DFKI

* National centre for language technology in Ireland, retrieved on February 28, 12:57. http://www.nclt.dcu.ie/

Add comment Otsaila 28, 2009


Sailak

Artxibo

 

Azaroa 2009
M T W T F S S
« Sep    
 1
2345678
9101112131415
16171819202122
23242526272829
30  

RSS Littera Deusto

  • Entre los olivos Azaroa 25, 2009
    Desde que mi alma es libre, y vuela hacia adelante, como yo te quiero nadie lo sabe. Aunque el horizonte baña mi amor, nadie ha visto como yo te quiero. Si me lastimas, no habrá nadie en el mundo que cure mi herida. Esta herida que cuando tú vuelvas, sanará sobre tu cuerpo. Comprende que mi corazón derrama manantiales de olvido [...]
    Maialen Garbizu
  • Madame Bovary Azaroa 25, 2009
    Eta, baldin eta hartaz ez pentsatu izana aitortzen bazuen, ahakarrak ugari izaten ziren, eta aldioro betiko hitzekin bukatzen ziren: — Maite nauzu? — Noski ba, maite zaitudala! erantzuten zuen gizonak. — Asko? — Bai horixe! — Besterik ez duzu maite izan, ez? — Birjin hartu nauzula uste al duzu ala? botatzen zuen besteak barrez. Emmak negar egiten zuen, eta k […]
    Olatz
  • Markaketa lengoaia Azaroa 25, 2009
    Informatikan erabiltzen dugun hizkuntzari markaketa lengoia deritzogu. Horrela, testuen formatoa edo egitura marken bidez adierazten da. Programa informatikoak modu egokian agertzen dira esanahi jakin bat duten testu etiketak gehitzen dizkiotelako. Honako hauek dira markaketa lengoaiaren aurrekari batzuk: -Scribe: markaketa deskriptiboa bideratu zuen. Sortza […]
    Ainhoa Causo
  • Wikinovela Azaroa 25, 2009
    La Wikinovela es un proyecto de creación colectiva, multilingüe y no lineal, basado en tecnología wiki y con licencia Creative Commons, ha sido desarrollado dentro de la facultad de Filosofía y Letras de la Universidad de Deusto entre el 24 de abril y el 24 de julio de 2006. La finalidad del proyecto Wikinovela, consiste en [...]
    Olatz
  • We gonna run this town tonight Azaroa 24, 2009
    Feeeeeeeeeeeeeed up!!! Pfffffff, time to relax now, let’s listen to some gooooooood music: http://www.youtube.com/watch?v=yVA-xTBeHyM (Pay attention to minute 3:56, love it, love Jay-Z mostly…)
    Maialen Garbizu
  • Ahora sí, Muse. Azaroa 23, 2009
    Qué triste, repito avatar. Bueno, o eso creo porque donde debería estar el avatar, ahora mismo solo me salen códigos HTML (no, no estoy escribiendo desde la pestaña de HTML, tan corta no soy), así que vete tú a saber si sale algo decente… Es lo que tiene no estar en el ordenador propio. Eso, que [...]
    Yera Espinosa
  • Markaketa lengoaia Azaroa 23, 2009
    Markaketa lengoaia informatikan erabiltzen den lengoaia da; honen bidez, testuen egitura, itxura eta formatoa marken eta etiketen bidez adierazten da. Programa informatikoak era egokian interpretatu ahal izateko testuaren jatorrizko edukiari esanahi zehatz bat duten etiketak gehitzen zaizkie. Gaur egun HTML da markaketa lengoaiarik ezagunenetarikoa, honen he […]
    Jone Flores
  • Mark-up language Azaroa 21, 2009
    A markup language is a system for annotating a text in a way which is syntactically distinguishable from that text.  There are many different types of mark-up languages. There are three general categories: Presentational markup is that used by traditional word-processing systems, binary codes embedded in document text that produced the WYSIWYG effect. Such m […]
    Gorka Lozano
  • #17 Azaroa 21, 2009
    Maialen estaba sentada, erguida pero con expresión relajada mientras leía un libro. Me acerqué y me senté en el asiento de enfrente. Como esperaba no alzó la vista, ni siquiera me hizo saber que sabía de mi presencia. Seguía leyendo, oja tras oja. Carraspeé pero sólo sirvió para que Maialen se hundiera más en el [...]
    Maialen Garbizu
  • Hoy Azaroa 19, 2009
    Esta tarde he ido al cine, he disfrutado como hacía tiempo que no lo hacía. Cuánto me gusta el cine, pero cuantísimo más me gusta evadirme con él. Me he olvidado de todo, de problemas, de stress, de absolutamente todo. No hay dinero que pague eso, de verdad. Mañana seguro que afronto el viernes con [...]
    Maialen Garbizu

lagunak