word sense disambiguation
Word sense disambiguation is the process used to identify which sense of the word is being used in each sentence, when a word has more than one sense. However, this has some problems.
The first problem is that the different meanings of the words sometimes are much closed, so it is difficult to know which one is being used. Another problem is that these systems are tested by humans, and humans don’t agree which the sense of each word is, so it’s impossible for the computer to know the right answer.
We have two different approaches, deep and shallow.
Deep approaches, give an explanation to each sense of the word, but this is impossible in computer format. Shallow approaches, however, analyses the words of the surroundings and decides which of the different meaning is, but it is a problem if words of more than one sense are arround.
References:
retrieved from wikipedia the free encyclopedia, sep. 06 11:29
Add comment Iraila 6, 2009
CATEGORISATION
Categorisation is to recognize, differentiate and understand ideas. In this process objects that have the same relation are put in categories or groups. There are lots of categorisation techniques but the most general ones are:
*classical categorisation
*conceptual clustering
*prototype theory
Classical categorisation:
This type of categorisation started with Plato, who separates objects based on their similar properties. Then this method was also used by Aristotle, who uses it to separate living beings into groups. In this type of categorisation groups or categories should be defined and each object has to be in one of the groups, no one can be without category.
Conceptual clustering:
In this type of categorisation, first we describe the objects and then, according to their description we classify them. The difference from the classical one is that here, we have one description for each category. Here, objects can belong to more than one category.
Prototype theory:
In prototype theory, some things that are in the same category are more central than others, is more possible to say chair when asked for furniture and not a stool, or an eagle when asked for a bird and not a penguin. This is because we have models for each category.
In prototype categorisation, we have basic level categorisation, that is to say chair instead of kitchen chair or furniture.
References:
* wikipedia the free encyclopedia, article about categorisation, retriebed on sep. 05, 12:30
* wikipedia the free encyclopedia, article about prototype theory, retrieben on sep. 05, 12:50
Add comment Iraila 5, 2009
ANSWER EXTRACTION
Answer extraction or Question Answering (QA) is a way of information retrieval. When a quantity of documents is given, the system should be able to answer questions written in natural language. QA needs a more complicated technology of natural language processing than other types of document retrieval.
Question answering systems are one of the most complicated systems in the information retrieval, because this system has to find a fragment of text that answers to the question made in natural language. This systems have to recognise questions like who, how, why, ..
A good QA system needs a good search engine that selects the documents that contain the answer. If we are searching in the web, where we have lots of documents, it common to find parts of the answer in different documents, but this has its benefits, because we can choose the answers that appear more.
We have two different methods, deep and shallow.
Shallow: Some methods use keyword techniques to find passages and sentences in documents and filter based on the presence of the desired answer. They made the ranking based on syntactic characteristics like word order.
Deep: Sometimes using keyword searching is not enough, and we need to use the system that include named-entity recognition, word sense disambiguation,… If the question done is why or how, we will also need this system.
References:
retrieved from, wikipedia the free encylcopedia, sep. 05, 10:51
Add comment Iraila 5, 2009
topics list (Q2)
In my opinion, these are the 10 topics that can be more interesting to write about:
• answer extraction
• spell checking
• topic detection
• word sense disambiguation
• speaker recognition
• automatic hyperlinking
• categorisation
• summarisation
• natural language parsing
• morphological analysis
References:
* Language Technology World’s page, retrieved, September 5th, 11:32
http://www.lt-world.org/
Add comment Iraila 5, 2009
MACHINE TRANSLATION (Q3)
Machine translation, sometimes used like MT, is a sub-field of the computational linguistic that investigates the use of the informatics programs to translate a text from a natural language to another. In its basic level, MT makes only a substitution of the words in a natural language for the words in another language. More complicated translations can be made by using text corpus, which makes possible to recognise sentences and to translate idioms, for example. Machine translation can be made based is rules, in the corpus or in the context.
Nowadays translation is very important, because of the high quantity of information and the necessity of translinguistic communication.
Translation between romantic languages like Spanish and Portuguese has a very good quality, but this changes if the languages are typologically very different, like Spanish and English. Another thing that influences the quality of the translation is how specialised the text is. If a translator is specialised in meteorological texts, it will not be valid to translate a sportive text.
When translating a text, you have to take in account the morphology, the syntactic and the semantic, and also the style and pragmatic. Nowadays the tendency is to integrate all the methodologies; linguistics, statistics and others in the data vase of a corpus.
On-line machine translators
o OpenTrad
o Systran
o ProMT
o Lucy
o Google Traductor
o Translated
o WorldLingo
References
* From wikipediua the free encyclopedia, from the article traduccion automatica, retrieved on april 30, 2009 at 11:47
Add comment Apirila 30, 2009
Hans Uszkoreit and Yorick Wilks (Q1)
There are many relevant researchers like Martin Kay, Yorick Wilks, Hans Uszkoreit, Fabian M. Suchanek and Silviu Cucerzan, but I’m going to talk about two of them, Hans Uszkoreit and Yorick Wilks.
HANS USZKOREIT
Hans Uszkoreit is Professor of Computational Linguistics at Saarland University. At the same time he serves as Scientific Director at the German Research Centre for Artificial Intelligence (DFKI) where he heads the DFKI Language Technology Laboratory. By cooptation he is also Professor of the Computer Science Department.
Uszkoreit studied Linguistics and Computer Science at the Technical University of Berlin. He co-founded the Berlin city magazine Zitty, for which he worked as an part-time editor and writer. During his time in Austin he also worked as a research associate in a large machine translation project at the Linguistics Research Center. In 1984 Uszkoreit received his Ph.D. in linguistics from the University of Texas. From 1982 until 1986, he worked as a computer scientist at the Artificial Intelligence Center of SRI International in Menlo Park, California. While working at SRI, he was also affiliated with the Center for the Study of Language and Information at Stanford University as a senior researcher and later as a project leader. In December 1986 he returned to Stuttgart to work for IBM Germany as a project leader in the project LILOG (Linguistic and Logical Methods for the Understanding of German Texts).
In 1988 Uszkoreit was appointed to a newly created chair of Computational Linguistics at Saarland University and started the Department of Computational Linguistics and Phonetics. In 1989 he became the head of the newly founded Language Technology Lab at DFKI. He has been a co-founder and principal investigator of the Special Collaborative Research Division (SFB 378) “Resource-Adaptive Cognitive Processes” of the DFG (German Science Foundation).
Uszkoreit is Permanent Member of the International Committee of Computational Linguistics (ICCL), Member of the European Academy of Sciences, Past President of the European Association for Logic, Language and Information, Member of the Executive Board of the European Network of Language and Speech, Member of the Board of the European Language Resources Association (ELRA), and serves on several international editorial and advisory boards. He is co-founder and Board Member of XtraMind Technologies GmbH, Saarbruecken, acrolinx gmbh, Berlin and Yocoy Technologies GmbH, Berlin. Since 2006, he serves as Chairman of the Board of Directors of the international initiative dropping knowledge.
YORICK WILKS
Yorick Wilks was born on1939 and is a British Computer Scientist, Professor of Artificial Intelligence at the University of Sheffield, and a Senior Research Fellow at the Oxford Internet Institute.
Wilks was educated at Torquay Boys’ Grammar School, before attending Pembroke College, and he obtained his Ph.D. in 1968 under Professor R. B. Braithwaite for his thesis Argument and Proof. His main early contribution in the 1970s was called “Preference Semantics”. That early work was hand-coded with semantic entries as was normal at the time.
Yorick Wilks has been elected a fellow of the American and European Associations for Artificial Intelligence, of the British Computer Society, and a member of the UK Computing Research Committee. In 1991 he received a Defense Advanced Projects Agency grant on interlingual pragmatics-based machine translation and in 1994 he received a grant by the Engineering and Physical Sciences Research Council in order to investigate in the field of Large-scale information extraction (LaSIE). In the 1990s Professor Wilks also became interested in modelling human-computer dialogue . He is currently the Director of the EU funded Companions Project on creating long-term computer companions for people. He was awarded the Antonio Zampolli prize in honor of his lifetime work at the LREC’2008 conference on May 28 in 2008, and the Lifetime Achievement Award at the ACL’2008 conference on June 18 in 2008. In 2009, he was awarded the British Computer Society’s Lovelace Medal.
In 1998, Wilks became in head of the Department of Computer Science of the University of Sheffield, where he had started working in the year 1993 as Professor of Artificial Intelligence, a post that he still possesses. In 1993 he became the founding director of the Institute of Language, Speech and Hearing (ILASH). Yorick Wilks also heads the Natural Language Processing Group of the University of Sheffield.
REFERENCES
* Yorick Wilks, from wikipedia, retrieved on March 14,13:02 http://en.wikipedia.org/wiki/Yorick_Wilks
* Hans Uszkoreit, from his own webpage, retrieved on March 14, 12:47 http://hans.uszkoreit.net/
Add comment Martxoa 14, 2009
RESEARCH CENTRES of HUMAN LANGUAGE TECHNOLOGIES in EUROPE (questionare 1)
I’m going to talk about the three more important centres in Europe, the one of Edinburgh (the Edinburgh Language Technology Group), the THE German Research Centre for Artificial Intelligence, and the Natuional Centre for Languge Technology of Ireland.
Edinburg Languge Technology Group (LGT)
This investigation group has been working since 1990 in Natural Language Engeenering. First of all it was part og the Human Comunication Research Centre, but nowadays it is in the Institute for Communicating and Collaborative Systems of the University of Edinburgh, one of the largest communities in Europe. Their objective is to build solutions that are practical to real problems in text processing.
The German Research Centre for Artificial Intelligence (DFKI)
This centre of investigation, is one of the biggest institutes of the world who work in software technology based on Artificial Inteligence (AI).
It was founded in 1988 and nowadays is situated in Kaiserslautern, Saarbrücken, Bremen and Berlin.
Microsoft, SAP, BMW and DaimlerChrysler are one of the companies thath we can find in the DFKI shareholders. The DFKI works in all areas of Artificial Intelligence, including image and pattern recognition, knowledge management, intelligent visualization and simulation, deduction and multi-agent systems, speech- and language technology, intelligent user interfaces, business informatics and robotics.
DFKI worked with the national proyect called Verbmobil, a proyect that tried to translate speech bidirectionally for German/English and German/Japanese.
Nowadays they have more than 90 proyects ongoing in the investigation centre.
NATIONAL CENTRE FOR LANGUAGE TECHNOLOGY, IN IRELAND
This centre works in the field of the automatic translation, natural language parsing, grammar induction, question answering, sentiment analysis, computer-aided language learning, software localisation, speech recognition and speech synthesis. The researchers were taken from the School of Computing, the School of Applied Languages and Intercultural Studies and the School of Electronic Engineering.
We have to mention that this centres is affiliated with the Centre for Next Generation Localisation.
REFERENCES
* Edinburg Languge Technology Group (LGT), on February 28, 12:15. http://www.ltg.ed.ac.uk/
* German Research centre for artificial inteligence, in wikipedia. retrieved on February 28, 12:32. http://en.wikipedia.org/wiki/DFKI
* National centre for language technology in Ireland, retrieved on February 28, 12:57. http://www.nclt.dcu.ie/
Add comment Otsaila 28, 2009
WEB 2.0
Web 2.0 internet edo webaren bigarren generazioa da. Hemen, erabiltzaileen partehartzea eta informazio trukaketa sustatzen dira. Hau webgune sozialak, komunikazio tresnak eta folksonomiak erabiliz lortzen da.
Web 2.0arekin lortu nahi dena, web 1.0aren aldean weborri interaktiboak lortzea, gizarteak sortutako edukietako informazioa aprobetxatuz, eta gainera bisualki erakargarria izatea.
Web 2.0 izena O’Reilly Mediako Dale Dougherthyk erabili zuen lehen aldiz MediaLiveko Craig Clinekin batera eman zuen konferentzia batetan, webaren berpizteaz eta eboluzioaz hitzegiten zeudela, ideien ekaitz (brainstorm) baten bitartez sortuta.
Tim O’Reilly-ren arabera; “Usted puede visualizar Web 2.0 como un sistema de principios y prácticas que conforman un verdadero sistema solar de sitios que muestran algunos o todos esos principios, a una distancia variable de ese núcleo.”
Horrela ba, Web 1.0 eta 2.0an erabilitako kontzeptuak horrela beereiz ditzakegu.
Web 1.0 Web 2.0
Doble click –> Google AdSense
Ofoto –> Flickr
Akamai –> BitTorrent
mp3.com –> Napster
Britannica Online –> Wikipedia
personal websites –> blogging
evite –> upcoming.org and EVDB
domain name speculation –> search engine optimization
page views –> cost per click
screen scraping –> web services
publishing –> participation
content management systems –> wikis
directories (taxonomy) –> tagging (‘folksonomy’)
stickiness –> syndication
ERABILITAKO WEBHORRIAK:
* Wikipediako web 2.0ari buz¡ruzko artikulua
* Fundacion telefonikako web 2.0ari buruzko artikulua
Add comment Otsaila 5, 2009
KEVIN KELLY
Kevin Kelly nació en Pensilvania en 1952 y se graduó en la Escuela Secundaria Westfield, en Westfield, Nueva Jersey en 1970. A pesar de que dejó la Universidad de Rhode Island, después de sólo un año, sus textos han aparecido en el New York Times, Esquire , The Economist y otras publicaciones periódicas, además de los libros que ha escrito y las revistas que ha editado, fundado, o ayudado a fundar.
Cuando tenía 27 años Kevin Kelly era un foto periodista y no pudo entrar en su albergue en Jerusalén porque tarde para un toque de queda. Durmió en el supuesto lugar donde Jesús fue crucificado, y, por la mañana tuvo una experiencia religiosa. Decidió vivir como si le quedasen seis meses para vivir. Se fue y vivió en paz con sus padres, regaló su dinerote una forma anónima, visitó a sus amigos, y regresó a casa a “morir” en la noche de Halloween.
En 1981, Kelly fundó el diario Walking Journal. Es un ex editor de Whole Earth Review, Signal y algunas de las ediciones posteriores de Whole Earth Catalog. Ha sido director de la Fundación Punto (Point Foundation), que patrocinó la primera Conferencia de Hackers en 1984, antes de que la palabra hacker tuviera una connotación negativa.
Kelly está implicado en una campaña para hacer un inventario completo de todas las especies vivientes de la tierra, conocido como the linnaean Enterprise. El objetivo es hacer en una generación (25 años) un catalogo de todas las especies basado en una Web.
Kelly vive en Pacifica, California, una pequeña ciudad costera al sur de San Francisco y es un devoto cristiano, está casado y tiene tres hijos.
************************************************************
Bibliografía: Articulo sobre Kevin Kelly en wikipedia
Add comment Otsaila 2, 2009
debate 3
Even though HyperText Markup Language has been the most extended markup language since the appearance of the Web, recent steps towards semantic integration brought up the necessity of a new tool capable of administrating data. This led to the creation of eXtended Marked Language.
While the sintaxis of XML and HTML is similar (they were both based on SGML), their functions and characteristics differ:
Both HTML and XML place tags around an element to describe it. HTML uses tags to determine the visual display (e.g. font size), whereas tags in XML indicate the category of each element (e.g. “city”, “date”, “name”…). This helps structure the content of the text.
Most HTML users put their attention only on getting the page look the way they want it to, even if the structure behind it is left disorganized. With XML, documents won’t show up unless they’re correctly constructed (well formed), thus forcing an arrangement to be respected. The format of XML documents makes them portable to different platforms and allows structured data coming from other sources to be combined easily.
HTML tags are predefined and limited. Instead, XML lets users create their own tags to classify elements with more precision. As an example, if we had every book written by Shakespeare marked as such, we’d be able to access a list to all of them. With the current methods, however, performing a search about “books” and “Shakespeare” gives us mixed results between works written by him and about him.
The internet is now heading towards eXtended HyperText Markup Language. XHTML is a hybrid of XML and HTML, where information is described in one layer and given the format needed to present it in a browser separately.
References:
Objetivos y usos del XML (2001, June 21). In DesarrolloWeb.com, by Miguel Ángel Álvarez. Retrieved December 12, 2008.
¿Cómo se diferencia el XML del HTML? (2003, November 11). In Maestros del Web, by Christian Van Der Henst S.. Retrieved December 10, 2008.
¿Por qué XML? (2003, April 17). In GAMAROD. Retrieved December 13, 2008.
Tutorial de XML en Flash (2004, April 5). In Cristalab, by Freddie. Retrieved December 10, 2008.
XML (2008, November 28). In Wikipedia, the free encyclopedia. Retrieved December 11, 2008.
Ainhize Leon, Aiora Juaristi, Ana Cristina Guerra, Ayanta García, Maialen Etxeberria, María Losada.
Add comment Urtarrila 12, 2009