Archive for Iraila, 2009

word sense disambiguation

Word sense disambiguation is the process used to identify which sense of the word is being used in each sentence, when a word has more than one sense. However, this has some problems.

The first problem is that the different meanings of the words sometimes are much closed, so it is difficult to know which one is being used. Another problem is that these systems are tested by humans, and humans don’t agree which the sense of each word is, so it’s impossible for the computer to know the right answer.

We have two different approaches, deep and shallow.

Deep approaches, give an explanation to each sense of the word, but this is impossible in computer format. Shallow approaches, however, analyses the words of the surroundings and decides which of the different meaning is, but it is a problem if words of more than one sense are arround.

References:

retrieved from wikipedia the free encyclopedia, sep. 06 11:29

Add comment Iraila 6, 2009

CATEGORISATION

Categorisation is to recognize, differentiate and understand ideas. In this process objects that have the same relation are put in categories or groups. There are lots of categorisation techniques but the most general ones are:
*classical categorisation
*conceptual clustering
*prototype theory

Classical categorisation:
This type of categorisation started with Plato, who separates objects based on their similar properties. Then this method was also used by Aristotle, who uses it to separate living beings into groups. In this type of categorisation groups or categories should be defined and each object has to be in one of the groups, no one can be without category.

Conceptual clustering:
In this type of categorisation, first we describe the objects and then, according to their description we classify them. The difference from the classical one is that here, we have one description for each category. Here, objects can belong to more than one category.

Prototype theory:

In prototype theory, some things that are in the same category are more central than others, is more possible to say chair when asked for furniture and not a stool, or an eagle when asked for a bird and not a penguin. This is because we have models for each category.

In prototype categorisation, we have basic level categorisation, that is to say chair instead of kitchen chair or furniture.

References:
* wikipedia the free encyclopedia, article about categorisation, retriebed on sep. 05, 12:30
* wikipedia the free encyclopedia, article about prototype theory, retrieben on sep. 05, 12:50

Add comment Iraila 5, 2009

ANSWER EXTRACTION

Answer extraction or Question Answering (QA) is a way of information retrieval. When a quantity of documents is given, the system should be able to answer questions written in natural language. QA needs a more complicated technology of natural language processing than other types of document retrieval.

Question answering systems are one of the most complicated systems in the information retrieval, because this system has to find a fragment of text that answers to the question made in natural language. This systems have to recognise questions like who, how, why, ..

A good QA system needs a good search engine that selects the documents that contain the answer. If we are searching in the web, where we have lots of documents, it common to find parts of the answer in different documents, but this has its benefits, because we can choose the answers that appear more.

We have two different methods, deep and shallow.

Shallow: Some methods use keyword techniques to find passages and sentences in documents and filter based on the presence of the desired answer. They made the ranking based on syntactic characteristics like word order.

Deep: Sometimes using keyword searching is not enough, and we need to use the system that include named-entity recognition, word sense disambiguation,… If the question done is why or how, we will also need this system.

References:

retrieved from, wikipedia the free encylcopedia, sep. 05, 10:51

Add comment Iraila 5, 2009

topics list (Q2)

In my opinion, these are the 10 topics that can be more interesting to write about:

• answer extraction
• spell checking
• topic detection
• word sense disambiguation
• speaker recognition
• automatic hyperlinking
• categorisation
• summarisation
• natural language parsing
• morphological analysis

References:

* Language Technology World’s page, retrieved, September 5th, 11:32
http://www.lt-world.org/

Add comment Iraila 5, 2009


Sailak

Artxibo

 

Iraila 2009
M T W T F S S
« Apr    
 123456
78910111213
14151617181920
21222324252627
282930  

RSS Littera Deusto

lagunak