Premium ebaseSEO Sponsor

Sunday, November 14, 2010

Semantic Web – Latent Semantic Indexing

Latent semantic indexing is very complex and usually requires a degree in math in order to figure it out and to understand it.
There are a few ways in which to index and retrieve all the relevant pages of a users query.
An obvious method of retrieving the relevant information is by matching words from a users search query to the same text found within the available pages.
Unfortunately when you perform simple word matching you get extremely inaccurate results. The reason for this is because there are so many ways in which a user can express the desired concept of what they are searching for.
This is also known as synonymy. Another reason for this occurrence is because so many words have multiple meanings. Words having multiple meanings is known as polysemy.
Because of synonymy, a user's search may actually match the text on the relevant pages. The terms on the relevant pages will be overlooked.
In polysymy, the relevant terms of a user's query will often match those on irrelevant pages.
The use of Latent Semantic Indexing or LSI is an attempt to overcome this search problem. This can be done by looking at the different patterns of words spread across the entire internet.
The pages that would be considered would be those that have words in common and are thought to be semantically close in meaning.
Those pages that contain a small amount of words in common would be considered semantically distant. The end result would be a relatively accurate and similar value, for which every content word and phrase has been calculated for.
The LSI database would return to pages it thinks are correct and relevant in response of a user's query.
The LSI algorithm does not need to understand the meaning of words and does not require an exact match to be able to return useful web pages to the user.

No comments:

Post a Comment