Lexxe Search Technology
Related Information:
What is Lexxe?
Lexxe is a third generation Internet search engine featuring Natural Language Processing technologies. It is fully automatic without human editing involved. Most of its answers come from unstructured texts and webpages on the Internet.
Lexxe uses computational linguistics to generate more relevant results than those from conventional search engines. Lexxe achieves this by analysing and extracting meaning from the search query. Such an approach represents the next major advance in search engine technology.
Keywords and Questions
People generally use search engines to do keyword searches (e.g. "bill gates"). But in many situations, it is more natural to ask a question ("who is bill gates"). And this especially true with voice interfaces to search engines, as is natural when using a mobile phone to do a search.
Lexxe recognises when a query is actually a question, and then seeks to find the answer from the web, extracting candidate answers from web pages as required. And if the query is not a question, then Lexxe will do a keyword search - but will also do some extra processing to return results that are more likely to relevant, and will also attempt to group results in to semantically-related clusters.
Query Processing
Lexxe handles the query differently depending on whether it is a question or a keyword search. So Lexxe's first task is to decide whether the query is a question.
Handling Questions: Generally speaking, Lexxe does best when asked a question about a fact. And to recognise the query as a question, the question needs to be in a form that Lexxe recognises. For example, if the query begins with "who", "which", "what", "why" or "how", then it's probably a question. Similarly, queries that begin with a modal verb (e.g. must, may, can, etc) or an auxilary verb (e.g. is, have, did, etc) are recognised as probably being questions. Otherwise, the query will be treated as a keyword search. Over time, Lexxe will handle a greater variety of questions.
Lexxe does best with questions that are 10 words long or less. It also does better with questions that have short answers, such as "who is tony blair", rather than "how do I change the oil in a ford mustang".
Key Word Query Processing: Even though keyword queries aren't questions, Lexxe can often extract extra meaning from them in order to find results that are more relevant. In particular, Lexxe will often be able to group words from the query in to meaningful phrases.
For example, take the keyword query "the lord of the rings assistant director". Lexxe is able to generate results by explicitly matching the phrases "the lord of the rings" and "assistant director", and to give priority to web pages that contain those exact phrases. Matches for things like "head of the rings" and "make-up assistant" will be less likely to turn up in the results. This technique significantly improves the relevance of the results."
Identifying the Answer
When searching web pages for an answer to a question, Lexxe will often find many candidate answers. Lexxe then applies techniques to winnow these candidate answers to find those which are most likely to be relevant. For example, take the question "who is barack obama". Lexxe will realise that the answer to this question will most likely be a title or a profession, such as "president", "lawyer" or "politician", and Lexxe will favour those candidate answers that include such terms.
But one of the hardest problems is to identify the word or phrase that is the best answer to the question. For example, for the Barack Obama question, a good answer would be "President of the United States". Answers such as "president of" or "president of the united" would not be nearly as good - but it's not easy to teach a machine how to make this distinction. Lexxe incorporates a robust "Phrase Recognition" ability to determine what phrase is likely to be the best answer.
Presenting Results
Lexxe displays three kinds of results - "answers", "clusters" and "web page snippets".
Answers: LAt the top of the search results Lexxe provides one or more answers, assuming the query was a question. That is, Lexxe doesn't just find a web page that might contain the answer you are looking for, it actually provides an explicit answer as well.
Clusters: Sometimes search terms are ambiguous. For example, a search for "thunderbird" could be expected to turn up matches about email clients, cars, and other things besides. Lexxe attempts to group semantically-related results in to clusters, which often makes it easier to find those matches which are most relevant to your search.
Web Page Snippets: This part is more or less the same to the other search engines' results. However, Lexxe often returns better ranked and more relevant web results.
Future Developments
The current Lexxe alpha site is mainly a proof-of-concept. Future versions will not just be faster, but will also be smarter - able to handle a greater variety of questions, and to provide answers with greater accuracy.
For more information, please read Why Lexxe? and Questions and Answers about Lexxe and 3rd Generation Search Technology.
|