Questions and Answers about Lexxe and 3rd Generation Search Technology
Q: What is the difference between the 2nd and 3rd generation search engines?
Q: Why do search engines need Natural Language Processing technologies?
Q: What Natural Language Processing technologies are used in Lexxe as a 3rd generation search engine?
Q: Can you briefly illustrate how Lexxe's short question answering works?
Q: Apart from Natural Language Processing capacity, what other intelligence does Lexxe have?
Q: Why does Lexxe allow only 10 words or less for Natural Language queries?
Q: Can you show us some example questions Lexxe can answer?
Q: When I cannot find the answer with a short question, what should I do?
Q: How much will human rely on search engine in the future? And why?
Q: How do you see search engine evolve in the next 50 years?
Q: What is the difference between the 2nd and 3rd generation search engines?
A: The main difference is in their algorithms. The 2nd generation search engines, such as Google, uses a "Symbolic Computing" approach to match between the key words a user types in and the texts it indexed in terms of words. The point here is that words are treated as a set of symbols, not words with meanings to human users. Although some very simple linguistic measures, like stemming (trying to catch all forms of a word) and structured data search (answering weather forecast questions with answers stored in a database), are used within the 2nd generation search engines, in nature they are still "Symbolic Computing" machines.
The 3rd generation search engines apply Natural Language Processing (a.k.a. Computational Linguistics) technologies in search, because search is seen as a language understanding process in the first place. An important principle carried out in the design of search algorithms is language first, computing second. We call this approach "Linguistic Computing" for search, which is paradigmatically different and a level higher in terms of the degrees of system difficulty and complexity. Although some may argue that "Linguistic Computing" still falls into the category "Symbolic Computing", the unique "Linguistic Computing" features such as syntactic and semantic processing do set as a watershed in terms of search technology generation classification.
This novel approach is applicable to both the traditional "key word-based" search method and the new Natural Language query method. The "Linguistic Computing" method came naturally as a replacement of the "Symbolic Computing" method, which has failed users, as they become more and more sophisticated in search and have higher expectations from search engines. The 3rd generation search engines offer more accurate and consistent search results than the 2nd generation search engines through its intelligence in language understanding.
Q: Why do search engines need Natural Language Processing technologies?
A: Natural Language Processing is one of the main areas of modern Artificial Intelligence. It deals with computer understanding of human languages. One of the weaknesses of the 2nd generation search engines is exactly what Natural Language Processing is good at. Although Natural Language Processing is still far from being successful, some mature technologies can be used to solve search problems.
For example, from experiments at Lexxe, we found that the Phrase Recognition technology used in query processing and search pattern formation help increase the accuracy of search results by 20-40% using the same database. In some cases, the top ten results are completely different, leaving the results from using the Phrase Recognition method getting them all correct, while the 2nd generation search engine nearly all wrong. The correct ones that could have been retrieved were in the same database. Therefore, it is worth stressing that Natural Language Processing techniques offer more consistent results. Furthermore, Phrase Recognition is a linguistic computing method, which does not discriminate less famous websites, which in fact help retrieve a lot of correct information that are prevented by the popular referenced methods.
Getting exact answers through Natural Language query can raise the quality and efficiency of search experience. No more reading through links and texts to get the answers is something luxury to enjoy. Lexxe is not yet 100% robust, but it is breaking new grounds and delivering increasingly accuracy results.
Q: What Natural Language Processing technologies are used in Lexxe as a 3rd generation search engine?
A: There are a series of NLP technologies used in Lexxe. Phrase Recognition and Short Question Answering are two main ones.
Firstly, for example, the Phrase Recognition method allows the search engine to understand if the key words are formed as one or more phrases. By the way, key word query is still a main user search method in Lexxe, although it introduced an innovative short question answering feature. If phrases can be spotted in the key words, search will be more accurate than treating them as individual words. That is why sometimes users found bizarre search results either among the correct ones or listed very high up near the top. The results are sometimes good and sometimes bad, not very consistent. Let's consider the key words query "chilly chicken fillet burger". Is it a phrase, or are they two phrases, or three, or are they just a few individual words? It is easy for human to understand the meaning because we have the linguistic ability to know its syntactic structure and semantic meaning sub-consciously, but to a search engine, it is just a sequence of strings of symbols. To get closer to a human analysis of the key words, 3rd generation search engines would conduct a linguistic analysis of the key words before matching the right texts. Different results of the linguistic analysis will directly result in different rankings of texts. Instead of using one formula for all searches, like a 2nd generation search engine, a 3rd generation search engine needs to expect and conduct different search methods based on the linguistic analysis of the key words. Poor or no phrase recognition process contributes to a large portion of the search inaccuracy today. Although some formulas do consider word proximity or word ordering factors, the quality is still not consistent and satisfactory enough. To put it another way, if one does not know the question clear enough, how can he or she guarantee to deliver a correct answer?
Secondly, users can ask short questions to Lexxe in order to receive a short answer. This is a major Natural Language Processing feature in Lexxe. Developing this new functionality serves two purposes. One is to get a precise request from a user of what he or she wants so as to dig out exactly the answer to the question without other unwanted information. For example, "when is Queen Elizabeth's birthday" will result in a date to be retrieved and returned by Lexxe. For all 2nd generation search engines, the words "when" and "is" will be deleted, because they are stop words. Actually "when" is extremely important to Lexxe, because it will signal a search of a time format. If one uses key word method, the query would have been "Queen Elizabeth's birthday". What one will get is a list of results mainly talking about Queen Elizabeth's birthday celebrations instead of the real birthday of Queen Elizabeth. The other reason to offer short question answering is short answers can let the users avoid reading through the pages to find out the answers and then confirm them through users' intelligence. It simply saves time. Naturally, short question answering forms a similarly large portion of all queries for search engines. We even argue that for human beings, this is easier and most natural to pose questions in Natural Language, while key word queries are a derived form of Natural Language queries. It is due to the fact that 2nd generation search engines are not able to understand Natural Language questions and find answers in unstructured data (e.g. texts) that users probably have forgotten they can ask question to a search engine. The 3rd generation search engines are trying to bridge this technical gap.
Another Natural Language Processing technology that is worth mentioning is the "clustering" results. "Clustering" provides a classification of the search results and additional search directions for further search. Some 2nd generation search engines have already used this technology, like Vivisimo and Mooter. What is unique about Lexxe's clustering is that it is not a hierarchical categorization of the search results, which will only narrow down to a couple of texts within the search results, instead, Lexxe will regenerate new search with new clusters.
Q: Can you briefly illustrate how Lexxe's short question answering works?
A: Let's look at "which countries does Thailand share border with?" as an example. First of all, Lexxe carries out a sentential conversion of the query from question to statement type. That is to get a new sentence like "Thailand shares border with which countries." Secondly, it needs to decide what to remove from the new statement. In this case, "which countries" are the question part involving a question word and head noun, so the two words are removed. Thirdly Lexxe needs to know on which side and how far away the possible answers may occur. Sometimes the answer may occur on one side of the statement pattern and sometimes it occurs on both sides. The distance of the answer from the pattern will also be configured. When these are ready, Lexxe conducts search engine retrieval and obtains the first 100 results.
Fourthly, if a sentence in the top 100 results has the pattern "Thailand shares the border with" and say up to n words on the right hand side of the pattern will be retrieved. Fifthly, a linguistic and statistical processing will be carried out on the result to retrieve meaningful phrases (a group of words, such as "Laos and Myanmar"), if there are any. Sixly, statistically significant words and phrases will be calculated and ranked in a descending order of importance. Finally, the best answer and those that are very close in score to the best one will be selected as answer to the question.
Q: Apart from Natural Language Processing capacity, what other intelligence does Lexxe have?
A: Lexxe is developing its ability to recognize addresses, check Australian share prices and understand a large number of names of profession, such as doctor, comedian, cook, etc., in order to become robust in answering questions such as "who is Roald Dahl?". Lexxe is also quite capable in distinguishing between the "who" questions asking for a person's name and about what a person is. It will take some time to be perfect.
Q: Why does Lexxe allow only 10 words or less for Natural Language queries?
A: The shorter the query is, the more likely it is for a search engine to use it to match documents, but harder to get the exact right answer. The longer the query is, the less likely it is for a search engine to find a match in the first place. However, it will be easier to find the right answer, if there is a match. From a statistical point of view, the bigger a set of elements, the more complicated the combinations can get. Hence, it is harder to get an exact match. Again, this problem in search can be solved through Natural Language Processing. That is to use synonyms to substitute each other and form multiple searches in order to retrieve more results. Otherwise too few results or even no results are expected like today. This is one of the important research issues at Lexxe.
Ten words or less is just an optimized question length for Lexxe at the moment. We will try to overcome this restriction step by step in the future.
Q: Can you show us some example questions Lexxe can answer?
A: You can find some of the example questions on Lexxe's help page.
Q: When I cannot find the answer with a short question, what should I do?
A: The question answering can best be used for finding factual answers. When you cannot find an answer, it might be a number of reasons. - Check your spelling.
- Remember to put a question word in the beginning of a question, such as "who", "what", "which", "when", "where", "why" and "how".
- Ask considerately. For example, don't ask a question like "What is the tallest mountain?", if you want to know what the tallest mountain in the world is. Instead, ask "What is the tallest mountain in the world?", because just "the tallest mountain" may mean the tallest mountain in a country or continent, etc. Other potential answers will make the one you want less outstanding.
- Be a bit formal, when choosing words for queries on formal matters. For example, "who is the head of Australia?" ® "who is the head of state of Australia?".
- Change the wording or word order of your question, e.g. "when is Prince Charles birthday?" ® " when is the birthday of prince charles?".
- There might be no answer to the question at all. Try to use key words to search instead.
Q: How much will human rely on search engine in the future? And why?
A: No one can have a better memory than a computer and no one can be as fast as a computer when it retrieves information. We all live in a time, when information is crucial in our life and work. Therefore, it is not difficult to imagine that human will increasingly rely on it on a daily basis. The more advanced and accurate the search engines become, the more human will rely on them.
Q: How do you see search engine evolve in the next 50 years?
A: It fundamentally depends on how successful Natural Language Processing technology develops from now. Then the question is can Natural Language Process be successful. Compared to the Human Genome Project that identify all the approximately 20,000-25,000 genes in human DNA that took about 13 years and millions of US dollars, Natural Language Processing can reach its goal, given the same funding condition and period of time necessary. Over the past two decades, Natural Language Processing has achieved some remarkable progress, e.g. Part-of-speech Tagging, Parsing and Word Sense Disambiguation.
But generally speaking search engine will go from the current search engines of the 2nd generation to Natural Language Processing-driven 3rd generation featuring question answering in Natural Language. Search engines may become dialogue machines, which can have much better communication skills, just like human to human conversation. One can talk (both in written and spoken, depending on the progress in speech recognition) to a dialogue machine from topics in history to physics and from arts to genetics. By then the Internet will be a million times bigger than the current one and computer speed will be about 100,000 times faster than Pentium4 in just a few years from now (according to Intel research). It will be a lot easier to get question answered than now. If the Internet connected dialogue machine is portable, it is even better. Every time one has a question, an answer could be instantly provided by the dialogue machine.
[Note: We will continuously publish user's questions and our answers on this web page. If you have any questions, please don't hesitate to contact us at contact@lexxe.com. Thank you for your interest in Lexxe.]
|