Proper query syntax
Introducing:
Before starting to format search terms please consider the following:
There are two types of terms:
- a single term i.e. a single word such as “test” or “hello”
- a phrase i.e. a group of words surrounded by double quotes such as “hello dolly”
Multiple terms can be combined together with Boolean operators and wildcard searches.
Wildcard Searches:
? – The single character wildcard search looks for terms that match that with the single character replaced. For example: te?t looks for terms such as test or text
* - Multiple character wildcard searches looks for one or more characters. You can also use the wildcard searches in the middle of a term. For example: comp*er looks for term such as computer, composer
You cannot use a * or ? symbol as the first character of a search.
Boolean operators:
The AND operator matches documents where both terms exist anywhere in the text of a single document. The symbol
&& can be used in place of the word AND. To search for documents that contain “jacarta apache” and “Apache Lucene” use the query: “jacarta apache” AND “Apache Lucene”
The OR operator links two terms and finds a matching document if either of the terms exist in a document.The symbol || can be used in place of the word OR. To search for documents that contain either “jacarta apache” or just “jacarta” use the query: “jakarta apache” OR jakarta
The NOT operator excludes documents that contain the term after NOT. The symbol ! can be used in place of the word NOT. To search for documents that contain “jakarta apache” but not “Apache Lucene” use query: “jakarta apache” NOT “Apache Lucene”
The „+” or required operator requires that the term after the “+” symbol exist somewhere in the field of a single document. To search for documents that must contain “jakarta” and may contain “lucene” use the query: +jakarta lucene
The „- ” or prohibit operator excludes documents that contain the term after the “-“ symbol. To search for documents that contain “jakarta apache” but not ‘Apache Lucene” use the query: “jakarta apache” –“Apache Lucene”
Grouping:
You can group clauses using parentheses to form subqueries. To search for either “jakarta” or “apache” and “website” use the query: (jakarta OR apache) AND website. This eliminates any confusion and makes sure you that website must exist and either term jakarta or apache may exist.
Special character:
The system suports escaping special characters that are part of the query syntax. The current list of special characters contain: + - && || ! () {} [] ^ “ ~ * ? : \. To escape these character use the \ before the character. For example to search for (1+1):2 use the query: \ (1\+1\) \ : 2
Fuzzy searches:
The system gives a new kind of search which is caled a fuzzy search. To do a fuzzy search use the tilde, “~”, symbol at the end of single word term. For example to search for a term similar in spelling to “roam” use the fuzzy search: roam~. An additional parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with higher similarity will be matched. For example: roam~0,8. The default that is used if the parameter is not given is 0,5.
Resources:
Original query syntax resource: Jakarta Lucene Query Parser Syntax.