Querying
We'll quickly setup the same index from the Indexing Tutorial:
import pink.cozydev.protosearch.{Field, IndexBuilder}
import pink.cozydev.protosearch.analysis.Analyzer
case class Book(author: String, title: String)
val books: List[Book] = List(
Book("Beatrix Potter", "The Tale of Peter Rabbit"),
Book("Beatrix Potter", "The Tale of Two Bad Mice"),
Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish"),
Book("Dr. Seuss", "Green Eggs and Ham"))
val analyzer = Analyzer.default.withLowerCasing
val index = IndexBuilder.of[Book](
(Field("title", analyzer, stored=true, indexed=true, positions=true), _.title),
(Field("author", analyzer, stored=true, indexed=true, positions=false), _.author),
).fromList(books)
def search(q: String): List[Book] =
index.search(q)
.map(hits => hits.map(h => books(h.id)))
.fold(_ => Nil, identity)
Now we can use our search
function to explore some different query types!
Term Queries
The following term queries represent some of the primitive operations on the term dictionary. In general they specify a way to match one or more terms, and then the query as a whole matches documents containing those terms.
Term Query
The most basic query is a single term query. Only documents that contain the term will match.
search("fish")
// res0: List[Book] = List(
// Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish")
// )
Prefix Query
A prefix query specifies all terms with a given prefix, and then matches all documents containing those terms.
search("egg*")
// res1: List[Book] = List(Book("Dr. Seuss", "Green Eggs and Ham"))
Range Query
A range query similarly specifies a range of terms, and then matches all documents containing those terms.
search("[fi TO gz]") // matching 'fish' and 'green'
// res2: List[Book] = List(
// Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish"),
// Book("Dr. Seuss", "Green Eggs and Ham")
// )
Phrase Query
A phrase query is made up of one or more terms surrounded by double quotes, and matches documents containing those terms in exactly that order.
search("\"red fish, blue fish\"")
// res3: List[Book] = List(
// Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish")
// )
Boolean Queries
Boolean queries allow us combine multiple queries together with boolean logic, using the OR
, AND
, and NOT
combinators.
search("fish OR ham")
// res4: List[Book] = List(
// Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish"),
// Book("Dr. Seuss", "Green Eggs and Ham")
// )
search("red AND blue")
// res5: List[Book] = List(
// Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish")
// )
search("tale AND NOT mice")
// res6: List[Book] = List(Book("Beatrix Potter", "The Tale of Peter Rabbit"))
Group Query
As queries get more complex, it can be helpful to group together parts with parenthesis.
search("(red OR blue OR green) AND (fish OR mice OR ham)")
// res7: List[Book] = List(
// Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish"),
// Book("Dr. Seuss", "Green Eggs and Ham")
// )
Field Query
The field query allows a user to specify the unique field a query should match. Field queries work with term queries:
search("author:seuss")
// res8: List[Book] = List(
// Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish"),
// Book("Dr. Seuss", "Green Eggs and Ham")
// )
search("author:beatri*")
// res9: List[Book] = List(
// Book("Beatrix Potter", "The Tale of Peter Rabbit"),
// Book("Beatrix Potter", "The Tale of Two Bad Mice")
// )
Additionally, field queries can take more complex boolean queries if specified in a group:
search("author:([b TO e] AND NOT dr*)")
// res10: List[Book] = List(
// Book("Beatrix Potter", "The Tale of Peter Rabbit"),
// Book("Beatrix Potter", "The Tale of Two Bad Mice")
// )
Regex Query
Regex queries allow for greater query flexibility by utilizing powerful regular expressions.
search("/jump.*")
// res11: List[Book] = List()