Querying

We'll quickly setup the same index from the Indexing Tutorial:

import pink.cozydev.protosearch.{Field, IndexBuilder}
import pink.cozydev.protosearch.analysis.Analyzer

case class Book(author: String, title: String)
val books: List[Book] = List(
  Book("Beatrix Potter", "The Tale of Peter Rabbit"),
  Book("Beatrix Potter", "The Tale of Two Bad Mice"),
  Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish"),
  Book("Dr. Seuss", "Green Eggs and Ham"))

val analyzer = Analyzer.default.withLowerCasing
val index = IndexBuilder.of[Book](
  (Field("title", analyzer, stored=true, indexed=true, positions=true), _.title),
  (Field("author", analyzer, stored=true, indexed=true, positions=false), _.author),
).fromList(books)

def search(q: String): List[Book] =
  index.search(q)
    .map(hits => hits.map(h => books(h.id)))
    .fold(_ => Nil, identity)

Now we can use our search function to explore some different query types!

Term Queries

The following term queries represent some of the primitive operations on the term dictionary. In general they specify a way to match one or more terms, and then the query as a whole matches documents containing those terms.

Term Query

The most basic query is a single term query. Only documents that contain the term will match.

search("fish")
// res0: List[Book] = List(
//   Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish")
// )

Prefix Query

A prefix query specifies all terms with a given prefix, and then matches all documents containing those terms.

search("egg*")
// res1: List[Book] = List(Book("Dr. Seuss", "Green Eggs and Ham"))

Range Query

A range query similarly specifies a range of terms, and then matches all documents containing those terms.

search("[fi TO gz]") // matching 'fish' and 'green'
// res2: List[Book] = List(
//   Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish"),
//   Book("Dr. Seuss", "Green Eggs and Ham")
// )

Phrase Query

A phrase query is made up of one or more terms surrounded by double quotes, and matches documents containing those terms in exactly that order.

search("\"red fish, blue fish\"")
// res3: List[Book] = List(
//   Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish")
// )

Boolean Queries

Boolean queries allow us combine multiple queries together with boolean logic, using the OR, AND, and NOT combinators.

search("fish OR ham")
// res4: List[Book] = List(
//   Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish"),
//   Book("Dr. Seuss", "Green Eggs and Ham")
// )
search("red AND blue")
// res5: List[Book] = List(
//   Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish")
// )
search("tale AND NOT mice")
// res6: List[Book] = List(Book("Beatrix Potter", "The Tale of Peter Rabbit"))

Group Query

As queries get more complex, it can be helpful to group together parts with parenthesis.

search("(red OR blue OR green) AND (fish OR mice OR ham)")
// res7: List[Book] = List(
//   Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish"),
//   Book("Dr. Seuss", "Green Eggs and Ham")
// )

Field Query

The field query allows a user to specify the unique field a query should match. Field queries work with term queries:

search("author:seuss")
// res8: List[Book] = List(
//   Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish"),
//   Book("Dr. Seuss", "Green Eggs and Ham")
// )
search("author:beatri*")
// res9: List[Book] = List(
//   Book("Beatrix Potter", "The Tale of Peter Rabbit"),
//   Book("Beatrix Potter", "The Tale of Two Bad Mice")
// )

Additionally, field queries can take more complex boolean queries if specified in a group:

search("author:([b TO e] AND NOT dr*)")
// res10: List[Book] = List(
//   Book("Beatrix Potter", "The Tale of Peter Rabbit"),
//   Book("Beatrix Potter", "The Tale of Two Bad Mice")
// )

Regex Query

Regex queries allow for greater query flexibility by utilizing powerful regular expressions.

search("/jump.*")
// res11: List[Book] = List()