lucille

Lucille is a small library for parsing and representing queries using the Lucene query syntax.

Usage

This library is currently available for Scala binary versions 2.12, 2.13, and 3.

Additionally, it's available for the JVM, Scala.js, and Scala Native.

To use the latest version, include the following in your build.sbt:

// use this snippet for the JVM
libraryDependencies += "pink.cozydev" %% "lucille" % "0.0.2"

// use this snippet for JS, Native, or cross-building
libraryDependencies += "pink.cozydev" %%% "lucille" % "0.0.2"

Parsing

Lucille offers a parse function to parse a whole string into a Lucille Query structure:

import pink.cozydev.lucille.QueryParser

QueryParser.default.parse("cats OR dogs")
// res0: Either[String, pink.cozydev.lucille.Query] = Right(
//   value = Or(
//     qs = NonEmptyList(
//       head = Term(str = "cats"),
//       tail = List(Term(str = "dogs"))
//     )
//   )
// )

The default QueryParser automatically inserts an OR operation inbetween consecutive terms.

QueryParser.withDefaultOperatorOR.parse("cats dogs")
// res1: Either[String, pink.cozydev.lucille.Query] = Right(
//   value = Or(
//     qs = NonEmptyList(
//       head = Term(str = "cats"),
//       tail = List(Term(str = "dogs"))
//     )
//   )
// )

This can be changed to an AND operation via withDefaultOperatorAND:

QueryParser.withDefaultOperatorAND.parse("cats dogs")
// res2: Either[String, pink.cozydev.lucille.Query] = Right(
//   value = And(
//     qs = NonEmptyList(
//       head = Term(str = "cats"),
//       tail = List(Term(str = "dogs"))
//     )
//   )
// )

Printing

Lucille offers a printer to format Querys as Lucene query strings:

import pink.cozydev.lucille.Query
import pink.cozydev.lucille.QueryPrinter

QueryPrinter.print(Query.And(Query.Term("cats"), Query.Term("dogs")))
// res3: String = "cats AND dogs"

Because the numeric value of a query boost parameter is modelled as a Float, the query printer has a precision parameter it uses to round the boost parameter for pretty printing:

val queryWithBoost = Query.Boost(Query.Phrase("apple pi"), 3.14159265f)
// queryWithBoost: Query.Boost = Boost(
//   q = Phrase(str = "apple pi"),
//   boost = 3.1415927F
// )

// the default precision is 2
QueryPrinter.print(queryWithBoost)
// res4: String = "\"apple pi\"^3.14"

QueryPrinter.print(queryWithBoost, precision=5)
// res5: String = "\"apple pi\"^3.14159"

Last Query Rewriting

To enable a better interactive search experience, it can be helpful to rewrite the last term as a prefix term to enable partial matching on terms.

We'll write a helper function expandQ to rewrite Term queries into a query that matches either that term OR a Prefix query:

import pink.cozydev.lucille.Query

def expandQ(q: Query): Query =
  q match {
    case Query.Term(t) => Query.Or(Query.Term(t), Query.Prefix(t))
    case _ => q
  }

We can now use expandQ along with mapLastTerm to rewrite the last term of a Query into our expanded term + prefix:

QueryParser.parse("cats meo").map(mq => mq.mapLastTerm(expandQ))
// res6: Either[String, Query] = Right(
//   value = Or(
//     qs = NonEmptyList(
//       head = Term(str = "cats"),
//       tail = List(
//         Or(
//           qs = NonEmptyList(
//             head = Term(str = "meo"),
//             tail = List(Prefix(str = "meo"))
//           )
//         )
//       )
//     )
//   )
// )

This also works when the last term is part of a boolean or field query.

QueryParser.parse("cats AND do").map(mq => mq.mapLastTerm(expandQ))
// res7: Either[String, Query] = Right(
//   value = And(
//     qs = NonEmptyList(
//       head = Term(str = "cats"),
//       tail = List(
//         Or(
//           qs = NonEmptyList(
//             head = Term(str = "do"),
//             tail = List(Prefix(str = "do"))
//           )
//         )
//       )
//     )
//   )
// )

Associativity

Queries may contain a mix of AND/OR operators, e.g. cats AND dogs OR fish. It is best to add parenthesis to help indicate your intent, either (cats AND dogs) OR fish or cats AND (dogs OR fish), as both of these queries could evaluate differently. In the absence of clarifying parenthesis, Lucille parses according to the precedence of the boolean operators. The highest and most immediately binding operator is NOT, then AND and finally OR.

Consider the following examples:

NOT a AND b         ->  (NOT A) AND b
a AND NOT b         ->  A AND (NOT b)
a AND b OR x        ->  (a AND b) OR x
a AND b OR x AND y  ->  (a AND b) OR (x AND y)
a AND b AND c OR x  ->  (a AND b AND c) OR x

It's worth noting that the last example could equivalently be written as ((a AND b) AND c) OR x or (a AND (b AND c)) OR x. However, Lucille parses sequences of repeated operators into a single Query.And or Query.Or node to avoid unnecessary nesting.