Lobsters »

Search Queries

Contents

The core concept here is search queries.

Concept

A search query is an object, which is uniquely identified by its parameters list—an unordered set of key/value pairs. The content of a search query is its results list—an ordered set of content objects. (For a search query to be correct, at any given time, its results list must contain all and only those content objects which are, at that time, matched by the specified parameters.)

Content objects

What is a content object? In a basic implementation of a system like WL, a content object might be “any post or comment”. Other things could be included, though. For example, could a media asset, such as an image, a video clip, a file attachment, etc., be a content object? It could. Could a user profile be a content object? Sure, why not? (In general, the set of all content objects will, or should, be the set of all taggable entities.)

Parameters

What parameters ought our specification permit? The following are some obvious candidates:

  • “Contains string” (for simple string search in content objects which are, or contain, string data)
  • “Contents match regexp” (as above, but regexp search instead of string search)
  • “[metadata field X] is { equal to | contains } given value”
  • for numeric-type metadata: “value of [metadata field X] is { less than | greater than } given value”
  • pre-constructed higher-order functions which supersede on any of the above
    • e.g. “contents contain hyperlinks to any/all of [list of content objects]”, which may be replicated via regexps, etc., but the more of these the system can provide, the better
  • nested Boolean combinations of any of the above
  • one or more fields on which to order the results list
    • e.g., by creation time; alphabetically; by karma; etc.
  • count & offset (i.e., “get only the first N entries”; “get N entries, at offset M”)

This list is not exhaustive.

Applications

What good is this?

Basic applications

Viewed through the lens of the search query (as defined above), an actual search feature (as in, “type some text into a text field, click Search, and be presented with a page of links to things that somehow ‘match’ your search”) is understood to be “input a definition [using some suitably defined syntax] of a parameters list into a text field, click Search, and be presented with a page which displays the results list [with each entry transcluded, linked, excerpted, etc., as appropriate] of the search query which the specified parameters list identifies”.

So far, so basic. But our definition is flexible enough to fit other contexts. For example:

We may expose an API endpoint which accepts a parameters list defined via GET parameters. This endpoint, when queried, responds with an XML document, which describes an RSS feed containing the results list of the search query which the specified parameters list identifies. We now, with no further work, have “dynamic RSS feeds”.

We can get even fancier, though. As we said, search queries are objects. As objects, can they have persistence? Sure. As persistent objects, can search queries have other useful properties / functionality? Yes.

Advanced applications

A persistent search query, to be useful, must be a live search query. A live search query has the property that it is always correct (its results list always contains all and only those content objects which match its parameters), rather than merely being guaranteed to be correct when accessed in response to a user action (as is the case with ordinary, non-persistent search queries).

(How might this be implemented? Many possibilities exist, which are differentiated mostly by the time-granularity of their correctness guarantee.)

What good are live search queries?

A live search query might have a list of subscribers. Each subscriber is a user. When the results list of a live search query changes, each subscriber gets a notification.

A search query (live or otherwise) might also have defined views. A view is a rendering of the search query’s results list (into HTML, say, and with some particular structure, formatting, etc.). Views may be built-in or user-defined. A live search query’s views would also update automatically when its results list updates. This allows results lists of search queries to be embedded as “live content” into other content (or views thereon) with negligible performance penalty.

A search query (live or otherwise) may have computed properties. These may range from a simple “count” property to all sorts of aggregative (e.g., statistical), conditional, or other properties. Computed properties may be built-in or user-defined (and a live search query would allow user-defined computed properties to be saved and permanently associated with itself). A live search query would update its computed properties automatically when its results list changed. Views on a live search query would be able to reference its computed properties. This, too, would allow powerful embedding functionality with minimal performance penalty.

Other applications are possible.

Further reading

  1. PmWiki pagelists
  2. John Siracusa on Spotlight
  3. John Siracusa on the Finder

Further notes

Almost every page of content on GreaterWrong is (that is, can be understood as) a view on a search query.

The front page is a view on a query of the form “posts in the ‘frontpage’ category; order by { creation time, descending | ‘hotness’ value [a metadata field] }; first 20 only”. The view adds, among other things, links to similar views on other, similar queries (such as “ditto but #s 21–40”).

A post page is a view on a query of the form “the first post named <name> [this will, of course, return exactly 1 post]”. The view embeds another view—this one on a search query of the form “all comments which are replies to this post”; that view lists the comment, then embeds itself (but with the parent comment in place of the post); this leads to view recursion, a.k.a. threading.

An alternative formulation of the preceding paragraph is as follows:

A post page is a view on a query of the form “the first post named <name>”. The view embeds another view—this one on a search query of the form “all comments which are associated with this post”; that view arranges the comments hierarchically, each comment placed beneath its parent in a tree structure. (This formulation requires that views be able to call upon non-trivial logic, which is functionality that the system may or may not have.)

A user page is a view on a query of the form “all posts and comments authored by <author>; order by { creation time, descending | karma value}; first 20 only”.

All of these content views, when seen in the properly general way—i.e, views on search queries—naturally become viewable via RSS, exportable in arbitrary formats (JSON, etc.), transcludable, analyzable, etc.

Data export

An API endpoint that accepts a parameters list via GET and responds with the results list in (for instance) JSON format, constitutes an extremely powerful and straightforward data export feature.

Custom content views

An HTML-rendering view on a search query can be inserted into modular content areas. This allows a site administrator (or the administrator of a “subforum” or similar “site area”) to easily construct layouts with custom content views. Even more powerfully, it allows each user to construct custom layouts for pages they control (thus allowing easy and customizable construction of “user dashboards”, “personal pages”, etc.).