WordPress Search Improvements &
the Wonders of Elasticsearch

By Flynn O’Connor - @thoronas

My name is
Flynn O’Connor

  • Developed WordPress themes for 5 years.
  • Lead developer at Forge and Smith
  • Spent the last year investigating ways to improve internal search of WordPress

In the beginning...

  • WordPress search does fulltext search on post content and post title.
  • Before WordPress 3.7 search results were ordered by post date
  • 3.7 added basic relevant post ordering on search results.

WordPress is growing

  • text search in mySQL scales poorly
  • Becomes practically unusable at scale.
  • Users need more fine grained control
  • Search can be an important part of content strategy.

What are your users doing?

Tracking and analyzing user search behaviour
we can learn:

  • Where our site navigation is failing.
  • What people are interested in.
  • How we can improve site profitability (if eCommerce)

Search Analytics

Cheapest and easiest way is via Google analytics

Where to set search analytics

What to set

WordPress search query by default use "s" as a parameter. That is the search parameter in wp_query.

Google analytics provides fantastic in-depth search stats.

For more information about how to analyze site search data read Internal Site Search Analysis: Simple, Effective, Life Altering! by Avinash Kaushik

User Behaviour

  • Search functionality can differ from site to site.
  • This will depend on your content.
  • It will also depend on your users.
    • Are your users technologically savvy?
    • Are they familiar with your content?
    • How did they find your site in the first place?

User Goals

Taxonomy of Web Search is a research paper on the three broad categories most searches fall into.

  • Navigational searches
  • Information searches
  • Transactional searches

Precision or Recall?

Understanding user goals provides insight for tuning your search for their needs.

  • Precision: Finding the most relevant documents
  • Recall: Find all the relevant documents

Search Anti Patterns

Common patterns that result in counterproductive search experiences.

Thrashing

If a user is unfamiliar with your content they can enter key words that might not reflect their intentions. Trying variations of flawed search terms return poor results, leads to frustrated users.

Pogo Sticking

When a user is constantly jumping back and forth between SERP and individual results.

Search Patterns

Your search engine results page (SERP) needs to be structured in a way that helps users find the results they're looking for.

  • Ensure relevant info is returned per post.
  • Style visited links
  • Highlight search term

Auto Complete

Help users find exactly what they are looking for faster.

WP Search Suggest by Konstantin Obenland

Suggest Alternative Queries

Guide the user when they are exploring your content.

Relevanssi by Ville Saari

Advanced Functionality

Control weighting of your content relevance.

Search meta, taxonomies, custom post types.

Keyword stemming

Search WP by Jonathan Christopher

What are Stemmers?

Creating tokens out of the root of words

"Computers, Computing, Computes"

[Comput]

Facets

Filter search results by taxonomies or meta data.

Facet WP by Matt Gibbs

Growing beyond
internal solutions

Performance issues become too difficult to overcome.

Third party solutions that work with WordPress search functionality are a better solution

Resources

Questions?

Before we dive into Elasticsearch,
are there questions so far?

Why Use Elasticsearch

  • Very fast
  • Scales well
  • Advanced queries
  • Robust faceting
  • Great geo search!

How to integrate Elasticsearch to WordPress

  • Create an index and map how documents are stored
  • Index WordPress posts in Elasticsearch
  • Hijack WordPress default search
    • Query Elasticsearch
    • Replace WordPress search query with results

Connecting to Elasticsearch

The HTTP API

Make remote requests from WordPress to Elasticsearch

$url = 'http://local.wordpress.dev:9200/{index}/{type}/{action}';
$args = array('method' => 'GET');
$response = wp_remote_request( $url, $args );

Let's create an index

Creating a basic index in Elasticsearch.

$url = 'http://local.wordpress.dev:9200/wp-index';
$args = array('method' => 'PUT');
$response = wp_remote_request( $url, $args );

Mapping WordPress post data

Specify a document type called posts in our index.

$mapping = array(
  'mappings' => array(
    'post' => array( // post property field mappings go here. )
  ),
  'settings' => array( // custom settings & analysis goes here )
);							
						

Inside the post mappings

'post' => array( 
  'properties' => array(
    'post_title' => array( 'type' => 'string' ),
    'post_content' => array( 'type' => 'sting' ),
    'post_id' => array( 'type' => 'long' ),
    'post_date' => array(
	  'type' => 'date',
	  'format' => 'YYYY-MM-dd HH:mm:ss',
    )
  )
)

Elasticsearch Core Field Types

  • string - text
  • integer - 32 bit integers
  • long - 64 bit integers
  • float - IEEE floats
  • double - double precision floats
  • boolean - true/false
  • date - UTC date/time
  • geo_point - Lat/Long

Advanced Field Types

Elasticsearch also supports array, object, and multi-field types.

'post_author' => array( 
  'type' => 'multi_field',
    'fields' => array(
      'author' => array( 'type' => 'string' ),
      'author_raw' => array(
        'type' => 'string',
        'index' => 'not_analyzed'
      )
    )	
  )
)

Dynamic templates!

What happens if you add new data fields?

Example: Custom Taxonomies

"dynamic_templates" => array(
  array(
    "template_terms" => array(
      "path_match" => "terms.*",
      "mapping" => array(
        "type" => "object",
          "properties" => array(
            "name" => array( "type" => "string" ),
            "term_id" => array( "type" => "long" )
          )
        )
      )
    )
  )
)

Declaring a document type

The mapping created is for a document type we'll call 'post'

$url = 'http://local.wordpress.dev:9200/wp-index/post/_mapping';  
$mapping = array(
  'post' => array(
    'properties' => array(
      // all the fields we declared previously
    )
  )
);
$body = json_encode($mapping); 
$args = array('method' => 'PUT', 'body' => $body); 
$test = wp_remote_request( $url, $args );

Passing Post Content to Elasticsearch

Index & mapping done we can now populate with content

To do so we need to do the following:

  1. Get the posts within WordPress
  2. JSON encode the post data
  3. Send the post data to Elasticsearch

Getting the posts

Match the post data to our mapping.

$post_for_ES = array(   
  'post_title' => get_the_title(),
  'post_content' => get_the_content(),
  'post_id' => get_the_ID(),
  'post_date' => get_the_date()
);

Once the posts have been encoded we have two options for sending them to posts

  • Single Posts - Use HTTP verb PUT
  • Multiple Posts - Use elasticsearch Bulk API

Single Post

$url = 'http://local.wordpress.dev:9200/wp-index/post/1';
$post_content = json_encode($post_for_ES);
$args = array('method' => 'PUT', 'body' => $post_content);
$response = wp_remote_request( $url, $args );

Bulk API

When using the bulk API index info must precede post data.

$url = 'http://local.wordpress.dev:9200/wp-index/_bulk';
$post_for_ES[] = array('index'=> array('_id' => $post->ID));
$post_for_ES[] = array( 'post_title' => get_the_title( ), //etc	);
$post_content = json_encode($post_for_ES);
$args = array('method' => 'PUT', 'body' => $post_content);
$response = wp_remote_request( $url, $args );

Searching!

In order to hijack WordPress default search functionality we need to do the following:

  • Grab the search query before querying the Database
  • Query Elasticsearch instead
  • Parse the post ids from the results
  • Replace the search query with ES results

pre_get_posts to the rescue!

Use pre_get_posts to capture search query.

Query Elasticsearch and return array of post id's

function search_filter($query){
  if ( !is_admin() && $query->is_main_query() && $query->is_search ){
    $search_query = stripslashes( get_search_query( false ) );
    $elasticsearch_posts = elasticsearch_function($search_query);
    set_query_var( 'post__in', $elasticsearch_posts);
    set_query_var( 'orderby', 'post__in'); 
  }
}
add_action('pre_get_posts','search_filter');

Make sure to nuke the default WordPress search.

function clear_sql_search_clause( $search ) {
  if( is_search() && ! is_admin() ) {
    $search = '';
  }
  return $search;
}
add_filter( 'posts_search', 'clear_sql_search_clause');

Querying Elasticsearch

Querying documents in Elasticsearch utilizes several APIs that are nested in the Search API:

  • Query DSL - 39 different query types
  • Filter API - 27 different filter types
  • Aggregations API - 20 different facet types

Querying WordPress data

Querying multiple fields is a common requirement.

Example: query post title and post content.

$ES_query = array(
  'query' => array(
    // specify query type
    'multi_match' => array(
      // the query term
      'query' => 'beer',
      // what fields to search through
      'fields' => array('post_title^2', 'post_content')
    )
  )
);

Take the constructed query, pass it to HTTP API.

//search our wp-index
$url = "http://local.wordpress.dev:9200/wp-index/_search";
$method = "POST";
//pass the query we constructed in the previous slide
$body = json_encode($ES_query);
$arg = array ( 'method' => $method, 'body' => $body);
$request = wp_remote_request ($url, $arg);

Filter Queries

Run a nested query after applying filters

$ES_query = array(
  'query' => array(
    'filtered' => array(
      'query' => array(
        'multi_match' => array(
          'query' => 'beer',
            'fields' => array('post_title^2', 'post_content')
          )
        ),
        'filter' => array(
          'term' => array(
            "post_author.author_raw" => "Flynn"
          )
        )
      ))));

Parse the results

We've queried Elasticsearch and got results!

We need to parse the results.

//our Elasticsearch query from previous slides
$request = wp_remote_request ($url, $arg); 

//grab the body of request which has the found posts
$results = json_decode(wp_remote_retrieve_body($request));

//pass the results into variable.
$hits = $results->hits->hits;

// do what you want with the data from here. 

Just the beginning

Some examples of Elasticsearch functionality

  • Improve e-commerce product searching with price ranges
  • Search for posts within a google maps boundaries
  • real time auto complete of almost any data type.
  • Create custom analyzers for unqiue search applications

Aggregations

When querying Elasticsearch you can specify particular data to be returned as facets or aggregations.

Useful for getting aggragate date on:

  • Posts in Categories
  • Number of products within price ranges
  • Geo distance from a location
  • Combine aggregations to make custom aggregates

Third Party API's

WordPress.com related posts available through Jetpack.

Swiftype provides managed Elasticsearch functionality.

Customize either using Elasticsearch API's

Useful Tools

Resources

Thank you for listening!

Questions?