sirensolutions/siren-join

Name: siren-join

Owner: Siren

Description: [This is the old, single node version for Elasticsearch 2.x, see the latest "Siren Federate" plugin for distributed Elasticsearch 5.x and 6.x capabilities]

Created: 2015-10-13 16:09:31.0

Updated: 2018-05-17 03:03:30.0

Pushed: 2017-11-07 10:24:29.0

Homepage: http://siren.io

Size: 478

Language: Java

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

:warning: This project (Siren “Join”) is superseded by the new Siren “FEDERATE” plugin (AKA Vanguard).

Siren Federate is capable of fully distributed (scale with the number of machines) Elasticsearch joins and can even perform joins across multiple backends making JDBC datasources appear as if they were Elasticsearch indexes.

Siren Federate is available for Elasticsearch 5.x, and soon 6.x

For more information and downloads see http://siren.io

(Superseded) The SIREn Join Plugin for Elasticsearch 2.x

This plugin extends Elasticsearch with new search actions and a filter query parser that enables to perform a “Filter Join” between two set of documents (in the same index or in different indexes).

The Filter Join is basically a (left) semi-join between two set of documents based on a common attribute, where the result only contains the attributes of one of the joined set of documents. This join is used to filter one document set based on a second document set, hence its name. It is equivalent to the EXISTS() operator in SQL.

Compatibility

The following table shows the compatibility between releases of Elasticsearch and the SIREn Join plugin:

Elasticsearch|SIREn Join —|— 2.4.5|2.4.5 2.4.4|2.4.4 2.4.3|2.4.3 2.4.2|2.4.2-1 2.4.1|2.4.1-1 2.3.5|2.3.5-1 2.3.4|2.3.4-1 2.3.3|2.3.3-1 2.2.0|2.2.0-1 2.1.2|2.1.2 2.1.1|2.1.1 1.7.x|1.0

Installing the Plugin
Online Download

You can use the following command to download the plugin from the online repository:

$ bin/plugin install solutions.siren/siren-join/2.4.4
Offline Download
Manual

Alternatively, you can assemble it via Maven (you must build it as a non-root user):

t clone git@github.com:sirensolutions/siren-join.git
 siren-join
n package

This creates a single Zip file that can be installed using the Elasticsearch plugin command:

$ bin/plugin install file:/PATH-TO-SIRENJOIN-PROJECT/target/releases/siren-join-2.4.4.zip
Interacting with the Plugin

You can now start Elasticsearch and see that our plugin gets loaded:

$ bin/elasticsearch
...
[2013-09-04 17:33:27,443][INFO ][node    ] [Andrew Chord] initializing ...
[2013-09-04 17:33:27,455][INFO ][plugins ] [Andrew Chord] loaded [siren-join], sites []
...

To uninstall the plugin:

$ bin/plugin remove siren-join
Usage
Coordinate Search API

This plugin introduces two new search actions, _coordinate_search that replaces the _search action, and _coordinate_msearch that replaces the _msearch action. Both actions are wrappers around the original elasticsearch actions and therefore supports the same API. One must use these actions with the filterjoin filter, as the filterjoin filter is not supported by the original elaticsearch actions.

Parameters
Example

In this example, we will join all the documents from index1 with the documents of index2. The query first filters documents from index2 and of type type with the query { "terms" : { "tag" : [ "aaa" ] } }. It then retrieves the ids of the documents from the field id specified by the parameter path. The list of ids is then used as filter and applied on the field foreign_key of the documents from index1.

{
  "bool" : {
    "filter" : {
      "filterjoin" : {
        "foreign_key" : {
          "indices" : ["index2"],
          "types" : ["type"],
          "path" : "id",
          "query" : {
            "terms" : {
              "tag" : [ "aaa" ]
            }
          }
        }
      }
    }
  }
}
Response Format

The response returned by the coordinate search API is identical to the response returned by Elasticsearch's search API, but augmented with additional information about the execution of the relational query planning. This additional information is stored within the field named coordinate_search at the root of the response, see example below. The object contains the following parameters:

{
  "coordinate_search": {
    "actions": [
      {
        "relations": {
          "from": {
            "indices": ["index2"],
            "types": ["type"],
            "field": "id"
          },
          "to": {
            "indices": null,
            "types": null,
            "field": "foreign_key"
          }
        },
        "size": 2,
        "size_in_bytes": 20,
        "is_pruned": false,
        "cache_hit": false,
        "terms_encoding" : "long",
        "took": 313
      }
    ]
  },
...
}
Performance Considerations
Acknowledgement

Part of this plugin is inspired and based on the pull request 3278 submitted by Matt Weber to the Elasticsearch project.


Copyright (c) 2016, SIREn Solutions. All Rights Reserved.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.