简介

概述

Elasticsearch，基于Lucene，隐藏复杂性，提供简单易用的RestfulAPI接口、JavaAPI接口（还有其他语言的API接口）。
Elasticsearch是一个实时分布式搜索和分析引擎。它用于全文搜索、结构化搜索、分析。
- 全文检索：将非结构化数据中的一部分信息提取出来,重新组织,使其变得有一定结构,然后对此有一定结构的数据进行搜索,从而达到搜索相对较快的目的。
- 倒排索引：简单举例：根据关键词找包含其的文章（正常思维：在文章中找关键词）。
- 结构化检索：我想搜索商品分类为日化用品的商品都有哪些，select * from products where category_id=’日化用品’。
- 数据分析：电商网站，最近7天牙膏这种商品销量排名前10的商家有哪些；新闻网站，最近1个月访问量排名前3的新闻版块是哪些。
可以作为一个大型分布式集群（数百台服务器）技术，处理PB级数据，服务大公司；也可以运行在单机上，服务小公司.

使用场景

维基百科，类似百度百科，牙膏，牙膏的维基百科，全文检索，高亮，搜索推荐。
The Guardian（国外新闻网站），类似搜狐新闻，用户行为日志（点击，浏览，收藏，评论）+ 社交网络数据（对某某新闻的相关看法），数据分析，给到每篇新闻文章的作者，让他知道他的文章的公众反馈（好，坏，热门，垃圾，鄙视，崇拜）。
Stack Overflow（国外的程序异常讨论论坛），IT问题，程序的报错，提交上去，有人会跟你讨论和回答，全文检索，搜索相关问题和答案，程序报错了，就会将报错信息粘贴到里面去，搜索有没有对应的答案。
GitHub（开源代码管理），搜索上千亿行代码。
国内：站内搜索（电商，招聘，门户，等等），IT系统搜索（OA，CRM，ERP，等等），数据分析（ES热门的一个使用场景）。

核心概念

ElasticSearch与数据库类比

关系型数据库（如Mysql）	非关系型数据库（Elasticsearch）
数据库Database	索引Index
表Table	类型Type(6.0版本之后在一个索引下面只能有一个，7.0版本之后取消了Type)
数据行Row	文档Document(JSON格式)
数据列Column	字段Field
约束 Schema	映射Mapping

安装

1）解压elasticsearch-6.6.0.tar.gz到/opt/module目录下

Code

1	[ys@hadoop102 software]$ tar -zxvf elasticsearch-6.6.0.tar.gz -C /opt/module/

2）在/opt/module/elasticsearch-6.6.0路径下创建data文件夹

Code

1	[ys@hadoop102 elasticsearch-6.6.0]$ mkdir data

3）修改配置文件/opt/module/elasticsearch-6.6.0/config/elasticsearch.yml

Code

1
2
3

[ys@hadoop102 config]$ pwd
/opt/module/elasticsearch-6.6.0/config
[ys@hadoop102 config]$ vim elasticsearch.yml

yml

#-----------------------Cluster-----------------------
cluster.name: my-application
#-----------------------Node-----------------------
node.name: node-102
#-----------------------Paths-----------------------
path.data: /opt/module/elasticsearch-6.6.0/data
path.logs: /opt/module/elasticsearch-6.6.0/logs
#-----------------------Memory-----------------------
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
#-----------------------Network-----------------------
network.host: 192.168.9.102 
#-----------------------Discovery-----------------------
discovery.zen.ping.unicast.hosts: ["192.168.9.102"]

（1）cluster.name

如果要配置集群需要两个节点上的elasticsearch配置的cluster.name相同，都启动可以自动组成集群，这里如果不改cluster.name则默认是cluster.name=my-application，

（2）nodename随意取但是集群内的各节点不能相同

（3）修改后的每行前面不能有空格，修改后的“：”后面必须有一个空格

4）分发至hadoop103以及hadoop104，分发之后修改：

Code

[ys@hadoop102 module]$ xsync elasticsearch-6.6.0/

node.name: node-103
network.host: 192.168.9.103

node.name: node-104
network.host: 192.168.9.104

5）此时启动会报错，要配置linux系统环境（参考：http://blog.csdn.net/satiling/article/details/59697916）

6）启动Elasticsearch

Code

1	[ys@hadoop102 elasticsearch-6.6.0]$ bin/elasticsearch

7）测试elasticsearch

Code

[ys@hadoop102 elasticsearch-6.6.0]$ curl http://hadoop102:9200

{
 "name" : "node-102",
 "cluster_name" : "my-application",
 "cluster_uuid" : "KOpuhMgVRzW_9OTjMsHf2Q",
 "version" : {
  "number" : "6.6.0",
  "build_flavor" : "default",
  "build_type" : "tar",
  "build_hash" : "eb782d0",
  "build_date" : "2018-06-29T21:59:26.107521Z",
  "build_snapshot" : false,
  "lucene_version" : "7.3.1",
  "minimum_wire_compatibility_version" : "5.6.0",
  "minimum_index_compatibility_version" : "5.0.0"
 },
 "tagline" : "You Know, for Search"
}

8）停止集群

Code

1	kill -9 进程号

9）群起脚本

Code

1	[ys@hadoop102 bin]$ vi es.sh

shell

#!/bin/bash
es_home=/opt/module/elasticsearch
case $1  in
 "start") {
 for i in hadoop102 hadoop103 hadoop104
 do
  echo "==============$i=============="
  ssh $i  "source /etc/profile;${es_home}/bin/elasticsearch >/dev/null 2>&1 &"
 done
};;
"stop") {
 for i in hadoop102 hadoop103 hadoop104
 do
  echo "==============$i=============="
  ssh $i "ps -ef|grep $es_home |grep -v grep|awk '{print \$2}'|xargs kill" >/dev/null 2>&1
 done
};;
esac

可视化工具Kibana

Kibana的安装

1、将kibana压缩包上传到虚拟机指定目录

Code

1	[ys@hadoop102 software]$ tar -zxvf kibana-6.6.0-linux-x86_64.tar.gz -C /opt/module/

2、修改相关配置，连接Elasticsearch

Code

1	[ys@hadoop102 kibana]$ vim config/kibana.yml

yml

# Kibana is served by a back end server. This setting specifies the port to use.
server.port: 5601
# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
server.host: "192.168.9.102"
... ...
... ...
# The URL of the Elasticsearch instance to use for all your queries.
elasticsearch.url: "http://192.168.9.102:9200"

3、启动Kibana

Code

1	[ys@hadoop102 kibana]$ bin/kibana

4、浏览器访问：hadoop102:5601 即可操作

操作

命令行操作

核心数据类型

字符串型：text(分词)、keyword(不分词)
数值型：long、integer、short、byte、double、float、half_float、scaled_float
日期类型：date

Mapping

1、手动创建

创建mapping

Code

PUT my_index1
{
  "mappings": {
    "_doc":{
      "properties":{
        "username":{
          "type": "text", 
          "fields": {
            "pinyin":{
              "type": "text"
            }
          }
        }
      }
    }
  }
}

创建文档

Code

PUT my_index1/_doc/1
{
  "username":"haha heihei"
}

查询

Code

GET my_index1/_search
{
  "query": {
    "match": {
      "username.pinyin": "haha"
    }
  }
}

2、自动创建

直接插入文档

Code

PUT /test_index/_doc/1
{
  "username":"alfred",
  "age":1,
  "birth":"1991-12-15"
}

查看mapping

Code

GET /test_index/doc/_mapping

{
  "test_index": {
    "mappings": {
      "doc": {
        "properties": {
          "age": {
            "type": "long"
          },
          "birth": {
            "type": "date"
          },
          "username": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

IK分词器

分词器主要应用在中文上，在ES中字符串类型有keyword和text两种。keyword默认不进行分词，而text是将每一个汉字拆开称为独立的词，这两种都是不适用于生产环境。

keyword分词

Code

GET _analyze
{
  "keyword":"我是程序员"
}

结果展示（会报错error）

text类型的分词

Code

GET _analyze
{
  "text":"我是程序员"
}

结果展示：

Code

{
  "tokens": [
    {
      "token": "我",
      "start_offset": 0,
      "end_offset": 1,
      "type": "<IDEOGRAPHIC>",
      "position": 0
    },
    {
      "token": "是",
      "start_offset": 1,
      "end_offset": 2,
      "type": "<IDEOGRAPHIC>",
      "position": 1
    },
    {
      "token": "程",
      "start_offset": 2,
      "end_offset": 3,
      "type": "<IDEOGRAPHIC>",
      "position": 2
    },
    {
      "token": "序",
      "start_offset": 3,
      "end_offset": 4,
      "type": "<IDEOGRAPHIC>",
      "position": 3
    },
    {
      "token": "员",
      "start_offset": 4,
      "end_offset": 5,
      "type": "<IDEOGRAPHIC>",
      "position": 4
    }
  ]
}

IK分词器安装

1）下载与安装的ES相对应的版本

2）解压elasticsearch-analysis-ik-6.6.0.zip，将解压后的IK文件夹拷贝到ES安装目录下的plugins目录下，并重命名文件夹为ik（什么名称都OK）

Code

1	[ys@hadoop102 plugins]$ mkdir ik

Code

1	[ys@hadoop102 software]$ unzip elasticsearch-analysis-ik-6.6.0.zip -d /opt/module/elasticsearch-6.6.0/plugins/ik/

3）分发分词器目录

Code

1	[ys@hadoop102 elasticsearch-6.6.0]$ xsync plugins/

4）重新启动Elasticsearch，即可加载IK分词器

5）IK测试

ik_smart ：最少切分
ik_max_word：最细粒度划分

Code

get _analyze
{
  "analyzer": "ik_smart",
  "text":"我是程序员"
}

Code

{
    "tokens" : [
        {
            "token" : "我",
            "start_offset" : 0,
            "end_offset" : 1,
            "type" : "CN_CHAR",
            "position" : 0
        },
        {
            "token" : "是",
            "start_offset" : 1,
            "end_offset" : 2,
            "type" : "CN_CHAR",
            "position" : 1
        },
        {
            "token" : "程序员",
            "start_offset" : 2,
            "end_offset" : 5,
            "type" : "CN_WORD",
            "position" : 2
        }
    ]
}

ik_max_word

Code

1	"我","是","程序员","程序","员"

检索文档【重点】

向Elasticsearch增加数据

Code

PUT /atguigu/doc/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": ["sports", "music"]
}

查询数据

Code

1 2	# 协议方法索引/类型/文档编号 GET /atguigu/doc/1

响应

Code

{
  "_index": "atguigu",
  "_type": "doc",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": { // 文档的原始数据JSON数据
    "first_name": "John",
    "last_name": "Smith",
    "age": 25,
    "about": "I love to go rock climbing",
    "interests": [
      "sports",
      "music"
    ]
  }
}

元数据查询

Code

1	GET _cat/indices

全文档检索

Code

1 2	# 协议方法索引/类型/_search GET /atguigu/_doc/_search

字段全值匹配检索[filter]

Code

GET atguigu/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "about": "I love to go rock climbing"
        }
      }
    }
  }
}

字段分词匹配检索[match]

Code

GET atguigu/_search
{
  "query": {
    "match": {
      "about": "I"
    }
  }
}

字段模糊匹配检索[fuzzy]

Code

GET  test/_search
{
  "query": {
    "fuzzy": {
      "aa": {
        "value": "我是程序"
      }
    }
  }
}

聚合检索

Code

GET test/_search
{
  "aggs": {
    "groupby_aa": {
      "terms": {
        "field": "aa",
        "size": 10
      }
    }
  }
}

分页检索

Code

GET movie_index/movie/_search
{
  "query": { "match_all": {} },
  "from": 1,
  "size": 1
}

索引别名 _aliases

索引别名就像一个快捷方式或软连接，可以指向一个或多个索引，也可以给任何一个需要索引名的API来使用。别名带给我们极大的灵活性，允许我们做下面这些：

1）给多个索引分组 (例如， last_three_months)

2）给索引的一个子集创建视图

3）在运行的集群中可以无缝的从一个索引切换到另一个索引

说白了就是功能更强大的视图

创建索引别名

建表时直接声明

Code

PUT movie_chn_2020
{  "aliases": {
      "movie_chn_2020-query": {}
  }, 
  "mappings": {
    "movie":{
      "properties": {
        "id":{
          "type": "long"
        },
        "name":{
          "type": "text"
          , "analyzer": "ik_smart"
        },
        "doubanScore":{
          "type": "double"
        },
        "actorList":{
          "properties": {
            "id":{
              "type":"long"
            },
            "name":{
              "type":"keyword"
            }
          }
        }
      }
    }
  }
}

为已存在的索引增加别名

Code

POST  _aliases
{
    "actions": [
        { "add":    { "index": "movie_chn_xxxx", "alias": "movie_chn_2020-query" }}
    ]
}

也可以通过加过滤条件缩小查询范围，建立一个子集视图

Code

POST  _aliases
{
    "actions": [
        { "add":    
            { "index": "movie_chn_xxxx", 
              "alias": "movie_chn0919-query-zhhy",
               "filter": {
                  "term": {  "actorList.id": "3"
                 }
               }
			 }
		}
    ]
}

查询别名：与使用普通索引没有区别

Code

1	GET movie_chn_2020-query/_search

删除某个索引的别名

Code

POST  _aliases
{
    "actions": [
        { "remove":    { "index": "movie_chn_xxxx", "alias": "movie_chn_2020-query" }}
    ]
}

为某个别名进行无缝切换

Code

POST /_aliases
{
    "actions": [
        { "remove": { "index": "movie_chn_xxxx", "alias": "movie_chn_2020-query" }},
        { "add":    { "index": "movie_chn_yyyy", "alias": "movie_chn_2020-query" }}
    ]
}

查询别名列表

Code

1	GET _cat/aliases?v

索引模板

Index Template 索引模板，顾名思义，就是创建索引的模具，其中可以定义一系列规则来帮助我们构建符合特定业务需求的索引的mappings和 settings，通过使用 Index Template 可以让我们的索引具备可预知的一致性。

常见的场景: 分割索引

分割索引就是根据时间间隔把一个业务索引切分成多个索引。比如把order_info 变成 order_info_20200101,order_info_20200102 …..

这样做的好处有两个：

1、结构变化的灵活性：因为elasticsearch不允许对数据结构进行修改。但是实际使用中索引的结构和配置难免变化，那么只要对下一个间隔的索引进行修改，原来的索引位置原状。这样就有了一定的灵活性。

2、查询范围优化：因为一般情况并不会查询全部时间周期的数据，那么通过切分索引，物理上减少了扫描数据的范围，也是对性能的优化。

创建模板

Code

PUT _template/template_movie2020
{
  "index_patterns": ["movie_test*"],                  
  "settings": {                                               
    "number_of_shards": 1
  },
  "aliases" : { 
    "{index}-query": {},
    "movie_test-query":{}
  },
  "mappings": {                                          
"_doc": {
      "properties": {
        "id": {
          "type": "keyword"
        },
        "movie_name": {
          "type": "text",
          "analyzer": "ik_smart"
        }
      }
    }
  }
}

其中 “index_patterns”: [“movie_test*”], 的含义就是凡是往movie_test开头的索引写入数据时，如果索引不存在，那么es会根据此模板自动建立索引。

在 “aliases” 中用{index}表示，获得真正的创建的索引名。

测试：

Code

POST movie_test_2020xxxx/_doc
{
  "id":"333",
  "name":"zhang3"
}

查看系统中已有的模板清单

Code

1	GET _cat/templates

查看某个模板详情

Code

1
2
3

GET  _template/template_movie2020
或者
GET  _template/template_movie*

JavaAPI操作

maven依赖:

xml

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.5</version>
</dependency>

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpmime</artifactId>
    <version>4.3.6</version>
</dependency>

<dependency>
    <groupId>io.searchbox</groupId>
    <artifactId>jest</artifactId>
    <version>5.3.3</version>
</dependency>

<dependency>
    <groupId>net.java.dev.jna</groupId>
    <artifactId>jna</artifactId>
    <version>4.5.2</version>
</dependency>

<dependency>
    <groupId>org.codehaus.janino</groupId>
    <artifactId>commons-compiler</artifactId>
    <version>2.7.8</version>
</dependency>

<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <versison>6.6.0</version>
</dependency>

单条写入数据

java

import com.ys.bean.Stu;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.Index;

import java.io.IOException;

public class ESWriter {

    public static void main(String[] args) throws IOException {

        //一、创建ES客户端对象
        //1.1 创建ES客户端的工厂对象
        JestClientFactory jestClientFactory = new JestClientFactory();

        //1.2 创建配置信息
        HttpClientConfig config = new HttpClientConfig.Builder("http://hadoop102:9200").build();
        jestClientFactory.setHttpClientConfig(config);

        //1.3 获取客户端对象
        JestClient jestClient = jestClientFactory.getObject();

        //二、写入数据
        //2.1 创建Action对象 --> Index
        Stu stu = new Stu("004", "少爷");
        Index index = new Index.Builder(stu)
                .index("stu_temp_01")
                .type("_doc")
                .id("1004")
                .build();

        //2.2 执行写入数据操作
        jestClient.execute(index);

        //三、关闭资源
        jestClient.shutdownClient();

    }

}

批量写入数据

java

import com.ys.bean.Stu;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.Bulk;
import io.searchbox.core.Index;

import java.io.IOException;

public class ESWriterByBulk {

    public static void main(String[] args) throws IOException {

        //一、创建ES客户端对象
        //1.1 创建ES客户端的工厂对象
        JestClientFactory jestClientFactory = new JestClientFactory();

        //1.2 创建配置信息
        HttpClientConfig config = new HttpClientConfig.Builder("http://hadoop102:9200").build();
        jestClientFactory.setHttpClientConfig(config);

        //1.3 获取客户端对象
        JestClient jestClient = jestClientFactory.getObject();

        //二、批量写入
        //2.1 准备数据
        Stu stu1 = new Stu("008", "麻瓜");
        Stu stu2 = new Stu("009", "海格");

        //2.2 创建Bulk.Builder对象
        Bulk.Builder builder = new Bulk.Builder();

        //2.3 创建Index对象
        Index index1 = new Index.Builder(stu1).id("1008").build();
        Index index2 = new Index.Builder(stu2).id("1009").build();

        //2.4 赋值默认的索引名称及类型名
        builder.defaultIndex("stu_temp_01");
        builder.defaultType("_doc");

        //2.5 添加Index之Bulk
        builder.addAction(index1);
        builder.addAction(index2);

        //2.6 真正构建Bulk对象
        Bulk bulk = builder.build();

        //2.7 执行批量写入数据操作
        jestClient.execute(bulk);

        //3.关闭连接
        jestClient.shutdownClient();

    }
}

读取数据（这里不使用json串，可读性不好）

java

import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.Search;
import io.searchbox.core.SearchResult;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.builder.SearchSourceBuilder;

import java.io.IOException;
import java.util.List;
import java.util.Map;

public class ESReader {

    public static void main(String[] args) throws IOException {

        //1.获取客户端对象
        //1.1 创建ES客户端的工厂对象
        JestClientFactory jestClientFactory = new JestClientFactory();

        //1.2 创建配置信息
        HttpClientConfig config = new HttpClientConfig.Builder("http://hadoop102:9200").build();
        jestClientFactory.setHttpClientConfig(config);

        //1.3 获取客户端对象
        JestClient jestClient = jestClientFactory.getObject();

        //2.读取数据
        //2.0 创建查询条件
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
        boolQueryBuilder.filter(new TermQueryBuilder("class_id", "190218"));
        searchSourceBuilder.query(boolQueryBuilder);

        searchSourceBuilder.from(0);
        searchSourceBuilder.size(2);

        //2.1 创建Search对象
        Search search = new Search.Builder(searchSourceBuilder.toString())
                .addIndex("student")
                .addType("_doc")
                .build();

        //2.2 执行查询操作
        SearchResult searchResult = jestClient.execute(search);

        //2.3 解析searchResult
        System.out.println("查询数据" + searchResult.getTotal() + "条！");
        // [json对应map是常见操作]
        List<SearchResult.Hit<Map, Void>> hits = searchResult.getHits(Map.class);
        for (SearchResult.Hit<Map, Void> hit : hits) {
            Map source = hit.source;
            for (Object key : source.keySet()) {
                System.out.println(hit.id + ":" + key.toString() + ":" + source.get(key).toString());
            }
            System.out.println("*************");
        }

        //3.关闭资源
        jestClient.shutdownClient();
    }

}

java

// Stu.java
public class Stu {

    private String id;
    private String name;

    public Stu() {
    }

    public Stu(String id, String name) {
        this.id = id;
        this.name = name;
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        Stu stu = (Stu) o;

        if (id != null ? !id.equals(stu.id) : stu.id != null) return false;
        return name != null ? name.equals(stu.name) : stu.name == null;
    }

    @Override
    public int hashCode() {
        int result = id != null ? id.hashCode() : 0;
        result = 31 * result + (name != null ? name.hashCode() : 0);
        return result;
    }

    @Override
    public String toString() {
        return "Stu{" +
                "id='" + id + '\'' +
                ", name='" + name + '\'' +
                '}';
    }
}