쿠...sal: [웹] ElasticSearch query 예제

ElasticSearch 쿼리 날리는 법 / aggregation 사용법 / 예제 / 복잡한 쿼리 / 몽고 db 의 aggregation 과 ElasticSearch 의 차이점

index, index type

일단 query 를 위해서 간략한 용어는 익히고 가자. ~~ElasticSearch 에서는 index 가 database 이고 type(index type) 이 table 의 역할을 한다.~~ elasticsearch-sql-cli.bat 를 가지고 sql 을 날려보면 index 는 table 로 보는 것이 맞을 듯 하다. index 는 여러개의 type을 가질 수 있고, 1개의 type 은 1개의 값을 가질 수도 있고, 여러개의 document 를 가질 수 있다.[see also 3] - http://localhost:9200/_plugin/head/

query 를 날릴 때 사용할 tool

기본툴(embedded tool)

http://localhost:9200/_plugin/head/

개인적으로 이녀석이 가장 나은 듯 하다.

gist(gist)

web page 에서 ElasticSearch 로 query 를 직접 작성해서 날릴 수 있다. 개인적으로 애용한다. query 를 여러개 적어놓고, 선택해서 실행하게 되어 있으며, 이전에 날린 query 의 history 도 볼 수 있다.

Sense

이 녀석을 chrome plugin 에서 찾을 수 있는데 이름이 Sense 이다. - Chrome extension Sense : Sense (Beta) - Chrome 웹 스토어

Elastic HQ(http://www.elastichq.org/)

이녀석은 database 의 구조 파악을 할 때 좋다. 일반적인 DB client 를 생각하면 좋을 듯 하다.

Tokenizer test

http://localhost:9200/logstash-2015.04.26/

GET _analyze?analyzer=standard&field=requestHeaders.Host&text=test.com

참고로 index 를 정확히 적어야 한다. * 등은 먹히지 않는다. ### Java 에서 ElasticSearch 사용

query 예제

index 정보

index list

curl http://localhost/_cat/indices

my_index 정보를 볼 때

curl http://localhost/my_index

curl -X GET "localhost:9200/_cat/indices/twi*?v&s=index&pretty"

cat indices | Elasticsearch Guide [6.8] | Elastic

결과 size

query 의 결과는 기본적으로 size 가 10 이기 때문에 더 많은 결과를 보려면 큰 숫자를 넣던지 search_type 을 scan 으로 사용하라고 한다. 아래를 참고하자. - elasticsearch query to return all records - Stack Overflow

"term" 또는 "field" 를 사용해서 filter 를 적용할 수 있다.term : Term Query

GET _search
{
    "size" : 5,
   "query": {
      "term": {
          "ip":"10.10.224.6"
      }
   
   }
}

"field" 는 명시한 field 만 결과로 보여준다. field 로 filtered 된 녀석은 [](array) 로 표시된다. 예를 들면

"host": "daum.net"

의 값이 "fields" : ["host"] 를 지정하면

"host": ["daum.net"]

으로 결과가 넘어온다.

GET _search
{
    "from":0,
    "size":50,
    "sort":{"_score":{"order":"desc"}},
    "fields":["@version"],
    "explain":true
}

query.match

GET _search
{
    "from":0,
    "size":50,
    "query": {
        "match" : {
            "host" : "new-ats-edge01"
        }
    }
}

query.fuzzy

fuzzy 는 결과를 Levenshtein distance 순으로 순서를 매겨서 return 해준다.주의할 점은 전혀 관계없는 결과가 나올 수도 있다. 정확도가 낮은 녀석은 아예 원하는 검색내용이 없는 경우도 있다. fuzziness 를 조절하면 괜찮을 수도 있다.

filter 사용예제

search - elasticsearch query dates by range - Stack Overflow : 날짜 범위에 해당하는 item 찾기

원하는 string 검색

Partial Matching

여러조건 검색

GET uh-as-202107/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "id": "4220"
          }
        },
        {
          "range": {
            "session.time": {
              "gte": 1626102000000,
              "lte": 1626188386000
            }
          }
        },
        {
          "range": {
            "session.created_at": {
              "gte": 1626015600000,
              "lte": 1626101999000
            }
          }
        }
      ]
    }
    
  }
}

aggregation

Mongodb 의 aggregation 과의 차이

기본적으로 aggregation 에서 query 가 설정되지 않으면 query : match_all 을 한 것으로 가정한다. - Scoping Aggregations

mongo db 에서 aggregation 은 기본적으로 $match 를 제공한다. 그래서 match 된 녀석에서 $group, $sort 등을 진행하도록 되어 있다. 그런데 ElasticSearch 의 aggregation 은 정말 aggregation 하는 부분만 놔두고, mongodb 의 $match 등의 부분은 기존의 query 를 이용하도록 해놨다. 그러므로 만약 aggregation 의 범위를 제한하고 싶다면 query 를 이용하자.

date_histogram aggregation 에서 document 가 없는 날짜를 숫자 0 으로 채우는 방법

Elasticsearch date histogram aggregation - filling in the empty buckets | Sean McGary

"min_doc_count" : 0 와 "extended_bounds"을 이용하면 된다.

"extended_bounds\": {
    "min": 1426498200000,
    "max": 1426498560000
}

aggregation 예제들

terms 관련 주의할 점

aggregation 의 terms 는 다르게 field 의 index 가 analyzed 라면 analyzed 된 값으로 grouping 을 한다. 일반적으로 생각하는 grouping 을 위해서는 non_analyzed 값으로 된 index 를 추가로 생성해야 한다.[ref. 4]

GET /cars/transactions/_search?search_type=count
{
    "aggs" : {
        "colors" : {
            "terms" : {
              "field" : "color"
            }
        }
    }
}

GET /cars/transactions/_search?search_type=count
{
    "query" : {
        "match_all" : {}
    },
    "aggs" : {
        "colors" : {
            "terms" : {
              "field" : "color"
            }
        }
    }
}

default 는 query_then_fetch

GET _search?search_type=query_then_fetch
{...}

GET _search?search_type=count
{
    "aggregations": {
        "min_price": {
          "terms": {
            "field": "destPort"
          }
        }
  }
}

grouping, termsmy_aggregate 이란 이름으로 aggregations 을 하는데, event_timestamp field 로 grouping 한 결과를 보여줘라

GET _search?search_type=count
{
    "aggregations": {
       "my_aggregate" : {
           "terms" : {
               "field" : "event_timestamp"
            }
        }
    }
}

my_aggregate 이란 이름으로 aggregations 을 하는데, event_timestamp field 의 범위가 from 부터 to 까지이다.

GET _search?search_type=count
{
    "aggregations": {
       "my_aggregate" : {
       
            "range":{
                "field" : "event_timestamp",
                "ranges": [
                    {"from":"2015-03-16T09:29:47.000Z",
                    "to": "2015-03-16T09:31:47.000Z"}
                ]
            }
        }
    }
}

Range Aggregationmy_aggregate 이란 이름으로 aggregations 을 하는데, event_timestamp field 의 범위가 from 부터 to 까지이다. 그리고 각각의 range 의 개수(value_count) 를 구해라

GET _search?search_type=count
{
    "aggs": {
       "aggs_stats" : {
            "range":{
                "field" : "event_timestamp",
                "ranges": [
                    {"from":"2015-03-16T09:31:47.000Z",
                    "to": "2015-03-16T09:31:47.000Z"},
                    {"from":"2015-03-16T09:29:47.000Z",
                    "to": "2015-03-16T09:31:47.000Z"}
                ]
            },
            "aggs" : {
                "time_count" : {
                    "value_count" : { "field" : "event_timestamp" }
                }
            }
        }
    }
}

aggs_stats 로 aggregation 한다. event_timestamp 로 grouping 한 결과를 보여줘라.

GET _search?search_type=count
{
    "aggs": {
       "aggs_stats" : {
           "terms":{
             "field" : "event_timestamp"
           }
       }
    }
}

aggs_stats 라는 이름으로 aggregation 한다. event_timestamp 라는 이름의 field 를 date histogram 한다. interval(간격) 은 day 이다. 그리고 doc 0 인 녀석도 보여줘라(Date Histogram Aggregation)

GET _search?search_type=count
{
    "aggs": {
       "aggs_stats" : {
           "date_histogram":{
             "field" : "event_timestamp",
             "interval" : "day"
             "min_doc_count" : 0
           }
       }
    }
}

Host 가 localhost.com 인 녀석들을 1분 단위 시간으로 묶어서(grouping) 해서 보여줘라

GET _search?search_type=count
{
    "from":0,
    "size":50,

    "aggs":{
   
        "aaa" : {
            "filter": {
                "term": {
                   "Host": "localhost.com"
                }
            },
            "aggs": {
                "bbb" : {
                    "date_histogram":{
                        "field" : "event_timestamp",
                        "interval" : "1m"
                    }
                }
            }
        }
    }
}

event_timestamp 의 field 를 interval 을 1분(1m) 으로 해서 grouping 하고, 그 결과에서 auditLogTrailer.messages.severity 가 CRITICAL 인 녀석을 찾아라

GET _search?search_type=count
{
    "from": 0,
    "size": 50,
    "query": {
        "match": {
            "auditLogTrailer.messages.severity": "CRITICAL"
        }
    },
    "aggs": {
        "aaa": {
            "date_histogram": {
                "field": "event_timestamp",
                "interval": "1m"
            }
        }
    }
}

GET _search?search_type=count
{
    "from": 0,
    "size": 50,
    "query": {
        "match": {
            "auditLogTrailer.messages.severity": "CRITICAL"
        }
    },
    "aggs": {
        "aaa" : {
            "filter": {
                "term": {
                   "Host": "localhost.com"
                }
            },
            "aggs": {
                "bbb" : {
                    "date_histogram":{
                        "field" : "event_timestamp",
                        "interval" : "1m"
                    }
                }
            }
        }
    }
}

Host 가 localhost.com 인 녀석들 중에서 auditLogTrailer.messages.severity 이 CRITICAL 인 녀석을 가져온다. 그리고range 범위의 data 를 가져와서 1분단위로 grouping 한다.

GET _search?search_type=count
{
    "from": 0,
    "size": 50,
    "query": {
        "filtered": {
            "query": {
                "match": {
                    "auditLogTrailer.messages.severity": "CRITICAL"
                }
            },
            "filter": {
                "term": {
                    "Host": "localhost.com"
                }
            }
        }
    },
    "aggs": {
        "aaa": {
            "range": {
                "field": "event_timestamp",
                "ranges": [
                    {
                        "from": "2015-03-16T09:30:00.000Z",
                        "to": "2015-03-16T09:31:00.000Z"
                    }
                ]
            },
            "aggs": {
                "bbb": {
                    "date_histogram": {
                        "field": "event_timestamp",
                        "interval": "1m"
                    }
                }
            }
        }
    }
}

Host 가 localhost.com 이고, event_timestamp 의 range 가 from ~ to 인 녀석에서 CRITICAL 인 녀석을 query 해서 찾는다.여기서 나온 결과를 event_timestamp 의 간격을 1분으로 grouping 한다.

GET _search?search_type=count
{
    "from": 0,
    "size": 50,
    "query": {
        "filtered": {
            "query": {
                "match": {
                    "auditLogTrailer.messages.severity": "CRITICAL"
                }
            },
            "filter": {
                "and": {
                    "filters": [
                        {
                            "term": {
                                "Host": "localhost.com"
                            }
                        },
                        {
                            "range": {
                                "event_timestamp": {
                                    "from": "2015-03-16T09:30:00.000Z",
                                    "to": "2015-03-16T09:31:00.000Z"
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs": {
        "aaa": {
            "date_histogram": {
                "field": "event_timestamp",
                "interval": "1m"
            }
        }
    }
}

Post filter빨간색 이면서 gucci 인 녀석을 query 할 때

curl -XGET localhost:9200/shirts/_search -d '
{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            { "term": { "color": "red"   }},
            { "term": { "brand": "gucci" }}
          ]
        }
      }
    }
  }
}

Host 가 localhost.com 이면서 event timestamp 의 range 가 from 부터 to 인 을 가져온다. 이 녀석을 가지고 2개의 aggregation 을 수행한다.(activity_timeline, severity)activity_timeline 은event_timestamp 을 1m(1분) 간격으로 묶고, 그 묶은 각각의 결과내에서 severity 별로 다시 묶는다.(aggregation)severity 는severity 별로 grouping 을 한다.

GET _search?search_type=count
{
    "from": 0,
    "size": 50,
    "query": {
        "filtered": {
            "filter": {
                "and": {
                    "filters": [
                        
                        {
                            "range": {
                                "event_timestamp": {
                                    "from": "2015-03-16T09:29:00.000Z",
                                    "to": "2015-03-16T09:32:00.000Z"
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs": {
        "activity_timeline": {
            "date_histogram": {
                "field": "event_timestamp",
                "interval": "5m",
                "min_doc_count": 0,
                "extended_bounds": {
                    "min": 1423906200000,
                    "max": 1426498200000
                }
            },
            "aggs": {
                "bbb": {
                    "terms": {
                        "field": "auditLogTrailer.messages.severity"
                    }
                }
            }
        },
        "severity_count":{
            "terms": {
                "field": "auditLogTrailer.messages.severity"
            }
        }
    }
}

특정 필드만 가져와라(Fields)

GET _search
{
    "from": 0,
    "size": 50,
    "fields": ["requestHeaders.Host", "auditLogTrailer.messages.msg",
    "event_date_milliseconds","sourceIp", "auditLogTrailer.messages.severity"
    ],
    "query" : { "match_all": {}}
}

fields 와 비슷하지만, wildcard 가 가능하다. 그리고 leaf node 가 아니라도 된다.

GET _search
{
    
    "query" : { "match_all": {}},
    "partial_fields" : {
        "my_data" : {
            "include" : ["auditLogTrailer.*","requestHeaders", "uniqueId", "event_date_milliseconds", "sourceIp", "rawSectionH"],
            "exclude" : "modsecSeverities.*"
        }
    }
}

filter 적용(exists, Missing Filter)

GET _search
{
    "from": 0,
    "size": 50,
    "filter": {
        "exists" : { "field" : "auditLogTrailer.messages.tag" }
    },
    "sort": [
        {
            "event_date_milliseconds": {
                "order": "desc"
            }
        }
    ],
    "query": {
        "match_all": {}
    },
    "partial_fields": {
        "data": {
            "include": [
                "auditLogTrailer.messages.tag",
                "auditLogTrailer.messages.severity",
                "requestHeaders.Host",
                "uniqueId",
                "event_date_milliseconds",
                "sourceIp",
                "rawSectionH"
            ]
        }
    }
}

특정필드만 가져오고, 거기서 특정 조건에 맞는 녀석을 가져와라

GET _search
{
    "from": 0,
    "size": 50,
    "fields": [
        "requestHeaders.Host",
        "auditLogTrailer.messages.msg",
        "event_date_milliseconds",
        "sourceIp",
        "auditLogTrailer.messages.severity"
    ],
    "query": {
        "filtered": {
            "filter": {
                "and": {
                    "filters": [
                        {
                            "term": {
                                "event_date_milliseconds": 1426498259000.208
                            }
                        }
                    ]
                }
            }
        }
    }
}

특정 field 에 대해 비슷한 text (like_text) 가 있으면 가져와라.주의할 점은 fuzzy_like_this 는 text field 여야 한다. numeric field 라면 Exception 이 발생한다.

GET _search
{
    "from": 0,
    "size": 50,
    "fields": [
        "requestHeaders.Host",
        "auditLogTrailer.messages.msg",
        "auditLogTrailer.messages.tag",
        "event_date_milliseconds",
        "sourceIp",
        "auditLogTrailer.messages.severity"
    ],
    "query": {
        "fuzzy_like_this": {
            "fields": [
                "requestHeaders.Host",
                "auditLogTrailer.messages.msg",
                "sourceIp",
                "auditLogTrailer.messages.severity",
                "auditLogTrailer.messages.tag"
            ],
            "like_text": "xss",
            "max_query_terms": 12
        }
    }
}

uniqueId 가 "248cF1AmidgVjcxc8BAcAcAc" 인 녀석을 찾고, 그중 1개만 돌려줘라

GET _search
{
    "from":0,
    "size":1,
   
    "query":{
        "match":{
            "uniqueId" : "248cF1AmidgVjcxc8BAcAcAc"
        }
    }
}

GET _search?search_type=count
{
    "from": 0,
    "size": 50,
    "query": {
        "filtered": {
            "query": {
                "match": {
                    "auditLogTrailer.messages.severity": "CRITICAL"
                }
            },
            "filter": {
                "and": {
                    "filters": [
                        {
                            "range": {
                                "event_timestamp": {
                                    "from": "2015-03-16T09:30:00.000Z",
                                    "to": "2015-03-16T09:31:00.000Z"
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs": {
        "aaa": {
            "stats": {
                "field": "event_timestamp"
            }
        }
    }
}

severity 가 CRITICAL 인 녀석들을 event_timestamp 로 grouping 하고, 그중 가장 최근녀석 1개를 가져와라.

GET _search
{
    "from": 0,
    "size": 50,
    "query": {
        "filtered": {
            "query": {
                "match": {
                    "auditLogTrailer.messages.severity": "CRITICAL"
                }
            }
        }
    },
    "aggs": {
        "top-tags": {
            "terms":{
                "field": "event_timestamp",
                "size" : 1,
                "order" : {"_term" : "desc"}
            }
        }
    }
}

event_timestamp 로 정렬하고, 그중 CRITICAL 가져와라. 가져올 때 field 는 [...] 만 가져와라

GET _search
{
    "from": 0,
    "size": 5,
    "sort": [
       {
          "event_timestamp": {
             "order": "desc"
          }
       }
    ],
    "fields": ["requestHeaders.Host", "auditLogTrailer.messages.msg",
    "event_date_milliseconds","sourceIp", "auditLogTrailer.messages.severity", "auditLogTrailer.messages.id", "event_date_milliseconds"],
    "query": {
        "filtered": {
            "query": {
                "match": {
                    "auditLogTrailer.messages.severity": "CRITICAL"
                }
            }
        }
    }
}

data 의 date range 를 정하고 그 것들을 ascending order 로 정렬한다.이 값들을 aggs 하고, 이렇게 나온 결과부분중에 일정 field 만 가져온다.

GET _search
{
    "sort": [
       {
          "event_date_milliseconds": {
             "order": "asc"
          }
       }
    ],
   
    "query": {
        "filtered": {
           
            "filter": {
                "and": {
                    "filters": [
                        {
                            "range": {
                                "event_timestamp": {
                                    "from": "2015-03-16T09:25:00.000Z",
                                    "to": "2015-03-16T09:30:00.000Z"
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs": {
        "data": {
            "date_histogram": {
                "field": "event_timestamp",
                "interval": "5m"
            },
            "aggs": {
                "by_host": {
                    "terms": {
                        "field": "requestHeaders.Host.raw"
                    },
                    "aggs": {
                        "by_host_ip": {
                            "terms": {
                                "field": "sourceIp"
                            },
                            "aggs":{
                                "by_host_ip_tag": {
                                    "terms": {
                                        "field": "auditLogTrailer.messages.tag.raw"
                                    }
                                }
                               
                            }
                        }
                    }
                }
            }
        }
    },
    "partial_fields": {
       "data": {
          "include": ["auditLogTrailer.messages.tag",
          "auditLogTrailer.messages.severity",
          "auditLogTrailer.messages.msg",
          "sourceIp",
          "requestHeaders.Host",
          "event_date_milliseconds"]
       }
    }
}

aggregation 한 결과에 대한 다른 field 로 같이 뽑고 싶을 때는 top_hits 를 이용하면 된다.

GET _search?search_type=count
{
    "sort": [
       {
          "event_date_milliseconds": {
             "order": "asc"
          }
       }
    ], 
    
    "query": {
        "filtered": {
            
            "filter": {
                "and": {
                    "filters": [
                        {
                            "range": {
                                "event_timestamp": {
                                    "from": "2015-03-16T09:25:00.000Z",
                                    "to": "2015-03-16T09:30:00.000Z"
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs": {
        "data": {
            "date_histogram": {
                "field": "event_timestamp",
                "interval": "5m"
            },
            "aggs": {
                "by_host": {
                    "terms": {
                        "field": "requestHeaders.Host.raw"
                    },
                    "aggs": {
                        "by_host_ip": {
                            "terms": {
                                "field": "sourceIp"
                            },
                            "aggs":{
                                "by_host_ip_tag": {
                                    "terms": {
                                        "field": "auditLogTrailer.messages.tag.raw"
                                        
                                    },
                                    "aggs" : {
                                        "top_tag_hits" : {
                                            "top_hits" : {
                                                "_source": {
                                                    "include": [
                                                        "auditLogTrailer.messages.severity",
                                                        "auditLogTrailer.messages.msg"
                                                    ]
                                                }
                                            }
                                        }
                                    }
                                    
                                }
                                
                            }
                        }
                    }
                }
            }
        }
    }
}

auditLogTrailer.messages 라는 field 가 있는 녀석만 보여줘라.(Exists Filter)

GET _search
{
    "filter": {
        "exists": {
           "field": "auditLogTrailer.messages"
        }
    }
}

multi_match 사용

GET _search
{
  "from":10,
  "size": 20,
  "sort" : [ {
    "event_date_milliseconds" : {
      "order" : "desc"
    }
  } ],
  "query" : {
    "filtered" : {
      "query" : {
        "multi_match" : {
          "fields" : [ 
              "requestHeaders.Host",
              "auditLogTrailer.messages.msg",
              "sourceIp",
              "auditLogTrailer.messages.severity",
              "auditLogTrailer.messages.tag" ],
          "query" : "com"
          
        }
      },
      "filter" : {
        "exists" : {
          "field" : "auditLogTrailer.messages.tag"
        }
      }
    }
  }
  
}

multi_match 와 phrase_prefix이 경우에 시작부분 부터 matching 이 맞는 경우만 결과로 return 한다. 만약 "abc.de.fg.com" 이라는 string 이 tokenizer 를 해도 "abc.de.fg.com" 인 경우에 fg 로 검색을 한다면, phrase_prefix 는 아무것도 return 해 주지 않는다.이 경우 wildcard 나 regex 를 쓰는 것이 나은 듯 하다.

{
    "from": 0,
    "size": 50,
    "sort": [
        {
            "event_date_milliseconds": {
                "order": "desc"
            }
        }
    ],
    "query": {
        "filtered": {
            "query": {
                "multi_match": {
                    "fields": [
                        "requestHeaders.Host",
                        "auditLogTrailer.messages.msg",
                        "sourceIp",
                        "auditLogTrailer.messages.severity",
                        "auditLogTrailer.messages.tag"
                    ],
                    "query": "local",
                    "type" : "phrase_prefix"
                }
            },
            "filter": {
                "exists": { "field" : "auditLogTrailer.messages.tag" }
            }
        }
    },
    "partial_fields": {
        "data": {
            "include": [
                "auditLogTrailer.messages.tag",
                "auditLogTrailer.messages.severity",
                "requestHeaders.Host",
                "uniqueId",
                "event_date_milliseconds",
                "sourceIp",
                "rawSectionH"
            ]
        }
    }
}

filter 와 wildcard query 사용

GET_search{
    "query": {
        "filtered": {
            "query": {
                "query_string": {
                    "fields": [
                        "requestHeaders.Host",
                        "rawSectionH"
                    ],
                    "query": "*com*"
                }
            },
            "filter": {
                "and": {
                    "filters": [
                        {
                            "exists": {
                                "field": "auditLogTrailer.messages.tag"
                            }
                        },
                        {
                            "term": {
                                "requestHeaders.Host": "testdomain.com"
                            }
                        },
                        {
                            "regexp": {
                                "sourceIp": "[0-9.]*10.10.100.161[0-9.]*"
                            }
                        },
                        {
                            "terms": {
                                "auditLogTrailer.messages.severity": [
                                    "error"
                                ]
                            }
                        },
                        {
                            "regexp": {
                                "auditLogTrailer.messages.tag.raw": ".*WASC-13.*"
                            }
                        },
                        {
                            "range": {
                                "event_date_milliseconds": {
                                    "from": 1430006739000,
                                    "to": 1430006765000
                                }
                            }
                        }
                    ]
                }
            }
        }
    }
}

쿠...sal

[웹] ElasticSearch query 예제