ANALYZE導(dǎo)致的阻塞問(wèn)題分析

小仙女本仙人 2022-03-06

展開(kāi)全文

背景

問(wèn)題描述

DBA同學(xué)收到qps大量下降的告警，qps從2w下降到1w，然后又自動(dòng)恢復(fù)了。
基于Analysis Report信息，發(fā)現(xiàn)有很多 STATE:Waiting for table flush 的狀態(tài)，因此我們分析了問(wèn)題期間的snapshot，將有問(wèn)題的狀態(tài)羅列整理如下：

0. Server version: 5.7.23-ctrip-log MySQL Community Server - (GPL)
1. qps 下降的比較多
2. thread_running 比較多
3. STATE:Waiting for table flush 比較多
4. Waiting for table flush 的表： 同一個(gè)表
5. 此時(shí)間段有定時(shí)任務(wù)： Analyze table 操作

基于以上，我們很快便懷疑和Analyze table 有關(guān)

為什么要Analyze table

當(dāng)優(yōu)化器決定選擇使用哪個(gè)索引來(lái)查詢時(shí)，它將使用該表存儲(chǔ)引擎提供的統(tǒng)計(jì)信息。如果統(tǒng)計(jì)信息不是最新的，那么優(yōu)化器在創(chuàng)建執(zhí)行計(jì)劃時(shí)可能會(huì)選擇錯(cuò)誤的索引，這會(huì)導(dǎo)致性能下降。

為了防止這種情況發(fā)生，存儲(chǔ)引擎支持自動(dòng)和手動(dòng)的統(tǒng)計(jì)信息更新。雖然自動(dòng)統(tǒng)計(jì)信息更新通常工作得很好，但在某些情況下它們不能正確地完成工作。

MySQL對(duì)于索引信息統(tǒng)計(jì)是通過(guò)采樣統(tǒng)計(jì)的方法，相比整表一條條記錄采集來(lái)說(shuō)花費(fèi)的代價(jià)小很多。采樣統(tǒng)計(jì)的時(shí)候，InnoDB默認(rèn)使用N個(gè)page（默認(rèn)16k大?。?，統(tǒng)計(jì)其中的不同值，然后取平均數(shù)，最后乘以這個(gè)索引對(duì)應(yīng)的頁(yè)面數(shù)，最后得到索引的基數(shù)（cardinality）。

1）自動(dòng)更新：

當(dāng)innodb_stats_persistent=ON，表示統(tǒng)計(jì)信息存儲(chǔ)到磁盤(pán)，算法如下：采樣統(tǒng)計(jì)20個(gè)16k page(innodb_stats_persistent_sample_pages)，當(dāng)更新的記錄數(shù)超過(guò)1/10的時(shí)候，自動(dòng)更新統(tǒng)計(jì)信息；

當(dāng)innodb_stats_persistent=OFF，表示統(tǒng)計(jì)信息存儲(chǔ)到內(nèi)存，算法如下：采樣統(tǒng)計(jì)8個(gè)16k page(innodb_stats_transient_sample_pages)，當(dāng)更新的記錄數(shù)超過(guò)1/16的時(shí)候，自動(dòng)更新統(tǒng)計(jì)信息；

2）手動(dòng)更新：

自己寫(xiě)腳本，定期調(diào)用Analyze table

3）綜合考慮：

我們選擇的方法，就是定期更新統(tǒng)計(jì)信息。原因如下：

a）可控制低峰期執(zhí)行  
b）自動(dòng)更新統(tǒng)計(jì)信息，如果沒(méi)有達(dá)到觸發(fā)條件，則一直不會(huì)更新

問(wèn)題重現(xiàn)與分析

問(wèn)題重現(xiàn)

時(shí)間	conn1	conn2	conn3
1		show open tables like 'mychecksums' 【In_use=0】
2		ANALYZE NO_WRITE_TO_BINLOG TABLE mychecksums;【成功】
3			select * from mychecksums limit 1;【成功】
4	SELECT SLEEP(100) FROM mychecksums;【進(jìn)入sleep】
5		show open tables like 'mychecksums' 【In_use=1】
6		ANALYZE NO_WRITE_TO_BINLOG TABLE mychecksums;【成功】
7			select * from mychecksums limit 1;【阻塞】
8	【KILL】		查詢結(jié)果【成功】
9		ANALYZE NO_WRITE_TO_BINLOG TABLE mychecksums;【成功】
10	SELECT SLEEP(100) FROM mychecksums;【進(jìn)入sleep】
11		show open tables like 'mychecksums' 【In_use=1】
12			select * from mychecksums limit 1;【成功】

上述模擬表明：

Analyze table mychecksums之前，如果mychecksums table沒(méi)有被打開(kāi)使用（m_open_in_progress=0），那么后續(xù)query執(zhí)行成功。
Analyze table mychecksums之前，如果mychecksums table已經(jīng)被打開(kāi)使用了（m_open_in_progress=1），那么后續(xù)query會(huì)被阻塞。
- 如果將之前引用該表的conn kill掉，那么被阻塞的query立刻執(zhí)行成功
Analyze table mychecksums之后，如果mychecksums table被打開(kāi)使用了（m_open_in_progress=1），那么后續(xù)query執(zhí)行成功。

阻塞堆棧

conn3 select * from mychecksums limit 1

Thread 22 (Thread 0x7f4cbc246700 (LWP 2981)):   conn3 select * from mychecksums limit 1
#0  0x00007f4d870c4cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000c1bf59 in native_cond_timedwait (abstime=0x7f4cbc242700, mutex=0x7f1c27005098, cond=0x7f1c270050c8) at xx/mysql-5.7.23/mysql-5.7.23/include/thr_cond.h:129
#2  my_cond_timedwait (abstime=0x7f4cbc242700, mp=0x7f1c27005098, cond=0x7f1c270050c8) at xx/mysql-5.7.23/mysql-5.7.23/include/thr_cond.h:182
#3  inline_mysql_cond_timedwait (src_file=0x142f2a8 "xx/mysql-5.7.23/mysql-5.7.23/sql/mdl.cc", src_line=1861, abstime=<optimized out>, mutex=0x7f1c27005098, that=<optimized out>) at xx/mysql-5.7.23/mysql-5.7.23/include/mysql/psi/mysql_thread.h:1229
#4  MDL_wait::timed_wait (this=this@entry=0x7f1c27005098, owner=owner@entry=0x7f1c27005000, abs_timeout=abs_timeout@entry=0x7f4cbc242700, set_status_on_timeout=set_status_on_timeout@entry=true, wait_state_name=<optimized out>) at xx/mysql-5.7.23/mysql-5.7.23/sql/mdl.cc:1861
#5  0x0000000000d6d5b4 in TABLE_SHARE::wait_for_old_version (this=this@entry=0x7f1c270b5130, thd=thd@entry=0x7f1c27005000, abstime=abstime@entry=0x7f4cbc242700, deadlock_weight=deadlock_weight@entry=0) at xx/mysql-5.7.23/mysql-5.7.23/sql/table.cc:4611
#6  0x0000000000c71e71 in tdc_wait_for_old_version (deadlock_weight=0, wait_timeout=31536000, table_name=<optimized out>, db=<optimized out>, thd=0x7f1c27005000) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_base.cc:2927
#7  open_table (thd=thd@entry=0x7f1c27005000, table_list=table_list@entry=0x7f1c2713b578, ot_ctx=ot_ctx@entry=0x7f4cbc242bf0) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_base.cc:3517
#8  0x0000000000c79402 in open_and_process_table (flags=0, ot_ctx=0x7f4cbc242bf0, has_prelocking_list=false, prelocking_strategy=0x7f4cbc242c80, counter=0x7f1c27007240, tables=0x7f1c2713b578, lex=<optimized out>, thd=0x7f1c27005000) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_base.cc:5144
#9  open_tables (thd=thd@entry=0x7f1c27005000, start=start@entry=0x7f4cbc242c78, counter=<optimized out>, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7f4cbc242c80) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_base.cc:5755
#10 0x0000000000c79d5a in open_tables_for_query (thd=thd@entry=0x7f1c27005000, tables=tables@entry=0x7f1c2713b578, flags=flags@entry=0) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_base.cc:6530
#11 0x00000000007769fb in execute_sqlcom_select (thd=thd@entry=0x7f1c27005000, all_tables=0x7f1c2713b578) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_parse.cc:5099
#12 0x0000000000ccc3c4 in mysql_execute_command (thd=thd@entry=0x7f1c27005000, first_level=first_level@entry=true) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_parse.cc:2814
#13 0x0000000000cce9dd in mysql_parse (thd=thd@entry=0x7f1c27005000, parser_state=parser_state@entry=0x7f4cbc243d00) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_parse.cc:5554
#14 0x0000000000ccf55d in dispatch_command (thd=thd@entry=0x7f1c27005000, com_data=com_data@entry=0x7f4cbc2445a0, command=COM_QUERY) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_parse.cc:1484
#15 0x0000000000cd0f7f in do_command (thd=thd@entry=0x7f1c27005000) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_parse.cc:1025
#16 0x0000000000d93030 in handle_connection (arg=arg@entry=0x7f24d193d820) at xx/mysql-5.7.23/mysql-5.7.23/sql/conn_handler/connection_handler_per_thread.cc:308
#17 0x000000000127a504 in pfs_spawn_thread (arg=0x7f24d195dd20) at xx/mysql-5.7.23/mysql-5.7.23/storage/perfschema/pfs.cc:2444
#18 0x00007f4d870c0e25 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f4d85b7d34d in clone () from /lib64/libc.so.6

根因分析

復(fù)現(xiàn)這個(gè)問(wèn)題，最主要的幾點(diǎn)因素是：

1. 長(zhǎng)事務(wù)
2. 并發(fā)事務(wù)
3. 在長(zhǎng)事務(wù)結(jié)束之前，跑analyze table 

無(wú)論長(zhǎng)事務(wù)還是并發(fā)事務(wù)，都是為了讓 m_open_in_progress=1 的概率提高，產(chǎn)生鎖的概率就提高了。 

我們這次遇到的問(wèn)題，就是并發(fā)事務(wù)多造成的

過(guò)程

1. 打開(kāi)一張表，讓其m_open_in_progress=1
2. Analyze table mychecksums
    2.1 打開(kāi)并修改表的統(tǒng)計(jì)信息
    2.2 tdc_remove_table： 讓該表從table definition cache中移除，讓其失效。移除之前，必須等待之前的引用標(biāo)記結(jié)束。
        2.2.1 此時(shí)，如果后續(xù)有新的query來(lái)訪問(wèn)該表，其內(nèi)存對(duì)象版本不一致，導(dǎo)致等待，阻塞發(fā)生。（mysql_admin_table-->tdc_remove_table::execute
）（REPAIR TABLE，CHECK TABLE同理）
        2.2.2 移除成功
        2.2.3 之后訪問(wèn)不受影響
    2.3 query_cache.invalidate(thd, table, FALSE)

相關(guān)代碼

sql/sql_admin.cc

     if (table->table)
    {
      if (table->table->s->tmp_table)
      {
        /*
          If the table was not opened successfully, do not try to get
          status information. (Bug#47633)
        */
        if (open_for_modify && !open_error)
          table->table->file->info(HA_STATUS_CONST);
      }
      else if (open_for_modify || fatal_error)
      {
        tdc_remove_table(thd, TDC_RT_REMOVE_UNUSED,
                         table->db, table->table_name, FALSE);
        /*
          May be something modified. Consequently, we have to
          invalidate the query cache.
        */
        table->table= 0;                        // For query cache
        query_cache.invalidate(thd, table, FALSE);
      }

sql/sql_base.cc

/**
   Remove all or some (depending on parameter) instances of TABLE and
   TABLE_SHARE from the table definition cache.
   @param  thd          Thread context
   @param  remove_type  Type of removal:
                        TDC_RT_REMOVE_ALL     - remove all TABLE instances and
                                                TABLE_SHARE instance. There
                                                should be no used TABLE objects
                                                and caller should have exclusive
                                                metadata lock on the table.
                        TDC_RT_REMOVE_NOT_OWN - remove all TABLE instances
                                                except those that belong to
                                                this thread. There should be
                                                no TABLE objects used by other
                                                threads and caller should have
                                                exclusive metadata lock on the
                                                table.
                        TDC_RT_REMOVE_UNUSED  - remove all unused TABLE
                                                instances (if there are no
                                                used instances will also
                                                remove TABLE_SHARE).
                        TDC_RT_REMOVE_NOT_OWN_KEEP_SHARE -
                                                remove all TABLE instances
                                                except those that belong to
                                                this thread, but don't mark
                                                TABLE_SHARE as old. There
                                                should be no TABLE objects
                                                used by other threads and
                                                caller should have exclusive
                                                metadata lock on the table.
   @param  db           Name of database
   @param  table_name   Name of table
   @param  has_lock     If TRUE, LOCK_open is already acquired
   @note It assumes that table instances are already not used by any
   (other) thread (this should be achieved by using meta-data locks).
*/

void tdc_remove_table(THD *thd, enum_tdc_remove_table_type remove_type,
                      const char *db, const char *table_name,
                      bool has_lock)
{
  char key[MAX_DBKEY_LENGTH];
  size_t key_length;
  TABLE_SHARE *share;

  if (! has_lock)
    table_cache_manager.lock_all_and_tdc();
  else
    table_cache_manager.assert_owner_all_and_tdc();

  DBUG_ASSERT(remove_type == TDC_RT_REMOVE_UNUSED ||
              thd->mdl_context.owns_equal_or_stronger_lock(MDL_key::TABLE,
                                 db, table_name, MDL_EXCLUSIVE));

  key_length= create_table_def_key(thd, key, db, table_name, false);

  if ((share= (TABLE_SHARE*) my_hash_search(&table_def_cache,(uchar*) key,
                                            key_length)))
  {
    /*
      Since share->ref_count is incremented when a table share is opened
      in get_table_share(), before LOCK_open is temporarily released, it
      is sufficient to check this condition alone and ignore the
      share->m_open_in_progress flag.
      Note that it is safe to call table_cache_manager.free_table() for
      shares with m_open_in_progress == true, since such shares don't
      have any TABLE objects associated.
    */
    if (share->ref_count)
    {
      /*
        Set share's version to zero in order to ensure that it gets
        automatically deleted once it is no longer referenced.
        Note that code in TABLE_SHARE::wait_for_old_version() assumes
        that marking share as old and removal of its unused tables
        and of the share itself from TDC happens atomically under
        protection of LOCK_open, or, putting it another way, that
        TDC does not contain old shares which don't have any tables
        used.
      */
      if (remove_type != TDC_RT_REMOVE_NOT_OWN_KEEP_SHARE)
        share->version= 0;
      table_cache_manager.free_table(thd, remove_type, share);
    }
    else
    {
      DBUG_ASSERT(remove_type != TDC_RT_REMOVE_NOT_OWN_KEEP_SHARE);
      (void) my_hash_delete(&table_def_cache, (uchar*) share);
    }
  }

  if (! has_lock)
    table_cache_manager.unlock_all_and_tdc();
}

sql/sql_base.cc

/**
  Check if table's share is being removed from the table definition
  cache and, if yes, wait until the flush is complete.
  @param thd             Thread context.
  @param db              Database name.
  @param table_name      Table name.
  @param wait_timeout    Timeout for waiting.
  @param deadlock_weight Weight of this wait for deadlock detector.
  @retval false   Success. Share is up to date or has been flushed.
  @retval true    Error (OOM, our was killed, the wait resulted
                  in a deadlock or timeout). Reported.
*/

static bool tdc_wait_for_old_version(THD *thd, const char *db,
                                     const char *table_name, ulong wait_timeout,
                                     uint deadlock_weight) {
  TABLE_SHARE *share;
  bool res = false;

  mysql_mutex_lock(&LOCK_open);
  if ((share = get_cached_table_share(db, table_name)) &&
      share->has_old_version()) {
    struct timespec abstime;
    set_timespec(&abstime, wait_timeout);
    res = share->wait_for_old_version(thd, &abstime, deadlock_weight);
  }
  mysql_mutex_unlock(&LOCK_open);
  return res;
}

sql/sql_base.h

/**
  Get an existing table definition from the table definition cache.
  Search the table definition cache for a share with the given key.
  If the share exists, check the m_open_in_progress flag. If true,
  the share is in the process of being opened by another thread,
  so we must wait for the opening to finish. This may make the share
  be destroyed, if open_table_def() fails, so we must repeat the search
  in the hash table. Return the share.
  @note While waiting for the condition variable signaling that a
        table share is completely opened, the thread will temporarily
        release LOCK_open. Thus, the caller cannot rely on LOCK_open
        being held for the duration of the call.
  @param thd        thread descriptor
  @param db         database name
  @param table_name table name
  @retval NULL      a share for the table does not exist in the cache
  @retval != NULL   pointer to existing share in the cache
*/

TABLE_SHARE *get_cached_table_share(THD *thd, const char *db,
                                    const char *table_name)
{
  char key[MAX_DBKEY_LENGTH];
  size_t key_length;
  TABLE_SHARE *share= NULL;
  mysql_mutex_assert_owner(&LOCK_open);

  key_length= create_table_def_key((THD*) 0, key, db, table_name, 0);
  while ((share= reinterpret_cast<TABLE_SHARE*>(
                     my_hash_search(&table_def_cache,
                       reinterpret_cast<uchar*>(const_cast<char*>(key)),
                       key_length))))
  {
    if (!share->m_open_in_progress)
      break;

    DEBUG_SYNC(thd, "get_cached_share_cond_wait");
    mysql_cond_wait(&COND_open, &LOCK_open);
  }
  return share;
}

sql/table.cc

/**
  Wait until the subject share is removed from the table
  definition cache and make sure it's destroyed.
  @note This method may access the share concurrently with another
  thread if the share is in the process of being opened, i.e., that
  m_open_in_progress is true. In this case, close_cached_tables() may
  iterate over elements in the table definition cache, and call this
  method regardless of the share being opened or not. This works anyway
  since a new flush ticket is added below, and LOCK_open ensures
  that the share may not be destroyed by another thread in the time
  between finding this share (having an old version) and adding the flush
  ticket. Thus, after this thread has added the flush ticket, the thread
  opening the table will eventually call free_table_share (as a result of
  releasing the share after using it, or as a result of a failing
  open_table_def()), which will notify the owners of the flush tickets,
  and the last one being notified will actually destroy the share.
  @param mdl_context     MDL context for thread which is going to wait.
  @param abstime         Timeout for waiting as absolute time value.
  @param deadlock_weight Weight of this wait for deadlock detector.
  @pre LOCK_open is write locked, the share is used (has
       non-zero reference count), is marked for flush and
       this connection does not reference the share.
       LOCK_open will be unlocked temporarily during execution.
  @retval FALSE - Success.
  @retval TRUE  - Error (OOM, deadlock, timeout, etc...).
*/

bool TABLE_SHARE::wait_for_old_version(THD *thd, struct timespec *abstime,
                                       uint deadlock_weight)
{
  MDL_context *mdl_context= &thd->mdl_context;
  Wait_for_flush ticket(mdl_context, this, deadlock_weight);
  MDL_wait::enum_wait_status wait_status;

  mysql_mutex_assert_owner(&LOCK_open);
  /*
    We should enter this method only when share's version is not
    up to date and the share is referenced. Otherwise our
    thread will never be woken up from wait.
  */
  DBUG_ASSERT(version != refresh_version && ref_count != 0);

  m_flush_tickets.push_front(&ticket);

  mdl_context->m_wait.reset_status();

  mysql_mutex_unlock(&LOCK_open);

  mdl_context->will_wait_for(&ticket);

  mdl_context->find_deadlock();

  DEBUG_SYNC(thd, "flush_complete");

  wait_status= mdl_context->m_wait.timed_wait(thd, abstime, TRUE,
                                              &stage_waiting_for_table_flush);

  mdl_context->done_waiting_for();

  mysql_mutex_lock(&LOCK_open);

  m_flush_tickets.remove(&ticket);

  if (m_flush_tickets.is_empty() && ref_count == 0)
  {
    /*
      If our thread was the last one using the share,
      we must destroy it here.
    */
    destroy();
  }

  DEBUG_SYNC(thd, "share_destroyed");

  /*
    In cases when our wait was aborted by KILL statement,
    a deadlock or a timeout, the share might still be referenced,
    so we don't delete it. Note, that we can't determine this
    condition by checking wait_status alone, since, for example,
    a timeout can happen after all references to the table share
    were released, but before the share is removed from the
    cache and we receive the notification. This is why
    we first destroy the share, and then look at
    wait_status.
  */
  switch (wait_status)
  {
  case MDL_wait::GRANTED:
    return FALSE;
  case MDL_wait::VICTIM:
    my_error(ER_LOCK_DEADLOCK, MYF(0));
    return TRUE;
  case MDL_wait::TIMEOUT:
    my_error(ER_LOCK_WAIT_TIMEOUT, MYF(0));
    return TRUE;
  case MDL_wait::KILLED:
    return TRUE;
  default:
    DBUG_ASSERT(0);
    return TRUE;
  }
}

問(wèn)題抽象

生產(chǎn)環(huán)境中，會(huì)遇到這種場(chǎng)景：平常SQL語(yǔ)句跑的很快，怎么突然就慢了呢？

資源不足

大家都知道，一臺(tái)機(jī)器的資源是有限的，這里的資源包括但不限于 cpu資源、io資源、thread資源、cache資源等。當(dāng)query突然猛增之后，這些資源就有可能出現(xiàn)不足，從而導(dǎo)致原來(lái)很快的SQL性能變差。

這類問(wèn)題的通用解決方案一般是：降級(jí)、限流

優(yōu)化器bug

大家都知道優(yōu)化器會(huì)基于最小代價(jià)選擇一個(gè)合適的索引去引擎層獲取數(shù)據(jù)。
這里的代價(jià)包含但不限于：Rows-exam、排序、臨時(shí)表等等

bug之所以為bug，就是沒(méi)道理可講，具體內(nèi)容就不啰嗦了，簡(jiǎn)單列幾個(gè)平常優(yōu)化過(guò)程中的解決方案：

1）刪除重復(fù)索引
2）添加合適的索引
3）force index ：這種方法不通用，因?yàn)閷?duì)業(yè)務(wù)侵入性比較大，且上線需要時(shí)間

這類問(wèn)題的通用解決方案一般是：定期check 重復(fù)索引和冗余索引

統(tǒng)計(jì)信息不準(zhǔn)確

為什么統(tǒng)計(jì)信息會(huì)不準(zhǔn)確呢
- 因?yàn)镸ySQL是通過(guò)采樣物理數(shù)據(jù)頁(yè)的方式來(lái)計(jì)算的，當(dāng)數(shù)據(jù)分布不均時(shí)，就會(huì)有統(tǒng)計(jì)信息不準(zhǔn)的情況
- 另外，MySQL的delete操作，只是將物理記錄標(biāo)記為刪除，真正的刪除是purge線程干的活。當(dāng)某些數(shù)據(jù)因?yàn)槟撤N原因（比如：RR事務(wù)隔離級(jí)別，保證數(shù)據(jù)一致性），不能被purge的時(shí)候，新的insert不是Inplace，而是占用新的空間，這樣也會(huì)導(dǎo)致統(tǒng)計(jì)信息不準(zhǔn)確

如果只是基于Rows-exam為成本代價(jià)來(lái)判斷，那么是可以通過(guò)Analyze table 更新統(tǒng)計(jì)信息來(lái)讓優(yōu)化器做出正確的選擇。
但是真實(shí)場(chǎng)景往往很復(fù)雜，目前也沒(méi)有太好的辦法來(lái)判斷出哪些table的統(tǒng)計(jì)信息需要重新更新，只能通過(guò)查看該執(zhí)行計(jì)劃的rows和真實(shí)查詢的記錄數(shù)的差別來(lái)簡(jiǎn)單的判斷

解決方案

運(yùn)維層

* 主要原因
    - long query
        - 監(jiān)控long query ， 發(fā)現(xiàn)long query 記錄到日志，并后期優(yōu)化
    - high concurrent query
        - 白名單機(jī)制： 先加入白名單，后優(yōu)化降低并發(fā)度  

* 按需check
    - 時(shí)間維度（全量）
        - 每天一次，改為每周一次
    - 數(shù)量維度（增量）
        - 由于沒(méi)有太好的辦法判斷哪些表需要Analyze，所以變通的方法是：從slow中找出慢的表，對(duì)它進(jìn)行check

源碼層
- 因?yàn)閍nalyze table，既沒(méi)有修改table definition，也沒(méi)有修改table data，那么我們?cè)诖瞬僮髦兄苯犹^(guò)tdc_remove_table即可
- 已在我們5.7.29分支上合并了這部分代碼，主要邏輯如下

* sql/sql_admin.cc

 if (table->table)
{
    const bool skip_flush=
    (operator_func == &handler::ha_analyze)
    && (table->table->file->ha_table_flags() & HA_ONLINE_ANALYZE);
  if (table->table->s->tmp_table)
  {
    /*
      If the table was not opened successfully, do not try to get
      status information. (Bug#47633)
    */
    if (open_for_modify && !open_error)
      table->table->file->info(HA_STATUS_CONST);
  }
  else if ((!skip_flush && open_for_modify) || fatal_error)
  {
    tdc_remove_table(thd, TDC_RT_REMOVE_UNUSED,
                     table->db, table->table_name, FALSE);
    /*
      May be something modified. Consequently, we have to
      invalidate the query cache.
    */
    table->table= 0;                        // For query cache
    query_cache.invalidate(thd, table, FALSE);
  }

感謝

本文主要參與者如下：余聰、王棟、余建鋒、圣勇進(jìn)、徐云磊、蘭春

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買(mǎi)等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自：小仙女本仙人 > 《待分類》

舉報(bào)/認(rèn)領(lǐng)