哈希表的C實(shí)現(xiàn)（二）

lchjczw 2013-01-08

展開全文

上次大致分析了一下哈希表的鏈地址法的實(shí)現(xiàn)，今天來(lái)分析一下另一種解決哈希沖突的做法，即為每個(gè)Hash值，建立一個(gè)Hash桶(Bucket)，桶的容量是固定的，也就是只能處理固定次數(shù)的沖突，如1048576個(gè)Hash桶，每個(gè)桶中有4個(gè)表項(xiàng)(Entry)，總計(jì)4M個(gè)表項(xiàng)。其實(shí)這兩種的實(shí)現(xiàn)思路雷同，就是對(duì)Hash表中每個(gè)Hash值建立一個(gè)沖突表，即將沖突的幾個(gè)記錄以表的形式存儲(chǔ)在其中；

廢話不多說(shuō)，上代碼和圖示基本能說(shuō)明清楚：

完整的代碼，請(qǐng)看：這里，一位圣安德魯斯大學(xué)的講師：KRISTENSSON博客

這里截取幾個(gè)主要的片段：

主要的數(shù)據(jù)結(jié)構(gòu)：

struct Pair {
    char *key;
    char *value;
};

struct Bucket {
    unsigned int count;
    Pair *pairs;
};

struct StrMap {
    unsigned int count;
    Bucket *buckets;
};

主要的函數(shù)：

put：

int sm_put(StrMap *map, const char *key, const char *value)
{
    unsigned int key_len, value_len, index;
    Bucket *bucket;
    Pair *tmp_pairs, *pair;
    char *tmp_value;
    char *new_key, *new_value;

    if (map == NULL) {
        return 0;
    }
    if (key == NULL || value == NULL) {
        return 0;
    }
    key_len = strlen(key);
    value_len = strlen(value);
    /* Get a pointer to the bucket the key string hashes to */
    index = hash(key) % map->count;
    bucket = &(map->buckets[index]);
    /* Check if we can handle insertion by simply replacing
     * an existing value in a key-value pair in the bucket.
     */
    if ((pair = get_pair(bucket, key)) != NULL) {
        /* The bucket contains a pair that matches the provided key,
         * change the value for that pair to the new value.
         */
        if (strlen(pair->value) < value_len) {
            /* If the new value is larger than the old value, re-allocate
             * space for the new larger value.
             */
            tmp_value = realloc(pair->value, (value_len + 1) * sizeof(char));
            if (tmp_value == NULL) {
                return 0;
            }
            pair->value = tmp_value;
        }
        /* Copy the new value into the pair that matches the key */
        strcpy(pair->value, value);
        return 1;
    }
    /* Allocate space for a new key and value */
    new_key = malloc((key_len + 1) * sizeof(char));
    if (new_key == NULL) {
        return 0;
    }
    new_value = malloc((value_len + 1) * sizeof(char));
    if (new_value == NULL) {
        free(new_key);
        return 0;
    }
    /* Create a key-value pair */
    if (bucket->count == 0) {
        /* The bucket is empty, lazily allocate space for a single
         * key-value pair.
         */
        bucket->pairs = malloc(sizeof(Pair));
        if (bucket->pairs == NULL) {
            free(new_key);
            free(new_value);
            return 0;
        }
        bucket->count = 1;
    }
    else {
        /* The bucket wasn't empty but no pair existed that matches the provided
         * key, so create a new key-value pair.
         */
        tmp_pairs = realloc(bucket->pairs, (bucket->count + 1) * sizeof(Pair));
        if (tmp_pairs == NULL) {
            free(new_key);
            free(new_value);
            return 0;
        }
        bucket->pairs = tmp_pairs;
        bucket->count++;
    }
    /* Get the last pair in the chain for the bucket */
    pair = &(bucket->pairs[bucket->count - 1]);
    pair->key = new_key;
    pair->value = new_value;
    /* Copy the key and its value into the key-value pair */
    strcpy(pair->key, key);
    strcpy(pair->value, value);
    return 1;
}

get：

int sm_get(const StrMap *map, const char *key, char *out_buf, unsigned int n_out_buf)
{
    unsigned int index;
    Bucket *bucket;
    Pair *pair;

    if (map == NULL) {
        return 0;
    }
    if (key == NULL) {
        return 0;
    }
    index = hash(key) % map->count;
    bucket = &(map->buckets[index]);
    pair = get_pair(bucket, key);
    if (pair == NULL) {
        return 0;
    }
    if (out_buf == NULL && n_out_buf == 0) {
        return strlen(pair->value) + 1;
    }
    if (out_buf == NULL) {
        return 0;
    }
    if (strlen(pair->value) >= n_out_buf) {
        return 0;
    }
    strcpy(out_buf, pair->value);
    return 1;
}

哈希函數(shù)：

/*
 * Returns a hash code for the provided string.
 */
static unsigned long hash(const char *str)
{
    unsigned long hash = 5381;
    int c;

    while (c = *str++) {
        hash = ((hash << 5) + hash) + c;
    }
    return hash;
}

大致的思路是這樣的：

首先哈希桶的個(gè)數(shù)是固定的，有用戶構(gòu)建的時(shí)候輸入，一旦構(gòu)建，個(gè)數(shù)就已經(jīng)固定；查找的時(shí)候首先將key值通過(guò)哈希函數(shù)獲取哈希值，根據(jù)哈希值獲取到對(duì)應(yīng)的哈希桶，然后遍歷哈希桶內(nèi)的pairs數(shù)組獲??；

這兩種實(shí)現(xiàn)方法看似比較類似，但也有差異：

基于哈希桶的情況下，由于Hash桶容量的限制，所以，有可能發(fā)生Hash表填不滿的情況，也就是，雖然Hash表里面還有空位，但是新建的表項(xiàng)由于沖突過(guò)多，而不能裝入Hash表中。不過(guò)，這樣的實(shí)現(xiàn)也有其好處，就是查表的最大開銷是可以確定的，因?yàn)樽疃嗵幚淼臎_突數(shù)是確定的，所以算法的時(shí)間復(fù)雜度為O(1)+O(m)，其中m為Hash桶容量。

而另一種通過(guò)鏈表的實(shí)現(xiàn)，由于Hash桶的容量是無(wú)限的，因此，只要沒(méi)有超出Hash表的最大容量，就能夠容納新建的表項(xiàng)。但是，一旦發(fā)生了Hash沖突嚴(yán)重的情況，就會(huì)造成Hash桶的鏈表過(guò)長(zhǎng)，大大降低查找效率。在最壞的情況下，時(shí)間復(fù)雜度退化為O(n)，其中n為Hash表的總?cè)萘俊．?dāng)然，這種情況的概率小之又小，幾乎是可以忽略的。

后面我們?cè)倏纯匆恍﹥?yōu)秀的開源項(xiàng)目中是如何實(shí)現(xiàn)的；

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自： lchjczw > 《哈希表》

舉報(bào)/認(rèn)領(lǐng)