上次大致分析了一下哈希表的鏈地址法的實(shí)現(xiàn),今天來(lái)分析一下另一種解決哈希沖突的做法,即為每個(gè)Hash值,建立一個(gè)Hash桶(Bucket),桶的容量是固定的,也就是只能處理固定次數(shù)的沖突,如1048576個(gè)Hash桶,每個(gè)桶中有4個(gè)表項(xiàng)(Entry),總計(jì)4M個(gè)表項(xiàng)。其實(shí)這兩種的實(shí)現(xiàn)思路雷同,就是對(duì)Hash表中每個(gè)Hash值建立一個(gè)沖突表,即將沖突的幾個(gè)記錄以表的形式存儲(chǔ)在其中;
廢話不多說(shuō),上代碼和圖示基本能說(shuō)明清楚:
完整的代碼,請(qǐng)看:這里,一位圣安德魯斯大學(xué)的講師:KRISTENSSON博客
這里截取幾個(gè)主要的片段:
主要的數(shù)據(jù)結(jié)構(gòu):
struct Pair { char *key; char *value; };
struct Bucket { unsigned int count; Pair *pairs; };
struct StrMap { unsigned int count; Bucket *buckets; };
主要的函數(shù):
put:
int sm_put(StrMap *map, const char *key, const char *value) { unsigned int key_len, value_len, index; Bucket *bucket; Pair *tmp_pairs, *pair; char *tmp_value; char *new_key, *new_value;
if (map == NULL) { return 0; } if (key == NULL || value == NULL) { return 0; } key_len = strlen(key); value_len = strlen(value); /* Get a pointer to the bucket the key string hashes to */ index = hash(key) % map->count; bucket = &(map->buckets[index]); /* Check if we can handle insertion by simply replacing * an existing value in a key-value pair in the bucket. */ if ((pair = get_pair(bucket, key)) != NULL) { /* The bucket contains a pair that matches the provided key, * change the value for that pair to the new value. */ if (strlen(pair->value) < value_len) { /* If the new value is larger than the old value, re-allocate * space for the new larger value. */ tmp_value = realloc(pair->value, (value_len + 1) * sizeof(char)); if (tmp_value == NULL) { return 0; } pair->value = tmp_value; } /* Copy the new value into the pair that matches the key */ strcpy(pair->value, value); return 1; } /* Allocate space for a new key and value */ new_key = malloc((key_len + 1) * sizeof(char)); if (new_key == NULL) { return 0; } new_value = malloc((value_len + 1) * sizeof(char)); if (new_value == NULL) { free(new_key); return 0; } /* Create a key-value pair */ if (bucket->count == 0) { /* The bucket is empty, lazily allocate space for a single * key-value pair. */ bucket->pairs = malloc(sizeof(Pair)); if (bucket->pairs == NULL) { free(new_key); free(new_value); return 0; } bucket->count = 1; } else { /* The bucket wasn't empty but no pair existed that matches the provided * key, so create a new key-value pair. */ tmp_pairs = realloc(bucket->pairs, (bucket->count + 1) * sizeof(Pair)); if (tmp_pairs == NULL) { free(new_key); free(new_value); return 0; } bucket->pairs = tmp_pairs; bucket->count++; } /* Get the last pair in the chain for the bucket */ pair = &(bucket->pairs[bucket->count - 1]); pair->key = new_key; pair->value = new_value; /* Copy the key and its value into the key-value pair */ strcpy(pair->key, key); strcpy(pair->value, value); return 1; }
get:
int sm_get(const StrMap *map, const char *key, char *out_buf, unsigned int n_out_buf) { unsigned int index; Bucket *bucket; Pair *pair;
if (map == NULL) { return 0; } if (key == NULL) { return 0; } index = hash(key) % map->count; bucket = &(map->buckets[index]); pair = get_pair(bucket, key); if (pair == NULL) { return 0; } if (out_buf == NULL && n_out_buf == 0) { return strlen(pair->value) + 1; } if (out_buf == NULL) { return 0; } if (strlen(pair->value) >= n_out_buf) { return 0; } strcpy(out_buf, pair->value); return 1; }
哈希函數(shù):
/* * Returns a hash code for the provided string. */ static unsigned long hash(const char *str) { unsigned long hash = 5381; int c;
while (c = *str++) { hash = ((hash << 5) + hash) + c; } return hash; }
大致的思路是這樣的:
首先哈希桶的個(gè)數(shù)是固定的,有用戶構(gòu)建的時(shí)候輸入,一旦構(gòu)建,個(gè)數(shù)就已經(jīng)固定;查找的時(shí)候首先將key值通過(guò)哈希函數(shù)獲取哈希值,根據(jù)哈希值獲取到對(duì)應(yīng)的哈希桶,然后遍歷哈希桶內(nèi)的pairs數(shù)組獲??;
這兩種實(shí)現(xiàn)方法看似比較類似,但也有差異:
基于哈希桶的情況下,由于Hash桶容量的限制,所以,有可能發(fā)生Hash表填不滿的情況,也就是,雖然Hash表里面還有空位,但是新建的表項(xiàng)由于沖突過(guò)多,而不能裝入Hash表中。不過(guò),這樣的實(shí)現(xiàn)也有其好處,就是查表的最大開銷是可以確定的,因?yàn)樽疃嗵幚淼臎_突數(shù)是確定的,所以算法的時(shí)間復(fù)雜度為O(1)+O(m),其中m為Hash桶容量。
而另一種通過(guò)鏈表的實(shí)現(xiàn),由于Hash桶的容量是無(wú)限的,因此,只要沒(méi)有超出Hash表的最大容量,就能夠容納新建的表項(xiàng)。但是,一旦發(fā)生了Hash沖突嚴(yán)重的情況,就會(huì)造成Hash桶的鏈表過(guò)長(zhǎng),大大降低查找效率。在最壞的情況下,時(shí)間復(fù)雜度退化為O(n),其中n為Hash表的總?cè)萘俊.?dāng)然,這種情況的概率小之又小,幾乎是可以忽略的。
后面我們?cè)倏纯匆恍﹥?yōu)秀的開源項(xiàng)目中是如何實(shí)現(xiàn)的;
|