🔗 Complete Guide to Hashing

📋 Table of Contents

1. What is Hashing? 🤔

Definition: Hashing is a technique that uses a hash function to map keys to specific locations in a hash table, enabling fast data retrieval with average O(1) time complexity.

Key Components:

Real-World Examples:

📖 Dictionary/Phone Book:

Instead of searching through every name sequentially, you jump to a specific section (A-D, E-H, etc.) based on the first letter. The first letter acts as a "hash function" that maps names to sections.

🔐 Password Storage:

Websites don't store your actual password. They store a hashed version. When you login, your entered password is hashed and compared with the stored hash.

# Password: "mypassword123" # Stored Hash: "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"

💾 Caching Systems:

Web browsers use hashing to quickly check if a webpage is already cached. The URL is hashed to create a unique identifier for quick lookup.

🗄️ Database Indexing:

Databases use hash indices to quickly locate records based on primary keys without scanning the entire table.

2. How Hashing Works ⚙️

Key: "apple"
Hash Function
Index: 3
Table[3] = "apple"

Step-by-Step Process:

  1. Input Key: Provide a key (string, number, object)
  2. Hash Function: Apply mathematical transformation
  3. Generate Index: Get array index within table bounds
  4. Store/Retrieve: Access data at calculated index

Hash Function Properties:

# Simple Hash Function Example def simple_hash(key, table_size): """Convert string key to table index""" hash_value = 0 for char in str(key): hash_value += ord(char) # Sum ASCII values return hash_value % table_size # Ensure within bounds # Example key = "hello" table_size = 10 index = simple_hash(key, table_size) print(f"Key '{key}' maps to index {index}") # Output: Key 'hello' maps to index 2

3. Types of Hashing Techniques 🎯

3.1 Direct Addressing

Concept: Use the key itself as the array index. Works when keys are small integers within a known range.
# Direct Addressing Example class DirectAddressTable: def __init__(self, max_size): self.table = [None] * max_size self.max_size = max_size def insert(self, key, value): if 0 <= key < self.max_size: self.table[key] = value def search(self, key): if 0 <= key < self.max_size: return self.table[key] return None def delete(self, key): if 0 <= key < self.max_size: self.table[key] = None # Usage table = DirectAddressTable(1000) table.insert(42, "Answer to everything") table.insert(100, "Century") print(table.search(42)) # Output: "Answer to everything"
Direct addressing is perfect when you have a small, known range of integer keys (like ages 0-120, or student IDs 1000-9999).

3.2 Division Method (Mod Operator)

Formula: h(k) = k mod m
Where: k = key, m = table size
def division_hash(key, table_size): """Division method hashing""" return hash(key) % table_size # Example with different keys table_size = 7 # Prime number (recommended) keys = ["apple", "banana", "cherry", "date", 123, 456] print("Division Method Results:") for key in keys: index = division_hash(key, table_size) print(f"Key: {key} → Index: {index}") # Output: # Key: apple → Index: 2 # Key: banana → Index: 4 # Key: cherry → Index: 1 # Key: date → Index: 6 # Key: 123 → Index: 4 # Collision with banana! # Key: 456 → Index: 1 # Collision with cherry!
Choose table size as a prime number to reduce clustering and improve distribution. Avoid powers of 2.

3.3 Multiplication Method

Formula: h(k) = floor(m * (k * A mod 1))
Where: A = constant (0 < A < 1), typically A = (√5 - 1)/2 ≈ 0.618
import math def multiplication_hash(key, table_size): """Multiplication method hashing""" A = (math.sqrt(5) - 1) / 2 # Golden ratio - 1 key_hash = hash(key) if isinstance(key, str) else key return int(table_size * ((key_hash * A) % 1)) # Example table_size = 10 keys = [10, 22, 31, 4, 15, 28, 17, 88, 59] print("Multiplication Method Results:") for key in keys: index = multiplication_hash(key, table_size) print(f"Key: {key} → Index: {index}")

3.4 Universal Hashing

Concept: Randomly select hash function from a family of functions to minimize worst-case collisions.
import random class UniversalHashing: def __init__(self, table_size): self.table_size = table_size self.p = 1000000007 # Large prime self.a = random.randint(1, self.p - 1) self.b = random.randint(0, self.p - 1) def hash_function(self, key): """Universal hash function: ((a*k + b) mod p) mod m""" key_val = hash(key) if isinstance(key, str) else key return ((self.a * key_val + self.b) % self.p) % self.table_size # Example universal_hash = UniversalHashing(10) keys = ["hello", "world", "python", "hashing"] print("Universal Hashing Results:") for key in keys: index = universal_hash.hash_function(key) print(f"Key: {key} → Index: {index}")

4. Collision Handling Strategies 🚧

Collision: When two different keys produce the same hash value/index.

4.1 Chaining (Separate Chaining)

Store multiple elements at the same index using linked lists or dynamic arrays.

class HashTableChaining: def __init__(self, size=10): self.size = size self.table = [[] for _ in range(size)] # List of lists def _hash(self, key): return hash(key) % self.size def insert(self, key, value): index = self._hash(key) bucket = self.table[index] # Check if key already exists for i, (k, v) in enumerate(bucket): if k == key: bucket[i] = (key, value) # Update existing return # Add new key-value pair bucket.append((key, value)) def search(self, key): index = self._hash(key) bucket = self.table[index] for k, v in bucket: if k == key: return v return None def delete(self, key): index = self._hash(key) bucket = self.table[index] for i, (k, v) in enumerate(bucket): if k == key: del bucket[i] return True return False def display(self): for i, bucket in enumerate(self.table): print(f"Index {i}: {bucket}") # Example usage ht = HashTableChaining(7) ht.insert("apple", 100) ht.insert("banana", 200) ht.insert("orange", 300) ht.insert("grape", 400) # May cause collision ht.display() print(f"Search 'apple': {ht.search('apple')}") # Output: 100

Chaining Visualization:

Index 0: []
Index 1: [('apple', 100)]
Index 2: [('banana', 200), ('grape', 400)]  ← Collision handled
Index 3: [('orange', 300)]
Index 4: []
Index 5: []
Index 6: []
            

4.2 Open Addressing

Find alternative locations within the same table when collision occurs.

4.2.1 Linear Probing

Formula: h'(k, i) = (h(k) + i) mod m
Where: i = 0, 1, 2, ... (probe sequence)
class HashTableLinearProbing: def __init__(self, size=10): self.size = size self.keys = [None] * size self.values = [None] * size self.deleted = [False] * size # Track deleted slots def _hash(self, key): return hash(key) % self.size def _probe(self, key): index = self._hash(key) while (self.keys[index] is not None and self.keys[index] != key and not self.deleted[index]): index = (index + 1) % self.size # Linear probing return index def insert(self, key, value): index = self._probe(key) self.keys[index] = key self.values[index] = value self.deleted[index] = False def search(self, key): index = self._hash(key) while self.keys[index] is not None: if self.keys[index] == key and not self.deleted[index]: return self.values[index] index = (index + 1) % self.size return None def delete(self, key): index = self._hash(key) while self.keys[index] is not None: if self.keys[index] == key and not self.deleted[index]: self.deleted[index] = True return True index = (index + 1) % self.size return False def display(self): for i in range(self.size): status = " (DEL)" if self.deleted[i] else "" print(f"Index {i}: {self.keys[i]} -> {self.values[i]}{status}")

4.2.2 Quadratic Probing

Formula: h'(k, i) = (h(k) + c₁i + c₂i²) mod m
Where: c₁, c₂ are constants (commonly c₁=c₂=1)
class HashTableQuadraticProbing: def __init__(self, size=10): self.size = size self.keys = [None] * size self.values = [None] * size def _hash(self, key): return hash(key) % self.size def _quadratic_probe(self, key): index = self._hash(key) i = 0 while (self.keys[index] is not None and self.keys[index] != key): i += 1 index = (self._hash(key) + i * i) % self.size return index def insert(self, key, value): index = self._quadratic_probe(key) self.keys[index] = key self.values[index] = value def search(self, key): index = self._hash(key) i = 0 while self.keys[index] is not None: if self.keys[index] == key: return self.values[index] i += 1 index = (self._hash(key) + i * i) % self.size return None

4.2.3 Double Hashing

Formula: h'(k, i) = (h₁(k) + i * h₂(k)) mod m
Where: h₁, h₂ are different hash functions
class HashTableDoubleHashing: def __init__(self, size=10): self.size = size self.keys = [None] * size self.values = [None] * size def _hash1(self, key): return hash(key) % self.size def _hash2(self, key): # Second hash function (should be coprime to table size) return 7 - (hash(key) % 7) def _double_hash_probe(self, key): index = self._hash1(key) step_size = self._hash2(key) i = 0 while (self.keys[index] is not None and self.keys[index] != key): i += 1 index = (self._hash1(key) + i * step_size) % self.size return index def insert(self, key, value): index = self._double_hash_probe(key) self.keys[index] = key self.values[index] = value
Collision Resolution Comparison:
Chaining: Simple, handles high load factors well
Linear Probing: Good cache performance, but clustering issues
Quadratic Probing: Reduces clustering, moderate complexity
Double Hashing: Best distribution, more complex computation

5. Hashing in Python 🐍

5.1 Built-in `dict` Data Structure

# Dictionary (Hash Table) Operations student_grades = {} # Insertion - O(1) average student_grades["Alice"] = 95 student_grades["Bob"] = 87 student_grades["Charlie"] = 92 # Access - O(1) average print(student_grades["Alice"]) # Output: 95 # Update - O(1) average student_grades["Alice"] = 98 # Deletion - O(1) average del student_grades["Bob"] # Check existence - O(1) average if "Charlie" in student_grades: print(f"Charlie's grade: {student_grades['Charlie']}") # Dictionary methods print(student_grades.keys()) # dict_keys(['Alice', 'Charlie']) print(student_grades.values()) # dict_values([98, 92]) print(student_grades.items()) # dict_items([('Alice', 98), ('Charlie', 92)]) # Dictionary comprehension squares = {x: x**2 for x in range(1, 6)} print(squares) # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

5.2 Built-in `set` Data Structure

# Set (Hash Set) Operations fruits = set() # Addition - O(1) average fruits.add("apple") fruits.add("banana") fruits.add("cherry") fruits.add("apple") # Duplicate ignored print(fruits) # {'cherry', 'banana', 'apple'} # Check membership - O(1) average print("apple" in fruits) # True # Removal - O(1) average fruits.remove("banana") # fruits.discard("banana") # No error if not found # Set operations set1 = {1, 2, 3, 4, 5} set2 = {4, 5, 6, 7, 8} print(set1 | set2) # Union: {1, 2, 3, 4, 5, 6, 7, 8} print(set1 & set2) # Intersection: {4, 5} print(set1 - set2) # Difference: {1, 2, 3} print(set1 ^ set2) # Symmetric difference: {1, 2, 3, 6, 7, 8} # Set comprehension even_squares = {x**2 for x in range(10) if x % 2 == 0} print(even_squares) # {0, 4, 16, 36, 64}

5.3 Built-in `hash()` Function

# Using hash() function print(hash("hello")) # e.g., -5434435027328283325 print(hash(42)) # 42 print(hash(3.14)) # 322818021289917443 print(hash((1, 2, 3))) # Tuples are hashable # Hash of same object is consistent within session text = "python" print(hash(text) == hash(text)) # True # Different objects may have same hash (collision) print(hash("a") == hash("b")) # Probably False, but not guaranteed # Custom hashable class class Point: def __init__(self, x, y): self.x = x self.y = y def __hash__(self): return hash((self.x, self.y)) def __eq__(self, other): return isinstance(other, Point) and self.x == other.x and self.y == other.y def __repr__(self): return f"Point({self.x}, {self.y})" # Using custom hashable objects points = {} p1 = Point(1, 2) p2 = Point(3, 4) points[p1] = "Origin area" points[p2] = "Upper right" print(points[Point(1, 2)]) # "Origin area"
Remember: To use an object as a dictionary key, it must be hashable (implement __hash__ and __eq__ methods) and immutable.

6. Time & Space Complexity 📊

Operation Average Case Worst Case Best Case Notes
Search O(1) O(n) O(1) Worst case when all keys hash to same index
Insert O(1) O(n) O(1) May require table resizing
Delete O(1) O(n) O(1) Same as search complexity
Space O(n) O(n) O(n) n = number of key-value pairs

Load Factor Impact:

Load Factor (α) = Number of elements / Table size
• α < 0.7: Good performance
• α > 0.8: Performance degrades
• α = 1.0: Table full (open addressing impossible)
# Load factor calculation and resizing class DynamicHashTable: def __init__(self, initial_size=8): self.size = initial_size self.count = 0 self.keys = [None] * self.size self.values = [None] * self.size self.deleted = [False] * self.size def load_factor(self): return self.count / self.size def _needs_resize(self): return self.load_factor() > 0.7 def _resize(self): # Save current data old_keys = self.keys[:] old_values = self.values[:] old_deleted = self.deleted[:] # Double the size self.size *= 2 self.count = 0 self.keys = [None] * self.size self.values = [None] * self.size self.deleted = [False] * self.size # Rehash all elements for i in range(len(old_keys)): if old_keys[i] is not None and not old_deleted[i]: self.insert(old_keys[i], old_values[i]) def insert(self, key, value): if self._needs_resize(): self._resize() index = self._probe(key) if self.keys[index] is None or self.deleted[index]: self.count += 1 self.keys[index] = key self.values[index] = value self.deleted[index] = False

7. Common Interview Problems 💼

7.1 Two Sum Problem

Problem: Given an array of integers and a target sum, return indices of two numbers that add up to target.
def two_sum(nums, target): """ Time: O(n), Space: O(n) """ num_map = {} # value -> index for i, num in enumerate(nums): complement = target - num if complement in num_map: return [num_map[complement], i] num_map[num] = i return [] # Example nums = [2, 7, 11, 15] target = 9 result = two_sum(nums, target) print(f"Indices: {result}") # [0, 1] because nums[0] + nums[1] = 2 + 7 = 9

7.2 Group Anagrams

Problem: Group strings that are anagrams of each other.
from collections import defaultdict def group_anagrams(strs): """ Time: O(n * k log k), Space: O(n * k) where n = number of strings, k = max length of string """ anagram_map = defaultdict(list) for s in strs: # Sort characters to create key sorted_str = ''.join(sorted(s)) anagram_map[sorted_str].append(s) return list(anagram_map.values()) # Example strs = ["eat", "tea", "tan", "ate", "nat", "bat"] result = group_anagrams(strs) print(result) # [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

7.3 First Non-Repeating Character

def first_non_repeating_char(s): """ Time: O(n), Space: O(1) - at most 26 lowercase letters """ char_count = {} # Count frequencies for char in s: char_count[char] = char_count.get(char, 0) + 1 # Find first non-repeating for char in s: if char_count[char] == 1: return char return None # Example s = "leetcode" result = first_non_repeating_char(s) print(f"First non-repeating: '{result}'") # 'l'

7.4 Subarray Sum Equals K

def subarray_sum(nums, k): """ Count number of continuous subarrays whose sum equals k Time: O(n), Space: O(n) """ count = 0 prefix_sum = 0 sum_map = {0: 1} # prefix_sum -> frequency for num in nums: prefix_sum += num # Check if (prefix_sum - k) exists if (prefix_sum - k) in sum_map: count += sum_map[prefix_sum - k] # Update map sum_map[prefix_sum] = sum_map.get(prefix_sum, 0) + 1 return count # Example nums = [1, 1, 1] k = 2 result = subarray_sum(nums, k) print(f"Number of subarrays: {result}") # 2

7.5 Longest Consecutive Sequence

def longest_consecutive(nums): """ Time: O(n), Space: O(n) """ if not nums: return 0 num_set = set(nums) longest = 0 for num in num_set: # Check if this is start of sequence if num - 1 not in num_set: current_num = num current_length = 1 # Count consecutive numbers while current_num + 1 in num_set: current_num += 1 current_length += 1 longest = max(longest, current_length) return longest # Example nums = [100, 4, 200, 1, 3, 2] result = longest_consecutive(nums) print(f"Longest consecutive length: {result}") # 4 (sequence: 1,2,3,4)

7.6 Design HashMap

class MyHashMap: """ Design a HashMap without using built-in hash table libraries """ def __init__(self): self.size = 1000 self.table = [[] for _ in range(self.size)] def _hash(self, key): return key % self.size def put(self, key, value): index = self._hash(key) bucket = self.table[index] for i, (k, v) in enumerate(bucket): if k == key: bucket[i] = (key, value) return bucket.append((key, value)) def get(self, key): index = self._hash(key) bucket = self.table[index] for k, v in bucket: if k == key: return v return -1 def remove(self, key): index = self._hash(key) bucket = self.table[index] for i, (k, v) in enumerate(bucket): if k == key: del bucket[i] return # Usage hashmap = MyHashMap() hashmap.put(1, 1) hashmap.put(2, 2) print(hashmap.get(1)) # 1 print(hashmap.get(3)) # -1 hashmap.put(2, 1) # Update existing key print(hashmap.get(2)) # 1 hashmap.remove(2) print(hashmap.get(2)) # -1

8. Interview Tips and Tricks 🎯

🔍 Pattern Recognition: Use hashing when you need to:
• Track what you've seen before
• Count frequencies
• Find complements or pairs
• Group related items
• Check membership quickly

8.1 Common Hash Table Applications

8.2 Key Interview Phrases

# When explaining hash tables in interviews, mention: "I'll use a hash table to achieve O(1) average lookup time..." "To handle collisions, I would recommend chaining with linked lists..." "The space-time tradeoff here is O(n) extra space for O(1) operations..." "We need to consider the load factor to maintain good performance..." "Hash tables provide amortized O(1) operations..."

8.3 Common Pitfalls & How to Avoid Them

🚨 Collision Handling: Always mention how you'll handle collisions. Don't assume perfect hashing.
🚨 Key Mutability: Explain that keys must be immutable. You can't use lists as dictionary keys in Python.
🚨 Memory Usage: Hash tables trade space for time. Mention the O(n) space complexity.
🚨 Hash Function Quality: Poor hash functions can lead to clustering and worst-case O(n) performance.

8.4 Advanced Optimization Techniques

# 1. Counter for frequency problems from collections import Counter def find_anagrams(s, p): """Find all anagrams of p in s""" if len(p) > len(s): return [] p_count = Counter(p) window_count = Counter() result = [] for i in range(len(s)): # Add current character window_count[s[i]] += 1 # Remove character outside window if i >= len(p): left_char = s[i - len(p)] window_count[left_char] -= 1 if window_count[left_char] == 0: del window_count[left_char] # Check if current window is anagram if window_count == p_count: result.append(i - len(p) + 1) return result # 2. defaultdict for cleaner code from collections import defaultdict def group_by_first_letter(words): """Group words by their first letter""" groups = defaultdict(list) for word in words: groups[word[0]].append(word) return dict(groups) # 3. Using frozenset for hashable sets def find_unique_combinations(lists): """Find unique combinations from list of lists""" unique_combos = set() for lst in lists: unique_combos.add(frozenset(lst)) return [list(combo) for combo in unique_combos]

8.5 Step-by-Step Problem Solving Approach

  1. Identify the Pattern: Look for keywords like "count", "frequency", "seen before", "pairs", "duplicates"
  2. Choose Data Structure: Dict for key-value mapping, Set for membership testing
  3. Design Hash Strategy: What will be your key? What will be your value?
  4. Handle Edge Cases: Empty input, single element, no solution
  5. Optimize: Consider Counter, defaultdict, or other specialized tools

8.6 Practice Problems by Difficulty

📗 Easy Level:

📙 Medium Level:

📕 Hard Level:

8.7 Code Templates for Common Patterns

# Template 1: Frequency Counter def frequency_pattern(arr): freq = {} for item in arr: freq[item] = freq.get(item, 0) + 1 return freq # Template 2: Seen Before Pattern def seen_before_pattern(arr): seen = set() for item in arr: if item in seen: return True # Found duplicate seen.add(item) return False # Template 3: Complement Search Pattern def complement_pattern(arr, target): complements = set() for item in arr: if target - item in complements: return True # Found pair complements.add(item) return False # Template 4: Grouping Pattern def grouping_pattern(items, key_func): groups = {} for item in items: key = key_func(item) if key not in groups: groups[key] = [] groups[key].append(item) return groups
🎤 What to Say in Interviews:
1. "Let me think about using a hash table for O(1) lookups..."
2. "I'll use a dictionary to map X to Y..."
3. "This looks like a frequency counting problem..."
4. "I can optimize this from O(n²) to O(n) using hashing..."
5. "The trade-off is O(n) extra space for faster time complexity..."

8.8 Final Interview Checklist

✅ Before submitting your solution:

🌟 Sample Interview Response:

"For this problem, I notice we need to find pairs that sum to a target. This suggests using a hash table to store complements. As I iterate through the array, I'll check if the current element's complement exists in my hash table. If yes, I've found a pair. If no, I'll add the current element to the table. This gives us O(n) time complexity and O(n) space complexity, which is much better than the O(n²) brute force approach."

🎓 Master Hashing = Master Interviews! 🎓

Practice these concepts daily and you'll be ready for any hashing-related interview question!