🔗 Complete Guide to Hashing
1. What is Hashing? 🤔
Definition: Hashing is a technique that uses a hash function to map keys to specific locations in a hash table, enabling fast data retrieval with average O(1) time complexity.
Key Components:
- Hash Function: A mathematical function that converts keys into array indices
- Hash Table: An array-based data structure that stores key-value pairs
- Buckets: Individual slots in the hash table where data is stored
Real-World Examples:
📖 Dictionary/Phone Book:
Instead of searching through every name sequentially, you jump to a specific section (A-D, E-H, etc.) based on the first letter. The first letter acts as a "hash function" that maps names to sections.
🔐 Password Storage:
Websites don't store your actual password. They store a hashed version. When you login, your entered password is hashed and compared with the stored hash.
# Password: "mypassword123"
# Stored Hash: "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
💾 Caching Systems:
Web browsers use hashing to quickly check if a webpage is already cached. The URL is hashed to create a unique identifier for quick lookup.
🗄️ Database Indexing:
Databases use hash indices to quickly locate records based on primary keys without scanning the entire table.
2. How Hashing Works ⚙️
Key: "apple"
→
Hash Function
→
Index: 3
→
Table[3] = "apple"
Step-by-Step Process:
- Input Key: Provide a key (string, number, object)
- Hash Function: Apply mathematical transformation
- Generate Index: Get array index within table bounds
- Store/Retrieve: Access data at calculated index
Hash Function Properties:
- Deterministic: Same input always produces same output
- Uniform Distribution: Keys should be evenly distributed
- Fast Computation: Should execute quickly
- Minimize Collisions: Different keys should produce different indices
# Simple Hash Function Example
def simple_hash(key, table_size):
"""Convert string key to table index"""
hash_value = 0
for char in str(key):
hash_value += ord(char) # Sum ASCII values
return hash_value % table_size # Ensure within bounds
# Example
key = "hello"
table_size = 10
index = simple_hash(key, table_size)
print(f"Key '{key}' maps to index {index}") # Output: Key 'hello' maps to index 2
3. Types of Hashing Techniques 🎯
3.1 Direct Addressing
Concept: Use the key itself as the array index. Works when keys are small integers within a known range.
# Direct Addressing Example
class DirectAddressTable:
def __init__(self, max_size):
self.table = [None] * max_size
self.max_size = max_size
def insert(self, key, value):
if 0 <= key < self.max_size:
self.table[key] = value
def search(self, key):
if 0 <= key < self.max_size:
return self.table[key]
return None
def delete(self, key):
if 0 <= key < self.max_size:
self.table[key] = None
# Usage
table = DirectAddressTable(1000)
table.insert(42, "Answer to everything")
table.insert(100, "Century")
print(table.search(42)) # Output: "Answer to everything"
Direct addressing is perfect when you have a small, known range of integer keys (like ages 0-120, or student IDs 1000-9999).
3.2 Division Method (Mod Operator)
Formula: h(k) = k mod m
Where: k = key, m = table size
def division_hash(key, table_size):
"""Division method hashing"""
return hash(key) % table_size
# Example with different keys
table_size = 7 # Prime number (recommended)
keys = ["apple", "banana", "cherry", "date", 123, 456]
print("Division Method Results:")
for key in keys:
index = division_hash(key, table_size)
print(f"Key: {key} → Index: {index}")
# Output:
# Key: apple → Index: 2
# Key: banana → Index: 4
# Key: cherry → Index: 1
# Key: date → Index: 6
# Key: 123 → Index: 4 # Collision with banana!
# Key: 456 → Index: 1 # Collision with cherry!
Choose table size as a prime number to reduce clustering and improve distribution. Avoid powers of 2.
3.3 Multiplication Method
Formula: h(k) = floor(m * (k * A mod 1))
Where: A = constant (0 < A < 1), typically A = (√5 - 1)/2 ≈ 0.618
import math
def multiplication_hash(key, table_size):
"""Multiplication method hashing"""
A = (math.sqrt(5) - 1) / 2 # Golden ratio - 1
key_hash = hash(key) if isinstance(key, str) else key
return int(table_size * ((key_hash * A) % 1))
# Example
table_size = 10
keys = [10, 22, 31, 4, 15, 28, 17, 88, 59]
print("Multiplication Method Results:")
for key in keys:
index = multiplication_hash(key, table_size)
print(f"Key: {key} → Index: {index}")
3.4 Universal Hashing
Concept: Randomly select hash function from a family of functions to minimize worst-case collisions.
import random
class UniversalHashing:
def __init__(self, table_size):
self.table_size = table_size
self.p = 1000000007 # Large prime
self.a = random.randint(1, self.p - 1)
self.b = random.randint(0, self.p - 1)
def hash_function(self, key):
"""Universal hash function: ((a*k + b) mod p) mod m"""
key_val = hash(key) if isinstance(key, str) else key
return ((self.a * key_val + self.b) % self.p) % self.table_size
# Example
universal_hash = UniversalHashing(10)
keys = ["hello", "world", "python", "hashing"]
print("Universal Hashing Results:")
for key in keys:
index = universal_hash.hash_function(key)
print(f"Key: {key} → Index: {index}")
4. Collision Handling Strategies 🚧
Collision: When two different keys produce the same hash value/index.
4.1 Chaining (Separate Chaining)
Store multiple elements at the same index using linked lists or dynamic arrays.
class HashTableChaining:
def __init__(self, size=10):
self.size = size
self.table = [[] for _ in range(size)] # List of lists
def _hash(self, key):
return hash(key) % self.size
def insert(self, key, value):
index = self._hash(key)
bucket = self.table[index]
# Check if key already exists
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value) # Update existing
return
# Add new key-value pair
bucket.append((key, value))
def search(self, key):
index = self._hash(key)
bucket = self.table[index]
for k, v in bucket:
if k == key:
return v
return None
def delete(self, key):
index = self._hash(key)
bucket = self.table[index]
for i, (k, v) in enumerate(bucket):
if k == key:
del bucket[i]
return True
return False
def display(self):
for i, bucket in enumerate(self.table):
print(f"Index {i}: {bucket}")
# Example usage
ht = HashTableChaining(7)
ht.insert("apple", 100)
ht.insert("banana", 200)
ht.insert("orange", 300)
ht.insert("grape", 400) # May cause collision
ht.display()
print(f"Search 'apple': {ht.search('apple')}") # Output: 100
Chaining Visualization:
Index 0: []
Index 1: [('apple', 100)]
Index 2: [('banana', 200), ('grape', 400)] ← Collision handled
Index 3: [('orange', 300)]
Index 4: []
Index 5: []
Index 6: []
4.2 Open Addressing
Find alternative locations within the same table when collision occurs.
4.2.1 Linear Probing
Formula: h'(k, i) = (h(k) + i) mod m
Where: i = 0, 1, 2, ... (probe sequence)
class HashTableLinearProbing:
def __init__(self, size=10):
self.size = size
self.keys = [None] * size
self.values = [None] * size
self.deleted = [False] * size # Track deleted slots
def _hash(self, key):
return hash(key) % self.size
def _probe(self, key):
index = self._hash(key)
while (self.keys[index] is not None and
self.keys[index] != key and
not self.deleted[index]):
index = (index + 1) % self.size # Linear probing
return index
def insert(self, key, value):
index = self._probe(key)
self.keys[index] = key
self.values[index] = value
self.deleted[index] = False
def search(self, key):
index = self._hash(key)
while self.keys[index] is not None:
if self.keys[index] == key and not self.deleted[index]:
return self.values[index]
index = (index + 1) % self.size
return None
def delete(self, key):
index = self._hash(key)
while self.keys[index] is not None:
if self.keys[index] == key and not self.deleted[index]:
self.deleted[index] = True
return True
index = (index + 1) % self.size
return False
def display(self):
for i in range(self.size):
status = " (DEL)" if self.deleted[i] else ""
print(f"Index {i}: {self.keys[i]} -> {self.values[i]}{status}")
4.2.2 Quadratic Probing
Formula: h'(k, i) = (h(k) + c₁i + c₂i²) mod m
Where: c₁, c₂ are constants (commonly c₁=c₂=1)
class HashTableQuadraticProbing:
def __init__(self, size=10):
self.size = size
self.keys = [None] * size
self.values = [None] * size
def _hash(self, key):
return hash(key) % self.size
def _quadratic_probe(self, key):
index = self._hash(key)
i = 0
while (self.keys[index] is not None and
self.keys[index] != key):
i += 1
index = (self._hash(key) + i * i) % self.size
return index
def insert(self, key, value):
index = self._quadratic_probe(key)
self.keys[index] = key
self.values[index] = value
def search(self, key):
index = self._hash(key)
i = 0
while self.keys[index] is not None:
if self.keys[index] == key:
return self.values[index]
i += 1
index = (self._hash(key) + i * i) % self.size
return None
4.2.3 Double Hashing
Formula: h'(k, i) = (h₁(k) + i * h₂(k)) mod m
Where: h₁, h₂ are different hash functions
class HashTableDoubleHashing:
def __init__(self, size=10):
self.size = size
self.keys = [None] * size
self.values = [None] * size
def _hash1(self, key):
return hash(key) % self.size
def _hash2(self, key):
# Second hash function (should be coprime to table size)
return 7 - (hash(key) % 7)
def _double_hash_probe(self, key):
index = self._hash1(key)
step_size = self._hash2(key)
i = 0
while (self.keys[index] is not None and
self.keys[index] != key):
i += 1
index = (self._hash1(key) + i * step_size) % self.size
return index
def insert(self, key, value):
index = self._double_hash_probe(key)
self.keys[index] = key
self.values[index] = value
Collision Resolution Comparison:
• Chaining: Simple, handles high load factors well
• Linear Probing: Good cache performance, but clustering issues
• Quadratic Probing: Reduces clustering, moderate complexity
• Double Hashing: Best distribution, more complex computation
5. Hashing in Python 🐍
5.1 Built-in `dict` Data Structure
# Dictionary (Hash Table) Operations
student_grades = {}
# Insertion - O(1) average
student_grades["Alice"] = 95
student_grades["Bob"] = 87
student_grades["Charlie"] = 92
# Access - O(1) average
print(student_grades["Alice"]) # Output: 95
# Update - O(1) average
student_grades["Alice"] = 98
# Deletion - O(1) average
del student_grades["Bob"]
# Check existence - O(1) average
if "Charlie" in student_grades:
print(f"Charlie's grade: {student_grades['Charlie']}")
# Dictionary methods
print(student_grades.keys()) # dict_keys(['Alice', 'Charlie'])
print(student_grades.values()) # dict_values([98, 92])
print(student_grades.items()) # dict_items([('Alice', 98), ('Charlie', 92)])
# Dictionary comprehension
squares = {x: x**2 for x in range(1, 6)}
print(squares) # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
5.2 Built-in `set` Data Structure
# Set (Hash Set) Operations
fruits = set()
# Addition - O(1) average
fruits.add("apple")
fruits.add("banana")
fruits.add("cherry")
fruits.add("apple") # Duplicate ignored
print(fruits) # {'cherry', 'banana', 'apple'}
# Check membership - O(1) average
print("apple" in fruits) # True
# Removal - O(1) average
fruits.remove("banana")
# fruits.discard("banana") # No error if not found
# Set operations
set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}
print(set1 | set2) # Union: {1, 2, 3, 4, 5, 6, 7, 8}
print(set1 & set2) # Intersection: {4, 5}
print(set1 - set2) # Difference: {1, 2, 3}
print(set1 ^ set2) # Symmetric difference: {1, 2, 3, 6, 7, 8}
# Set comprehension
even_squares = {x**2 for x in range(10) if x % 2 == 0}
print(even_squares) # {0, 4, 16, 36, 64}
5.3 Built-in `hash()` Function
# Using hash() function
print(hash("hello")) # e.g., -5434435027328283325
print(hash(42)) # 42
print(hash(3.14)) # 322818021289917443
print(hash((1, 2, 3))) # Tuples are hashable
# Hash of same object is consistent within session
text = "python"
print(hash(text) == hash(text)) # True
# Different objects may have same hash (collision)
print(hash("a") == hash("b")) # Probably False, but not guaranteed
# Custom hashable class
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __hash__(self):
return hash((self.x, self.y))
def __eq__(self, other):
return isinstance(other, Point) and self.x == other.x and self.y == other.y
def __repr__(self):
return f"Point({self.x}, {self.y})"
# Using custom hashable objects
points = {}
p1 = Point(1, 2)
p2 = Point(3, 4)
points[p1] = "Origin area"
points[p2] = "Upper right"
print(points[Point(1, 2)]) # "Origin area"
Remember: To use an object as a dictionary key, it must be hashable (implement __hash__ and __eq__ methods) and immutable.
6. Time & Space Complexity 📊
Operation |
Average Case |
Worst Case |
Best Case |
Notes |
Search |
O(1) |
O(n) |
O(1) |
Worst case when all keys hash to same index |
Insert |
O(1) |
O(n) |
O(1) |
May require table resizing |
Delete |
O(1) |
O(n) |
O(1) |
Same as search complexity |
Space |
O(n) |
O(n) |
O(n) |
n = number of key-value pairs |
Load Factor Impact:
Load Factor (α) = Number of elements / Table size
• α < 0.7: Good performance
• α > 0.8: Performance degrades
• α = 1.0: Table full (open addressing impossible)
# Load factor calculation and resizing
class DynamicHashTable:
def __init__(self, initial_size=8):
self.size = initial_size
self.count = 0
self.keys = [None] * self.size
self.values = [None] * self.size
self.deleted = [False] * self.size
def load_factor(self):
return self.count / self.size
def _needs_resize(self):
return self.load_factor() > 0.7
def _resize(self):
# Save current data
old_keys = self.keys[:]
old_values = self.values[:]
old_deleted = self.deleted[:]
# Double the size
self.size *= 2
self.count = 0
self.keys = [None] * self.size
self.values = [None] * self.size
self.deleted = [False] * self.size
# Rehash all elements
for i in range(len(old_keys)):
if old_keys[i] is not None and not old_deleted[i]:
self.insert(old_keys[i], old_values[i])
def insert(self, key, value):
if self._needs_resize():
self._resize()
index = self._probe(key)
if self.keys[index] is None or self.deleted[index]:
self.count += 1
self.keys[index] = key
self.values[index] = value
self.deleted[index] = False
7. Common Interview Problems 💼
7.1 Two Sum Problem
Problem: Given an array of integers and a target sum, return indices of two numbers that add up to target.
def two_sum(nums, target):
"""
Time: O(n), Space: O(n)
"""
num_map = {} # value -> index
for i, num in enumerate(nums):
complement = target - num
if complement in num_map:
return [num_map[complement], i]
num_map[num] = i
return []
# Example
nums = [2, 7, 11, 15]
target = 9
result = two_sum(nums, target)
print(f"Indices: {result}") # [0, 1] because nums[0] + nums[1] = 2 + 7 = 9
7.2 Group Anagrams
Problem: Group strings that are anagrams of each other.
from collections import defaultdict
def group_anagrams(strs):
"""
Time: O(n * k log k), Space: O(n * k)
where n = number of strings, k = max length of string
"""
anagram_map = defaultdict(list)
for s in strs:
# Sort characters to create key
sorted_str = ''.join(sorted(s))
anagram_map[sorted_str].append(s)
return list(anagram_map.values())
# Example
strs = ["eat", "tea", "tan", "ate", "nat", "bat"]
result = group_anagrams(strs)
print(result) # [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
7.3 First Non-Repeating Character
def first_non_repeating_char(s):
"""
Time: O(n), Space: O(1) - at most 26 lowercase letters
"""
char_count = {}
# Count frequencies
for char in s:
char_count[char] = char_count.get(char, 0) + 1
# Find first non-repeating
for char in s:
if char_count[char] == 1:
return char
return None
# Example
s = "leetcode"
result = first_non_repeating_char(s)
print(f"First non-repeating: '{result}'") # 'l'
7.4 Subarray Sum Equals K
def subarray_sum(nums, k):
"""
Count number of continuous subarrays whose sum equals k
Time: O(n), Space: O(n)
"""
count = 0
prefix_sum = 0
sum_map = {0: 1} # prefix_sum -> frequency
for num in nums:
prefix_sum += num
# Check if (prefix_sum - k) exists
if (prefix_sum - k) in sum_map:
count += sum_map[prefix_sum - k]
# Update map
sum_map[prefix_sum] = sum_map.get(prefix_sum, 0) + 1
return count
# Example
nums = [1, 1, 1]
k = 2
result = subarray_sum(nums, k)
print(f"Number of subarrays: {result}") # 2
7.5 Longest Consecutive Sequence
def longest_consecutive(nums):
"""
Time: O(n), Space: O(n)
"""
if not nums:
return 0
num_set = set(nums)
longest = 0
for num in num_set:
# Check if this is start of sequence
if num - 1 not in num_set:
current_num = num
current_length = 1
# Count consecutive numbers
while current_num + 1 in num_set:
current_num += 1
current_length += 1
longest = max(longest, current_length)
return longest
# Example
nums = [100, 4, 200, 1, 3, 2]
result = longest_consecutive(nums)
print(f"Longest consecutive length: {result}") # 4 (sequence: 1,2,3,4)
7.6 Design HashMap
class MyHashMap:
"""
Design a HashMap without using built-in hash table libraries
"""
def __init__(self):
self.size = 1000
self.table = [[] for _ in range(self.size)]
def _hash(self, key):
return key % self.size
def put(self, key, value):
index = self._hash(key)
bucket = self.table[index]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value)
return
bucket.append((key, value))
def get(self, key):
index = self._hash(key)
bucket = self.table[index]
for k, v in bucket:
if k == key:
return v
return -1
def remove(self, key):
index = self._hash(key)
bucket = self.table[index]
for i, (k, v) in enumerate(bucket):
if k == key:
del bucket[i]
return
# Usage
hashmap = MyHashMap()
hashmap.put(1, 1)
hashmap.put(2, 2)
print(hashmap.get(1)) # 1
print(hashmap.get(3)) # -1
hashmap.put(2, 1) # Update existing key
print(hashmap.get(2)) # 1
hashmap.remove(2)
print(hashmap.get(2)) # -1
8. Interview Tips and Tricks 🎯
🔍 Pattern Recognition: Use hashing when you need to:
• Track what you've seen before
• Count frequencies
• Find complements or pairs
• Group related items
• Check membership quickly
8.1 Common Hash Table Applications
- Frequency Counting: Count occurrences of elements
- Caching/Memoization: Store computed results
- Finding Duplicates: Track seen elements
- Complement Search: Two Sum, Three Sum problems
- Grouping: Group anagrams, similar items
- Mapping: Character mappings, transformations
8.2 Key Interview Phrases
# When explaining hash tables in interviews, mention:
"I'll use a hash table to achieve O(1) average lookup time..."
"To handle collisions, I would recommend chaining with linked lists..."
"The space-time tradeoff here is O(n) extra space for O(1) operations..."
"We need to consider the load factor to maintain good performance..."
"Hash tables provide amortized O(1) operations..."
8.3 Common Pitfalls & How to Avoid Them
🚨 Collision Handling: Always mention how you'll handle collisions. Don't assume perfect hashing.
🚨 Key Mutability: Explain that keys must be immutable. You can't use lists as dictionary keys in Python.
🚨 Memory Usage: Hash tables trade space for time. Mention the O(n) space complexity.
🚨 Hash Function Quality: Poor hash functions can lead to clustering and worst-case O(n) performance.
8.4 Advanced Optimization Techniques
# 1. Counter for frequency problems
from collections import Counter
def find_anagrams(s, p):
"""Find all anagrams of p in s"""
if len(p) > len(s):
return []
p_count = Counter(p)
window_count = Counter()
result = []
for i in range(len(s)):
# Add current character
window_count[s[i]] += 1
# Remove character outside window
if i >= len(p):
left_char = s[i - len(p)]
window_count[left_char] -= 1
if window_count[left_char] == 0:
del window_count[left_char]
# Check if current window is anagram
if window_count == p_count:
result.append(i - len(p) + 1)
return result
# 2. defaultdict for cleaner code
from collections import defaultdict
def group_by_first_letter(words):
"""Group words by their first letter"""
groups = defaultdict(list)
for word in words:
groups[word[0]].append(word)
return dict(groups)
# 3. Using frozenset for hashable sets
def find_unique_combinations(lists):
"""Find unique combinations from list of lists"""
unique_combos = set()
for lst in lists:
unique_combos.add(frozenset(lst))
return [list(combo) for combo in unique_combos]
8.5 Step-by-Step Problem Solving Approach
- Identify the Pattern: Look for keywords like "count", "frequency", "seen before", "pairs", "duplicates"
- Choose Data Structure: Dict for key-value mapping, Set for membership testing
- Design Hash Strategy: What will be your key? What will be your value?
- Handle Edge Cases: Empty input, single element, no solution
- Optimize: Consider Counter, defaultdict, or other specialized tools
8.6 Practice Problems by Difficulty
📗 Easy Level:
- Two Sum
- Valid Anagram
- First Unique Character
- Contains Duplicate
- Missing Number
📙 Medium Level:
- Group Anagrams
- Subarray Sum Equals K
- Longest Consecutive Sequence
- Design HashMap
- Top K Frequent Elements
📕 Hard Level:
- Substring with Concatenation of All Words
- Design Twitter
- Alien Dictionary
- Insert Delete GetRandom O(1)
8.7 Code Templates for Common Patterns
# Template 1: Frequency Counter
def frequency_pattern(arr):
freq = {}
for item in arr:
freq[item] = freq.get(item, 0) + 1
return freq
# Template 2: Seen Before Pattern
def seen_before_pattern(arr):
seen = set()
for item in arr:
if item in seen:
return True # Found duplicate
seen.add(item)
return False
# Template 3: Complement Search Pattern
def complement_pattern(arr, target):
complements = set()
for item in arr:
if target - item in complements:
return True # Found pair
complements.add(item)
return False
# Template 4: Grouping Pattern
def grouping_pattern(items, key_func):
groups = {}
for item in items:
key = key_func(item)
if key not in groups:
groups[key] = []
groups[key].append(item)
return groups
🎤 What to Say in Interviews:
1. "Let me think about using a hash table for O(1) lookups..."
2. "I'll use a dictionary to map X to Y..."
3. "This looks like a frequency counting problem..."
4. "I can optimize this from O(n²) to O(n) using hashing..."
5. "The trade-off is O(n) extra space for faster time complexity..."
8.8 Final Interview Checklist
✅ Before submitting your solution:
- Explain time and space complexity
- Handle edge cases (empty input, single element)
- Consider collision scenarios
- Mention load factor if relevant
- Test with example inputs
- Discuss alternative approaches
🌟 Sample Interview Response:
"For this problem, I notice we need to find pairs that sum to a target. This suggests using a hash table to store complements. As I iterate through the array, I'll check if the current element's complement exists in my hash table. If yes, I've found a pair. If no, I'll add the current element to the table. This gives us O(n) time complexity and O(n) space complexity, which is much better than the O(n²) brute force approach."
🎓 Master Hashing = Master Interviews! 🎓
Practice these concepts daily and you'll be ready for any hashing-related interview question!