A data structure is a way of organizing and storing data so that it can be accessed and modified efficiently. Examples include arrays, linked lists, stacks, queues, trees, and graphs. The choice of data structure can significantly affect the performance of a program, depending on the operations that need to be performed.

Data structures can be classified into two main categories: linear and non-linear.

**Linear Data Structures:** In linear data structures, elements are arranged in a sequential manner, and each element is connected to its previous and next element.

**Arrays:**An array is a sequential collection of elements of the same type, stored in contiguous memory locations. Elements are accessed using an index, with the first element typically having an index of 0. Arrays have a fixed size determined at the time of declaration.**Linked Lists:**A linked list is a linear data structure where elements, called nodes, are linked together via pointers. Each node contains a data field and a pointer/reference to the next node in the sequence. Linked lists allow dynamic memory allocation and efficient insertion and deletion of elements anywhere in the list.**Stacks:**A stack is a linear data structure that follows the Last In, First Out (LIFO) principle. Elements are added and removed from one end called the top. Stack operations include push (adding an element to the top) and pop (removing the top element). Stacks are used in various applications, including function calls, expression evaluation, and undo mechanisms.**Queues:**A queue is a linear data structure that follows the First In, First Out (FIFO) principle. Elements are added at one end called the rear and removed from the other end called the front. Queue operations include enqueue (adding an element to the rear) and dequeue (removing an element from the front). Queues are used in scheduling, buffering, and breadth-first search (BFS) algorithms.

**Non-linear Data Structures:** In non-linear data structures, elements are not arranged sequentially; instead, each element is connected to multiple other elements.

**Trees:**A tree is a hierarchical data structure consisting of nodes connected by edges. It has a root node at the top and child nodes branching out from it. Each node can have zero or more child nodes. Trees are used to represent hierarchical relationships, such as organization charts, file systems, and HTML DOM.**Binary Trees:**A binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child. Binary trees are used in various applications, such as expression parsing, searching, and sorting.**Binary Search Trees (BST):**A binary search tree is a binary tree with the property that for any node, all elements in the left subtree are smaller, and all elements in the right subtree are larger. BSTs enable efficient search, insertion, and deletion operations, typically in O(log n) time complexity.**AVL Trees:**An AVL tree is a self-balancing binary search tree where the difference in heights of left and right subtrees for any node is at most one. Rotations are used to maintain balance after insertions and deletions, ensuring O(log n) time complexity for search, insertion, and deletion operations.**B-Trees:**A B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. It is designed for systems that read and write large blocks of data, such as databases and file systems.**Graphs:**A graph is a non-linear data structure consisting of vertices (nodes) connected by edges (links). Graphs can be directed or undirected and can have weighted or unweighted edges. They are used to represent relationships between objects, such as social networks, transportation networks, and communication systems.**Heaps:**A heap is a complete binary tree where the value of each node is either greater than or equal to (max-heap) or less than or equal to (min-heap) the values of its children. Heaps are commonly used to implement priority queues and for heap sort algorithms due to their efficient access to the maximum or minimum element.**Hash Tables:**A hash table is a data structure that stores key-value pairs, enabling efficient data retrieval in O(1) average time. It uses a hash function to map keys to indices in an array, where values are stored. Collision resolution techniques like chaining and open addressing are used to handle hash collisions. Hash tables are widely used in applications like dictionaries, caches, and database indexing.

Some key differences between an array and a linked list:

Feature | Array | Linked List |
---|---|---|

Memory Allocation |
Contiguous block of memory | Dynamically allocated nodes |

Dynamic vs. Static Size |
Static (fixed size) | Dynamic (size can grow or shrink) |

Insertion and Deletion |
Inefficient for middle/beginning insertion/deletion due to shifting elements | Efficient insertion/deletion at any position |

Random Access |
Efficient (O(1)) using indices | Inefficient (O(n)) – sequential access required |

Memory Overhead |
Less overhead as no extra pointers required | More overhead due to node pointers |

Flexibility |
Fixed size, cannot change after allocation | Flexible size, can grow or shrink as needed |

Cache Friendliness |
Cache-friendly due to contiguous memory access | Less cache-friendly due to scattered memory access |

A stack is a linear data structure that follows the Last In, First Out (LIFO) principle. It can be thought of as a collection of elements with two primary operations: push and pop.

**Push:**Adds element to top of stack, increasing size. Like placing a book atop a pile. Crucial for LIFO data management.**Pop:**Removes top element, reducing stack size. Similar to taking top book from stack. Essential for accessing recent data.**Peek:**Inspects top element without removal. Like viewing top book in stack. Useful for checking without altering stack.**IsEmpty:**Checks if stack is empty. Returns true if no elements. Ensures safety before operations.**IsFull:**Checks if stack is at capacity. Returns true if full. Important for managing fixed-size stacks, preventing overflow.

A queue is a linear data structure that follows the First In, First Out (FIFO) principle. It represents a collection of elements where insertion occurs at the rear (also called enqueue), and removal occurs from the front (also called dequeue).

**Linear Queue:**Elements are stored in a linear manner, and when the queue is full, it cannot accept more elements, even if the rear end has empty spaces.**Circular Queue:**Addresses the limitation of the linear queue by allowing the rear and front to wrap around and use the unused spaces at the beginning after dequeueing. It efficiently utilizes memory.**Priority Queue:**Elements have associated priorities, and the element with the highest priority is removed first. It doesn't follow the FIFO principle but rather prioritizes elements based on a predefined priority order.**Double-ended Queue (Deque):**Allows insertion and deletion at both the front and rear ends, providing more flexibility than a standard queue. It can operate as both a queue and a stack.

A binary tree is a hierarchical data structure composed of nodes, where each node has at most two children, referred to as the left child and the right child. The topmost node of the tree is called the root node. Each node can have either zero, one, or two children. Binary trees are commonly used for data organization, searching, and sorting algorithms.

A BST is a binary tree with the property that for any node, all elements in the left subtree are smaller, and all elements in the right subtree are larger. This property makes BSTs efficient for search, insertion, and deletion operations, all of which can be performed in O(log n) time on average.

A heap is a complete binary tree where the value of each node is either greater than or equal to (max-heap) or less than or equal to (min-heap) the values of its children. Heaps are commonly used to implement priority queues and for heap sort algorithms due to their efficient access to the maximum or minimum element.

Hashing is a technique used to map data of arbitrary size to fixed-size values using hash functions. Hash tables use this concept to store key-value pairs, enabling efficient data retrieval in O(1) average time. Collisions are handled using techniques like chaining and open addressing to ensure effective storage.

A graph is a non-linear data structure composed of vertices (nodes) connected by edges (links). It represents relationships between pairs of objects. Graphs can be directed, where edges have a specific direction, or undirected, where edges have no direction. Additionally, edges can be weighted, meaning they have associated values.

**Directed Graph (Digraph):**Edges have a specific direction, indicating a one-way relationship between nodes.**Undirected Graph:**Edges have no direction, representing symmetric relationships between nodes.**Weighted Graph:**Edges have associated values, often representing distances, costs, or capacities.**Cyclic Graph:**Contains at least one cycle, where a sequence of edges forms a closed loop.**Acyclic Graph:**Does not contain any cycles. Trees are a special case of acyclic graphs.**Connected Graph:**Every pair of vertices is connected by a path.**Disconnected Graph:**Contains at least two vertices with no path connecting them.**Complete Graph:**Every pair of distinct vertices is connected by a unique edge.**Bipartite Graph:**Vertices can be divided into two disjoint sets such that every edge connects a vertex from one set to the other.**DAG (Directed Acyclic Graph):**A directed graph with no directed cycles. DAGs are commonly used in scheduling, topological sorting, and dependency resolution.

DFS is a graph traversal algorithm that explores as far down a branch as possible before backtracking. It uses a stack data structure, either explicitly or through recursion. DFS is useful for tasks like finding connected components, topological sorting, and solving puzzles or mazes.

```
def dfs(graph, start, visited=None):
if visited is None:
visited = set()
visited.add(start)
print(start)
for next in graph[start] - visited:
dfs(graph, next, visited)
return visited
# Example Usage
graph = {
'A': {'B', 'C'},
'B': {'A', 'D', 'E'},
'C': {'A', 'F'},
'D': {'B'},
'E': {'B', 'F'},
'F': {'C', 'E'}
}
dfs(graph, 'A')
```

Dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It involves solving each subproblem once and storing their solutions – typically using a table – to avoid redundant computations. It is used in optimization problems like the knapsack problem, shortest path, and sequence alignment.

Recursion is a technique in which a function calls itself in order to solve a problem. Each recursive call should bring the problem closer to a base case, which is solved directly. Recursion is used in algorithms like quicksort, mergesort, and tree traversals. It can be powerful but may lead to stack overflow if not managed properly.

Big-O notation describes the upper bound of the time complexity of an algorithm, providing an estimate of the worst-case scenario as input size grows. Common Big-O complexities include O(1) for constant time, O(n) for linear time, O(log n) for logarithmic time, and O(n^2) for quadratic time. It helps compare the efficiency of algorithms.

Merge sort is a divide-and-conquer algorithm that divides the input array into halves, recursively sorts them, and then merges the sorted halves. Its time complexity is O(n log n) for all cases. Merge sort is stable and works well for large data sets but requires additional space for merging, making it less memory-efficient.

BFS is a graph traversal algorithm that explores all neighbors of a vertex before moving to their neighbors. It uses a queue data structure to maintain the order of exploration. BFS is used in finding the shortest path in unweighted graphs, level-order traversal of trees, and peer-to-peer networking.

```
from collections import deque
def bfs(graph, start):
visited = set()
queue = deque([start])
while queue:
vertex = queue.popleft()
if vertex not in visited:
print(vertex, end=" ")
visited.add(vertex)
queue.extend(graph[vertex] - visited)
graph = {
'A': {'B', 'C'},
'B': {'A', 'D', 'E'},
'C': {'A', 'F'},
'D': {'B'},
'E': {'B', 'F'},
'F': {'C', 'E'}
}
bfs(graph, 'A')
```

A trie is a tree-like data structure used to store a dynamic set of strings, where keys are usually strings. It is particularly efficient for retrieval operations. Each node represents a common prefix of some strings, and operations like insertion, deletion, and search can be performed in O(m) time, where m is the length of the key.

A linked list is a linear data structure where each element, called a node, contains a data part and a reference to the next node. Types of linked lists include singly linked lists, doubly linked lists, and circular linked lists. Linked lists are dynamic in size and efficient for insertions and deletions, especially in comparison to arrays.

In a circular linked list, the last node points back to the first node, forming a circle. This structure allows for efficient traversal from any node, making it useful in applications like round-robin scheduling and buffer management. It can be singly or doubly linked, and traversal requires careful handling to avoid infinite loops.

A doubly linked list is a linked list where each node contains a reference to both the next and the previous node. This allows traversal in both directions, making operations like insertion and deletion more flexible compared to singly linked lists. However, it requires more memory to store the additional reference.

A skip list is a data structure that allows fast search within an ordered sequence of elements. It extends a linked list with additional layers of linked lists, each skipping over a fixed number of elements. This hierarchical structure enables average-case time complexities of O(log n) for search, insertion, and deletion.

Quick sort is a divide-and-conquer algorithm that selects a pivot element, partitions the array around the pivot, and recursively sorts the partitions. Its average-case time complexity is O(n log n), but it can degrade to O(n^2) in the worst case. Quick sort is efficient in practice and requires less memory compared to merge sort.

```
def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quick_sort(left) + middle + quick_sort(right)
arr = [3, 6, 8, 10, 1, 2, 1]
print(quick_sort(arr))
```

Bubble sort is a simple comparison-based algorithm that repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. This process repeats until the list is sorted. Its time complexity is O(n^2) in the worst and average cases, making it inefficient for large lists.

Insertion sort builds the sorted array one item at a time by repeatedly taking the next item and inserting it into its correct position. It has a time complexity of O(n^2) in the average and worst cases but performs well for small or nearly sorted arrays, making it useful for small datasets and online algorithms.

Selection sort repeatedly finds the minimum element from the unsorted portion and swaps it with the first unsorted element. Its time complexity is O(n^2) for all cases. While it is inefficient for large lists, it has the advantage of requiring only O(1) additional memory, making it useful for memory-constrained environments.

A hash collision occurs when two distinct keys produce the same hash value. Collisions are inevitable due to the pigeonhole principle. They are handled using techniques like chaining, where a linked list is used to store colliding elements, or open addressing, where collisions are resolved through probing sequences in the hash table.

A red-black tree is a balanced binary search tree with an extra bit of storage per node to denote the color (red or black). It ensures balance through rules that constrain the tree structure, resulting in O(log n) time complexity for search, insertion, and deletion. Red-black trees are used in many systems, including the Java Collections Framework.

A B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. It is designed for systems that read and write large blocks of data, such as databases and file systems. B-trees generalize binary search trees by allowing nodes to have more than two children.

An AVL tree is a self-balancing binary search tree where the heights of the left and right subtrees of every node differ by at most one. It's named after its inventors, Adelson-Velsky and Landis.

- Every node has a balance factor, which is the difference between the heights of its left and right subtrees.
- If the balance factor of any node is greater than 1 or less than -1, the tree is unbalanced.
- To maintain balance, rotations are performed after insertions and deletions to ensure that the balance factor of every node remains within the range [-1, 1].

The rotations used in AVL trees are:

**Left Rotation (LL Rotation):**A right-heavy node becomes left-heavy.**Right Rotation (RR Rotation):**A left-heavy node becomes right-heavy.

**Left-Right Rotation (LR Rotation):**A node has a right child that is left-heavy.**Right-Left Rotation (RL Rotation):**A node has a left child that is right-heavy

A segment tree is a binary tree used for storing intervals or segments. It allows querying which of the stored segments contain a given point and updating segments efficiently. It is commonly used in range query problems like finding the sum or minimum of elements in an array segment and has a time complexity of O(log n) for both queries and updates.

Backtracking is a problem-solving technique used for finding all (or some) solutions to computational problems, especially in combinatorial optimization and constraint satisfaction problems. It systematically searches for solutions by exploring all possible candidates incrementally and backtracks from the dead-end paths to find the correct solution.

**Generate:**It starts with generating potential candidates for the solution one at a time.**Test:**After generating a candidate, it tests if it satisfies the problem's constraints.**Solution Found:**If the candidate satisfies all constraints, it's considered a solution.**Backtrack:**If a candidate does not satisfy the constraints (i.e., leads to a dead end), the algorithm backtracks to the previous decision point and explores other candidates.**Explore:**It continues this process of generating, testing, and backtracking until all possible solutions have been explored.

Some key points about backtracking:

- It systematically explores the search space, pruning branches that cannot lead to a solution.
- It's often implemented recursively, making it easy to backtrack to previous decision points.
- Backtracking is particularly useful for problems with a large or infinite search space, where exhaustive search is not feasible.
- Examples of problems solved using backtracking include the N-Queens problem, Sudoku, and finding all possible permutations or combinations of elements.

A spanning tree of a graph is a subgraph that includes all vertices of the original graph and is a single connected tree. Minimum spanning trees (MSTs) are spanning trees with the minimum possible total edge weight. Algorithms like Kruskal’s and Prim’s are used to find MSTs, which have applications in network design and optimization.

Topological sort is a linear ordering of vertices in a directed acyclic graph (DAG) such that for every directed edge uv, vertex u comes before v. It is used in scheduling tasks, resolving symbol dependencies in linkers, and ordering cells in spreadsheets. Topological sorting can be performed using DFS or Kahn's algorithm.

The Floyd-Warshall algorithm is a dynamic programming algorithm used for finding the shortest paths between all pairs of vertices in a weighted graph, including negative edge weights (but with no negative cycles). It's named after its inventors, Robert Floyd and Stephen Warshall.

**Initialization:**Initialize a distance matrix where 𝑑𝑖𝑠𝑡[𝑖][𝑗]dist[i][j] represents the shortest distance between vertex 𝑖i and vertex 𝑗j. Initially, 𝑑𝑖𝑠𝑡[𝑖][𝑗]dist[i][j] is set to the weight of the edge between 𝑖i and 𝑗j if there is an edge, or 𝐼𝑁𝐹INF (a very large value representing infinity) if there is no edge. Also, set 𝑑𝑖𝑠𝑡[𝑖][𝑖]=0dist[i][i]=0 for all vertices.**Dynamic Programming:**Perform a series of iterations to update the distance matrix. At each iteration, consider each vertex 𝑘k as an intermediate vertex in the path between vertices 𝑖i and 𝑗j. If the path from 𝑖i to 𝑗j through 𝑘k is shorter than the current shortest path, update 𝑑𝑖𝑠𝑡[𝑖][𝑗]dist[i][j] to the new shorter distance.**Optimization:**The algorithm iterates over all vertices as intermediate points and updates the distance matrix accordingly. By the end of the algorithm, the distance matrix contains the shortest paths between all pairs of vertices.**Negative Cycle Detection:**After completing the iterations, check for negative cycles in the graph. If there exists a negative cycle, the algorithm will not produce correct results, as it is not possible to define a shortest path in the presence of negative cycles.

Dijkstra’s algorithm finds the shortest path from a single source vertex to all other vertices in a graph with non-negative edge weights. It uses a priority queue to repeatedly select the vertex with the smallest tentative distance and update its neighbors. The time complexity is O(V + E log V), where V is vertices and E is edges.

Prim's algorithm is a greedy algorithm used to find the minimum spanning tree (MST) of a connected, undirected graph with weighted edges. The minimum spanning tree is a subset of the edges that forms a tree connecting all the vertices while minimizing the total edge weight.

**Initialization:**Start with an arbitrary vertex as the initial tree. Initialize a set 𝑆S to store vertices included in the MST and a set 𝑇T to store edges included in the MST. Initially, 𝑆S contains only the starting vertex, and 𝑇T is empty.**Selecting Edges:**At each step, choose the edge with the minimum weight that connects a vertex in 𝑆S to a vertex not in 𝑆S. This edge is guaranteed to be a part of the MST. Add the selected edge to 𝑇T and the destination vertex to 𝑆S.**Updating Sets:**Repeat step 2 until all vertices are included in 𝑆S. This process generates the MST incrementally by adding vertices and edges one at a time.**Termination:**When all vertices are included in 𝑆S, the MST is complete, and the algorithm terminates.

Kruskal’s algorithm is used to find the minimum spanning tree of a graph by sorting all edges and adding them one by one to the spanning tree, as long as they do not form a cycle. The algorithm uses a union-find data structure to detect cycles. The time complexity is O(E log E), making it efficient for sparse graphs.

The Bellman-Ford algorithm finds the shortest paths from a single source vertex to all other vertices in a graph, even with negative edge weights. It relaxes all edges up to (V-1) times, where V is the number of vertices. The time complexity is O(VE), and it can detect negative weight cycles, unlike Dijkstra's algorithm.

A greedy algorithm builds up a solution piece by piece, always choosing the next piece with the most apparent benefit. While this approach does not always yield an optimal solution, it works well for problems like the fractional knapsack, Huffman coding, and activity selection. Greedy algorithms are generally simpler and faster.

The Divide and Conquer strategy is a problem-solving approach where a problem is broken down into smaller subproblems, which are solved independently. Once the subproblems are solved, their solutions are combined to solve the original problem. This approach typically involves three steps: divide, conquer, and combine.

**Divide:**The original problem is divided into smaller, more manageable subproblems. This step continues recursively until the subproblems become simple enough to be solved directly. Each division reduces the problem size, making it easier to solve.**Conquer:**Once the subproblems are sufficiently small, they are solved independently. This step often involves applying a base case or a simple solution to solve the smallest subproblems directly. This is the "conquer" phase, where each subproblem's solution is computed.**Combine:**Finally, the solutions to the subproblems are combined to form the solution to the original problem. This step may involve merging, aggregating, or processing the solutions from the subproblems to obtain the final result. The combined solution represents the solution to the original problem.

The knapsack problem is a combinatorial optimization problem where a set of items, each with a weight and a value, must be selected to maximize the total value without exceeding a weight limit. There are different variants, including the 0/1 knapsack and the fractional knapsack. Dynamic programming and greedy algorithms are commonly used to solve it.

Memoization is an optimization technique used to speed up recursive algorithms by storing the results of expensive function calls and reusing them when the same inputs occur again. It is a key component of dynamic programming, allowing algorithms to avoid redundant computations and significantly reduce time complexity.

```
def fib(n, memo={}):
if n in memo:
return memo[n]
if n <= 2:
return 1
memo[n] = fib(n-1, memo) + fib(n-2, memo)
return memo[n]
print(fib(50))
```

A Least Recently Used (LRU) cache is a data structure that evicts the least recently accessed item when it reaches its capacity limit. It is implemented using a combination of a doubly linked list and a hash map to provide O(1) time complexity for both access and eviction. LRU caches are widely used in memory management and web caching.

An adjacency matrix is a way of representing a graph using a 2D array. For a graph with n vertices, it uses an n x n matrix where the entry at row i and column j indicates the presence and possibly the weight of an edge between vertices i and j. It is efficient for dense graphs but requires O(n^2) space.

An adjacency list is a way of representing a graph using an array of lists. Each vertex has a list of its adjacent vertices (or edges). It is space-efficient for sparse graphs and allows easy traversal of neighbors, with O(V + E) space complexity, where V is the number of vertices and E is the number of edges.

The Boyer-Moore algorithm is an efficient string-searching algorithm that skips sections of the text, rather than examining each character, using information gathered during the preprocessing phase. It uses two heuristics: the bad character rule and the good suffix rule, allowing it to achieve sublinear time complexity in practice.

The Knuth-Morris-Pratt (KMP) algorithm searches for occurrences of a word within a text by preprocessing the word to create a partial match table (or "failure function"). This allows the algorithm to skip unnecessary comparisons, resulting in O(n + m) time complexity, where n is the length of the text and m is the length of the word.

The Rabin-Karp algorithm searches for a pattern in a text using hashing. It computes the hash value of the pattern and a substring of the text and compares these hash values. If they match, it performs a character-by-character check to confirm the match. Its average time complexity is O(n + m), but it can degrade to O(nm) with many hash collisions.

Floyd’s cycle detection algorithm, also known as the Tortoise and Hare algorithm, is used to detect cycles in a sequence of values. It uses two pointers moving at different speeds to determine if a cycle exists. If the pointers meet, a cycle is present. It is used in linked list cycle detection and other applications with O(n) time complexity.

The A* search algorithm is used for finding the shortest path from a start node to a goal node in a weighted graph. It uses a heuristic function to estimate the cost to reach the goal, combining this with the actual cost to reach a node from the start. This makes A* efficient and optimal for many pathfinding problems.

The Ford-Fulkerson algorithm computes the maximum flow in a flow network. It uses the concept of augmenting paths and repeatedly finds paths from the source to the sink with available capacity, increasing the flow until no more augmenting paths exist. The time complexity depends on the method used to find augmenting paths, typically O(max_flow * E).

Karger’s algorithm is a randomized algorithm for finding the minimum cut of a connected graph. It repeatedly contracts random edges until only two vertices remain, with the remaining edges representing a cut. Repeating the algorithm multiple times increases the probability of finding the minimum cut. Its expected time complexity is O(V^2 log V).

Amortized analysis is a method for analyzing the time or space complexity of algorithms, especially those with varying costs for individual operations. It provides a way to average out the cost of a sequence of operations over time to obtain a more accurate understanding of the overall performance.

**Aggregate Analysis:**Analyze total cost of a sequence of operations instead of individual operations.**Potential Method:**Define a potential function representing unused resources.**Amortized Cost:**Calculate average cost per operation over the sequence.**Types of Analysis:**Used for time, space, or other resources.**Constant Amortization:**Constant average cost per operation.**Logarithmic Amortization:**Average cost grows logarithmically with data size.**Incremental Amortization:**Average cost increases incrementally but bounded.