SQL Interview Questions


SQL eBook

What is SQL?

SQL (Structured Query Language) is a standard programming language specifically designed for managing and manipulating relational databases. It allows users to query, update, insert, and delete data within a database. SQL also supports various functions for sorting, filtering, and aggregating data to facilitate analysis and reporting.

What is a primary key?

A primary key is a column or a combination of columns in a table that uniquely identifies each row in that table. It must contain unique values and cannot contain NULL values. The primary key ensures entity integrity by ensuring that each record is unique and can be accurately identified.

What is a foreign key?

A foreign key is a column or a set of columns in one table that uniquely identifies rows in another table. It creates a relationship between two tables, ensuring referential integrity. The foreign key values must match values in the referenced table’s primary key or be NULL, enforcing a valid link between related records.

What is a join? Explain its types.

A join is a SQL operation that combines rows from two or more tables based on a related column. The main types of joins are INNER JOIN (returns rows with matching values in both tables), LEFT JOIN (returns all rows from the left table and matched rows from the right table), RIGHT JOIN (returns all rows from the right table and matched rows from the left table), and FULL JOIN (returns rows when there is a match in one of the tables).

What is the difference between WHERE and HAVING clauses?

In SQL, both the WHERE and HAVING clauses are used to filter data, but they operate at different stages of the query execution process.

Aspect WHERE Clause HAVING Clause
Usage Filters rows before grouping. Filters grouped rows after the GROUP BY clause.
Application Used with SELECT, UPDATE, and DELETE statements. Used exclusively with SELECT statements that have GROUP BY clauses.
Aggregation Cannot be used with aggregate functions like SUM(), AVG(), COUNT(), etc. Typically used with aggregate functions to filter grouped data based on specified conditions.
Syntax WHERE condition HAVING condition
Evaluation order Evaluated before grouping and aggregation functions. Evaluated after grouping and aggregation functions.
Impact on result Can filter individual rows of data. Filters aggregated data based on specified conditions.
Efficiency Generally more efficient because it reduces the number of rows processed before aggregation. Less efficient compared to WHERE clause as it operates on aggregated data, but necessary for filtering aggregated results.

What is an index?

An index is a database object that improves the speed of data retrieval operations on a table at the cost of additional storage space. Indexes can be created on one or more columns of a table to allow quick lookup of data. Common types include B-tree, bitmap, and unique indexes.

What is a subquery?

A subquery is a query nested within another query, typically within a SELECT, INSERT, UPDATE, or DELETE statement. Subqueries can be used to perform operations that depend on the result of another query. They can return single values (scalar subqueries), multiple values (row subqueries), or even a set of rows and columns (table subqueries).

What are aggregate functions?

Aggregate functions perform a calculation on a set of values and return a single value. Common aggregate functions include COUNT (counts rows), SUM (calculates the total), AVG (calculates the average), MAX (finds the maximum value), and MIN (finds the minimum value). These functions are often used with the GROUP BY clause.

What is the difference between DELETE and TRUNCATE?

DELETE and TRUNCATE are both SQL commands used to remove data from a table, but they differ in their functionality and how they operate.

Aspect DELETE TRUNCATE
Operation Removes specific rows from a table based on a condition or all rows if no condition is specified. Removes all rows from a table.
Logging Logged operation, each deleted row is logged in the transaction log and can be rolled back. Non-logged operation, it doesn't log each deleted row, and therefore, it cannot be rolled back.
Transaction Can be a part of a transaction and can be rolled back using ROLLBACK command. Cannot be rolled back within a transaction.
Locking Holds row-level locks during the operation, potentially causing blocking in concurrent environments. Acquires a table-level lock for the duration of the operation, reducing contention in concurrent environments.
Performance Typically slower than TRUNCATE due to logging and individual row processing. Generally faster than DELETE as it doesn't log each deletion and deallocates data pages directly.
Triggers and Constraints Executes triggers and respects constraints (e.g., foreign key constraints) associated with the table. Doesn't execute triggers and constraints aren't checked. It is a faster operation but bypasses such validations.
Usage Used when selective deletion of rows is required or when triggers and constraints need to be honored. Used when you need to quickly remove all rows from a table without logging each individual deletion.

What is a view?

A view is a virtual table based on the result set of a SQL query. It can encapsulate complex queries, simplify data access, and provide an additional layer of security. Views do not store data physically but display data stored in underlying tables. They can be used to restrict data access or present data in a specific format.

What is a stored procedure?

A stored procedure is a prepared SQL code that can be saved and reused. It encapsulates a sequence of SQL statements and logic, allowing for modular, reusable, and maintainable database code. Stored procedures can accept parameters, execute complex operations, and return results to the caller.

What is a transaction?

A transaction is a sequence of one or more SQL operations executed as a single unit of work. Transactions ensure data integrity by adhering to ACID properties. They can be explicitly started with BEGIN TRANSACTION and ended with COMMIT to save changes or ROLLBACK to discard changes if an error occurs.

What are constraints?

Constraints are rules enforced on data columns in a table to ensure data integrity and consistency. Common constraints include PRIMARY KEY (ensures unique identification), FOREIGN KEY (enforces referential integrity), UNIQUE (ensures all values in a column are unique), NOT NULL (ensures no null values), and CHECK (ensures values meet a specified condition).

What is a cursor?

A cursor is a database object used to retrieve, manipulate, and navigate through a result set row-by-row. Cursors are useful for operations that cannot be performed in a set-based manner. They allow for more granular control over data processing but can be resource-intensive and slower compared to set-based operations.

What is the difference between SQL and NoSQL?

SQL (Structured Query Language) and NoSQL (Not Only SQL) are both database management systems, but they differ in several key aspects:

Aspect SQL NoSQL
Data Structure Relational databases with structured schemas, tables, and rows. Non-relational databases with flexible schemas, including key-value pairs, documents, wide-column stores, or graph databases.
Scalability Vertically scalable, meaning you can increase capacity by adding more power (CPU, RAM, SSD) to a single server. Horizontally scalable, meaning you can handle larger data volumes and user loads by adding more servers to your NoSQL database cluster.
Schema Flexibility Requires a predefined schema with a fixed structure. Changes to schema may require downtime and migration. Offers schema flexibility, allowing changes to the structure without downtime or migrations.
Query Language Standardized query language (SQL) for data manipulation and retrieval. NoSQL databases use their own query languages, which can vary between different databases.
ACID Transactions Generally support ACID transactions (Atomicity, Consistency, Isolation, Durability). Support for ACID transactions varies among NoSQL databases, with some offering full ACID compliance and others providing eventual consistency.
Data Consistency Emphasizes strong consistency and ensures that each transaction is consistent and isolated from others. May prioritize availability and partition tolerance over consistency (CAP theorem), offering eventual consistency or other consistency models.
Use Cases Well-suited for applications requiring complex queries and transactions, such as financial systems or traditional relational databases. Ideal for applications with large volumes of unstructured or semi-structured data, real-time analytics, content management systems, and distributed systems.
Examples MySQL, PostgreSQL, Oracle, SQL Server MongoDB, Cassandra, Redis, Couchbase

What is a composite key?

A composite key is a primary key that consists of two or more columns, used together to uniquely identify a row in a table. Composite keys are useful when a single column is not sufficient to guarantee uniqueness. They ensure that each combination of values in the specified columns is unique across the table.

What is a self-join?

A self-join is a join where a table is joined with itself. This is useful for hierarchical data or when comparing rows within the same table. In a self-join, table aliases are used to distinguish the table instances involved in the join, allowing for the comparison of columns from different rows within the same table.

What is the difference between CHAR and VARCHAR?

CHAR and VARCHAR are both data types used to store character strings in SQL databases, but they have some differences:

Aspect CHAR VARCHAR
Storage Fixed-length string. Variable-length string.
Length Must be specified when defining the column. Maximum length must be specified when defining the column.
Trailing spaces Trailing spaces are padded to fill the entire length of the column. Trailing spaces are not padded.
Storage space Uses storage space equal to the specified length, regardless of actual data length. Uses storage space equal to the actual data length plus two bytes to store the length of the data.
Performance Generally faster for retrieval and comparison operations, especially for fixed-length data. May be more efficient in terms of storage space, especially for columns with varying lengths.

What is a unique key?

A unique key is a constraint that ensures all values in a column or a combination of columns are unique across the table. Unlike a primary key, a table can have multiple unique keys, and unique keys can contain NULL values (though only one NULL per column). Unique keys prevent duplicate entries while allowing NULLs.

What is an index scan?

An index scan is a database operation where the entire index is scanned to find matching rows. It is less efficient than an index seek, which directly navigates to matching rows using the index. Index scans occur when the query optimizer determines that scanning the entire index is more efficient than using a specific seek.

What is referential integrity?

Referential integrity is a database concept that ensures that relationships between tables remain consistent. When a foreign key in one table matches a primary key in another, referential integrity ensures that the foreign key value must always point to a valid, existing primary key value or be NULL. This prevents orphaned records and maintains logical relationships between tables.

What is the difference between DROP, TRUNCATE, and DELETE?

DROP, TRUNCATE, and DELETE are all SQL commands used to remove data or objects from a database:

Aspect DROP TRUNCATE DELETE
Operation Removes an entire table, view, index, or database. Removes all rows from a table. Removes specific rows from a table based on a condition.
Logging Not logged. Non-logged operation. Logged operation.
Transaction Cannot be rolled back within a transaction. Cannot be rolled back within a transaction. Can be rolled back within a transaction.
Triggers/Constraints Drops associated triggers, constraints, and indexes. Doesn't execute triggers, but constraints are checked. Executes triggers and respects constraints.
Performance Very fast, but irreversible. Very fast, but irreversible. Slower compared to TRUNCATE, but faster than DROP.

What is a common table expression (CTE)?

A common table expression (CTE) is a temporary result set defined within the execution scope of a single SELECT, INSERT, UPDATE, or DELETE statement. CTEs improve query readability and organization by breaking complex queries into simpler parts. They are defined using the WITH keyword and can be referenced like a table in subsequent queries.

What is the purpose of the SQL SELECT statement?

The SQL SELECT statement is used to query and retrieve data from one or more tables. It allows specifying columns to retrieve, filtering rows with conditions, sorting results, and performing various transformations like aggregations and joins. SELECT is the most commonly used SQL statement for data retrieval and analysis.

What is a clustered index?

A clustered index determines the physical order of data in a table and is created on the primary key by default. Each table can have only one clustered index, which makes data retrieval fast by storing rows in a sorted manner. Non-clustered indexes, in contrast, store pointers to the data rather than the data itself.

What is the difference between clustered and non-clustered indexes?

Some Key differences between clustered and non-clustered indexes:

Aspect Clustered Index Non-Clustered Index
Definition Determines the physical order of data rows in the table. Does not alter the physical order of data rows in the table.
Structure The actual table rows are stored in the order defined by the index. Index contains a separate list of keys and pointers to the actual rows.
Key Only one clustered index can be created per table. Multiple non-clustered indexes can be created per table.
Searching Generally faster for retrieving large ranges of data. Typically faster for seeking individual rows.
Storage Requires additional storage space as it directly impacts the order of data rows. Requires less storage space as it only contains the keys and pointers.
Impact on INSERT/UPDATE/DELETE Slower for INSERT, UPDATE, and DELETE operations as they may need to rearrange the data to maintain the index order. Generally faster for INSERT, UPDATE, and DELETE operations as they don't require rearranging the data.
Suitable Use Cases Ideal for columns that are frequently used for range queries and have a unique key. Suitable for columns often used in search operations where the order of data retrieval is not crucial.
Examples Primary key column is typically automatically indexed as clustered. Indexes created manually on columns to improve query performance.

What is a database schema?

A database schema is a logical container for database objects such as tables, views, indexes, and procedures. It defines how data is organized and structured within the database. Schemas help to group and manage database objects logically, allowing for better organization, access control, and separation of different application areas.

What is a correlated subquery?

A correlated subquery is a subquery that references columns from the outer query, making it dependent on the outer query for its values. It is evaluated once for each row processed by the outer query. Correlated subqueries are useful for row-by-row comparisons but can be less efficient than non-correlated subqueries.

What is the difference between INNER JOIN and OUTER JOIN?

INNER JOIN and OUTER JOIN are both types of join operations used in SQL to combine rows from two or more tables based on a related column between them.

Aspect INNER JOIN OUTER JOIN
Matching Rows Returns only matching rows from both tables. Returns matching rows as well as unmatched rows.
Unmatched Rows Excludes unmatched rows from either table. Includes unmatched rows with NULL values.
Types Only one type (INNER JOIN). Three types (LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN).
Syntax SELECT columns FROM table1 INNER JOIN table2 ON join_condition; SELECT columns FROM table1 LEFT/RIGHT/FULL OUTER JOIN table2 ON join_condition;
Usage Used when only matching rows are needed. Used when all rows from one or both tables are needed, with NULL values for unmatched rows.

What is SQL injection?

SQL injection is a security vulnerability where an attacker can insert malicious SQL code into an input field, potentially gaining unauthorized access to the database or executing harmful commands. To prevent SQL injection, use parameterized queries, stored procedures, and input validation to ensure only expected data is processed.

What is the purpose of the GROUP BY clause?

The GROUP BY clause groups rows sharing a common value into summary rows, often used with aggregate functions like COUNT, SUM, AVG, MAX, and MIN. It allows for data aggregation and analysis by grouping data based on one or more columns, providing insights into patterns and trends within the dataset.

What is a derived table?

A derived table is a subquery used within the FROM clause of a SELECT statement, treated as a temporary table. Derived tables help simplify complex queries by breaking them into smaller, more manageable parts. They can improve query readability and performance by allowing intermediate results to be processed within a single query.

What is a composite index?

A composite index is an index on multiple columns in a table. It improves query performance by allowing efficient access based on combinations of the indexed columns. Composite indexes are useful for queries that filter or sort data using multiple columns. The order of columns in the index is crucial for its effectiveness.

What is the difference between RANK() and DENSE_RANK()?

RANK() and DENSE_RANK() are both window functions in SQL used to assign ranks to rows in a result set based on a specified ordering. However, they differ in how they handle ties (i.e., rows with the same value being ranked).

Aspect RANK() DENSE_RANK()
Handling Ties Leaves gaps in rank values when there are ties. Does not leave gaps in rank values; assigns consecutive ranks to ties.
Example If two rows are tied for the second rank, the next rank will be 4 (no rank 3). If two rows are tied for the second rank, the next rank will still be 3 (no gaps).
Result May produce non-consecutive ranks if there are ties. Produces consecutive ranks even if there are ties.
Syntax RANK() OVER (PARTITION BY partition_expression ORDER BY order_expression) DENSE_RANK() OVER (PARTITION BY partition_expression ORDER BY order_expression)

What is the use of the COALESCE() function?

COALESCE() returns the first non-NULL value in a list of expressions. It is used to handle NULL values and provide default values when necessary. COALESCE() is particularly useful in scenarios where NULL values need to be replaced with a meaningful alternative, ensuring that calculations and concatenations do not result in NULL.

What is a trigger?

A trigger is a special kind of stored procedure that automatically executes in response to certain events on a table or view, such as INSERT, UPDATE, or DELETE. Triggers are used to enforce business rules, maintain audit trails, and ensure data integrity by performing predefined actions whenever specified changes occur.

What is ACID compliance?

ACID stands for Atomicity, Consistency, Isolation, and Durability. These are properties that ensure reliable transaction processing in databases. Atomicity ensures that transactions are all-or-nothing, Consistency ensures that transactions bring the database from one valid state to another, Isolation ensures that transactions do not interfere with each other, and Durability ensures that committed transactions are permanently saved.

What is the difference between EXISTS and IN?

EXISTS and IN are both SQL operators used to test for the existence of rows in a subquery:

Aspect EXISTS IN
Usage Typically used with correlated subqueries. Used with a list of values or a subquery.
Evaluation Stops evaluation after finding the first matching row. Evaluates all values or rows in the list or subquery.
Performance Generally more efficient for large result sets because it stops evaluation after finding a match. May be less efficient for large result sets because it evaluates all values or rows.
Null Handling Handles NULL values explicitly or implicitly based on the subquery. NULL values in the list or subquery are treated as if they don't exist.
Subquery Columns Doesn't require selecting specific columns in the subquery. Requires selecting specific columns in the subquery to match with the main query.

What is normalization?

Normalization is a database design technique used to minimize redundancy and dependency by organizing fields and table relationships. The goal is to divide a database into smaller, related tables and define relationships between them, thereby improving data integrity and reducing data anomalies. Normalization typically involves applying normal forms (1NF, 2NF, 3NF, etc.).

What is denormalization?

Denormalization is the process of combining normalized tables into larger tables to improve database performance, typically read performance. While it introduces redundancy and can lead to anomalies, it reduces the number of joins needed to retrieve data, making queries faster in read-heavy applications.

What is the difference between a primary key and a unique key?

A primary key and a unique key are both constraints used in SQL to enforce uniqueness in a column or combination of columns in a table:

Aspect Primary Key Unique Key
Uniqueness Uniquely identifies each row in a table. Ensures uniqueness but allows NULL values.
Nullability Does not allow NULL values. Allows NULL values (except in Oracle).
Number Only one primary key constraint per table. Multiple unique key constraints per table.
Purpose Typically used as a unique identifier for each row. Used to enforce uniqueness, but with more flexibility regarding NULL values.
Implicit NOT NULL Imposes the NOT NULL constraint automatically. Does not implicitly enforce the NOT NULL constraint.
Index Creation Automatically creates a clustered or non-clustered index. Creates a non-clustered index by default.
Relationship Often used as a foreign key in related tables. Can also be used as a reference for foreign keys.

What is an OLAP database?

OLAP (Online Analytical Processing) databases are designed for complex queries and analysis of large volumes of data, typically used in data warehousing and business intelligence. OLAP databases support multi-dimensional data models, allowing for efficient data aggregation, slicing, dicing, and drilling down. They enable quick retrieval of summarized data for decision-making.

What is a materialized view?

A materialized view is a database object that stores the result of a query physically, unlike a regular view that is a virtual table. Materialized views improve query performance by precomputing and storing the results, allowing for faster data retrieval. They are especially useful for complex queries and data warehousing applications.

What is the difference between the DELETE and DROP commands?

The DELETE and DROP commands are both used in SQL, but they serve different purposes:

Aspect DELETE Command DROP Command
Purpose Removes one or more rows from a table based on a specified condition. Removes an entire object (e.g., table, index, view) from the database.
Operation Deletes data from a table. Removes the object itself.
Rollback Can be rolled back using a transaction if within a transaction block. Cannot be rolled back; the action is permanent.
Syntax DELETE FROM table_name WHERE condition; DROP object_type object_name;
Example (Table) DELETE FROM table_name WHERE condition; DROP TABLE table_name;
Example (Index) DELETE FROM table_name WHERE condition; DROP INDEX index_name;
Example (View) DELETE FROM table_name WHERE condition; DROP VIEW view_name;

What is a surrogate key?

A surrogate key is an artificial primary key, typically an integer, used to uniquely identify each row in a table. Unlike natural keys, surrogate keys have no business meaning and are not derived from application data. They are used to simplify key management and ensure uniqueness, especially in large databases with complex relationships.

What is the difference between a cross join and a full join?

A cross join and a full join are both types of joins used in SQL, but they serve different purposes and produce different result sets:

Aspect Cross Join Full Join
Result Set Cartesian product of rows from both tables. All rows from both tables, with NULL values where no match is found.
Matching Rows No condition is used to match rows. Matches rows based on a specified condition.
Syntax SELECT * FROM table1 CROSS JOIN table2; SELECT * FROM table1 FULL OUTER JOIN table2 ON join_condition;
Usage Rarely used due to its potential for producing a large result set. Used when you want to combine data from two tables while preserving unmatched rows from both tables.
Example SELECT * FROM table1 CROSS JOIN table2; SELECT * FROM table1 FULL OUTER JOIN table2 ON table1.column = table2.column;

What is the use of the CASE statement in SQL?

The CASE statement allows for conditional logic in SQL queries, similar to IF-THEN-ELSE statements in programming languages. It evaluates a list of conditions and returns a corresponding result for the first true condition. CASE is useful for creating calculated fields, transforming data, and implementing complex conditional logic within a query.

What is the difference between the UNION and JOIN operations?

The UNION and JOIN operations are both fundamental concepts in SQL, but they serve different purposes:

Aspect UNION JOIN
Purpose Combines the results of two or more SELECT statements into a single result set. Combines rows from two or more tables based on a related column.
Operation Vertically concatenates rows from different SELECT statements. Horizontally combines rows from different tables based on a specified condition.
Matching Rows Does not involve matching rows based on a condition. Selects rows from the tables that have matching values in the specified column(s).
Result Set Concatenates rows from different SELECT statements, optionally removing duplicates. Combines rows from different tables based on a specified condition.
Syntax SELECT column1 FROM table1 UNION SELECT column2 FROM table2; SELECT * FROM table1 INNER JOIN table2 ON table1.column = table2.column;
Efficiency Typically involves less processing compared to joins. May involve more processing, especially for complex join conditions and large tables.

What is the purpose of the LIKE operator?

The LIKE operator is used for pattern matching in string searches, allowing for partial matches using wildcard characters. The percent sign (%) represents zero or more characters, and the underscore (_) represents a single character. LIKE is useful for searching strings with unknown or variable characters, such as finding names with specific patterns.

How do you find duplicates in a table?

To find the duplicates in a table, first create a table with duplicate rows.

DROP TABLE IF EXISTS MEMBERS;
CREATE TABLE MEMBERS (
    id serial PRIMARY KEY,
    name TEXT
);

INSERT INTO MEMBERS (name)
VALUES
    ('Praveen'),
    ('Mahesh'),
    ('Vijay'),
    ('Nitin'),
    ('Deepak'),
    ('Manish'),
    ('Neel'),
    ('Ravi'),
    ('Diya'),
    ('Naveen'),
    ('Dinesh'),
    ('Raj')
    ('Mahesh'),
    ('Vijay'),
    ('Nitin'),
    ('Deepak');

Use a select * statement to see all of our entries.

SELECT * FROM MEMBERS;

Use the COUNT() function to find all duplicate rows.

SELECT
	-- Get the column.
	name,
	-- Count how many times this name occurs.
	count(*)
FROM
	MEMBERS
GROUP BY 
	-- Using an aggregate function forces us to group all like name together.
	NAME
HAVING 
	-- Only select values that have a count greater than one (multiple entries).
	count(*) > 1;

The results show all the names that appear more than once and their count.

How do you delete multiple entries from a table?

To delete multiple entries from a table in SQL Server, you can use the DELETE statement with the IN operator. The IN operator allows you to specify a list of values that you want to delete. For example, the following query will delete all rows from the Customers table where the Country column is equal to "USA":

DELETE FROM Customers WHERE Country IN ("USA");

You can also use the BETWEEN operator to delete a range of rows. For example, the following query will delete all rows from the Customers table where the Age column is between 20 and 30:

DELETE FROM Customers WHERE Age BETWEEN 20 AND 30;

Finally, you can also use the WHERE clause to specify more complex criteria for deleting rows. For example, the following query will delete all rows from the Customers table where the Country column is equal to "USA" and the Age column is greater than 25:

DELETE FROM Customers WHERE Country = "USA" AND Age > 25;

It is important to note that the DELETE statement will not return any rows. The rows that are deleted are simply removed from the table.

Here are some additional tips for deleting multiple entries from a table in SQL Server:

  • If you are deleting a large number of rows, you may want to use the TRUNCATE TABLE statement instead of the DELETE statement. The TRUNCATE TABLE statement is faster and more efficient for deleting large amounts of data.
  • If you are deleting rows based on complex criteria, you may want to use a subquery in the WHERE clause. A subquery is a query within a query.
  • Always make a backup of your data before you delete any rows. This will help you to recover your data if you accidentally delete the wrong rows.

What is the difference between union and union all?

The UNION and UNION ALL operators in SQL are used to combine the results of two or more SELECT statements. The main difference between the two operators is that UNION removes duplicate rows from the results, while UNION ALL does not.

Here is an example of a UNION query:

SELECT * FROM Customers
UNION
SELECT * FROM Suppliers;

This query will return all rows from the Customers table and the Suppliers table, but it will only return each row once.

Here is an example of a UNION ALL query:

SELECT * FROM Customers
UNION ALL
SELECT * FROM Suppliers;

This query will return all rows from the Customers table and the Suppliers table, including any duplicate rows.

In general, you should use UNION when you want to remove duplicate rows from the results. You should use UNION ALL when you want to keep all rows, including duplicates.

Operator Description
UNION Removes duplicate rows from the results.
UNION ALL Keeps all rows, including duplicates.

Here are some additional things to keep in mind when using UNION and UNION ALL:

  • The columns in the SELECT statements must have the same data types.
  • The order of the columns in the results is not guaranteed.
  • The UNION and UNION ALL operators can be nested.

What is the difference between rank,row_number, and dense_rank?

The RANK, ROW_NUMBER, and DENSE_RANK functions in SQL are all window functions that are used to rank rows within a partition. The main difference between the three functions is how they handle ties.

  • The RANK function assigns the same rank to all rows with the same value. This means that there may be gaps in the rankings. For example, if there are three rows with the same value, the ranks will be 1, 2, and 4.
  • The ROW_NUMBER function assigns a unique rank to each row, regardless of the values in the other rows. This means that there will be no gaps in the rankings. For example, if there are three rows with the same value, the ranks will be 1, 2, and 3.
  • The DENSE_RANK function is similar to the RANK function, but it does not skip any ranks when there are ties. This means that the rankings will be consecutive, even if there are duplicate values. For example, if there are three rows with the same value, the ranks will be 1, 2, and 2.
  • Here is a table summarizing the key differences between RANK, ROW_NUMBER, and DENSE_RANK:
Function Description
RANK Assigns the same rank to all rows with the same value.
ROW_NUMBER Assigns a unique rank to each row, regardless of the values in the other rows.
DENSE_RANK Assigns consecutive ranks to rows, even if there are duplicate values.

In general, you should use the RANK function when you want to identify the relative rank of each row, even if there are duplicate values. You should use the ROW_NUMBER function when you want to assign a unique rank to each row, regardless of the values in the other rows. You should use the DENSE_RANK function when you want to assign consecutive ranks to rows, even if there are duplicate values.

Here are some additional things to keep in mind when using RANK, ROW_NUMBER, and DENSE_RANK:

  • The RANK, ROW_NUMBER, and DENSE_RANK functions can be used with any windowing clause.
  • The RANK, ROW_NUMBER, and DENSE_RANK functions can be nested.

Find records in a table that are not present in another table?

  • Using the NOT IN operator: The NOT IN operator allows you to specify a list of values that you want to exclude from the results. For example, the following query will return all rows from the Customers table where the Country column is not equal to "USA":
    SELECT * FROM Customers
    WHERE Country NOT IN ("USA");
    
  • Using the EXISTS operator: The EXISTS operator allows you to check if a subquery returns any rows. For example, the following query will return all rows from the Customers table where there is no corresponding row in the Orders table:
    SELECT * FROM Customers
    WHERE NOT EXISTS (SELECT * FROM Orders WHERE Customers.CustomerID = Orders.CustomerID );
    
  • Using the EXCEPT operator: The EXCEPT operator returns all rows from the first SELECT statement that are not present in the second SELECT statement. For example, the following query will return all rows from the Customers table that are not present in the Orders table:
    SELECT * FROM Customers
    EXCEPT
    SELECT * FROM Orders;

The method that you choose will depend on the specific requirements of your query.

What is referential integrity?

Referential integrity is a property of relational databases that ensures consistency and accuracy of data by enforcing relationships between tables. It ensures that foreign key values in a child table match primary key values in a parent table, preventing orphaned records and maintaining valid links between related data.

What is the difference between the MERGE and UPDATE statements?

The MERGE and UPDATE statements are both SQL commands used to modify data in a table, but they have different functionalities:

Aspect MERGE Statement UPDATE Statement
Operation Performs a combination of INSERT, UPDATE, and DELETE operations in a single statement. Modifies existing data in a table based on a specified condition.
Matching Matches rows based on a specified condition for both source and target tables. Matches rows based on a specified condition in the target table only.
Actions Allows specifying different actions for matched, unmatched, and duplicate rows. Updates existing rows that match the specified condition.
Handling Handles insertions, updates, and deletions in one statement based on the specified conditions. Focuses solely on updating existing rows based on the specified condition.
Use Cases Useful for synchronizing data between two tables or updating a target table based on conditions from a source table. Used when you know exactly which rows you want to update and what values you want to set for those rows.

What is the purpose of the DISTINCT keyword?

The DISTINCT keyword removes duplicate rows from the result set of a SELECT query. It ensures that each row in the output is unique, useful for generating reports, summaries, and eliminating redundant data. DISTINCT applies to the entire row, not just a single column, ensuring overall uniqueness in the result.

What is an execution plan?

An execution plan is a detailed map of how the database engine executes a query, including the order of operations, indexes used, and join methods. It helps in understanding and optimizing query performance by revealing potential bottlenecks and inefficiencies. Execution plans can be viewed using tools like EXPLAIN or graphical query analyzers.

How to check whether FullText Search is installed or not on the SQL Server?

To check FullText Search is installed, run below query

Select SERVERPROPERTY('IsFullTextInstalled')

Find employees with greater salaries than their manager's salary.

We can use an SQL query involving a self-join on the employee's table. Here's an example query for the 'employees' table with the following data:

id name salary manager_id
1 Guddu 100000 NULL
2 Prince 80000 1
3 Baibhav 120000 1
4 Saurav 90000 2
5 Gaurav 110000 2

Example SQL query:

SELECT e1.id AS employee_id, e1.name AS employee_name, e1.salary AS employee_salary,
       e2.id AS manager_id, e2.name AS manager_name, e2.salary AS manager_salary
FROM employees e1
JOIN employees e2 ON e1.manager_id = e2.id
WHERE e1.salary > e2.salary;

What is the difference between an inner and left join?

Some key differences between an inner and left join:

Aspect Inner Join Left Join
Description Returns only the rows that have matching values Returns all rows from the left table and matching rows from the right table, where available
Syntax SELECT FROM table1 INNER JOIN table2 ON condition SELECT FROM table1 LEFT JOIN table2 ON condition
Usage Use when you only want matching rows Use when you want all rows from the left table, and matching rows from the right table, where available
Result Includes only matching rows Includes all rows from the left table; for non-matching rows from the right table, NULL values are used

Update a table and swap gender values.

To update a table and swap gender values, we can use an SQL UPDATE statement. For example we have a table named 'employees' with columns 'id', 'name', and 'gender', and we want to swap the gender values for all rows in the table:

UPDATE employees
SET gender = CASE
    WHEN gender = 'Male' THEN 'Female'
    WHEN gender = 'Female' THEN 'Male'
    ELSE gender  -- Handle any other gender values, if applicable
    END;

What is the difference between the DELETE, TRUNCATE and DROP statements?

Some key differences between the DELETE, TRUNCATE and DROP statements:

Feature DELETE TRUNCATE DROP
Purpose Removes specific rows from a table Removes all rows from a table Deletes an entire table and its structure
Functionality Deletes rows based on a condition Deletes all rows from the table Deletes the entire table and its associated objects
Impact Affects rows in the table without altering its structure Removes all data while keeping the table structure Removes the table along with its structure and associated objects
Rollback Can be rolled back within a transaction Cannot be rolled back (DDL operation) Cannot be rolled back (DDL operation)
Performance Slower compared to TRUNCATE Faster compared to DELETE N/A
Usage Used to selectively remove specific rows Used to quickly remove all rows from a table Used to completely remove a table from the database

What is the difference between the NOW() and CURRENT_DATE functions?

Some key differences between the NOW() and CURRENT_DATE functions:

Feature NOW() CURRENT_DATE
Purpose Returns the current date and time Returns the current date (without time)
Data Type Returns a timestamp with time zone Returns a date (without time)
Time Zone May include the time zone depending on the database system configuration Does not include the time zone
Usage Used when you need the current date and time, including the time zone information if needed Used when you only need the current date without the time

What is the difference between the WHERE and the HAVING clause?

Some key differences between the WHERE and the HAVING clause:

Feature WHERE HAVING
Purpose Filters rows before groups are formed Filters groups after groups are formed
Applied to Applies to individual rows in the result set Applies to groups defined by GROUP BY clause
Order of Execution Applied before GROUP BY Applied after GROUP BY
Aggregates Cannot contain aggregate functions Can contain aggregate functions
Usage Used to filter rows based on individual row conditions Used to filter groups based on aggregated conditions
Example SELECT * FROM table WHERE condition; SELECT column1, aggregate_function(column2) FROM table GROUP BY column1 HAVING condition;

From a table of names, write a query that only returns EVEN number rows.

To select only the even-numbered rows from a table of names, you can use the ROW_NUMBER() window function to assign a sequential number to each row and then filter for rows with even row numbers. Here's how you can write the query:

SELECT name
FROM (
    SELECT name, ROW_NUMBER() OVER (ORDER BY name) AS row_num
    FROM your_table_name
) AS numbered_rows
WHERE row_num % 2 = 0;

How can we copy the contents of one table to a new table?

To copy the contents of one table to a new table, we can use the CREATE TABLE AS SELECT (CTAS) statement or the INSERT INTO SELECT statement, whether the new table already exists or not.

  • CREATE TABLE AS SELECT (CTAS):

If you want to create a new table and copy the contents of an existing table into it, you can use the CREATE TABLE AS SELECT statement:

CREATE TABLE new_table_name AS
SELECT *
FROM existing_table_name;
  • INSERT INTO SELECT:

If the new table already exists and you want to insert data into it, you can use the INSERT INTO SELECT statement:

INSERT INTO new_table_name
SELECT *
FROM existing_table_name;

In string pattern matching, what is the difference between LIKE and ILIKE?

Some key differences between LIKE and ILIKE:

Feature LIKE ILIKE
Case Sensitivity Case-sensitive (depends on database settings) Case-insensitive
Usage Used for pattern matching where case matters Used for pattern matching where case doesn't matter
Syntax column_name LIKE 'pattern' column_name ILIKE 'pattern'

What are Aggregate and Scalar functions in an RDBMS and can you provide an example of their use?

Some key differences between Aggregate and Scalar functions:

Feature Aggregate Functions Scalar Functions
Purpose Operate on sets of rows and return a single result, summarizing data Operate on individual values and return a single result based on each input value
Usage Typically used with the GROUP BY clause to perform calculations on groups of rows Used within SQL expressions to manipulate or transform data on a row-by-row basis
Examples SUM(), AVG(), COUNT(), MAX(), MIN() UPPER(), LOWER(), SUBSTRING(), DATE_FORMAT()
  • Example Aggregate function:
SELECT department, COUNT(*) AS num_employees
FROM employees
GROUP BY department;
  • Example Scalar function: 
SELECT name, UPPER(name) AS uppercase_name
FROM employees;

How can you calculate the MEDIAN of a numerical field?

Finding the middle value, or median, of numbers in an SQL database can be tricky, especially because some databases don't come with a ready-made median function. But you can still figure it out by using different SQL functions and methods.

WITH Ranked AS (
    SELECT numerical_field,
           ROW_NUMBER() OVER (ORDER BY numerical_field) AS RowAsc,
           ROW_NUMBER() OVER (ORDER BY numerical_field DESC) AS RowDesc
    FROM your_table
)

SELECT AVG(numerical_field) AS Median
FROM (
    SELECT numerical_field
    FROM Ranked
    WHERE RowAsc IN ((SELECT COUNT(*) FROM Ranked) / 2, (SELECT COUNT(*) FROM Ranked) / 2 + 1)
       OR RowDesc IN ((SELECT COUNT(*) FROM Ranked) / 2, (SELECT COUNT(*) FROM Ranked) / 2 + 1)
) AS SubQuery;

Find the second highest salary employees in each department?

This question will require us to rank salaries and partition that ranking by the department of each individual employee.

I will add a manager id column to be used in the next question.

DROP TABLE IF EXISTS employee;
CREATE TABLE employee (
	emp_id int,
	emp_name TEXT,
	manager_id int,
	department TEXT,
	salary int
);

INSERT INTO employee (emp_id, emp_name, manager_id, department, salary)
VALUES
	(1, 'jaime', 0, 'IT', 85000),
	(2, 'robert', 1, 'IT', 75000),
	(3, 'lisa', 1, 'IT', 65000),
	(4, 'chris', 1, 'IT', 55000),
	(5, 'mary', 7, 'SALES', 55000),
	(6, 'richard', 7, 'SALES', 85000),
	(7, 'jane', 0, 'SALES', 80000),
	(8, 'trevor', 7, 'SALES', 65000),
	(9, 'joan', 12, 'HR', 55000),
	(10, 'jennifer', 12, 'HR', 71000),
	(11, 'trish', 12, 'HR', 58000),
	(12, 'marge', 0, 'HR', 70000);