SQL (Structured Query Language) is the foundation of data management and analysis, making it a critical skill for database administrators, data analysts, and backend developers. Stark.ai offers a curated collection of SQL interview questions, real-world scenarios, and expert guidance to help you excel in your next technical interview.
SQL (Structured Query Language) is a standard language for managing and manipulating relational databases. It is used for querying, updating, and managing data in databases. Common SQL commands include `SELECT`, `INSERT`, `UPDATE`, `DELETE`, `CREATE`, and `ALTER`.
`INNER JOIN` returns only the rows that have matching values in both tables, while `OUTER JOIN` (including `LEFT JOIN` and `RIGHT JOIN`) returns all rows from one table and the matching rows from the other, filling in `NULL` where there are no matches.
A table is created using the `CREATE TABLE` statement. For example: ```sql CREATE TABLE employees ( id INT PRIMARY KEY, name VARCHAR(100), age INT, department VARCHAR(50) ); ```
A primary key is a unique identifier for records in a table. It ensures that no duplicate values exist in the column or combination of columns marked as the primary key. Each table can have only one primary key.
`GROUP BY` is used to group rows that share a common value in specified columns. It is typically used with aggregate functions like `COUNT()`, `SUM()`, `AVG()`, `MAX()`, and `MIN()` to return grouped results. For example: `SELECT department, COUNT(*) FROM employees GROUP BY department;`.
A stored procedure is a set of SQL statements that can be saved and reused. They encapsulate business logic within the database, improve performance, and reduce network traffic. You can call them with parameters to execute complex operations. For example: ```sql CREATE PROCEDURE GetEmployeeDetails(@id INT) AS BEGIN SELECT * FROM employees WHERE id = @id; END; ```
A transaction is a sequence of SQL operations executed as a single unit of work. Transactions ensure ACID properties (Atomicity, Consistency, Isolation, Durability), meaning that either all operations in a transaction are completed successfully, or none are. You can control transactions with `BEGIN`, `COMMIT`, and `ROLLBACK` commands.
Normalization is the process of organizing a database to reduce redundancy and improve data integrity. It involves dividing large tables into smaller tables and defining relationships between them. The normal forms (1NF, 2NF, 3NF, etc.) are guidelines for designing a normalized database.
An index in SQL is a database object that improves the speed of data retrieval operations on a table by providing a fast lookup mechanism. Indexes are typically created on columns that are frequently queried. However, they can also slow down `INSERT` and `UPDATE` operations.
A foreign key is a column (or a set of columns) in one table that references the primary key in another table. It establishes a relationship between the two tables, ensuring referential integrity. For example, the `department_id` in an `employees` table might be a foreign key referencing the `id` column in a `departments` table.
A database is an organized collection of structured information or data, typically stored electronically in a computer system. It allows efficient data storage, retrieval, management, and analysis, enabling businesses and organizations to manage large amounts of information systematically.
A database is the actual collection of data, while a Database Management System (DBMS) is the software that allows users to define, create, maintain, and control access to the database. DBMS provides an interface between the database and its end-users or application programs.
The main database models include: Relational (SQL), Hierarchical, Network, Object-Oriented, Document, Key-Value, and Graph databases. Each model has unique characteristics and is suited to different types of data storage and retrieval requirements.
Database normalization is the process of organizing data to reduce redundancy and improve data integrity. It involves breaking down tables into smaller, more focused tables and defining relationships between them to minimize data duplication and potential anomalies.
ACID stands for Atomicity, Consistency, Isolation, and Durability. Atomicity ensures transactions are completed entirely or not at all, Consistency maintains database integrity, Isolation prevents interference between concurrent transactions, and Durability guarantees that completed transactions are permanently recorded.
A primary key is a column or combination of columns that uniquely identifies each row in a database table. It ensures that no two rows have the same identifier and provides a way to establish relationships between tables.
A foreign key is a column or group of columns in a relational database table that provides a link between data in two tables. It creates a relationship between tables by referencing the primary key of another table, ensuring referential integrity.
The main types of database relationships are: One-to-One, One-to-Many, Many-to-One, and Many-to-Many. These relationships define how data is connected and how tables interact with each other in a relational database.
A database schema is a blueprint that defines the logical structure of a database, including tables, fields, relationships, views, indexes, and other database objects. It serves as a framework for organizing and representing data in a systematic manner.
Data integrity refers to the accuracy, consistency, and reliability of data stored in a database. It ensures that data remains unchanged during storage, retrieval, and processing, and is maintained through constraints, validation rules, and database design principles.
A data warehouse is a centralized repository designed to store large volumes of structured data from multiple sources. It is optimized for query and analysis, providing historical and consolidated data for business intelligence and reporting purposes.
OLTP (Online Transaction Processing) databases are optimized for handling numerous real-time transactions, while OLAP (Online Analytical Processing) databases are designed for complex analytical queries and reporting, typically used for business intelligence.
A database index is a data structure that improves the speed of data retrieval operations on a database table. It works similar to an index in a book, allowing faster lookup of rows based on the values of one or more columns.
A database view is a virtual table based on the result of a SQL statement. It provides a way to simplify complex queries, restrict access to data, aggregate information, and present data in a specific format without storing the data physically.
Denormalization is a database optimization technique where redundant data is intentionally added to improve read performance. It involves combining normalized tables to reduce the need for complex joins and speed up data retrieval at the cost of some data redundancy.
Database constraints are rules enforced on data columns to maintain data integrity. Common types include NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK, and DEFAULT constraints, which ensure data accuracy and consistency.
Data modeling is the process of creating a visual representation of a database's structure. It involves defining data elements, their relationships, and rules to support business requirements. Common data modeling techniques include conceptual, logical, and physical modeling.
A stored procedure is a precompiled collection of one or more SQL statements stored in a database. It can be reused and called multiple times, accepts input parameters, performs operations, and can return results, providing a way to encapsulate complex database logic.
Database triggers are special stored procedures automatically executed when a specific event occurs in the database, such as INSERT, UPDATE, or DELETE operations. They are used to maintain data integrity, enforce business rules, and automatically perform actions in response to data changes.
A database cursor is a database object that allows traversal and manipulation of database records. It acts like a pointer to a specific row in a result set, enabling row-by-row processing of query results and supporting operations that require sequential data access.
Database partitioning is a technique of dividing large tables into smaller, more manageable pieces called partitions. Each partition can be managed and accessed separately, improving query performance, simplifying maintenance, and enabling more efficient data management.
A database transaction is a sequence of database operations that are treated as a single unit of work. It must be completed entirely or not at all, ensuring data consistency. Transactions follow the ACID properties and are crucial for maintaining database reliability.
Database keys include Primary Key (uniquely identifies a record), Foreign Key (links tables together), Candidate Key (potential primary key), Alternate Key (secondary unique identifier), and Composite Key (combination of multiple columns used as a key).
Database replication is the process of creating and maintaining multiple copies of a database across different servers. It improves data availability, provides load balancing, enables disaster recovery, and ensures data consistency through various replication strategies.
Database sharding is a horizontal partitioning technique that splits large databases into smaller, more manageable pieces called shards. Each shard contains a subset of the data, distributed across multiple servers to improve performance, scalability, and manageability.
The main components of a relational database include tables, rows, columns, keys, indexes, views, stored procedures, and relationships. These elements work together to organize, store, and manage structured data efficiently.
A data dictionary is a centralized repository of information about data, such as its meaning, relationships to other data, origin, usage, and format. It provides metadata about database objects, helping users and administrators understand the structure and semantics of the database.
The basic structure of a SELECT statement is SELECT columns FROM table [WHERE condition]. It allows you to retrieve data from one or more tables, with SELECT specifying which columns to retrieve, FROM indicating the table, and WHERE (optional) filtering the results.
To select all columns from a table, use the asterisk (*) wildcard in the SELECT statement. For example: SELECT * FROM table_name. This retrieves all columns and rows from the specified table.
The WHERE clause is used to filter records based on specific conditions. It allows you to retrieve only the rows that meet the specified criteria, reducing the amount of data returned and helping to pinpoint exact information.
SQL comparison operators include: = (equal), <> or != (not equal), > (greater than), < (less than), >= (greater than or equal to), <= (less than or equal to), BETWEEN (range), LIKE (pattern matching), IN (multiple possible values), and IS NULL (null value check).
The DISTINCT keyword is used to remove duplicate rows from the result set. For example: SELECT DISTINCT column_name FROM table_name. It returns only unique values in the specified column(s).
The ORDER BY clause is used to sort the result set in ascending (ASC) or descending (DESC) order. For example: SELECT * FROM table_name ORDER BY column_name DESC. By default, sorting is in ascending order if not specified.
The LIMIT clause restricts the number of rows returned in a query result. For example: SELECT * FROM table_name LIMIT 10 returns only the first 10 rows. Some databases use TOP or FETCH FIRST instead of LIMIT.
AND requires all conditions to be true, while OR requires at least one condition to be true. For example, WHERE age > 30 AND salary < 50000 returns rows meeting both conditions, while WHERE age > 30 OR salary < 50000 returns rows meeting either condition.
The LIKE operator is used for pattern matching with wildcard characters. % represents zero or more characters, _ represents a single character. For example: WHERE name LIKE 'A%' finds names starting with A, WHERE name LIKE '_oh%' finds names with second and third letters 'oh'.
The IN operator allows you to specify multiple values in a WHERE clause. It provides a shorthand for multiple OR conditions. For example: WHERE column_name IN (value1, value2, value3) is equivalent to WHERE column_name = value1 OR column_name = value2 OR column_name = value3.
The BETWEEN operator selects values within a given range. It is inclusive of both boundary values. For example: WHERE age BETWEEN 20 AND 30 returns rows where age is 20, 30, or any value in between.
NULL values are handled using IS NULL and IS NOT NULL operators. For example: WHERE column_name IS NULL finds rows with null values, while WHERE column_name IS NOT NULL finds rows with non-null values. Standard comparison operators do not work with NULL.
CHAR is a fixed-length string type that pads shorter strings with spaces, while VARCHAR is a variable-length string type that only uses the space needed. CHAR(10) always uses 10 characters, but VARCHAR(10) can use 1 to 10 characters.
Aliases provide alternative names for tables or columns in a query. Column aliases use AS keyword: SELECT first_name AS name. Table aliases are used for readability and in joins: FROM employees AS e.
A subquery is a query nested inside another query. It can be used in SELECT, FROM, WHERE, and HAVING clauses. For example: SELECT * FROM employees WHERE salary > (SELECT AVG(salary) FROM employees) returns employees with above-average salary.
Common SQL data types include: INTEGER/INT (whole numbers), DECIMAL/NUMERIC (precise decimal numbers), FLOAT/REAL (approximate numeric), CHAR/VARCHAR (fixed/variable strings), DATE/DATETIME (date and time values), BOOLEAN (true/false), and BLOB (binary large objects).
The GROUP BY clause groups rows with the same values in specified columns into summary rows. It is typically used with aggregate functions like COUNT, SUM, AVG. For example: SELECT department, AVG(salary) FROM employees GROUP BY department calculates average salary per department.
The HAVING clause filters groups created by GROUP BY, similar to WHERE but applied after grouping. For example: SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > 50000 shows departments with average salary above 50000.
Aggregate functions perform calculations on a set of values and return a single result. Common aggregate functions include COUNT() (number of rows), SUM() (total), AVG() (average), MAX() (maximum value), and MIN() (minimum value).
The CASE statement allows conditional logic in queries. It works like an IF-THEN-ELSE statement. Example: SELECT name, CASE WHEN salary > 50000 THEN 'High' ELSE 'Low' END AS salary_category FROM employees.
WHERE filters individual rows before grouping, while HAVING filters groups after grouping. WHERE works with individual row conditions, HAVING works with aggregate function conditions in grouped queries.
Common string functions include CONCAT() (combine strings), SUBSTRING() (extract part of string), LENGTH() (string length), UPPER() (uppercase), LOWER() (lowercase), TRIM() (remove spaces), and REPLACE() (replace substring).
SQL provides functions for date and time manipulation like CURRENT_DATE, DATE_ADD(), DATE_SUB(), DATEDIFF() to perform calculations and comparisons. Specific functions vary between database systems.
SQL supports two types of comments: single-line comments (-- comment text) and multi-line comments (/* comment text */). Comments are used to explain code and are ignored by the database engine.
Wildcard characters are used with LIKE operator: % matches zero or more characters, _ matches a single character, [] matches any single character in brackets, [^] matches any character not in brackets. They enable flexible pattern matching in queries.
DELETE removes rows one at a time and can be rolled back, TRUNCATE removes all rows at once and cannot be rolled back. DELETE can use a WHERE clause, TRUNCATE removes all data from a table.
Set operators like UNION (removes duplicates), UNION ALL (includes duplicates), INTERSECT (common rows), and EXCEPT (rows in first query not in second) combine results from multiple SELECT statements.
The typical SQL query execution order is: FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY, LIMIT. This means conditions are applied before grouping, aggregations, sorting, and limiting results.
The basic structure of a SELECT statement is SELECT columns FROM table [WHERE condition]. It allows you to retrieve data from one or more tables, with SELECT specifying which columns to retrieve, FROM indicating the table, and WHERE (optional) filtering the results.
To select all columns from a table, use the asterisk (*) wildcard in the SELECT statement. For example: SELECT * FROM table_name. This retrieves all columns and rows from the specified table.
The WHERE clause is used to filter records based on specific conditions. It allows you to retrieve only the rows that meet the specified criteria, reducing the amount of data returned and helping to pinpoint exact information.
SQL comparison operators include: = (equal), <> or != (not equal), > (greater than), < (less than), >= (greater than or equal to), <= (less than or equal to), BETWEEN (range), LIKE (pattern matching), IN (multiple possible values), and IS NULL (null value check).
The DISTINCT keyword is used to remove duplicate rows from the result set. For example: SELECT DISTINCT column_name FROM table_name. It returns only unique values in the specified column(s).
The ORDER BY clause is used to sort the result set in ascending (ASC) or descending (DESC) order. For example: SELECT * FROM table_name ORDER BY column_name DESC. By default, sorting is in ascending order if not specified.
The LIMIT clause restricts the number of rows returned in a query result. For example: SELECT * FROM table_name LIMIT 10 returns only the first 10 rows. Some databases use TOP or FETCH FIRST instead of LIMIT.
AND requires all conditions to be true, while OR requires at least one condition to be true. For example, WHERE age > 30 AND salary < 50000 returns rows meeting both conditions, while WHERE age > 30 OR salary < 50000 returns rows meeting either condition.
The LIKE operator is used for pattern matching with wildcard characters. % represents zero or more characters, _ represents a single character. For example: WHERE name LIKE 'A%' finds names starting with A, WHERE name LIKE '_oh%' finds names with second and third letters 'oh'.
The IN operator allows you to specify multiple values in a WHERE clause. It provides a shorthand for multiple OR conditions. For example: WHERE column_name IN (value1, value2, value3) is equivalent to WHERE column_name = value1 OR column_name = value2 OR column_name = value3.
The BETWEEN operator selects values within a given range. It is inclusive of both boundary values. For example: WHERE age BETWEEN 20 AND 30 returns rows where age is 20, 30, or any value in between.
NULL values are handled using IS NULL and IS NOT NULL operators. For example: WHERE column_name IS NULL finds rows with null values, while WHERE column_name IS NOT NULL finds rows with non-null values. Standard comparison operators do not work with NULL.
CHAR is a fixed-length string type that pads shorter strings with spaces, while VARCHAR is a variable-length string type that only uses the space needed. CHAR(10) always uses 10 characters, but VARCHAR(10) can use 1 to 10 characters.
Aliases provide alternative names for tables or columns in a query. Column aliases use AS keyword: SELECT first_name AS name. Table aliases are used for readability and in joins: FROM employees AS e.
A subquery is a query nested inside another query. It can be used in SELECT, FROM, WHERE, and HAVING clauses. For example: SELECT * FROM employees WHERE salary > (SELECT AVG(salary) FROM employees) returns employees with above-average salary.
Common SQL data types include: INTEGER/INT (whole numbers), DECIMAL/NUMERIC (precise decimal numbers), FLOAT/REAL (approximate numeric), CHAR/VARCHAR (fixed/variable strings), DATE/DATETIME (date and time values), BOOLEAN (true/false), and BLOB (binary large objects).
The GROUP BY clause groups rows with the same values in specified columns into summary rows. It is typically used with aggregate functions like COUNT, SUM, AVG. For example: SELECT department, AVG(salary) FROM employees GROUP BY department calculates average salary per department.
The HAVING clause filters groups created by GROUP BY, similar to WHERE but applied after grouping. For example: SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > 50000 shows departments with average salary above 50000.
Aggregate functions perform calculations on a set of values and return a single result. Common aggregate functions include COUNT() (number of rows), SUM() (total), AVG() (average), MAX() (maximum value), and MIN() (minimum value).
The CASE statement allows conditional logic in queries. It works like an IF-THEN-ELSE statement. Example: SELECT name, CASE WHEN salary > 50000 THEN 'High' ELSE 'Low' END AS salary_category FROM employees.
WHERE filters individual rows before grouping, while HAVING filters groups after grouping. WHERE works with individual row conditions, HAVING works with aggregate function conditions in grouped queries.
Common string functions include CONCAT() (combine strings), SUBSTRING() (extract part of string), LENGTH() (string length), UPPER() (uppercase), LOWER() (lowercase), TRIM() (remove spaces), and REPLACE() (replace substring).
SQL provides functions for date and time manipulation like CURRENT_DATE, DATE_ADD(), DATE_SUB(), DATEDIFF() to perform calculations and comparisons. Specific functions vary between database systems.
SQL supports two types of comments: single-line comments (-- comment text) and multi-line comments (/* comment text */). Comments are used to explain code and are ignored by the database engine.
Wildcard characters are used with LIKE operator: % matches zero or more characters, _ matches a single character, [] matches any single character in brackets, [^] matches any character not in brackets. They enable flexible pattern matching in queries.
DELETE removes rows one at a time and can be rolled back, TRUNCATE removes all rows at once and cannot be rolled back. DELETE can use a WHERE clause, TRUNCATE removes all data from a table.
Set operators like UNION (removes duplicates), UNION ALL (includes duplicates), INTERSECT (common rows), and EXCEPT (rows in first query not in second) combine results from multiple SELECT statements.
The typical SQL query execution order is: FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY, LIMIT. This means conditions are applied before grouping, aggregations, sorting, and limiting results.
DDL (Data Definition Language) is a subset of SQL used to define and manage database structures. The main DDL commands are CREATE, ALTER, DROP, and TRUNCATE, which are used to create, modify, delete, and remove database objects like tables, indexes, and schemas.
The CREATE TABLE statement is used to create a new table in a database. It specifies the table name, column names, data types, and optional constraints. For example: CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR(100), salary DECIMAL(10,2)).
Table constraints are rules enforced on data columns to maintain data integrity. Common constraints include PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, CHECK, and DEFAULT. They define rules for data that can be inserted into a table.
ALTER TABLE is used to modify an existing table structure. Common operations include adding, modifying, or dropping columns, adding or removing constraints. For example: ALTER TABLE employees ADD COLUMN email VARCHAR(100), or ALTER TABLE employees DROP COLUMN phone_number.
DROP TABLE completely removes a table and all its data from the database. For example: DROP TABLE employees. This command permanently deletes the table structure and all associated data, and cannot be undone unless you have a backup.
DROP removes the entire table structure and data, while TRUNCATE removes all rows from a table but keeps the table structure intact. DROP is a DDL command that deletes the table, TRUNCATE quickly removes all data without logging individual row deletions.
Indexes are created using the CREATE INDEX statement. For example: CREATE INDEX idx_last_name ON employees(last_name). Indexes improve query performance by allowing faster data retrieval. They can be unique or non-unique and can be created on one or multiple columns.
A schema is a named collection of database objects like tables, views, indexes, and stored procedures. It provides a way to logically group and organize database objects. You can create a schema using CREATE SCHEMA statement and manage object ownership and permissions.
A view is a virtual table based on the result of a SELECT statement. It doesn't store data physically but provides a way to simplify complex queries. Created using CREATE VIEW: CREATE VIEW high_salary_employees AS SELECT * FROM employees WHERE salary > 50000.
A PRIMARY KEY constraint uniquely identifies each record in a table. It can be defined during table creation: CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR(100)), or added later: ALTER TABLE employees ADD PRIMARY KEY (id).
A FOREIGN KEY creates a relationship between two tables by referencing the primary key of another table. It ensures referential integrity. Example: CREATE TABLE orders (id INT, customer_id INT, FOREIGN KEY (customer_id) REFERENCES customers(id)).
The UNIQUE constraint ensures that all values in a column are different. Unlike PRIMARY KEY, a table can have multiple UNIQUE constraints. Example: CREATE TABLE users (id INT, email VARCHAR(100) UNIQUE).
Temporary tables are created using CREATE TEMPORARY TABLE or CREATE TEMP TABLE. They exist only for the duration of a session. Example: CREATE TEMPORARY TABLE temp_sales (product_id INT, total_sales DECIMAL).
The CHECK constraint is used to limit the value range that can be placed in a column. Example: CREATE TABLE employees (age INT CHECK (age >= 18 AND age <= 65)), which ensures the age is between 18 and 65.
The DEFAULT constraint provides a default value for a column when no value is specified. Example: CREATE TABLE products (id INT, price DECIMAL DEFAULT 0.00), which sets the default price to 0 if not explicitly provided.
Table renaming can be done using ALTER TABLE. The exact syntax varies between database systems. For example, in MySQL: RENAME TABLE old_table TO new_table; in SQL Server: SP_RENAME 'old_table', 'new_table'.
A composite key is a primary key composed of multiple columns. It's used when a single column cannot uniquely identify a record. Example: CREATE TABLE order_items (order_id INT, product_id INT, PRIMARY KEY (order_id, product_id)).
Auto-increment columns automatically generate unique numeric values when a new record is inserted. Syntax varies by database: MySQL uses AUTO_INCREMENT, SQL Server uses IDENTITY, PostgreSQL uses SERIAL.
Table inheritance allows creating a new table based on an existing table, inheriting its columns and characteristics. This is supported differently across database systems, with PostgreSQL providing direct support for table inheritance.
A clustered index determines the physical order of data in a table. Each table can have only one clustered index. It sorts and stores the data rows in the table based on their key values, which affects the way data is physically stored.
A sequence is a database object that generates a series of numeric values. Example: CREATE SEQUENCE emp_seq START WITH 1 INCREMENT BY 1. It can be used to generate unique identifier values for tables.
Database normalization is the process of organizing data to reduce redundancy and improve data integrity. It involves creating tables, defining relationships, and applying constraints to minimize data duplication and potential anomalies.
Computed columns are columns whose values are calculated dynamically from other columns in the table. Example: total_price DECIMAL GENERATED ALWAYS AS (quantity * unit_price) STORED.
Table partitioning divides large tables into smaller, more manageable pieces. The syntax varies by database system. It allows improving query performance and management of large datasets by splitting them into logical segments.
Cascading actions define what happens to dependent records when a referenced record is updated or deleted. Options include CASCADE (propagate changes), SET NULL, SET DEFAULT, and NO ACTION.
Data types define the type of data a column can store, its size, and potential constraints. They ensure data integrity, optimize storage, and define how data can be processed and manipulated in the database.
A materialized view is a database object that contains the results of a query. Unlike regular views, it stores the query results physically. Syntax varies by database system, but generally involves CREATE MATERIALIZED VIEW with a SELECT statement.
Data Manipulation Language (DML) is a subset of SQL used to manipulate data within database objects. The main DML commands are INSERT, UPDATE, DELETE, and MERGE. These commands allow adding, modifying, removing, and combining data in database tables.
The INSERT statement adds new rows to a table. There are multiple ways to use it: INSERT INTO table_name (column1, column2) VALUES (value1, value2), or INSERT INTO table_name VALUES (value1, value2) to insert values for all columns in order.
You can insert multiple rows in a single INSERT statement by listing multiple sets of values: INSERT INTO table_name (column1, column2) VALUES (value1, value2), (value3, value4), (value5, value6). Another method is using INSERT INTO ... SELECT to insert rows from another table.
The UPDATE statement modifies existing records in a table. Its basic syntax is UPDATE table_name SET column1 = value1, column2 = value2 WHERE condition. The WHERE clause is optional but recommended to specify which rows to update.
The DELETE statement removes one or more records from a table. Its syntax is DELETE FROM table_name WHERE condition. If no WHERE clause is specified, all rows in the table will be deleted. It's important to use a precise WHERE clause to avoid unintended data loss.
The MERGE statement performs INSERT, UPDATE, or DELETE operations in a single statement based on a condition. It's useful for synchronizing two tables. The statement allows you to compare a source table with a target table and perform different actions depending on whether a match is found.
You can insert data from one table into another using the INSERT INTO ... SELECT statement. For example: INSERT INTO target_table (column1, column2) SELECT column1, column2 FROM source_table WHERE condition.
Upsert is the process of inserting a new record or updating an existing one if it already exists. Different databases implement this differently. Some use MERGE, while others use specific syntax like INSERT ... ON DUPLICATE KEY UPDATE in MySQL.
The main risks include accidentally updating or deleting unintended records if the WHERE clause is incorrect. Always use transactions, have a backup, and test complex UPDATE or DELETE statements in a safe environment before running them on production data.
To update multiple columns, list them in the SET clause separated by commas: UPDATE table_name SET column1 = value1, column2 = value2, column3 = value3 WHERE condition.
You can use a subquery in an UPDATE statement to set values based on conditions from another table. For example: UPDATE employees SET salary = salary * 1.1 WHERE department_id IN (SELECT id FROM departments WHERE location = 'New York').
TRUNCATE quickly removes all rows from a table, resetting its storage. DELETE removes rows one by one and can be rolled back. TRUNCATE is faster but less flexible, as it cannot use a WHERE clause and cannot be rolled back.
To insert or update NULL values, use the NULL keyword. For example: INSERT INTO table_name (column1, column2) VALUES (value1, NULL). You can also use UPDATE table_name SET column1 = NULL WHERE condition.
You can update a table based on conditions from another table using JOIN. For example: UPDATE table1 t1 JOIN table2 t2 ON t1.id = t2.id SET t1.column1 = t2.column2 WHERE condition.
Best practices include: using transactions, writing precise WHERE clauses, backing up data before major changes, using prepared statements to prevent SQL injection, and testing complex operations in a safe environment before production.
To insert rows with default values, either omit the column in the column list or explicitly use the DEFAULT keyword. For example: INSERT INTO table_name (column1) VALUES (DEFAULT), or INSERT INTO table_name DEFAULT VALUES.
The OUTPUT clause allows you to return information about the rows affected by an INSERT, UPDATE, or DELETE statement. It can capture both the old and new values of the modified rows, useful for logging or auditing purposes.
Bulk insert is a method of inserting multiple rows efficiently. Different databases have different methods, such as using INSERT with multiple VALUES, BULK INSERT command, or database-specific bulk loading utilities.
A self-update involves updating a table based on its own existing values. For example: UPDATE table_name SET column1 = column1 * 1.1 WHERE condition.
The RETURNING clause (supported in some databases like PostgreSQL) allows you to retrieve values of rows affected by INSERT, UPDATE, or DELETE statements. It's similar to the OUTPUT clause in other databases.
When working with large datasets, consider performance implications, use transactions, potentially break the operation into smaller chunks, create appropriate indexes, and be cautious of lock contention in multi-user environments.
Maintain data integrity through foreign key constraints, check constraints, transactions, using appropriate data types, and implementing validation logic. Always ensure that DML operations don't violate defined database constraints.
Parameterized queries use placeholders for values, which helps prevent SQL injection and improves query performance by allowing query plan reuse. Different databases have different syntax for parameterization.
INSERT IGNORE will skip rows that would cause errors (like duplicate key violations) instead of failing the entire insert operation. It's useful when you want to insert multiple rows and don't want the entire operation to fail if some rows have issues.
Conditional delete uses a WHERE clause to specify which rows to remove. For example: DELETE FROM table_name WHERE condition. You can also use subqueries or joins to create more complex deletion conditions.
Foreign key constraints can restrict DML operations. For example, you cannot delete a parent record if child records exist unless CASCADE delete is specified. INSERT and UPDATE must ensure that referenced values exist in the parent table.
You can copy data between tables using INSERT INTO ... SELECT, CREATE TABLE ... AS SELECT, or using database-specific bulk copy utilities. The method depends on whether you want to copy structure, data, or both.
A JOIN clause is used to combine rows from two or more tables based on a related column between them. Its basic purpose is to create a result set that shows how data in different tables is related.
The four main types of JOIN operations in SQL are INNER JOIN, LEFT (OUTER) JOIN, RIGHT (OUTER) JOIN, and FULL (OUTER) JOIN. Each type determines how records from the joined tables are combined in the result set.
INNER JOIN returns only the matching rows from both tables, while LEFT JOIN returns all rows from the left table and matching rows from the right table. If there's no match, NULL values are returned for the right table columns.
A foreign key constraint is a column that references the primary key of another table. It maintains referential integrity and ensures data consistency between related tables.
A self-join is when a table is joined with itself. It's useful when a table contains hierarchical or self-referential data, such as an employee table where each employee has a manager who is also an employee.
A cross join produces a Cartesian product of two tables, combining each row from the first table with every row from the second table. The result contains all possible combinations of rows from both tables.
A NATURAL JOIN automatically joins tables based on columns with the same name in both tables, while an INNER JOIN requires explicit specification of the join conditions using the ON clause.
A composite key is a combination of two or more columns that uniquely identify a row. In relationships, it can be used as a foreign key to reference another table where the same combination of columns serves as the primary key.
Referential integrity ensures that relationships between tables remain consistent. It prevents actions that would destroy relationships, such as deleting a record that's referenced by other records or adding a reference to a nonexistent record.
The main types of relationships are one-to-one (1:1), one-to-many (1:N), and many-to-many (M:N). Each type determines how records in one table relate to records in another table.
A many-to-many relationship is implemented using a junction table (also called bridge or associative table) that contains foreign keys referencing the primary keys of both related tables.
ON DELETE CASCADE automatically deletes related records in the child table when a parent record is deleted, while ON DELETE SET NULL sets the foreign key fields to NULL in the child table when the parent record is deleted.
A recursive relationship is when a table has a relationship with itself, where a record in the table references another record in the same table, such as an employee having a manager who is also an employee.
Cardinality defines the numerical relationship between records in related tables, specifying how many records in one table can be related to a record in another table (e.g., one-to-one, one-to-many, many-to-many).
An anti-join returns records from the first table that have no matching records in the second table. It can be implemented using NOT EXISTS, NOT IN, or LEFT JOIN with a NULL check on the second table's columns.
NULL values in JOIN conditions require special attention as they don't match anything, even other NULLs. You may need to use IS NULL in the join condition or COALESCE to handle NULL values appropriately.
A surrogate key is an artificial primary key, typically an auto-incrementing number, used instead of a natural key. It's useful when natural keys are complex, subject to change, or non-existent.
Normalization is the process of organizing data to reduce redundancy and improve data integrity. It often results in creating more tables with relationships between them, requiring JOINs to retrieve related data.
There is no difference - LEFT OUTER JOIN and LEFT JOIN are synonymous in SQL. The word OUTER is optional and both produce the same result, returning all records from the left table and matching records from the right table.
Indexes on JOIN columns can significantly improve JOIN performance by reducing the need for full table scans. The database can use indexes to quickly locate matching rows between tables.
A non-equi join is a join that uses comparison operators other than equality (!=, >, <, etc.). It's useful when you need to match records based on ranges or conditions rather than exact matches.
A one-to-one relationship is implemented using a unique foreign key constraint in one table that references the primary key of another table. This ensures that each record in one table corresponds to exactly one record in the other table.
A circular reference occurs when tables form a cycle of foreign key relationships. It can be prevented by careful database design, breaking the cycle, or using alternative relationship patterns.
The USING clause is a shorthand for joining tables when the columns have the same name in both tables. It simplifies the JOIN syntax by eliminating the need for an ON clause with equality comparison.
When joining tables with composite foreign keys, all components of the key must be included in the JOIN condition using AND operators to ensure the correct matching of records.
Denormalization is the process of adding redundant data to tables to reduce the need for JOINs. While it can improve query performance, it introduces data redundancy and potential consistency issues.
Complex JOINs can be optimized by proper indexing, joining tables in the most efficient order, using appropriate JOIN types, and considering denormalization where necessary. The execution plan should be analyzed to identify performance bottlenecks.
UNION combines rows from two or more queries vertically (adding rows), while JOIN combines tables horizontally (adding columns). UNION requires the same number and compatible types of columns in all queries.
Referential actions specify what happens when a referenced record is deleted or updated. Types include CASCADE, SET NULL, SET DEFAULT, and NO ACTION, each defining different behaviors for maintaining referential integrity.
Logical joins describe the desired relationship between tables in the query, while physical joins refer to the actual methods used by the database engine to combine the data, such as nested loops, hash joins, or merge joins.
The GROUP BY clause is used to group rows that have the same values in specified columns into summary rows. It is typically used with aggregate functions to perform calculations on each group of rows rather than the entire table.
The five basic aggregate functions in SQL are COUNT(), SUM(), AVG(), MAX(), and MIN(). These functions perform calculations across a set of rows and return a single value.
COUNT(*) counts all rows including NULL values, while COUNT(column_name) counts only non-NULL values in the specified column. This can lead to different results when the column contains NULL values.
The HAVING clause is used to filter groups based on aggregate function results. It's similar to WHERE but operates on groups rather than individual rows and can use aggregate functions in its conditions.
When DISTINCT is used with aggregate functions (e.g., COUNT(DISTINCT column)), it counts or aggregates only unique values in the specified column, eliminating duplicates before performing the aggregation.
WHERE filters individual rows before grouping, while HAVING filters groups after grouping. HAVING can use aggregate functions in its conditions, but WHERE cannot because it processes rows before aggregation occurs.
NULL values are handled differently by different aggregate functions: COUNT(*) includes them, COUNT(column) ignores them, SUM and AVG ignore them, and MAX and MIN ignore them. This can significantly impact calculation results.
A window function performs calculations across a set of rows related to the current row, unlike regular aggregation which groups rows into a single output row. Window functions preserve the individual rows while adding aggregate calculations.
Running totals can be calculated using window functions with the OVER clause and ORDER BY, such as SUM(value) OVER (ORDER BY date). This creates a cumulative sum while maintaining individual row details.
GROUPING SETS allows you to specify multiple grouping combinations in a single query. It's a shorthand for combining multiple GROUP BY operations with UNION ALL, producing multiple levels of aggregation simultaneously.
Division by zero can be handled using NULLIF or CASE statements within aggregate functions. For example, AVG(value/NULLIF(divisor,0)) prevents division by zero errors by converting zero divisors to NULL.
The CUBE operator generates all possible combinations of grouping columns, producing a cross-tabulation report. It's useful for generating subtotals and grand totals across multiple dimensions in data analysis.
The mode can be found using COUNT and GROUP BY, then selecting the value with the highest count using ORDER BY COUNT(*) DESC and LIMIT 1 or ranking functions like ROW_NUMBER().
ROLLUP generates hierarchical subtotals based on the specified columns' order, while CUBE generates all possible combinations. ROLLUP is used for hierarchical data analysis, creating subtotals for each level.
Percentages within groups can be calculated using window functions, such as SUM(value) OVER (PARTITION BY group) to get the group total, then dividing individual values by this total and multiplying by 100.
ROW_NUMBER() assigns unique numbers, RANK() assigns same number to ties with gaps, and DENSE_RANK() assigns same number to ties without gaps. They're used for different ranking scenarios within groups.
Groups with specific patterns can be found using HAVING with aggregate functions to filter groups based on conditions like COUNT(), MIN(), MAX(), or custom calculations that identify the desired patterns.
FIRST_VALUE and LAST_VALUE are window functions that return the first and last values in a window frame, respectively. They're useful for comparing current rows with initial or final values in a group.
Timezone differences can be handled by converting timestamps to a standard timezone using AT TIME ZONE or converting to UTC before grouping. This ensures consistent grouping across different timezones.
LAG() accesses data from previous rows while LEAD() accesses data from subsequent rows in a result set. Both are window functions useful for comparing current rows with offset rows within groups.
Outliers can be identified using window functions to calculate statistical measures like standard deviation within groups, then using WHERE or HAVING to filter values that deviate significantly from the group's average.
ORDER BY in window functions determines the sequence of rows for operations like running totals, moving averages, and LAG/LEAD functions. It's crucial for time-series analysis and sequential calculations.
Moving averages are calculated using window functions with ROWS or RANGE in the OVER clause, such as AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW).
ROWS defines the window frame based on physical row count, while RANGE defines it based on logical value ranges. ROWS is used for fixed-size windows, RANGE for value-based windows.
Group concatenation can be achieved using STRING_AGG() or GROUP_CONCAT() (depending on the database system), which combines values from multiple rows into a single string within each group.
The FILTER clause allows conditional aggregation by specifying which rows to include in the aggregate calculation. It's more readable than CASE expressions and can improve performance.
Median calculation varies by database system. Common approaches include using PERCENTILE_CONT(0.5), specialized functions like MEDIAN(), or calculating it manually using window functions and row numbers.
Aggregate functions group rows into a single result row, while analytic functions (window functions) perform calculations across rows while maintaining individual row details in the result set.
Data pivoting can be achieved using aggregate functions with CASE expressions or the PIVOT operator (if supported by the database). This transforms row values into columns, creating cross-tabulated results.
A subquery is a query nested inside another query. Its basic purpose is to return data that will be used in the main query as a condition or as a derived table. Subqueries can be used in SELECT, FROM, WHERE, and HAVING clauses.
A correlated subquery references columns from the outer query and is executed once for each row processed by the outer query. A non-correlated subquery is independent of the outer query and executes once for the entire query.
A derived table is a subquery in the FROM clause that acts as a temporary table for the duration of the query. It must have an alias and can be used like a regular table in the main query.
Scalar subqueries are subqueries that return exactly one row and one column. They're used when a single value is needed, such as in comparisons or calculations, and can appear in SELECT, WHERE, or HAVING clauses.
EXISTS checks whether a subquery returns any rows. It's often used with correlated subqueries to test for the existence of related records, returning TRUE if the subquery returns any rows and FALSE if it doesn't.
IN compares a value against a list of values returned by the subquery, while EXISTS checks for the presence of any rows. EXISTS often performs better with large datasets as it stops processing once a match is found.
Subqueries in the SELECT clause must return a single value per row of the outer query. They're often used to calculate values or retrieve related data from other tables for each row in the result set.
Correlated subqueries can impact performance as they execute once for each row in the outer query. This can be inefficient for large datasets and might be better replaced with JOINs or other query constructs.
NULL values require special handling in subqueries, especially with NOT IN operations. It's important to either filter out NULLs or use NOT EXISTS instead of NOT IN to avoid unexpected results due to NULL comparison behavior.
A CTE is a named temporary result set that exists within the scope of a single statement. Unlike subqueries, CTEs can be referenced multiple times within a query and can be recursive. They often improve readability and maintenance.
Tables can be updated using subqueries in the SET clause or WHERE clause. The subquery can provide values for the update or identify which rows to update. Care must be taken with correlated subqueries to avoid updating the same table being referenced.
A lateral join allows subqueries in the FROM clause to reference columns from preceding items in the FROM clause. This enables row-by-row processing with access to outer query columns, similar to correlated subqueries.
Subqueries can be used with aggregate functions to compare individual values against group results, such as finding rows where a value exceeds the average. They can appear in HAVING clauses or as scalar subqueries in the SELECT list.
Subqueries have limitations including: cannot contain ORDER BY (except in TOP/LIMIT clauses), cannot be used with UNION/INTERSECT/EXCEPT in certain contexts, and must return single values in scalar contexts. Performance can also be a limitation with complex nested queries.
Subqueries can be used in INSERT statements to populate new records based on existing data. They can provide values for specific columns or complete rows, and can be combined with SELECT statements to insert multiple rows.
ANY/SOME compares a value with each value returned by a subquery, returning TRUE if any comparison is true. It's often used with comparison operators like '>', '<', or '=' to find matches against multiple values.
The ALL operator compares a value with every value returned by a subquery, returning TRUE only if all comparisons are true. It's useful for finding values that satisfy conditions against an entire set of results.
A nested subquery is a subquery within another subquery. While they can solve complex problems, each level of nesting can impact performance and readability. They should be used judiciously and potentially refactored using JOINs or CTEs.
Subquery performance can be optimized by: using EXISTS instead of IN for large datasets, avoiding correlated subqueries when possible, using JOINs instead of subqueries where appropriate, and ensuring proper indexing on referenced columns.
A materialized subquery is one where the results are computed once and stored temporarily, rather than being recomputed for each row. This can improve performance for complex subqueries referenced multiple times in the main query.
Subqueries in DELETE statements can identify which records to remove based on complex conditions or relationships with other tables. Care must be taken with correlated subqueries to avoid affecting the subquery's results during deletion.
While both can relate data from multiple tables, subqueries create a nested query structure while joins combine tables horizontally. Joins often perform better but subqueries can be more readable for certain operations like existence checks.
Subqueries can contain window functions to perform calculations before the results are used in the main query. This is often done in derived tables or CTEs where the window function results need further processing.
A recursive subquery is used in a CTE to query hierarchical or graph-like data structures. It combines a base case with a recursive part to traverse relationships like organizational charts or bill of materials.
Error handling in subqueries involves checking for NULL results, handling no-data scenarios, ensuring single-row returns for scalar subqueries, and using CASE expressions or COALESCE to handle exceptional cases.
Subqueries operate within the same transaction as the main query, but complex subqueries can affect lock duration and concurrency. Correlated subqueries may hold locks longer due to row-by-row processing.
Subqueries in dynamic SQL must be properly formatted and escaped. They can be used to create flexible queries based on runtime conditions, but care must be taken to prevent SQL injection and ensure proper parameter handling.
Inline views are subqueries in the FROM clause that create temporary result sets. They're useful for breaking down complex queries, pre-aggregating data, or applying transformations before joining with other tables.
Subqueries in CASE expressions must return scalar values and can be used to create conditional logic based on queries against other tables or aggregated data. They're useful for complex categorical assignments or calculations.
CASE statements in WHERE clauses allow for complex conditional logic. For example: WHERE CASE WHEN price > 100 THEN discount ELSE full_price END > 50. This enables dynamic comparison values based on multiple conditions.
LIKE uses simple wildcard patterns with % and _, while REGEXP enables complex pattern matching using regular expressions. REGEXP provides more powerful pattern matching capabilities including character classes, repetitions, and alternations.
Fuzzy matching can be implemented using functions like SOUNDEX, LEVENSHTEIN distance, or custom string similarity functions. These help find approximate matches when exact matching isn't suitable, useful for handling typos or variations in text.
BETWEEN tests if a value falls within a range, inclusive of boundaries. It handles different data types (numbers, dates, strings) appropriately, but care must be taken with timestamps and floating-point numbers for precise comparisons.
Related records can be filtered using EXISTS/NOT EXISTS, IN/NOT IN with subqueries, or LEFT JOIN with NULL checks. EXISTS often performs better for large datasets as it stops processing once a match is found.
NULL values require special handling: IS NULL/IS NOT NULL for direct comparison, COALESCE/NULLIF for substitution, and careful consideration with NOT IN operations as NULL affects their logic differently than normal values.
Array containment can be checked using ANY/ALL operators, ARRAY_CONTAINS function (in supported databases), or JSON array functions. For databases without native array support, you might need to split strings or use junction tables.
WHERE filters individual rows before grouping and cannot use aggregate functions, while HAVING filters groups after aggregation and can use aggregate functions. HAVING is specifically designed for filtering grouped results.
Dynamic filtering can be implemented using CASE statements, dynamic SQL with proper parameterization, or by building WHERE clauses conditionally. Always use parameterized queries to prevent SQL injection.
Window functions like ROW_NUMBER, RANK, or LAG can be used in subqueries or CTEs to filter based on row position, ranking, or comparison with adjacent rows. They're useful for tasks like finding top N per group.
Temporal filtering uses date/time functions and operators to handle ranges, overlaps, and specific periods. Consider timezone handling, date arithmetic, and proper indexing for performance.
Performance varies based on indexing, data distribution, and filter complexity. Using appropriate indexes, avoiding functions on indexed columns, and choosing the right operators (EXISTS vs IN) can significantly impact performance.
Hierarchical filtering uses recursive CTEs to traverse parent-child relationships. The recursive query combines a base case with a recursive step to filter based on tree structures like organizational charts.
Bitmap indexes are specialized indexes that work well for low-cardinality columns. They can improve filtering performance on multiple conditions through bitmap operations, but may not be suitable for frequently updated data.
JSON data can be filtered using JSON path expressions, JSON extraction functions, and comparison operators. Different databases provide specific functions like JSON_VALUE, JSON_QUERY, or ->> operators for JSON manipulation.
Indexes support efficient data retrieval in filtering operations. Composite indexes, covering indexes, and filtered indexes can be designed to optimize specific filtering patterns and improve query performance.
Overlapping ranges can be handled using combinations of comparison operators, BETWEEN, or specialized range types. Consider edge cases and ensure proper handling of inclusive/exclusive bounds.
Best practices include using appropriate indexes, avoiding functions on filtered columns, considering partitioning, using efficient operators, and implementing pagination or batch processing for large result sets.
Full-text search can be implemented using full-text indexes, CONTAINS/FREETEXT predicates, or specialized functions. Consider relevance ranking, word stemming, and stop words for effective text search.
Subqueries and joins can both be used for filtering, but they have different performance characteristics. Joins often perform better for large datasets, while subqueries can be more readable for existence checks.
Complex date/time filtering involves date arithmetic functions, DATEADD/DATEDIFF, handling of fiscal periods, and consideration of business calendars. Proper indexing strategies are crucial for performance.
XML filtering uses XPath expressions, XML methods like exist(), value(), and nodes(). Consider proper indexing of XML columns and the performance impact of complex XML operations.
Multi-tenant filtering requires consistent application of tenant identifiers, proper indexing strategies, and consideration of row-level security. Use parameters or context settings to ensure tenant isolation.
Soft delete filtering typically uses flag columns or deletion timestamps. Consider impact on indexes, constraints, and query performance. May require careful handling in joins and aggregate operations.
Aggregate-based filtering uses subqueries or window functions to compute aggregates, then filters based on these results. Consider performance implications and appropriate use of HAVING vs WHERE clauses.
Versioned data filtering involves temporal tables, effective dates, or version numbers. Consider overlap handling, current version retrieval, and historical data access patterns.
Geospatial filtering uses spatial data types and functions for operations like distance calculations, containment checks, and intersection tests. Consider spatial indexes for performance optimization.
Dynamic pivot filtering involves generating SQL dynamically based on pivot columns, using CASE expressions or PIVOT operator, and handling varying numbers of columns. Consider performance and maintenance implications.
Materialized views can pre-compute complex filtering conditions for better performance. Consider refresh strategies, storage requirements, and query rewrite capabilities of the database.
Row-level security implements access control at the row level using security predicates, column masks, or policy functions. Consider performance impact, maintenance overhead, and security implications.
A window function performs calculations across a set of table rows related to the current row. Unlike regular aggregate functions that group rows into a single output row, window functions retain the individual rows while adding computed values based on the specified window of rows.
The OVER clause defines the window or set of rows on which the window function operates. It can contain PARTITION BY to divide rows into groups, ORDER BY to sequence rows, and frame specifications to limit the rows within the partition.
ROW_NUMBER() assigns unique sequential numbers to rows, RANK() assigns the same rank to ties with gaps in sequence, and DENSE_RANK() assigns the same rank to ties without gaps. For example, ROW_NUMBER: 1,2,3,4; RANK: 1,2,2,4; DENSE_RANK: 1,2,2,3.
Running totals can be calculated using SUM as a window function with an ORDER BY clause: SUM(value) OVER (ORDER BY date). This creates a cumulative sum where each row contains the total of all previous rows plus the current row.
PARTITION BY divides rows into groups for window function calculations while maintaining individual rows in the result set. GROUP BY collapses rows into single summary rows. PARTITION BY is used within window functions, while GROUP BY is used with aggregate functions.
LAG accesses data from previous rows and LEAD accesses data from subsequent rows in the result set. Both functions can specify an offset and a default value. Example: LAG(price, 1, 0) OVER (ORDER BY date) returns the previous row's price or 0 if none exists.
Window frames define the set of rows within a partition using ROWS or RANGE with frame boundaries like UNBOUNDED PRECEDING, CURRENT ROW, or N PRECEDING/FOLLOWING. They control which rows are included in window function calculations.
Moving averages are calculated using AVG with a window frame specification: AVG(value) OVER (ORDER BY date ROWS BETWEEN n PRECEDING AND CURRENT ROW). This computes the average of the current row and n previous rows.
FIRST_VALUE returns the first value in a window frame, and LAST_VALUE returns the last value. They're useful for comparing current rows with initial or final values in a group, like finding the first or last price in a time period.
Percentiles can be calculated using PERCENTILE_CONT or PERCENTILE_DISC functions with window specifications. PERCENTILE_CONT provides continuous interpolated values, while PERCENTILE_DISC returns actual values from the dataset.
ROWS defines the frame based on physical row count, while RANGE defines it based on logical value ranges. ROWS uses exact row positions, while RANGE groups rows with the same ORDER BY values together.
Percent of total is calculated by dividing the current row's value by the sum over the entire partition: (value * 100.0) / SUM(value) OVER (PARTITION BY group). This shows each row's value as a percentage of its group total.
NTILE divides ordered rows into a specified number of roughly equal groups (buckets). For example, NTILE(4) OVER (ORDER BY value) assigns numbers 1-4 to rows, creating quartiles. It's useful for creating equal-sized groupings of ordered data.
NULL values in window functions can be handled using IGNORE NULLS option with LAG/LEAD/FIRST_VALUE/LAST_VALUE, or by using COALESCE/ISNULL functions. The treatment of NULLs affects frame boundaries and calculation results.
Window functions may require sorting operations and memory for frame processing. Performance can be improved by proper indexing on PARTITION BY and ORDER BY columns, limiting frame sizes, and considering materialized views for complex calculations.
Year-over-year growth can be calculated using LAG to get previous year's value and percentage calculation: (current_value - LAG(value, 1) OVER (ORDER BY year)) * 100.0 / LAG(value, 1) OVER (ORDER BY year).
When ties occur in ORDER BY, window functions handle them based on their specific behavior. ROW_NUMBER assigns unique values arbitrarily, RANK and DENSE_RANK assign same values, and frame specifications may include or exclude tied rows.
Exclusive frames (BETWEEN n PRECEDING AND 1 PRECEDING) exclude the current row, while inclusive frames (BETWEEN n PRECEDING AND CURRENT ROW) include it. This affects calculations like moving averages and running totals.
Multiple window functions can be used in the same query with different OVER clauses. You can also define named windows using WINDOW clause and reference them to avoid repetition and maintain consistency.
Median can be calculated using PERCENTILE_CONT(0.5) OVER (PARTITION BY group) or by combining ROW_NUMBER with aggregation to find the middle value in ordered sets.
CUME_DIST calculates cumulative distribution (relative position) of a value, while PERCENT_RANK calculates relative rank. Both return values between 0 and 1, useful for statistical analysis and percentile calculations.
Date/time windows can use RANGE with date intervals or ROWS with specific counts. Consider timezone handling, date arithmetic, and appropriate frame specifications for time-based analysis.
Gap analysis uses LAG/LEAD to compare consecutive values, identifying missing or irregular values in sequences. Common applications include finding missing sequence numbers or time gaps in event data.
Window functions cannot be nested directly, cannot be used in WHERE clauses, and may have performance implications on large datasets. They're also not available in all SQL databases or versions.
Window functions can be used before or after PIVOT operations to perform calculations across pivoted columns. This requires careful consideration of partitioning and ordering to maintain data relationships.
Running totals with resets use PARTITION BY to define reset boundaries and ORDER BY for sequence. SUM(value) OVER (PARTITION BY reset_column ORDER BY date) calculates totals that reset based on the partition column.
Anomaly detection uses window functions to calculate statistics (avg, stddev) over windows of data, then identifies values that deviate significantly from these statistics using comparison operations.
Window function results can be stored in materialized views for performance, but this requires careful consideration of refresh strategies and storage requirements. Not all databases support window functions in materialized views.
Rolling calculations use window frames with fixed sizes (e.g., ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) combined with aggregate functions. This enables calculations like moving averages, rolling sums, or sliding window analysis.
Window functions in stored procedures require careful string construction for dynamic SQL, proper parameter handling, and consideration of performance impact. Error handling and SQL injection prevention are crucial.
An index is a data structure that improves the speed of data retrieval operations by providing quick access to rows in a database table. It creates a pointer to data based on the values of specific columns, similar to a book's index, reducing the need for full table scans.
A clustered index determines the physical order of data in a table and can only exist once per table. Non-clustered indexes create a separate structure that points to the data and multiple can exist per table. Clustered indexes are typically faster for retrievals but slower for inserts.
Index selectivity is the ratio of unique values to total rows in an indexed column. High selectivity (many unique values) makes an index more effective as it better narrows down the result set. Low selectivity indexes might be ignored by the query optimizer.
Composite indexes include multiple columns in a specific order. They're useful for queries that filter or sort by multiple columns, following the leftmost principle. The order of columns should match common query patterns and consider column selectivity.
NULL values in indexed columns can affect performance by increasing index size and complexity. Some databases store NULL values in the index, while others don't. Understanding NULL handling is crucial for optimal index design and query performance.
Index fragmentation occurs when the logical order of index pages doesn't match their physical order, or when pages have empty space. It can degrade performance by causing extra I/O operations. Regular maintenance (rebuilding or reorganizing) helps maintain optimal performance.
Execution plans show how SQL Server processes a query, including index usage, join types, and estimated costs. Analyze plans to identify full table scans, inefficient joins, or missing indexes. Use this information to optimize queries through index creation or query restructuring.
Covering indexes include all columns needed by a query in the index itself, eliminating the need to access the table. They improve performance by reducing I/O but increase storage space and maintenance overhead. Use them for frequently run queries that access a limited set of columns.
Index maintenance operations (rebuilding, reorganizing) can impact performance by consuming resources and blocking operations. Schedule maintenance during low-usage periods, consider online operations, and balance frequency against database performance needs.
Filtered indexes include only a subset of rows based on a predicate. They're smaller and more efficient for queries matching the filter condition. Use them when queries frequently access a specific subset of data or for implementing row-level security.
Monitor blocking using dynamic management views, identify long-running transactions or lock escalation issues. Solutions include optimizing transaction duration, using appropriate isolation levels, implementing row versioning, or adjusting index design.
Parameter sniffing occurs when SQL Server reuses an execution plan optimized for specific parameter values. It can lead to poor performance when data distribution varies significantly. Solutions include using RECOMPILE hints or local variables.
Statistics provide the query optimizer with data distribution information to choose efficient execution plans. Outdated or missing statistics can lead to poor plan choices. Regular updates and appropriate sampling rates are crucial for optimal performance.
Index foreign key columns to improve JOIN performance and maintain referential integrity efficiently. Consider column order in composite indexes, include frequently queried columns, and evaluate the impact on write operations.
Strategies include partitioning, batch processing, minimizing logging, using appropriate isolation levels, and considering index impact. For maintenance operations, use minimal logging, tempdb optimization, and parallel execution when possible.
Index intersection occurs when the query optimizer uses multiple indexes to satisfy a query. While it can be efficient for some queries, too many index intersections might indicate the need for a better composite index.
Temporal table indexing requires consideration of both current and history tables. Index historical columns based on query patterns, consider filtered indexes for active records, and maintain appropriate statistics for both tables.
Bitmap indexes use bit arrays to track row locations for specific values. They're efficient for low-cardinality columns and complex AND/OR operations but perform poorly with frequent updates. Common in data warehousing scenarios.
Use covering indexes for common report queries, consider indexed views, implement partitioning for large tables, and evaluate materialized views. Balance real-time needs against data freshness requirements.
GUID clustered keys can cause page splits and fragmentation due to random value insertion. This impacts performance through increased I/O and maintenance overhead. Consider sequential GUIDs or alternative key designs for better performance.
Use appropriate indexing for parent-child relationships, consider materialized path or nested sets models, implement covering indexes for common traversal patterns, and evaluate graph database features for complex hierarchies.
Full-text indexes for text search, filtered indexes for non-NULL values, and careful evaluation of included columns. Consider partial indexing strategies and impact on maintenance operations.
Monitor deadlocks using trace flags or extended events, analyze deadlock graphs, optimize transaction patterns, adjust isolation levels, and ensure consistent access order for resources. Consider index design impact on lock types.
Index key compression reduces storage space by eliminating redundant key values. It's beneficial for indexes with many duplicate values or long key values, but increases CPU usage. Evaluate compression benefits against performance impact.
Use appropriate indexes for join conditions, consider batch processing, implement proper transaction handling, and evaluate MERGE statement alternatives. Monitor lock escalation and consider impact on existing indexes.
Align indexes with partition scheme, consider local vs. global indexes, implement filtered indexes for partition elimination, and maintain statistics at the partition level. Balance maintenance overhead against query performance needs.
Implement proper parameter handling, consider filtered indexes for common conditions, use dynamic SQL carefully, and evaluate index impact of different search patterns. Consider using OPTION (RECOMPILE) for highly variable queries.
Include date/time columns in appropriate index position, consider partitioning for historical data, implement sliding window maintenance, and evaluate impact of timezone handling on query performance.
Consider elastic resources, monitor DTU/vCore usage, implement appropriate scaling strategies, evaluate cost-based optimization, and understand cloud-specific indexing limitations. Balance performance against cloud resource costs.
The four ACID properties are: Atomicity (transactions are all-or-nothing), Consistency (transactions maintain database integrity), Isolation (concurrent transactions don't interfere with each other), and Durability (committed transactions are permanent).
Atomicity ensures that all operations in a transaction either complete successfully or roll back entirely. It's maintained through transaction logs and rollback mechanisms that undo partial changes if any part of the transaction fails.
The standard isolation levels are: READ UNCOMMITTED (lowest), READ COMMITTED, REPEATABLE READ, and SERIALIZABLE (highest). Each level provides different protection against read phenomena like dirty reads, non-repeatable reads, and phantom reads.
A deadlock occurs when two or more transactions are waiting for each other to release locks. Prevention strategies include consistent access order, minimizing transaction duration, using appropriate isolation levels, and implementing deadlock detection.
Pessimistic concurrency control locks resources when accessed, preventing concurrent modifications. Optimistic concurrency allows multiple users to access data and checks for conflicts at commit time. Each approach has different performance and concurrency implications.
A dirty read occurs when a transaction reads data that hasn't been committed by another transaction. READ COMMITTED and higher isolation levels prevent dirty reads by ensuring transactions only read committed data.
SNAPSHOT isolation provides transaction-consistent views of data using row versioning. It allows readers to see a consistent snapshot of data as it existed at the start of the transaction, without blocking writers.
Durability ensures that committed transactions survive system failures. It's guaranteed through write-ahead logging (WAL), where transaction logs are written to stable storage before changes are considered complete.
A phantom read occurs when a transaction re-executes a query and sees new rows that match the search criteria. SERIALIZABLE isolation level prevents phantom reads by using range locks on the query predicates.
Savepoints mark a point within a transaction that can be rolled back to without affecting the entire transaction. They allow partial rollback of transactions while maintaining atomicity of the overall transaction.
Lock escalation converts many fine-grained locks into fewer coarse-grained locks to reduce system overhead. While it conserves resources, it can reduce concurrency by holding broader locks than necessary.
Distributed transactions use two-phase commit protocol: prepare phase ensures all participants can commit, commit phase finalizes changes. Additional coordination and recovery mechanisms handle network failures and participant unavailability.
Long-running transactions can hold locks for extended periods, reducing concurrency, increasing deadlock probability, and consuming system resources. They can also impact transaction log space and recovery time.
Row versioning maintains multiple versions of data rows, allowing readers to see consistent data without blocking writers. It's used in SNAPSHOT isolation and READ COMMITTED SNAPSHOT, improving concurrency at the cost of additional storage.
SQL Server uses shared (S), exclusive (X), update (U), intent, and schema locks. Each type serves different purposes in controlling concurrent access to resources while maintaining transaction isolation.
Transaction timeouts can be handled using SET LOCK_TIMEOUT, implementing application-level timeouts, monitoring long-running transactions, and implementing retry logic with appropriate error handling.
A non-repeatable read occurs when a transaction reads the same row twice and gets different values due to concurrent updates. REPEATABLE READ and higher isolation levels prevent this by maintaining read locks until transaction completion.
Constraint violations trigger automatic rollback of the current transaction to maintain database consistency. Error handling should catch these exceptions and manage the rollback process appropriately.
Transaction logging records all database modifications in a sequential log file. It's crucial for maintaining ACID properties, enabling rollback operations, and recovering from system failures.
Implement retry logic by catching specific error conditions, using exponential backoff, setting appropriate timeout values, and ensuring idempotency. Consider deadlock victims and transient failures separately.
The transaction coordinator manages the two-phase commit protocol, ensures all participants either commit or roll back, handles recovery from failures, and maintains transaction state information.
SQL Server supports nested transactions through @@TRANCOUNT, but only the outermost transaction is physically committed or rolled back. Inner transactions only affect the transaction count and rollback behavior.
Higher isolation levels provide stronger consistency guarantees but can reduce concurrency and performance. Lower levels offer better concurrency but risk data anomalies. Choose based on application requirements.
Use system views like sys.dm_tran_locks, extended events, SQL Profiler, monitor transaction logs, analyze deadlock graphs, and track lock waits. Implement appropriate alerts and monitoring strategies.
Checkpointing writes dirty buffer pages to disk and records the operation in transaction logs. It reduces recovery time after system failure and manages log space by allowing log truncation.
Use appropriate batch sizes, implement checkpoint logic, consider isolation level impact, manage transaction log growth, and implement error handling with partial commit capability when appropriate.
Break into smaller transactions when possible, use appropriate isolation levels, implement progress monitoring, consider batch processing, and ensure proper error handling and recovery mechanisms.
Use appropriate isolation levels, implement optimistic concurrency when suitable, minimize transaction duration, use proper indexing strategies, and consider row versioning for read-heavy workloads.
Explicit transactions are manually controlled using BEGIN, COMMIT, and ROLLBACK statements. Implicit transactions automatically commit after each statement or are controlled by connection settings. Explicit transactions offer more control but require careful management.
The main types of constraints in SQL are: PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL, and DEFAULT constraints. Each type enforces different rules to maintain data integrity and relationships between tables.
A PRIMARY KEY constraint enforces uniqueness and doesn't allow NULL values, while a UNIQUE constraint allows one NULL value (in most databases) and multiple columns can have UNIQUE constraints. PRIMARY KEY also implicitly creates a clustered index by default.
Referential actions include: CASCADE (propagate changes), SET NULL (set to NULL), SET DEFAULT (set to default value), and NO ACTION/RESTRICT (prevent changes). These actions determine how child records are handled when parent records are updated or deleted.
CHECK constraints enforce domain integrity by limiting the values that can be entered into a column based on a logical expression. They can validate data against specific rules, ranges, or patterns before allowing inserts or updates.
Complex business rules can be implemented using combinations of CHECK constraints, computed columns, and trigger-based validation. Consider performance impact, maintainability, and the balance between constraint and application-level validation.
Constraints impact INSERT, UPDATE, and DELETE performance due to validation overhead. Foreign keys can affect join performance, while CHECK constraints may impact DML operations. Proper indexing and constraint design are crucial for optimal performance.
Handle constraint violations through error catching, appropriate error messages, transaction management, and retry logic where appropriate. Consider using TRY-CATCH blocks and implementing specific handling for different constraint violation types.
DRI uses database constraints to enforce data integrity rules automatically. It's important because it ensures consistent enforcement of rules, reduces application code complexity, and maintains data quality at the database level.
Composite key constraints involve multiple columns and require careful consideration of column order, indexing strategy, and impact on related foreign keys. Consider performance implications and ensure all components are necessary.
Temporal constraints enforce rules based on date/time values, such as valid periods or sequential relationships. They can be implemented using CHECK constraints, triggers, or temporal tables with system-versioning.
Different constraints handle NULL values differently: PRIMARY KEY doesn't allow NULL, UNIQUE typically allows one NULL, CHECK constraints need explicit NULL handling, and FOREIGN KEY allows NULL unless explicitly prohibited.
Constraints can affect bulk loading, index rebuilds, and partition maintenance. Consider disabling/re-enabling constraints for large operations, verify constraint integrity after maintenance, and plan for appropriate maintenance windows.
Cross-table constraints can be implemented using foreign keys, CHECK constraints with subqueries (where supported), or triggers. Consider performance impact and maintenance implications of different approaches.
Use consistent, descriptive naming conventions that identify constraint type, affected tables/columns, and purpose. Consider including prefixes for constraint types and ensure names are unique within the database.
Implement soft delete using filtered foreign keys, check constraints on active/inactive status, or triggers. Consider impact on queries, indexes, and maintenance operations when choosing an approach.
Consider partition alignment, constraint checking overhead, and maintenance operations. Ensure constraints work effectively across partitions and understand impact on partition switching operations.
Dynamic defaults can be implemented using computed columns, triggers, or application logic. Consider performance impact, maintainability, and whether logic belongs at database or application level.
In data warehouses, constraints help ensure data quality, maintain relationships between fact and dimension tables, and support slowly changing dimensions. Balance constraint enforcement with load performance requirements.
Hierarchical constraints can use self-referencing foreign keys, CHECK constraints for level limitations, or specialized solutions like closure tables. Consider query performance and maintenance complexity.
Test constraints with boundary values, NULL cases, and complex scenarios. Verify constraint behavior during concurrent operations, test referential actions, and ensure proper error handling.
Plan constraint modifications carefully, use appropriate transaction isolation, consider impact on existing data, and implement proper validation before and after changes. Maintain backup constraints where necessary.
Consider constraint checking on both primary and secondary servers, impact on replication performance, and handling of constraint violations during replication. Ensure consistent constraint definition across servers.
Use row-level security, filtered indexes, or tenant-specific schemas. Implement appropriate constraints for tenant isolation and consider performance impact of different approaches.
Create clear, actionable error messages that help identify the specific violation. Consider using custom error messages with CHECK constraints and appropriate error handling in applications.
Conditional constraints can be implemented using CHECK constraints with CASE expressions, filtered indexes, or triggers for more complex conditions. Consider performance and maintainability trade-offs.
Consider lock contention, deadlock potential, and validation performance. Choose appropriate constraint types and implement proper indexing to support constraint checking efficiently.
Use CHECK constraints for basic validation, triggers for complex rules, and consider using computed columns for derived values. Implement appropriate error handling and validation reporting.
Document constraint purposes, business rules implemented, maintenance procedures, and testing requirements. Include information about dependencies, performance implications, and modification procedures.
Plan constraint deployment carefully, script modifications idempotently, verify constraint state after migration, and consider impact on existing data. Include rollback procedures in migration plans.
Stored procedures can perform actions and return multiple result sets but don't necessarily return values, while functions must return a value/table and can be used in SELECT statements. Functions are more limited in what they can do (e.g., can't modify data in most cases) but are more flexible in queries.
SQL supports Scalar functions (return single value), Table-valued functions (return table result set), and Aggregate functions (operate on multiple values). User-defined functions can be either scalar or table-valued, while built-in functions come in all three types.
Error handling in stored procedures uses TRY-CATCH blocks, ERROR_NUMBER(), ERROR_MESSAGE(), and RAISERROR/THROW statements. Implement appropriate error logging, transaction management, and status returns to calling applications.
Benefits include: better security through encapsulation and permissions, reduced network traffic, code reuse, easier maintenance, cached execution plans, and the ability to implement complex business logic at the database level.
Optimize by using appropriate indexes, avoiding parameter sniffing issues, implementing proper error handling, using SET NOCOUNT ON, minimizing network roundtrips, and considering query plan reuse. Monitor and analyze execution plans for potential improvements.
Parameter sniffing occurs when SQL Server reuses a cached plan optimized for specific parameter values. Handle it using OPTION (RECOMPILE), local variables, or dynamic SQL in specific cases. Consider data distribution when choosing a solution.
Implement dynamic SQL using sp_executesql with parameterization to prevent SQL injection. Properly escape identifiers, validate inputs, and consider performance implications. Avoid string concatenation with user inputs.
Use appropriate data types, provide default values when sensible, validate inputs, use meaningful parameter names, document parameters clearly, and consider NULL handling. Implement proper parameter validation logic.
Implement explicit transactions with proper error handling, consider nested transaction levels, use appropriate isolation levels, and handle deadlock scenarios. Ensure proper cleanup in error cases.
Inline table-valued functions return table results based on a single SELECT statement. They often perform better than multi-statement functions because they can be treated like views and participate in query optimization.
Handle large results using pagination, batch processing, table-valued parameters, temporary tables, or table variables. Consider memory usage, network bandwidth, and client application capabilities.
Consider EXECUTE permissions, ownership chaining, module signing, dynamic SQL security, and principle of least privilege. Implement proper input validation and avoid SQL injection vulnerabilities.
Implement logging using dedicated log tables, error handling blocks, and appropriate detail levels. Consider performance impact, retention policies, and monitoring requirements.
sp_executesql supports parameterization and better plan reuse, while EXEC is simpler but more vulnerable to SQL injection. sp_executesql is preferred for dynamic SQL due to security and performance benefits.
Handle concurrency using appropriate isolation levels, locking hints, transaction management, and deadlock prevention strategies. Consider implementing retry logic for deadlock victims.
Keep functions deterministic when possible, avoid excessive complexity, consider performance impact in queries, use appropriate return types, and document behavior clearly. Avoid side effects in functions.
Use schema versioning, naming conventions, source control, and proper documentation. Consider backward compatibility, deployment strategies, and rollback procedures.
CLR stored procedures are implemented in .NET languages and useful for complex calculations, string operations, or external resource access. Consider security implications and performance overhead compared to T-SQL.
Handle NULLs using ISNULL/COALESCE, appropriate function logic, and clear documentation of NULL behavior. Consider impact on query optimization and result accuracy.
Consider transaction handling, error propagation, parameter passing, and performance impact. Manage transaction scope and error handling appropriately across nested calls.
Implement paging using OFFSET-FETCH, ROW_NUMBER(), or other ranking functions. Consider performance with large datasets, sort stability, and total count requirements.
Consider scope, reuse, indexing strategy, and cleanup of temporary tables. Balance between table variables and temporary tables based on size and complexity.
Implement progress reporting, batch processing, appropriate transaction handling, and monitoring capabilities. Consider timeout handling and cancelation support.
Scalar functions operate on a single value and return a single value, while aggregate functions operate on sets of values and return a single summary value. Scalar functions can be used in SELECT lists and WHERE clauses.
Implement retry logic using WHILE loops, error handling, appropriate wait times, and maximum retry counts. Consider transient error conditions and implement appropriate backoff strategies.
Consider execution order, deterministic operations, identity column handling, and timestamp handling. Ensure procedures work consistently across primary and secondary servers.
Implement appropriate encryption, use secure parameter passing, avoid logging sensitive data, and consider data masking requirements. Follow security best practices for handling confidential information.
Document purpose, parameters, return values, error conditions, dependencies, and usage examples. Include version history, performance considerations, and any special handling requirements.
Design procedures to produce the same result regardless of multiple executions. Use appropriate checks, handle existing data, and implement proper transaction management for consistency.
A view is a virtual table based on a SELECT query. Benefits include data abstraction, security through column/row filtering, query simplification, and data consistency. Views can hide complexity and provide a secure interface to underlying tables.
A regular view is a stored query that executes each time it's referenced, while a materialized view stores the result set physically. Materialized views offer better performance for complex queries but require storage and maintenance for data freshness.
An indexed view physically stores its result set with a unique clustered index. It's useful for queries with expensive computations or aggregations that are frequently accessed but rarely updated. Consider maintenance overhead and storage requirements.
Temporary tables (#temp) are stored in tempdb with statistics and support indexes, while table variables (@table) are memory-optimized and have limited statistics. Temp tables persist until dropped or session ends, while table variables have procedure-level scope.
Optimize views by avoiding SELECT *, using appropriate indexes, limiting subquery usage, considering indexed views for frequent queries, and ensuring base table optimization. Consider the impact of view nesting and complexity on query performance.
Views are updateable if they reference only one base table, don't include GROUP BY, DISTINCT, or aggregates, and don't use complex joins. Updates must map to single base table rows and respect all constraints.
Local temp tables (#table) are visible only to the creating session, while global temp tables (##table) are visible to all sessions. Use global temp tables for cross-session data sharing, but consider concurrency and cleanup implications.
Implement security using GRANT/DENY permissions, row-level security, column filtering, and schema binding when needed. Views can provide controlled access to sensitive data while hiding underlying table structures.
WITH SCHEMABINDING prevents changes to referenced objects that would affect the view's definition. It's required for indexed views and helps maintain data integrity by preventing unauthorized schema changes.
Maintain consistency through refresh strategies (complete or incremental), appropriate refresh timing, and tracking of base table changes. Consider performance impact and business requirements for data freshness.
Implement explicit cleanup in stored procedures, use appropriate scope management, consider session handling, and implement error handling for cleanup. Monitor tempdb usage and implement regular maintenance procedures.
Minimize view nesting to avoid performance issues, consider materialization for complex views, analyze execution plans, and maintain clear documentation. Balance abstraction benefits against performance impact.
Partitioned views combine data from multiple tables using UNION ALL. Consider partition elimination, constraint requirements, and performance impact. Ensure proper indexing and maintenance strategies.
Implement RLS using filtered views, inline table-valued functions, or security policies. Consider performance impact, maintenance requirements, and security boundary effectiveness.
View resolution affects query optimization, with nested views potentially causing performance issues. Consider materialization, indexing strategies, and query plan analysis for optimal performance.
Implement dynamic filtering using parameterized views, inline table-valued functions, or CROSS APPLY. Consider performance impact and maintenance requirements of different approaches.
Use consistent naming conventions, document purpose and dependencies, maintain version history, and include performance considerations. Clear documentation helps maintain and troubleshoot views effectively.
Manage concurrency using appropriate isolation levels, proper transaction handling, and consideration of scope. Implement proper error handling and deadlock mitigation strategies.
Consider publication requirements, filter complexity, maintenance overhead, and performance impact. Ensure views work consistently across replicated environments.
Implement changes using proper version control, testing procedures, and impact analysis. Consider dependent objects, security implications, and backward compatibility.
CTEs provide better readability, are scope-limited to a single statement, and don't require cleanup. Temporary tables offer persistence, reuse, and index support. Choose based on use case requirements.
Consider proper indexing, statistics maintenance, batch processing, and memory management. Monitor tempdb performance and implement appropriate cleanup strategies.
Consider query patterns, update frequency, storage requirements, and maintenance overhead. Ensure proper statistics maintenance and monitor performance impact.
Use views to provide transparent access to archived data, implement partitioned views for historical data, and consider performance implications of cross-archive queries.
Implement appropriate error handling for view operations, consider impact of base table errors, and provide meaningful error messages. Handle NULL values and edge cases appropriately.
Configure proper tempdb files and sizes, monitor usage patterns, implement appropriate cleanup, and consider file placement and IO patterns.
Consider performance impact, maintenance windows, dependency management, and error handling. Implement appropriate logging and monitoring for ETL operations.
Manage schema changes through proper version control, impact analysis, and testing procedures. Consider dependent objects and implement appropriate update strategies.
Implement comprehensive testing including performance, security, data accuracy, and edge cases. Consider impact of data volume and maintain test cases for regression testing.
Understand concepts like SELECT, JOIN, GROUP BY, and subqueries.
Work on creating and optimizing database schemas and queries.
Dive into indexing, partitioning, and transaction management.
Expect hands-on challenges to design, query, and optimize databases.
Join thousands of successful candidates preparing with Stark.ai. Start practicing SQL questions, mock interviews, and more to secure your dream role.
Start Preparing now