https://habrahabr.ru/post/321504/
https://blog.jooq.org/2017/04/20/how-to-calculate-multiple-aggregate-functions-in-a-single-query/
https://stackoverflow.com/questions/7745609/sql-select-only-rows-with-max-value-on-a-column
http://mysql.rjweb.org/doc.php/groupwise_max
SELECT P.parent_name, GROUP_CONCAT(C.child_name)FROM Parent P INNER JOIN Child C ON P.id = C.parent_id GROUP BY P.parent_name
https://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL
https://habrahabr.ru/post/323498/ Universal catalog
https://habrahabr.ru/post/324012/ same as above using Graph DB
https://www.youtube.com/watch?v=mgipNdAgQ3o 10 SQL tricks
https://periscopedata.com/blog//how-joins-work.html
nested loop
hash join: build a temporary hash table to index the values of the field whose equality is being tested. This saves time at the cost of memory. The outer table applies a hash function over the comparison fields and navigates to the matching stored values created in the earlier step, reducing the number of records it must traverse
merge join: first, both tables of the join are sorted on the join attribute. This can be done with just two passes through each table via an external merge sort. Finally, the result tuples are generated as the next ordered element is pulled from each table and the join attributes are compared.
http://www.infoworld.com/resources/52753/sql/sql-unleashed-17-tips-for-faster-sql-queries#tk.ifw-infsb
https://habrahabr.ru/post/320916/
http://www.slideshare.net/LukasEder1/10-sql-tricks-that-you-didnt-think-were-possible
https://news.ycombinator.com/item?id=13417326
http://db.cs.berkeley.edu/papers/vldb09-madskills.pdf
https://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL#Common_table_expression
http://stackoverflow.com/questions/39004165/sql-find-intervals-of-step-function
http://stackoverflow.com/questions/39859923/sql-how-to-find-all-local-min-in-column
https://modern-sql.com/feature/with/performance
https://technet.microsoft.com/en-us/library/bb522495(v=sql.105).aspx
Window functions
https://blog.statsbot.co/sql-window-functions-tutorial-b5075b87d129
http://www.silota.com/docs/recipes/sql-histogram-summary-frequency-distribution.html
https://drill.apache.org/docs/sql-window-functions-introduction/
https://www.postgresql.org/docs/current/static/functions-window.html
https://blog.jooq.org/2013/11/03/probably-the-coolest-sql-feature-window-functions/
http://db.cs.berkeley.edu/papers/vldb09-madskills.pdf
http://tapoueh.org/blog/2013/08/20-Window-Functions
https://blog.jooq.org/2014/08/12/the-difference-between-row_number-rank-and-dense_rank/
http://www.silota.com/docs/recipes/sql-linear-regression.html
Oracle online
https://livesql.oracle.com/apex/livesql/file/index.html
SELECT column, another_column FROM mytable UNION / UNION ALL / INTERSECT / EXCEPT SELECT other_column, yet_another_column FROM another_table ORDER BY column DESC LIMIT n;
union, intersect, except removes duplicates
https://blog.jooq.org/2016/05/27/correlated-subqueries-are-evil-and-slow-or-are-they/
https://news.ycombinator.com/item?id=11755116
http://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
http://www.sqlhammer.com/the-b-in-b-tree-indexing-sql-server/
PIVOT
https://modern-sql.com/use-case/pivot
https://news.ycombinator.com/item?id=11636728
http://neovintage.org/2016/05/04/the-bool-aggregate-functions/
http://cadlg.com/2016/04/understanding-analytic-functions/
select targetStartAt, EXTRACT(YEAR from targetStartAt) from Test;
https://blog.treasuredata.com/blog/2016/07/22/rolling-retention-done-right-in-sql/
https://danmartensen.svbtle.com/sql-performance-of-join-and-where-exists
http://rickbergfalk.github.io/sqlpad/
http://schemaspy.sourceforge.net schema browser
https://modern-sql.com/feature/with
https://www.periscopedata.com/blog/sql-summary-statistics.html
http://danmartensen.svbtle.com/sql-performance-of-join-and-where-exists
https://github.com/hudak/sql-parser
http://incubator.apache.org/projects/calcite.html
https://github.com/z16385947/sql-parser
https://en.wikipedia.org/wiki/Correlated_subquery
UPDATE FROM SELECT
UPDATE student_marks AS t1
INNER JOIN student_profile AS t2
ON t1.student_id = t2.student_id
SET t1.student_name = t2.student_name
WHERE t1.student_id <= 100;
In the SQL99 update of the SQL standard, the ANSI committee added a collection of grouping and reporting clauses to SQL to support the technology known as OLAP for "OnLine Analytical Processing". OLAP technologies, still used today, are ways to summarize and explore large data sets by grouping them in a multi-dimensional space. The three features added to SQL, ROLLUP, CUBE, and GROUPING SETS, provide different ways of summarizing data sets on several axes at once.
ROLLUP is the easiest to understand, since it simply provides subtotals and master totals for the whole set provided. For example, this query would provide counts for each city, then subtotals for each country, and finally a grand total of subscribers.
SELECT country, city, count(*) FROM lwn_subscribers GROUP BY ROLLUP ( country, city ); country | city | count ----------+---------------+------- Australia | Brisbane | 561 Australia | Melbourne | 302 Australia | Perth | 219 Australia | | 1082 USA | New York | 591 USA | Portland | 617 USA | San Francisco | 610 USA | | 1818 | | 2900
CUBE is less immediately understandable, because it's mainly intended to feed multi-dimensional data into an external OLAP tool like Mondrian. It expresses all possible totals for all combinations of specified columns, otherwise known as a "combinatorial explosion". For example, this query would provide totals for each country and each subscription level, and each combination of country and level.
SELECT country, level, count(*) FROM lwn_subscribers GROUP BY CUBE ( country, level ); country | level | count ----------+--------------+------- Australia | leader | 140 Australia | professional | 490 Australia | starving | 394 Australia | supporter | 58 Australia | | 1082 USA | leader | 301 USA | professional | 765 USA | starving | 602 USA | supporter | 150 USA | | 1818 | | 2900 | leader | 441 | professional | 1255 | starving | 996 | supporter | 208
Finally, GROUPING SETS are sort of the "parent class" of both ROLLUP and CUBE. It allows you to select subtotal and total groups and combinations manually to display, in case you need more specific output than ROLLUP orCUBE provide.
For example, say I wanted totals for level and city, then level, and then a grand total, I would write:
SELECT city, level, count(*) FROM lwn_subscribers GROUP BY GROUPING SETS ((city, level),(level),()); city | level | count ---------------+--------------+------- Brisbane | leader | 94 Melbourne | leader | 44 New York | leader | 105 Perth | leader | 2 Portland | leader | 94 San Francisco | leader | 102 | leader | 441 Brisbane | professional | 236 Melbourne | professional | 121 New York | professional | 250 ...
The benefit for users in these query constructs is the ability to get totals, subtotals, and combinations in one query and one scan over the table, which in the past would have required multiple queries against the database.
Transaction isolation level
http://en.wikipedia.org/wiki/Isolation_(database_systems)
http://habrahabr.ru/company/infopulse/blog/261097/
http://habrahabr.ru/company/infopulse/blog/261101/
https://habrahabr.ru/post/317884/
LLVM + SQL
http://blog.cloudera.com/blog/2013/02/inside-cloudera-impala-runtime-code-generation/
https://news.ycombinator.com/item?id=8470754
http://devblogs.nvidia.com/parallelforall/mapd-massive-throughput-database-queries-llvm-gpus/
EXISTS versus IN
http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/
Joins
http://sqlinthewild.co.za/index.php/2007/12/30/execution-plan-operations-joins/
http://blogs.msdn.com/b/craigfr/archive/2006/08/16/702828.aspx
SEMI-JOIN between two tables returns rows from the first table where one or more matches are found in the second table. The difference between a semi-join and a conventional join is that rows in the first table will be returned at most once. Even if the second table contains two matches for a row in the first table, only one copy of the row will be returned. Semi-joins are written using the EXISTS or IN constructs.
http://www.dbspecialists.com/files/presentations/semijoins.html
http://kb.askmonty.org/en/semi-join-subquery-optimizations
In Oracle 9i a semi-join can be performed using the nested loops, hash join, or merge join algorithms.
EXCEPT and INTERSECT for semi - joins:
http://msdn.microsoft.com/en-us/library/ms191255.aspx
ANTI-JOINS are written using the NOT EXISTS or NOT IN constructs. These two constructs differ in how they handle nulls
If the subquery of the NOT IN clause retrieves a row with a null value, the entire NOT IN clause always evaluate to false; hence ho make NOT EXISTS and NOT IN works in the same way the NULL should be explicitly eliminated in NOT IN subquery:
SELECT * FROM A WHERE x NOT IN ( SELECT d .. WHERE d IS NOT NULL)
list of departments with no employees:
http://awads.net/wp/2007/05/01/back-to-basics-anti-joins-and-semi-joins/
Analytic Functions
http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm
WITH clause
http://docs.oracle.com/cd/E11882_01/server.112/e25555/tdpdw_sql.htm#TDPDW0073
Advanced SQL
http://ebobby.org/2012/11/11/Using-Sets-For-Many-To-Many-Relationships.html
http://www.sqlexamples.info/sqladvanced.htm
http://www.optimaldba.com/papers/UKOUGSQLMasterClass.pdf
CUBE ROLLUP
the ROLLUP extension produces group subtotals from right to left and a grand total. If "n" is the number of columns listed in the ROLLUP, there will be n+1 levels of subtotals.
ROLLUP (YEAR, MONTH, DAY)
With a ROLLUP, it will have the following outputs:
YEAR, MONTH, DAY
YEAR, MONTH
YEAR
()
With CUBE, it will have the following:
YEAR, MONTH, DAY
YEAR, MONTH
YEAR, DAY
YEAR
MONTH, DAY
MONTH
DAY
()
CUBE essentially contains every possible rollup scenario for each node whereas ROLLUP will keep the hierarchy in tact (so it won't skip MONTH and show YEAR/DAY, whereas CUBE will)
WINDOW FUNCTION OVER PARTITION BY
a window function is a subquery over the results of the main query, in which you define the way to subset the returned columns over which to create a "window," and combine that with a set of operations to perform for each window. For example, imagine you had a query that returned a list of people, and for each person there was a row that had an address and the date they moved there. If you only wanted the three most recent addresses for each person, you could do something like this:
with ranked_addresses as ( select p.id, p.name, a.address, a.move_date, row_number() over ( partition by p.id order by a.move_date desc ) as address_rank from person p join address a on p.id = a.person_id ) select id, name, address, move_date from ranked_addresses where address_rank >= 3 order by address_rank;
You can see that we are creating an additional returned column that contains an autoincrementing integer, and each person has their own window or partition within which that column resets to 1. However, there are a ton of other operations you can perform besides just a simple row number. You can do rank and dense_rank, any aggregate operation (max, sum, avg, count, etc.), as well as things like comparing to the next or previous row within the window using lead or lag.
https://buildingvts.com/understanding-postgres-window-functions-697bc0ff2ed4#.gd91sagvo
http://msdn.microsoft.com/en-us/library/ms189461.aspx
http://srinathsview.blogspot.com/2015/02/why-we-need-sql-like-query-language-for.html
http://www.sql-workbench.net/dbms_comparison.html
http://www.reddit.com/r/programming/comments/2wfvjd/our_sql_interview_questions/
http://rob.conery.io/2015/02/24/embracing-sql-in-postgres/
http://truongtx.me/2014/02/28/tree-structure-query-with-postgresql/
http://engineering.tilt.com/7-postgresql-data-migration-hacks/
https://news.ycombinator.com/item?id=9111362
https://news.ycombinator.com/item?id=9111866
If you want to check if two columns are different, you can use is distinct from:
select * from userswhere has_address is distinct from has_photo-- result: users with an address or -- photo, but not both
One handy way to filter rows is with a subselect. For example, if you wanted the users who did not have any packages, you could write a query like this:
select * from users where id not in (select user_id from packages)
But if one of the rows in packages has a null user_id, this query will return no results!
If the condition is inverted, the query works fine. This time we'll look for users with packages.
select * from users where id in (select user_id from packages)
http://blog.jooq.org/2015/02/13/10-sql-articles-everyone-must-read/
http://use-the-index-luke.com/blog/2015-02/modern-sql
http://www.slideshare.net/MarkusWinand/modern-sql
Time Series
http://akumuli.org/akumuli/2015/01/16/motivation/
CTE WITH
http://www.iron.io/blog/2015/09/legible-efficient-sql-with-ctes.html
http://research.neustar.biz/2014/05/19/writing-analytics-sql-with-common-table-expressions/
https://www.sqlite.org/lang_with.html
Removing duplicates
http://desple.com/post/108909223762/recursion-with-sql
https://news.ycombinator.com/item?id=8934456
http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back
http://en.wikipedia.org/wiki/Nested_set_model
http://habrahabr.ru/post/263629/
http://mikhailstadnik.com/hierarchical-data-structures-and-performance
http://habrahabr.ru/post/193166/
http://habrahabr.ru/post/153861/
http://habrahabr.ru/post/166699/
http://www.slideshare.net/billkarwin/models-for-hierarchical-data
http://leopard.in.ua/2013/07/11/storing-trees-in-rdbms/
http://www.rittmanmead.com/2007/06/obiee-data-modeling-tips-3-ragged-hierarchies/
http://edwblog.com/architecture/ragged-hierarchy-navigation/112
http://ask.sqlservercentral.com/questions/32055/ragged-hierarchy-dimension-definition.html
http://scn.sap.com/thread/1882261
http://www.zaachi.com/en/items/modified-preorder-tree-traversal-algoritmus-1.html
http://www.sitepoint.com/hierarchical-data-database-2/
http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
http://troels.arvin.dk/db/rdbms/links/#hierarchical
http://techportal.ibuildings.com/2009/09/07/graphs-in-the-database-sql-meets-social-networks/
http://dirtsimple.org/2010/11/simplest-way-to-do-tree-based-queries.html
http://www.slideshare.net/quipo/trees-in-the-database-advanced-data-structures
http://rsdn.ru/article/db/db_nav1.xml http://rsdn.ru/article/db/Hierarchy.xml
http://www.reddit.com/r/programming/comments/e30ab/the_simplest_way_to_do_treebased_queries_in_sql/
http://en.wikipedia.org/wiki/Hierarchical_query
http://www.gfilter.net/?Post=Trees-in-relational-databases
http://msdn.microsoft.com/en-us/library/ms186243.aspx
http://wiki.postgresql.org/wiki/CTEReadme
http://dirtsimple.org/2010/11/simplest-way-to-do-tree-based-queries.html
http://habrahabr.ru/blogs/development/47280/
https://communities.bmc.com/communities/docs/DOC-9902
http://blog.webagesolutions.com/archives/category/microsoft
Celko. Trees and Hierarchies in SQL for Smarties
Finding Median in SQL
http://datamonkey.pro/blog/median_in_sql/
SELECT x.price
FROM positions x, positions y
GROUP BY x.price
HAVING SUM(SIGN(1-SIGN(y.price-x.price))) = (COUNT(*)+1)/2;
Question 1: SQL Query to find second highest salary of Employee
There are many ways to find second highest salary of Employee in SQL, you can either use SQL Join or Subquery to solve this problem. Here is SQL query using Subquery :
select MAX(Salary) from Employee WHERE Salary NOT IN (select MAX(Salary) from Employee );
See How to find second highest salary in SQL for more ways to solve this problem.
Question 2: SQL Query to find Max Salary from each department.
You can find maximum salary for each department by grouping all records by DeptId and then using MAX() function to calculate maximum salary in each group or each department.
SELECT DeptID, MAX(Salary) FROM Employee GROUP BY DeptID.
This questions become more interesting if Interviewer will ask you to print department name instead of department id, in that case you need to join Employee table with Department using foreign key DeptID, make sure you do LEFT OUTER JOIN to include departments without any employee as well. Here is the query
SELECT DeptName, MAX(Salary) FROM Employee e LEFT JOIN Department d ON e.DeptId = d.DeptID;
Question 3: Write SQL Query to display current date.
SQL has built in function called GetDate() which returns current timestamp. This will work in Microsoft SQL Server, other vendors like Oracle and MySQL also has equivalent functions.
SELECT GetDate();
Question 4: Write an SQL Query to check whether date passed to Query is date of given format or not.
SQL has IsDate() function which is used to check passed value is date or not of specified format ,it returns 1(true) or 0(false) accordingly. Remember ISDATE() is a MSSQL function and it may not work on Oracle, MySQL or any other database but there would be something similar.
SELECT ISDATE('1/08/13') AS "MM/DD/YY";
It will return 0 because passed date is not in correct format.
Question 5: Write a SQL Query to print the name of distinct employee whose DOB is between 01/01/1960 to 31/12/1975.
This SQL query is tricky but you can use BETWEEN clause to get all records whose date fall between two dates.
SELECT DISTINCT EmpName FROM Employees WHERE DOB BETWEEN ‘01/01/1960’ AND ‘31/12/1975’;
Question 6: Write an SQL Query find number of employees according to gender whose DOB is between 01/01/1960 to 31/12/1975.
SELECT COUNT(*), sex from Employees WHERE DOB BETWEEN '01/01/1960' AND '31/12/1975' GROUP BY sex;
Question 7: Write an SQL Query to find employee whose Salary is equal or greater than 10000.
SELECT EmpName FROM Employees WHERE Salary>=10000;
Question 8: Write an SQL Query to find name of employee whose name Start with ‘M’
SELECT * FROM Employees WHERE EmpName like 'M%';
Question 9: find all Employee records containing the word "Joe", regardless of whether it was stored as JOE, Joe, or joe.
SELECT * from Employees WHERE UPPER(EmpName) like '%JOE%';
Question 10: Write a SQL Query to find year from date.
Here is how you can find Year from a Date in SQL Server 2008
SELECT YEAR(GETDATE()) as "Year";
Question 11: Write SQL Query to find duplicate rows in a database and to delete them
You can use following query to select distinct records :
SELECT * FROM emp a WHERE rowid = (SELECT MAX(rowid) FROM EMP b WHERE a.empno=b.empno)
to Delete:
DELETE FROM emp a WHERE rowid != (SELECT MAX(rowid) FROM emp b WHERE a.empno=b.empno);
Question 12 : There is a table which contains two column Student and Marks, you need to find all the students, whose marks are greater than average marks i.e. list of above average students.
This query can be written using sub query as shown below :
SELECT student, marks from table where marks > SELECT AVG(marks) from table)
Question 13 : How do you find all employees which are also manager? .
You have given an standard employee table with an additional column mgr_id, which contains employee id of manager.
You need to know about self join to solve this problem. In Self Join, you can join two instances of same table to find out additional details as shown below
SELECT e.name, m.name FROM Employee e, Employee m WHERE e.mgr_id = m.emp_id;
this will show employee name and manger name in two column e.g.
name manager_name
John David
One follow-up is to modify this query to include employees which doesn't have manager. To solve that, instead of using inner join, just use left outer join, this will also include employees without managers.
Question 14 : You have a composite index of three columns, and you only provide value of two columns in WHERE clause of a select query? Will Index be used for this operation? For example if Index is on EmpId, EmpFirstName and EmpSecondName and you write query like
SELECT * FROM Employee WHERE EmpId=2 and EmpFirstName='Radhe'
If the given two columns are secondary index column then index will not invoke, but if the given 2 columns contain primary index(first col while creating index) then index will invoke. In this case Index will be used because EmpId and EmpFirstName are primary columns.