sql

SELECT P.parent_name, GROUP_CONCAT(C.child_name)FROM Parent P INNER JOIN Child C ON P.id = C.parent_id GROUP BY P.parent_name

https://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL

https://habrahabr.ru/post/323498/ Universal catalog

https://habrahabr.ru/post/324012/ same as above using Graph DB

https://www.youtube.com/watch?v=mgipNdAgQ3o 10 SQL tricks

https://periscopedata.com/blog//how-joins-work.html

http://hyperpolyglot.org/db

nested loop

hash join: build a temporary hash table to index the values of the field whose equality is being tested. This saves time at the cost of memory. The outer table applies a hash function over the comparison fields and navigates to the matching stored values created in the earlier step, reducing the number of records it must traverse

merge join: first, both tables of the join are sorted on the join attribute. This can be done with just two passes through each table via an external merge sort. Finally, the result tuples are generated as the next ordered element is pulled from each table and the join attributes are compared.

https://www.analyticsvidhya.com/blog/2017/01/46-questions-on-sql-to-test-a-data-science-professional-skilltest-solution/

http://www.infoworld.com/resources/52753/sql/sql-unleashed-17-tips-for-faster-sql-queries#tk.ifw-infsb

https://habrahabr.ru/post/320916/

http://www.slideshare.net/LukasEder1/10-sql-tricks-that-you-didnt-think-were-possible

https://news.ycombinator.com/item?id=13417326

http://db.cs.berkeley.edu/papers/vldb09-madskills.pdf

https://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL#Common_table_expression

http://stackoverflow.com/questions/39004165/sql-find-intervals-of-step-function

http://stackoverflow.com/questions/39859923/sql-how-to-find-all-local-min-in-column

http://modern-sql.com/

https://modern-sql.com/feature/with/performance

https://technet.microsoft.com/en-us/library/bb522495(v=sql.105).aspx

Window functions

https://blog.statsbot.co/sql-window-functions-tutorial-b5075b87d129

http://www.silota.com/docs/recipes/sql-histogram-summary-frequency-distribution.html

https://drill.apache.org/docs/sql-window-functions-introduction/

https://www.postgresql.org/docs/current/static/functions-window.html

https://blog.jooq.org/2013/11/03/probably-the-coolest-sql-feature-window-functions/

http://db.cs.berkeley.edu/papers/vldb09-madskills.pdf

http://tapoueh.org/blog/2013/08/20-Window-Functions

https://blog.jooq.org/2014/08/12/the-difference-between-row_number-rank-and-dense_rank/

https://blog.jooq.org/2016/10/31/a-little-known-sql-feature-use-logical-windowing-to-aggregate-sliding-ranges/

http://www.silota.com/docs/recipes/sql-linear-regression.html

Oracle online

https://livesql.oracle.com/apex/livesql/file/index.html

http://bigbash.it/

SELECT column, another_column FROM mytable UNION / UNION ALL / INTERSECT / EXCEPT SELECT other_column, yet_another_column FROM another_table ORDER BY column DESC LIMIT n;

union, intersect, except removes duplicates

https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html

https://blog.jooq.org/2016/05/27/correlated-subqueries-are-evil-and-slow-or-are-they/

https://news.ycombinator.com/item?id=11755116

http://blog.codinghorror.com/a-visual-explanation-of-sql-joins/

http://www.sqlhammer.com/the-b-in-b-tree-indexing-sql-server/

PIVOT

https://modern-sql.com/use-case/pivot

https://news.ycombinator.com/item?id=11636728

http://neovintage.org/2016/05/04/the-bool-aggregate-functions/

http://cadlg.com/2016/04/understanding-analytic-functions/

select targetStartAt, EXTRACT(YEAR from targetStartAt) from Test;

https://blog.treasuredata.com/blog/2016/07/22/rolling-retention-done-right-in-sql/

https://danmartensen.svbtle.com/sql-performance-of-join-and-where-exists

http://rickbergfalk.github.io/sqlpad/

http://schemaspy.sourceforge.net schema browser

http://modern-sql.com/

https://modern-sql.com/feature/with

https://www.periscopedata.com/blog/sql-summary-statistics.html

http://danmartensen.svbtle.com/sql-performance-of-join-and-where-exists

https://github.com/hudak/sql-parser

http://incubator.apache.org/projects/calcite.html

https://github.com/z16385947/sql-parser

https://en.wikipedia.org/wiki/Correlated_subquery

http://www.vertabelo.com/blog/notes-from-the-lab/why-sql-is-neither-legacy-nor-low-level-but-simply-awesome

UPDATE FROM SELECT

UPDATE student_marks AS t1

INNER JOIN student_profile AS t2

ON t1.student_id = t2.student_id

SET t1.student_name = t2.student_name

WHERE t1.student_id <= 100;

In the SQL99 update of the SQL standard, the ANSI committee added a collection of grouping and reporting clauses to SQL to support the technology known as OLAP for "OnLine Analytical Processing". OLAP technologies, still used today, are ways to summarize and explore large data sets by grouping them in a multi-dimensional space. The three features added to SQL, ROLLUP, CUBE, and GROUPING SETS, provide different ways of summarizing data sets on several axes at once.

ROLLUP is the easiest to understand, since it simply provides subtotals and master totals for the whole set provided. For example, this query would provide counts for each city, then subtotals for each country, and finally a grand total of subscribers.

CUBE is less immediately understandable, because it's mainly intended to feed multi-dimensional data into an external OLAP tool like Mondrian. It expresses all possible totals for all combinations of specified columns, otherwise known as a "combinatorial explosion". For example, this query would provide totals for each country and each subscription level, and each combination of country and level.

Finally, GROUPING SETS are sort of the "parent class" of both ROLLUP and CUBE. It allows you to select subtotal and total groups and combinations manually to display, in case you need more specific output than ROLLUP orCUBE provide.

For example, say I wanted totals for level and city, then level, and then a grand total, I would write:

The benefit for users in these query constructs is the ability to get totals, subtotals, and combinations in one query and one scan over the table, which in the past would have required multiple queries against the database.

Transaction isolation level

http://en.wikipedia.org/wiki/Isolation_(database_systems)

http://habrahabr.ru/company/infopulse/blog/261097/

http://habrahabr.ru/company/infopulse/blog/261101/

https://habrahabr.ru/post/317884/

LLVM + SQL

http://blog.cloudera.com/blog/2013/02/inside-cloudera-impala-runtime-code-generation/

https://news.ycombinator.com/item?id=8470754

http://devblogs.nvidia.com/parallelforall/mapd-massive-throughput-database-queries-llvm-gpus/

EXISTS versus IN

http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/

Joins

http://sqlinthewild.co.za/index.php/2007/12/30/execution-plan-operations-joins/

http://blogs.msdn.com/b/craigfr/archive/2006/08/16/702828.aspx

SEMI-JOIN between two tables returns rows from the first table where one or more matches are found in the second table. The difference between a semi-join and a conventional join is that rows in the first table will be returned at most once. Even if the second table contains two matches for a row in the first table, only one copy of the row will be returned. Semi-joins are written using the EXISTS or IN constructs.

http://www.dbspecialists.com/files/presentations/semijoins.html

http://kb.askmonty.org/en/semi-join-subquery-optimizations

In Oracle 9i a semi-join can be performed using the nested loops, hash join, or merge join algorithms.

EXCEPT and INTERSECT for semi - joins:

http://msdn.microsoft.com/en-us/library/ms191255.aspx

ANTI-JOINS are written using the NOT EXISTS or NOT IN constructs. These two constructs differ in how they handle nulls

If the subquery of the NOT IN clause retrieves a row with a null value, the entire NOT IN clause always evaluate to false; hence ho make NOT EXISTS and NOT IN works in the same way the NULL should be explicitly eliminated in NOT IN subquery:

SELECT * FROM A WHERE x NOT IN ( SELECT d .. WHERE d IS NOT NULL)

list of departments with no employees:

http://awads.net/wp/2007/05/01/back-to-basics-anti-joins-and-semi-joins/

Analytic Functions

http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm

WITH clause

http://docs.oracle.com/cd/E11882_01/server.112/e25555/tdpdw_sql.htm#TDPDW0073

Advanced SQL

http://ebobby.org/2012/11/11/Using-Sets-For-Many-To-Many-Relationships.html

http://www.sqlexamples.info/sqladvanced.htm

http://www.optimaldba.com/papers/UKOUGSQLMasterClass.pdf

CUBE ROLLUP

the ROLLUP extension produces group subtotals from right to left and a grand total. If "n" is the number of columns listed in the ROLLUP, there will be n+1 levels of subtotals.

ROLLUP (YEAR, MONTH, DAY)

With a ROLLUP, it will have the following outputs:

YEAR, MONTH, DAY

YEAR, MONTH

YEAR

()

With CUBE, it will have the following:

YEAR, MONTH, DAY

YEAR, MONTH

YEAR, DAY

YEAR

MONTH, DAY

MONTH

DAY

()

CUBE essentially contains every possible rollup scenario for each node whereas ROLLUP will keep the hierarchy in tact (so it won't skip MONTH and show YEAR/DAY, whereas CUBE will)

WINDOW FUNCTION OVER PARTITION BY

a window function is a subquery over the results of the main query, in which you define the way to subset the returned columns over which to create a "window," and combine that with a set of operations to perform for each window. For example, imagine you had a query that returned a list of people, and for each person there was a row that had an address and the date they moved there. If you only wanted the three most recent addresses for each person, you could do something like this:

with ranked_addresses as ( select p.id, p.name, a.address, a.move_date, row_number() over ( partition by p.id order by a.move_date desc ) as address_rank from person p join address a on p.id = a.person_id ) select id, name, address, move_date from ranked_addresses where address_rank >= 3 order by address_rank;

You can see that we are creating an additional returned column that contains an autoincrementing integer, and each person has their own window or partition within which that column resets to 1. However, there are a ton of other operations you can perform besides just a simple row number. You can do rank and dense_rank, any aggregate operation (max, sum, avg, count, etc.), as well as things like comparing to the next or previous row within the window using lead or lag.

https://buildingvts.com/understanding-postgres-window-functions-697bc0ff2ed4#.gd91sagvo

http://msdn.microsoft.com/en-us/library/ms189461.aspx

http://www.amazon.com/Microsoft-Server-High-Performance-Window-Functions/dp/0735658366/ref=sr_1_4?ie=UTF8&qid=1340718027&sr=8-4&keywords=itzik+ben-gan

http://srinathsview.blogspot.com/2015/02/why-we-need-sql-like-query-language-for.html

http://www.sql-workbench.net/dbms_comparison.html

http://www.reddit.com/r/programming/comments/2wfvjd/our_sql_interview_questions/

http://rob.conery.io/2015/02/24/embracing-sql-in-postgres/

http://truongtx.me/2014/02/28/tree-structure-query-with-postgresql/

http://engineering.tilt.com/7-postgresql-data-migration-hacks/

https://news.ycombinator.com/item?id=9111362

https://news.ycombinator.com/item?id=9111866

http://www.sql-ex.ru/

If you want to check if two columns are different, you can use is distinct from:

select * from userswhere has_address is distinct from has_photo-- result: users with an address or -- photo, but not both

Not in Null

One handy way to filter rows is with a subselect. For example, if you wanted the users who did not have any packages, you could write a query like this:

select * from users where id not in (select user_id from packages)

But if one of the rows in packages has a null user_id, this query will return no results!

If the condition is inverted, the query works fine. This time we'll look for users with packages.

select * from users where id in (select user_id from packages)

http://blog.jooq.org/2015/02/13/10-sql-articles-everyone-must-read/

http://use-the-index-luke.com/blog/2015-02/modern-sql

http://www.slideshare.net/MarkusWinand/modern-sql

Time Series

http://akumuli.org/akumuli/2015/01/16/motivation/

CTE WITH

http://www.iron.io/blog/2015/09/legible-efficient-sql-with-ctes.html

http://research.neustar.biz/2014/05/19/writing-analytics-sql-with-common-table-expressions/

https://www.sqlite.org/lang_with.html

Removing duplicates

http://gomorpheus.com/blog/2014-11-02-how-to-determine-the-best-approach-to-removing-duplicates-in-joined-tables

http://desple.com/post/108909223762/recursion-with-sql

https://news.ycombinator.com/item?id=8934456

http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back

Hierarhical data and SQL

http://en.wikipedia.org/wiki/Nested_set_model

http://habrahabr.ru/post/263629/

http://mikhailstadnik.com/hierarchical-data-structures-and-performance

http://habrahabr.ru/post/193166/

http://habrahabr.ru/post/153861/

http://habrahabr.ru/post/166699/

http://www.slideshare.net/billkarwin/models-for-hierarchical-data

http://leopard.in.ua/2013/07/11/storing-trees-in-rdbms/

http://www.rittmanmead.com/2007/06/obiee-data-modeling-tips-3-ragged-hierarchies/

http://edwblog.com/architecture/ragged-hierarchy-navigation/112

http://ask.sqlservercentral.com/questions/32055/ragged-hierarchy-dimension-definition.html

http://scn.sap.com/thread/1882261

http://kamfonas.com/wordpress/wp-content/uploads/2007/08/recursive-hierarchies-no-columns.htm

http://www.zaachi.com/en/items/modified-preorder-tree-traversal-algoritmus-1.html

http://www.sitepoint.com/hierarchical-data-database-2/

http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/

http://troels.arvin.dk/db/rdbms/links/#hierarchical

http://techportal.ibuildings.com/2009/09/07/graphs-in-the-database-sql-meets-social-networks/

http://dirtsimple.org/2010/11/simplest-way-to-do-tree-based-queries.html

http://www.slideshare.net/quipo/trees-in-the-database-advanced-data-structures

http://intelligent-enterprise.informationweek.com/001020/celko.jhtml;jsessionid=QNMV12AIPMGXPQE1GHRSKHWATMY32JVN

http://rsdn.ru/article/db/db_nav1.xml http://rsdn.ru/article/db/Hierarchy.xml

http://www.reddit.com/r/programming/comments/e30ab/the_simplest_way_to_do_treebased_queries_in_sql/

http://en.wikipedia.org/wiki/Hierarchical_query

http://www.gfilter.net/?Post=Trees-in-relational-databases

http://msdn.microsoft.com/en-us/library/ms186243.aspx

http://wiki.postgresql.org/wiki/CTEReadme

http://dirtsimple.org/2010/11/simplest-way-to-do-tree-based-queries.html

http://habrahabr.ru/blogs/development/47280/

https://communities.bmc.com/communities/docs/DOC-9902

http://blog.webagesolutions.com/archives/category/microsoft

Celko. Trees and Hierarchies in SQL for Smarties

Finding Median in SQL

http://datamonkey.pro/blog/median_in_sql/

SELECT x.price

FROM positions x, positions y

GROUP BY x.price

HAVING SUM(SIGN(1-SIGN(y.price-x.price))) = (COUNT(*)+1)/2;

http://java67.blogspot.com/2013/04/10-frequently-asked-sql-query-interview-questions-answers-database.html

Question 1: SQL Query to find second highest salary of Employee

There are many ways to find second highest salary of Employee in SQL, you can either use SQL Join or Subquery to solve this problem. Here is SQL query using Subquery :

select MAX(Salary) from Employee WHERE Salary NOT IN (select MAX(Salary) from Employee );

See How to find second highest salary in SQL for more ways to solve this problem.

Question 2: SQL Query to find Max Salary from each department.

You can find maximum salary for each department by grouping all records by DeptId and then using MAX() function to calculate maximum salary in each group or each department.

SELECT DeptID, MAX(Salary) FROM Employee GROUP BY DeptID.

This questions become more interesting if Interviewer will ask you to print department name instead of department id, in that case you need to join Employee table with Department using foreign key DeptID, make sure you do LEFT OUTER JOIN to include departments without any employee as well. Here is the query

SELECT DeptName, MAX(Salary) FROM Employee e LEFT JOIN Department d ON e.DeptId = d.DeptID;

Question 3: Write SQL Query to display current date.

SQL has built in function called GetDate() which returns current timestamp. This will work in Microsoft SQL Server, other vendors like Oracle and MySQL also has equivalent functions.

SELECT GetDate();

Question 4: Write an SQL Query to check whether date passed to Query is date of given format or not.

SQL has IsDate() function which is used to check passed value is date or not of specified format ,it returns 1(true) or 0(false) accordingly. Remember ISDATE() is a MSSQL function and it may not work on Oracle, MySQL or any other database but there would be something similar.

SELECT ISDATE('1/08/13') AS "MM/DD/YY";

It will return 0 because passed date is not in correct format.

Question 5: Write a SQL Query to print the name of distinct employee whose DOB is between 01/01/1960 to 31/12/1975.

This SQL query is tricky but you can use BETWEEN clause to get all records whose date fall between two dates.

SELECT DISTINCT EmpName FROM Employees WHERE DOB BETWEEN ‘01/01/1960’ AND ‘31/12/1975’;

Question 6: Write an SQL Query find number of employees according to gender whose DOB is between 01/01/1960 to 31/12/1975.

SELECT COUNT(*), sex from Employees WHERE DOB BETWEEN '01/01/1960' AND '31/12/1975' GROUP BY sex;

Question 7: Write an SQL Query to find employee whose Salary is equal or greater than 10000.

SELECT EmpName FROM Employees WHERE Salary>=10000;

Question 8: Write an SQL Query to find name of employee whose name Start with ‘M’

SELECT * FROM Employees WHERE EmpName like 'M%';

Question 9: find all Employee records containing the word "Joe", regardless of whether it was stored as JOE, Joe, or joe.

SELECT * from Employees WHERE UPPER(EmpName) like '%JOE%';

Question 10: Write a SQL Query to find year from date.

Here is how you can find Year from a Date in SQL Server 2008

SELECT YEAR(GETDATE()) as "Year";

Question 11: Write SQL Query to find duplicate rows in a database and to delete them

You can use following query to select distinct records :

SELECT * FROM emp a WHERE rowid = (SELECT MAX(rowid) FROM EMP b WHERE a.empno=b.empno)

to Delete:

DELETE FROM emp a WHERE rowid != (SELECT MAX(rowid) FROM emp b WHERE a.empno=b.empno);

Question 12 : There is a table which contains two column Student and Marks, you need to find all the students, whose marks are greater than average marks i.e. list of above average students.

This query can be written using sub query as shown below :

SELECT student, marks from table where marks > SELECT AVG(marks) from table)

Question 13 : How do you find all employees which are also manager? .

You have given an standard employee table with an additional column mgr_id, which contains employee id of manager.

You need to know about self join to solve this problem. In Self Join, you can join two instances of same table to find out additional details as shown below

SELECT e.name, m.name FROM Employee e, Employee m WHERE e.mgr_id = m.emp_id;

this will show employee name and manger name in two column e.g.

name manager_name

John David

One follow-up is to modify this query to include employees which doesn't have manager. To solve that, instead of using inner join, just use left outer join, this will also include employees without managers.

Question 14 : You have a composite index of three columns, and you only provide value of two columns in WHERE clause of a select query? Will Index be used for this operation? For example if Index is on EmpId, EmpFirstName and EmpSecondName and you write query like

SELECT * FROM Employee WHERE EmpId=2 and EmpFirstName='Radhe'

If the given two columns are secondary index column then index will not invoke, but if the given 2 columns contain primary index(first col while creating index) then index will invoke. In this case Index will be used because EmpId and EmpFirstName are primary columns.

Page updated

Google Sites

Report abuse