Post date: Nov 04, 2018 1:45:21 PM
"Using a big file of words, find all the anagrams in the file, where if you do a cipher substitution for the letters, using the same cipher for both words, (the cipher varies from word to word), the resulting encoded number for each of the pair is a perfect square. Of these, find the largest square number in the file."
This was a horrible problem. The rules were completely opaque. The description that I just gave you was made up by me, by guesswork, and might not be right. It matches the result, but It certainly includes a number of requirements that were not stated. There's no way to know the problem/solution other than trial and error or backtracking to the problem from the solution. The answer, by comparison, was much easier than finding out what the question was.
CREATE FUNCTION dbo.SortString (@String varchar(50))
RETURNS varchar(50)
AS
BEGIN
DECLARE @Letters TABLE (a CHAR(1) NOT NULL)
DECLARE @Result varchar(50)
WHILE @String <> ''
BEGIN
INSERT INTO @Letters VALUES(LEFT(@String,1))
SET @String = STUFF(@String,1,1,'')
END
SELECT @Result = (SELECT '' + a FROM @Letters ORDER BY a
FOR XML PATH(''),TYPE).value('text()[1]','nvarchar(max)')
RETURN @Result
END
GO
CREATE FUNCTION dbo.DistinctCharacters (@String varchar(50))
RETURNS tinyint
AS
BEGIN
DECLARE @Letters TABLE (a CHAR(1) NOT NULL)
DECLARE @Result tinyint
WHILE @String <> ''
BEGIN
INSERT INTO @Letters VALUES(LEFT(@String,1))
SET @String = STUFF(@String,1,1,'')
END
SELECT @Result = COUNT(DISTINCT a)
FROM @Letters
RETURN @Result
END
GO
CREATE TABLE #Words (w varchar(500) NOT NULL)
BULK INSERT #Words FROM 'E:\words.txt' WITH (ROWTERMINATOR = ',')
ALTER TABLE #Words ADD sorted varchar(50) NULL
ALTER TABLE #Words ADD dist tinyint NULL
UPDATE #Words
SET w = REPLACE(w, '"', '')
UPDATE #Words
SET sorted = dbo.SortString(w)
, dist = dbo.DistinctCharacters(w)
-- The rules say a word cannot be an anagram of itself. This rules out single letters.
-- I'm assuming that a letter can't be two digits, so over 10 is right out.
-- And the 3 words with more than 10 unique letters are ruled out because this is decimal, not hexadecimal, math.
DELETE FROM #Words WHERE dist < 2 OR dist > 9
-- There is no point in keeping any words that aren't an anagram of another word
DELETE FROM #Words
WHERE w NOT IN (
SELECT w1.w
FROM #Words w1
INNER JOIN #Words w2
ON w1.sorted = w2.sorted
AND w1.w <> w2.w)
-- Rather than calculate squares, we'll do a faster lookup.
-- The table contains nothing over 9 digits so the maximum square is under 999999999.
CREATE TABLE #Squares (s varchar(10) NOT NULL PRIMARY KEY, dist tinyint NOT NULL)
; WITH Integers (i) AS
(
SELECT 1
UNION ALL
SELECT i + 1
FROM Integers
WHERE i < 31622
)
INSERT INTO #Squares
SELECT i * i, dbo.DistinctCharacters(i * i)
FROM Integers
WHERE i > 3
OPTION (MAXRECURSION 31622)
-- Don't keep any squares that don't match any of the words in both total length and distinct length.
DELETE s
FROM #Squares s
WHERE NOT EXISTS (SELECT 1
FROM #Words w
WHERE LEN(w.w) = LEN(s.s)
AND w.dist = s.dist)
-- Capture length-based combinations. No translations have been done yet. This reduces our working set to a reasonable count.
SELECT w.w, w.sorted, s.s
INTO #PossibleAnswers
FROM #Squares s
INNER JOIN #Words w
ON w.dist = s.dist
AND LEN(w.w) = LEN(s.s)
/*
The problem states that a letter can't have the same digit as a different letter.
This can't be true for the entire file. But let's assume that it's true for a pair.
This is not actually stated, but this answer is based on it.
This has a really sloppy replace splat in it. What it does is apply the mappings derived from word 1 to word 2 and
see if it gets the square in word 2.
If so, that means the mapping is the same across the pair. And I'm hoping that is what they want.
It's a lot faster than an exhaustive check against all possible cipher solutions.
*/
SELECT MAX(CONVERT(int, pa1.s))
FROM #PossibleAnswers pa1
INNER JOIN #PossibleAnswers pa2
ON pa1.sorted = pa2.sorted
AND pa1.w <> pa2.w
/* WARNING: THIS IS UGLY */
WHERE pa2.s = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(pa2.w
,SUBSTRING(pa1.w,1,1), SUBSTRING(pa1.s,1,1))
,SUBSTRING(pa1.w,2,1), SUBSTRING(pa1.s,2,1))
,SUBSTRING(pa1.w,3,1), SUBSTRING(pa1.s,3,1))
,SUBSTRING(pa1.w,4,1), SUBSTRING(pa1.s,4,1))
,SUBSTRING(pa1.w,5,1), SUBSTRING(pa1.s,5,1))
,SUBSTRING(pa1.w,6,1), SUBSTRING(pa1.s,6,1))
,SUBSTRING(pa1.w,7,1), SUBSTRING(pa1.s,7,1))
,SUBSTRING(pa1.w,8,1), SUBSTRING(pa1.s,8,1))
,SUBSTRING(pa1.w,9,1), SUBSTRING(pa1.s,9,1))
DROP TABLE #Words
DROP TABLE #Squares
DROP TABLE #PossibleAnswers
GO
DROP FUNCTION dbo.SortString
GO
DROP FUNCTION dbo.DistinctCharacters
GO