Entrepreneurials‎ > ‎Jobs‎ > ‎Interview‎ > ‎

2017



Data Structures & Algorithms
  • List
  • Stack
  • Hashmap
  • Hashtable
  • tree
  • graph

Discrete Math 101

Machine Learning
  • supervised learning
  • unsupervised learning
  • overfitting
  • boosting
  • regularization. 

  • Experience with common learning algorithms such as 
  • logistic regression
  • decision trees
  • SVM
  • PCA
  • k-means
  • XGBDT
  • XGBoost
Domains
Ranking / Personalization / 
NLP
Word embedding
Parse tree
Part of speech
LSTM
Attention
bi-directional
beam search
Beam search uses breadth-first search to build its search tree. At each level of the tree, it generates all successors of the states at the current level, sorting them in increasing order of heuristic cost.[2] However, it only stores a predetermined number, β, of best states at each level (called the beam width). Only those states are expanded next.


Deep Learning
Deep Learning/Neural Networks Knowledge of Deep Learning algorithms inclusive of examples and features. Understanding of Convolutional Neural Network, Recurrent Neural Networks, optimization algorithms (Stochastic Gradient Descent, Backpropagation through time) and other aspects of the design/training of those networks

Reinforcement Learning Knowledge of an area of machine learning about sequential decision making, in which an agent takes actions in an environment, trying to maximize the sum of expected rewards. This includes sequential environment modeling and control. This requires knowledge of the main concepts (Markov Decision Processes, value functions, Bellman equations, policy gradient, Q-learning, and actor-critic.) Be able to cast concrete problems into a RL formulation.

The Large Scale Design Interview
 In this interview, you should be able to combine your current Machine Learning knowledge, theories, experience and judgement toward solving a real-world engineering problem. Questions are generally open ended so you can dive deeper into the solution. It is important to design a solution thinking about data analysis, potential use cases, the users, scalability, limitations and tradeoffs.





Array
Contiguous memory block, items of same size.
Row major, Column major
item at (3,4): array_address + (3-1) * ItemsInEachColumn + 4
Constant time to access each element

Add/Remove 
to beginning and middle is O(n)
to end is O(1)
Linked List

Stack (LIFO)
Operations
push
top
pop
isempty

Implementaiton
With Arrays
Just keep count of how many items in array
With Lists
Always keep pointer to top of the stack
Example
Balanced Brackets
Input: A string str consisting of ‘(‘, ‘)’, ‘[‘,‘]’ characters.
Output: Return whether or not the string’s parentheses and square brackets are balanced.
Code
IsBalanced(str )
Stack stack
for char in str:
if char in [‘(‘, ‘[‘]:
stack.Push(char )
else:
if stack.Empty(): return False
top ← stack.Pop()
if (top = ‘[‘ and char != ‘]’) or
(top = ‘(‘ and char != ‘)’):
return False
return stack.Empty()

Queue (FIFO)
Operations
enqueue
dequeue
isempty

Using 
Linked lists: head and tail pointers
Array: 
read: Array index to read from
write: Array index to write to
Empty if read = write
Full if write + 1 % length = read
Tree
traversals
in-order
pre-order
post-order

Heap Sort: 
Heap sort is best implemented as a binary tree within an array structure. good link

Max-heap:
Start with an empty array.                   parent/child indexing is as follows: if parent is at k, left child is 2k+1 and right child is 2k+2
For each new number, if bigger than parent, swap it with it. 
If swapped keep swapping up to the top

Heap Sort: 
Create max heap
Remove top node, put the last item in its place.

Complexity: 
Complexity: maxiheap takes O(log n) for each item. there are n calls to max heap -> O(n log n). To create max heap. 
+
To sort there is log n shifts for each item, there are n items -> n log n
=
n log n






0.8 * 120 * 10^3 +
0.1 * (120 * 10^3 + 25*10^6 * 0.075 / 4) + 
0.075 * (120 * 10^3 + 60*10^6 * 0.075 / 4) + 
0.025 * (120 * 10^3 + 80*10^6 * 0.075 / 4)



120 * 10^3 + 
0.8 * 0 +
0.1 * (25*10^6 * 0.075 / 4) + 
0.075 * (60*10^6 * 0.075 / 4) + 
0.025 * (80*10^6 * 0.075 / 4)


Interesting Algorithms
Reservoir samplingTo sample k items from a stream when the # of items is not known:
1) Create an array reservoir[0..k-1] and copy first k items of stream to it.
2) for i in [(k+1)th item to end]:
        j = randomInt(0, i) 
        If 0 <= j <= k-1 
            replace reservoir[j] with stream[i]
O(n)

Initialize:
    max_so_far = 0  # global max
    max_ending_here = 0

Loop for each element of the array
  (a) max_ending_here = max_ending_here + a[i]
  (b) if(max_ending_here < 0)
            max_ending_here = 0
  (c) if(max_so_far < max_ending_here)
            max_so_far = max_ending_here
return max_so_far


LCS for input Sequences “ABCDGH” and “AEDFHR” is “ADH” of length 3.
LCS for input Sequences “AGGTAB” and “GXTXAYB” is “GTAB” of length 4.

int lcs( char *X, char *Y, int m, int n )
{
   if (m == 0 || n == 0)
     return 0;
   if (X[m-1] == Y[n-1])
     return 1 + lcs(X, Y, m-1, n-1);
   else
     return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n));
}

Huffman code
The idea is to assign variable-length codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding characters. The most frequent character gets the smallest code and the least frequent character gets the largest code. 

    Create a leaf node for each symbol and add it to the priority queue.
    While there is more than one node in the queue:
        Remove the two nodes of highest priority (lowest probability) from the queue
        Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
        Add the new node to the queue.
    The remaining node is the root node and the tree is complete.














Comments