Algorithms http://en.wikipedia.org/wiki/List_of_data_structures
http://en.wikipedia.org/wiki/Coroutine - Coroutine
http://en.wikipedia.org/wiki/Continuation - Continuation
http://www.azillionmonkeys.com/qed/optimize.html Performance optimization
http://www.azillionmonkeys.com/qed/tech.shtml
http://en.wikipedia.org/wiki/List_of_algorithms
http://en.wikipedia.org/wiki/List_of_data_structures
Local Files: Tree problems
linked list problems
http://en.wikipedia.org/wiki/List_of_numerical_analysis_software
http://www.dmoz.org/Science/Math/Software/
http://www.trumphurst.com/cpplibs/cpplibs.php
http://directory.google.com/Top/Science/Math/Software/
http://www.linuxlinks.com/Software/Programming/Libraries/Scientific/
C++ linear algebra: http://arma.sourceforge.net http://eigen.tuxfamily.org
http://www.oonumerics.org/blitz http://math.nist.gov/tnt/
Machine learning open source: http://mloss.org
Image processing: http://cimg.sourceforge.net
http://www.gnu.org/software/gsl/ GNU scientific library http://www.netlib.org/ Algorithms collection
Compare 2 files:
Read the first file into a hashtable of line/linenumber, then delete the second file from that hashtable. if an entry in 2nd file does not exist, put it into a 2nd hash table. Content of the 1st hashtable is stuff that doesn't exist in 2nd file, and the 2nd hash table contain lines existing in 2nd file.
STL C++ performance http://www.artima.com/cppsource/lazy_builder.html
A function f is an element of the set O(n2) if there are a factor c and an integer number n0 such that for all n equal to or greater than this n0 the following holds:
f(n) ≤ c·n2.
The function n2 is then called an asymptotically upper bound for f. Generally, the notation f(n)=O(g(n)) says that the function f is asymptotically bounded from above by the function g
Polynomial time algorithms,
O(1) --- Constant time --- the time necessary to perform the algorithm does not change in response to the size of the problem.
O(n) --- Linear time --- the time grows linearly with the size (n) of the problem.
O(n2) --- Quadratic time --- the time grows quadratically with the size (n) of the problem
Sub-linear time algorithms (grow slower than linear time algorithms).
O(logn) -- Logarithmic time
Super-polynomial time algorithms (grow faster than polynomial time algorithms).
O(n!)
O(2n) --- Exponential time --- the time required grows exponentially with the size of the problem
Amortized analysis
Sometimes we find the statement in the manual that an operation takes amortized time O(f(n)). This means that the total time for n such operations is bounded asymptotically from above by a function g(n) and that f(n)=O(g(n)/n). So the amortized time is (a bound for) the average time of an operation in the worst case.
The special case of an amortized time of O(1) signifies that a sequence of n such operations takes only time O(n). One then refers to this as constant amortized time.
Such statements are often the result of an amortized analysis: Not each of the n operations takes equally much time; some of the operations are running time intensive and do a lot of “pre-work” (or also “post-work”), what, however, pays off by the fact that, as a result of the pre-work done, the remaining operations can be carried out so fast that a total time of O(g(n)) is not exceeded. So the investment in the pre-work or after-work amortizes itself.
binary search in sorted array is O(log n)
O(n2)-bubble, insertion, selection, and shell sorts;
O(n log n): heap, merge, and quick sorts.
Note that often interesting results can be gained by combining sorting algorithms. Think, for instance, of a quick sort implementation that calls insertion sort when it receives an array with a small number of elements or a sorting algorithm that uses radix sort for small keys and quick sort for longer keys.
http://en.wikipedia.org/wiki/Radix_sort http://www.cubic.org/docs/radix.htm
The number of iterations that radix sort need to make in order to sort an array of elements depends not on the number of elements to be sorted, but on the size of a single element. This means that radix sort is an O(m*n) algorithm—where m is the number of subelements
Radix sort, MSD. Для char-строк. Ноль - конец строки. На основе рекурсии. Глубина рекурсии = длина строки + 1. В каждой рекурсии требуется память для размещения двух массивов указателей: StringItem* front[256], т.е. 2Кб;
//элемент списка
struct StringItem{
const char* str; //указатель на строку
StringItem* next;
};
//pList - начало списка указателей на строки
//iDigit - разряд, по которому сортирует
//возвращает указатель на первый элемент отсортированной последовательности
StringItem* radix_sort_msd_for_string(StringItem* pList, unsigned int iDigit )
{
// количество вариантов значения одного разряда (char)
const int iRange = 256;
//массив bucket-ов (под-списков)
StringItem* front[iRange];
memset(front, 0, sizeof(front) );
StringItem** ppNextItem[iRange];
for (int i = 0; i < iRange; i++)
ppNextItem[i]=&front[i];
//разбиваем список на bucket-ты, в зависимости от значения разряда
while (pList)
{
StringItem* temp = pList;
pList=pList->next;
temp->next=NULL; //отключаем от списка
unsigned char c = (unsigned char)temp->str[iDigit];
*ppNextItem[c]=temp;
ppNextItem[c]=&temp->next;
}
//строим выходной список
StringItem* pResult = NULL;
StringItem** ppNext = &pResult;
//нулевой bucket возвращаем весь - он уже отсортирован
*ppNext = front[0];
while (*ppNext)
ppNext=&((*ppNext)->next);
for (int i = 1; i < iRange; i++)
{
//пустые - пропускаем
if ( !front[i] )
continue;
if (front[i]->next == NULL)// с одним элементом - сразу добавляем
*ppNext = front[i];
else // остальные - на сортировку по следующему разряду
*ppNext = radix_sort_msd_for_string(front[i], iDigit + 1);
while (*ppNext)
ppNext=&((*ppNext)->next);
}
return pResult;}
Программа для тестирования
void TestAlgorithm()
{
// открываем, большой текстовый файл для теста
ifstream file("demo.txt");
if ( !file )
return;
//считываем все слова из файла в вектор
istream_iterator<string> ii(file);
istream_iterator<string> eos;
vector<string> vecs(ii, eos);
// заполняем список
StringItem* pList = NULL;
StringItem** ppCurr = &pList;
for ( unsigned int i=0 ; i<vecs.size(); i++)
{
*ppCurr = new StringItem();
(*ppCurr )->str = vecs[i].c_str();
(*ppCurr )->next = NULL;
ppCurr = &(*ppCurr)->next;
}
//сортируем
pList = radix_sort_msd_for_string(pList, 0);
//файл для вывода
ofstream fo("out.txt");
ostream_iterator<string> oo(fo," ");
// в конце удаляем список
while(pList)
{
oo = pList->str; // выводим в файл
StringItem* tmp = pList;
pList = pList->next;
delete tmp;
}
}
http://en.wikipedia.org/wiki/String_searching_algorithm http://www.ics.uci.edu/~eppstein/161/kmp/
Given a pattern P and a text T, determine whether the pattern appears somewhere in the text: Naive approach gives O(m*n):
for (i=0; T[i] != '\0'; i++){
for (j=0; T[i+j] != '\0' && P[j] != '\0' && T[i+j]==P[j]; j++) ;
if (P[j] == '\0') found a match
}
Knuth-Morris-Pratt algorithm takes only O(m+n)
http://www.cs.sunysb.edu/~algorith/files/approximate-pattern-matching.shtml
http://www.cs.ucr.edu/~stelo/pattern.html
http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
The suffix array is the array of the indices of suffixes sorted in lexicographical order.
http://en.wikipedia.org/wiki/Trie http://pine.cs.yale.edu/pinewiki/RadixSearch
http://www.ddj.com/architect/208800854
http://en.wikipedia.org/wiki/Ternary_search_tree http://www.nist.gov/dads/HTML/ternarySearchTree.html
http://www.ddj.com/article/printableArticle.jhtml?articleID=184410528&dept_url=/windows/
13! = 6,227,020,800, which exceeds 2^31 − 1 or 2,147,483,647, so on a 32-bit machine using ordinary 32-bit integers, you cannot compute more than 12 different factorials. Using floating-point numbers we can compute up to 170! = about 7.3d+306, but 171! will overflow
http://cis.stvincent.edu/html/tutorials/swd/extsort/extsort.html
http://en.wikipedia.org/wiki/External_sort
external mergesort algorithm. For example, for sorting 900 megabytes of data using only 100 megabytes of RAM:
Read 100 MB of the data in main memory and sort by some conventional method (usually quicksort).
Write the sorted data to disk.
Repeat steps 1 and 2 until all of the data is sorted in 100 MB chunks, which now need to be merged into one single output file.
Read the first 10 MB of each sorted chunk (call them input buffers) in main memory (90 MB total) and allocate the remaining 10 MB for output buffer.
Perform a 9-way merging and store the result in the output buffer. If the output buffer is full, write it to the final sorted file. If any of the 9 input buffers gets empty, fill it with the next 10 MB of its associated 100 MB sorted chunk or otherwise mark it as exhausted if there is no more data in the sorted chunk and do not use it for merging.
class Node{ //Tree
pivate Node left; Node right; int data;
Node ( Node l, Node r, int val){ left=l; right=t; data=val);
public Node getLeft {return left;}
public Node getRight {return right;}
public getVal(return data;)
}
----------------
preorder(node) # Depth first traversal
print node.value
if node.left ? null then preorder(node.left)
if node.right ? null then preorder(node.right)
--------------------
inorder(node)
if node.left ? null then inorder(node.left)
print node.value
if node.right ? null then inorder(node.right)
------
postorder(node)
if node.left ? null then postorder(node.left)
if node.right ? null then postorder(node.right)
print node.value
In B-trees, internal nodes can have a variable number of child nodes within some pre-defined range
traverse tree: deep first/ bread first
http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_rbtree.aspx
Basic binary search trees (BST) O(log N) search, insertion, and deletion. For the most part this is true, assuming that the data arrives in random order, but basic binary search trees have three very nasty degenerate cases where the structure stops being logarithmic and becomes a glorified linked list. The two most common of these degenerate cases is ascending or descending sorted order (the third is outside-in alternating order). Because binary search trees store their data in such a way that can be considered sorted, if the data arrives already sorted, this causes problems.
in Java and C++, the library map structures are typically implemented with a red black tree. Red black trees are also comparable in speed to AVL trees.
http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree
self-balancing binary search tree or height-balanced binary search tree is a binary search tree that attempts to keep its height, or the number of levels of nodes beneath the root, as small as possible at all times, automatically
critical property of red-black trees: that the longest path from the root to a leaf is no more than twice as long as the shortest path from the root to a leaf in that tree. The result is that the tree is roughly balanced. Since operations such as inserting, deleting, and finding values requires worst-case time proportional to the height of the tree, this theoretical upper bound on the height allows red-black trees to be efficient in the worst-case, unlike ordinary binary search trees.
A splay tree is a self-balancing binary search tree with the additional property that recently accessed elements are quick to access again
Example: list of alphabetically ordered names. Instead of using only a single pointer to the next element, you could create a skip list by adding to each element a pointer to the first element of the next letter in the alphabet:
struct SkiplistNode
{
char * name;
SkiplistNode * nextName;
SkiplistNode * nextLetter;
};
heap is a specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key(A) ≥ key(B). This implies that an element with the greatest key is always in the root node, and so such a heap is sometimes called a max heap
http://marknelson.us/2007/08/01/memoization/
http://burtleburtle.net/bob/hash/doobs.html
http://www.azillionmonkeys.com/qed/hash.html
http://en.wikipedia.org/wiki/Universal_hashing
http://www.gnu.org/software/gperf/
http://bretm.home.comcast.net/~bretm/hash
The hash table data structure is based on the idea of using table lookup to speed up an arbitrary mapping. For our purposes, we are interested in mapping strings into integers. We cannot use strings directly as indices into an array. However, we can define an auxiliary function that converts strings into such indices. Such a function is called a hash function. Thus we could imagine a two-step process to map a string into an integer: for a given string calculate the hash function and then use the result to access an array that contains the pre-computed value of the mapping at that offset.
http://enfranchisedmind.com/blog/2008/02/25/problems-with-hash-tables/
The birthday paradox illustrates how quickly you can run into collision problems for small hash values.
Birthdays are effectively a tiny 365 value hash function. Using such a small hash value, there's a 50% chance of two people sharing the same birthday after a mere 23 people. With the 30 students in our hypothetical classroom, the odds of two students having a shared birthday rise to 70%.
With 23 people there are 23*22/2 = 253 unique comparisons to make. The chance of 1 comparison being unique 364/365. The chance of every comparison being unique (i.e. no overlap) is (364/365)^253 = 0.4995.
So, with 23 people there's about a 50% chance of having at 1 or more collisions.
A rule of thumb for estimating the number of values you need to enter in a hashtable before you have a 50 percent chance of an existing collision is to take the square root of 1.4 times the number of possible hash values.
SQRT(1.4 * 365) = 23
SQRT(1.4 * 232) = 77,543
When using a 32-bit hash value, we have a 50% chance of collision after about 77,000
entries-- a pretty far cry from the 4 billion possible values we could store in that 32-bit value.
That is, the probability of collision on inserting another entry, what one might think of as the collision rate, is 0.002%. But there is a 50% probability that within those 77k entries there is at least one collision
Hash function h(x) maps the N-bit string x to an M-bit string , where M is smaller than N.
Variable string exclusive-or method to create a hash h in the range 0 : : : 65 535 from a string x.
Author: Thomas Niemann.
unsigned char Rand8[256]; // This array contains a random permutation from 0..255 to 0..255
int Hash(char *x)
{ // x is a pointer to the first char;
int h; // *x is the first character
unsigned char h1, h2;
if (*x == 0) return 0; // Special handling of empty string
h1 = *x;
h2 = *x + 1; // Initialize two hashes
x++; // Proceed to the next character
while (*x)
{
h1 = Rand8[h1 ^ *x]; // Exclusive-or with the two hashes
h2 = Rand8[h2 ^ *x]; // and put through the randomizer
x++;
} // End of string is reached when *x=0
// Shift h1 left 8 bits and add h2
h = ((int)(h1)<<8) | (int) h2 ;
return h ; // Hash is concatenation of h1 and h2
}
Collision handling: linear probing, quadratic probing, exponential seatch, rehashing, double hashing, chaining using linked lists or trees
For n available slots and m occupying items,
Prob. of no collision on first insertion = 1
Prob. of no collision on 2nd insertion = ( n - 1 ) / n
Prob. of no collision on 3rd insertion = ( n - 2 ) / n
...
Prob. of no collision on mth insertion = ( n - ( m - 1 ) ) / n
So the probability of no collision on ANY of m insertions is the product of the above values, which works out to
( n - 1 )! / ( ( n - m )! * n^( m - 1 ) )
and conversely, the likelihood of at least one collision is just 1 minus the above value.
http://www.eternallyconfuzzled.com/arts/jsw_art_rand.aspx
srand ( (unsigned int)time ( NULL ) );
int r = M + rand() / ( RAND_MAX / ( N - M ) + 1 );
http://en.wikipedia.org/wiki/Inverse_transform_sampling
http://cg.scs.carleton.ca/~luc/handbooksimulation1.pdf Luc Devroye
http://www.ie.boun.edu.tr/~hormannw/vortrag.pdf
The inversion method. For a univariate random variable, the inversion method is theoretically applicable: given the distribution function F, and its inverse F inv, we generate a random variate X with that distribution as Finv(U), where U is a uniform [0; 1] random variable.
This is the method of choice when the inverse is readily computable. For example, a standard exponential random variable (which has density e-x; x > 0), can be generated as log(1/U). The following table gives some further examples.
Table 1: Some densities with distribution functions that are explicitly invertible.
Note: U is a uniform random variate between 0 and 1.
The fact that there is a monotone relationship between U and X has interesting benefits in the area of coupling and variance reduction. In simulation, one sometimes requires two random variates from the same distribution that are maximally anti-correlated. This can be achieved by generating the pair (Finv(U); Finv(1 - U)). In coupling, one has two distribution functions F and G and a pair of (possibly dependent) random variates (X; Y ) with these marginal distribution functions is needed so that some appropriate metric measuring distance between them is minimized. For example, the Wasserstein metric d2(F;G) is the minimal value over all couplings of (X; Y ) of p f(X - Y )2g. That minimal coupling occurs when (X; Y ) = (Finv(U);Ginv(U)) (see, e.g., Rachev, 1991). Finally, if we wish to simulate the maximum M of n i.i.d. random variables, each distributed as X = F inv(U), then noting that the maximum of n i.i.d. uniform [0; 1] random variables is distributed as U1=n, we see that M can be simulated as Finv(U1/n).
Deleting from vector in linear time in place
Как за линейное время удалять из вектора числа, обладающие определенным свойством
Нужно это делать в том же векторе
прогнать по массиву два указателя в одном цикле, "схлопывая" ненужные элементы, Один указатель растёт автоматически и показывает элемент, который мы проверяем. Другой указатель смотрит на ячейку, в которую мы элемент хотим записать - его мы увеличиваем сами, после того, как записали. После прохода второй указатель показывает на первую ячейку, оставшуюся пустой - с неё и до конца нужно элементы убрать.
vector erase и remove(_if)