algorithms

Local Files: Tree problems

linked list problems

Math libraries

http://en.wikipedia.org/wiki/List_of_numerical_analysis_software

http://www.dmoz.org/Science/Math/Software/

http://www.trumphurst.com/cpplibs/cpplibs.php

http://directory.google.com/Top/Science/Math/Software/

http://www.linuxlinks.com/Software/Programming/Libraries/Scientific/

C++ linear algebra: http://arma.sourceforge.net http://eigen.tuxfamily.org

http://www.oonumerics.org/blitz http://math.nist.gov/tnt/

Machine learning open source: http://mloss.org

Image processing: http://cimg.sourceforge.net

http://www.gnu.org/software/gsl/ GNU scientific library http://www.netlib.org/ Algorithms collection

Compare 2 files:

Read the first file into a hashtable of line/linenumber, then delete the second file from that hashtable. if an entry in 2nd file does not exist, put it into a 2nd hash table. Content of the 1st hashtable is stuff that doesn't exist in 2nd file, and the 2nd hash table contain lines existing in 2nd file.

// Reverses zero-pointer terminated singly linked list of nodes of type T.

template<class T>

void reverse(T*& list_head) throw()

{

T* n = list_head;

list_head = 0;

while (n) {

T* nn = n->next;

n->next = list_head;

list_head = n;

n = nn;

}

node* reverse_node_list(node* old_head) {

node* new_head = 0;

while(old_head) {

node* tmp_head = old_head;

old_head = old_head->next;

tmp_head->next = new_head;

new_head = tmp_head;

}

return new_head;

}

time dd if=/dev/zero of=/tmp/test count=200000

200000+0 records in

200000+0 records out

0.20user 0.75system 0:00.94elapsed 100%CPU (0avgtext+0avgdata 0maxresident)

wrote a 100MB file in 0.94 seconds

Complexity

STL C++ performance http://www.artima.com/cppsource/lazy_builder.html

A function f is an element of the set O(n²) if there are a factor c and an integer number n₀ such that for all n equal to or greater than this n₀ the following holds:

f(n) ≤ c·n².

The function n² is then called an asymptotically upper bound for f. Generally, the notation f(n)=O(g(n)) says that the function f is asymptotically bounded from above by the function g

Polynomial time algorithms,

- O(1) --- Constant time --- the time necessary to perform the algorithm does not change in response to the size of the problem.
- O(n) --- Linear time --- the time grows linearly with the size (n) of the problem.
- O(n²) --- Quadratic time --- the time grows quadratically with the size (n) of the problem

Sub-linear time algorithms (grow slower than linear time algorithms).

- O(logn) -- Logarithmic time

Super-polynomial time algorithms (grow faster than polynomial time algorithms).

- O(n!)
- O(2ⁿ) --- Exponential time --- the time required grows exponentially with the size of the problem

Amortized analysis

Sometimes we find the statement in the manual that an operation takes amortized time O(f(n)). This means that the total time for n such operations is bounded asymptotically from above by a function g(n) and that f(n)=O(g(n)/n). So the amortized time is (a bound for) the average time of an operation in the worst case.

The special case of an amortized time of O(1) signifies that a sequence of n such operations takes only time O(n). One then refers to this as constant amortized time.

Such statements are often the result of an amortized analysis: Not each of the n operations takes equally much time; some of the operations are running time intensive and do a lot of “pre-work” (or also “post-work”), what, however, pays off by the fact that, as a result of the pre-work done, the remaining operations can be carried out so fast that a total time of O(g(n)) is not exceeded. So the investment in the pre-work or after-work amortizes itself.

binary search in sorted array is O(log n)

Complexity of Sorting:

O(n²⁾-bubble, insertion, selection, and shell sorts;

O(n log n): heap, merge, and quick sorts.

Note that often interesting results can be gained by combining sorting algorithms. Think, for instance, of a quick sort implementation that calls insertion sort when it receives an array with a small number of elements or a sorting algorithm that uses radix sort for small keys and quick sort for longer keys.

Radix Sort

http://en.wikipedia.org/wiki/Radix_sort http://www.cubic.org/docs/radix.htm

The number of iterations that radix sort need to make in order to sort an array of elements depends not on the number of elements to be sorted, but on the size of a single element. This means that radix sort is an O(m*n) algorithm—where m is the number of subelements

C++ implementation of Radix Sort (code from Russian Wikipedia)

Radix sort, MSD. Для char-строк. Ноль - конец строки. На основе рекурсии. Глубина рекурсии = длина строки + 1. В каждой рекурсии требуется память для размещения двух массивов указателей: StringItem* front[256], т.е. 2Кб;

//элемент списка

struct StringItem{

const char* str; //указатель на строку

StringItem* next;

};

//pList - начало списка указателей на строки

//iDigit - разряд, по которому сортирует

//возвращает указатель на первый элемент отсортированной последовательности

StringItem* radix_sort_msd_for_string(StringItem* pList, unsigned int iDigit )

{

// количество вариантов значения одного разряда (char)

const int iRange = 256;

//массив bucket-ов (под-списков)

StringItem* front[iRange];

memset(front, 0, sizeof(front) );

StringItem** ppNextItem[iRange];

for (int i = 0; i < iRange; i++)

ppNextItem[i]=&front[i];

//разбиваем список на bucket-ты, в зависимости от значения разряда

while (pList)

{

StringItem* temp = pList;

pList=pList->next;

temp->next=NULL; //отключаем от списка

unsigned char c = (unsigned char)temp->str[iDigit];

*ppNextItem[c]=temp;

ppNextItem[c]=&temp->next;

}

//строим выходной список

StringItem* pResult = NULL;

StringItem** ppNext = &pResult;

//нулевой bucket возвращаем весь - он уже отсортирован

*ppNext = front[0];

while (*ppNext)

ppNext=&((*ppNext)->next);

for (int i = 1; i < iRange; i++)

{

//пустые - пропускаем

if ( !front[i] )

continue;

if (front[i]->next == NULL)// с одним элементом - сразу добавляем

*ppNext = front[i];

else // остальные - на сортировку по следующему разряду

*ppNext = radix_sort_msd_for_string(front[i], iDigit + 1);

while (*ppNext)

ppNext=&((*ppNext)->next);

}

return pResult;}

Программа для тестирования

void TestAlgorithm()

{

// открываем, большой текстовый файл для теста

ifstream file("demo.txt");

if ( !file )

return;

//считываем все слова из файла в вектор

istream_iterator<string> ii(file);

istream_iterator<string> eos;

vector<string> vecs(ii, eos);

// заполняем список

StringItem* pList = NULL;

StringItem** ppCurr = &pList;

for ( unsigned int i=0 ; i<vecs.size(); i++)

{

*ppCurr = new StringItem();

(*ppCurr )->str = vecs[i].c_str();

(*ppCurr )->next = NULL;

ppCurr = &(*ppCurr)->next;

}

//сортируем

pList = radix_sort_msd_for_string(pList, 0);

//файл для вывода

ofstream fo("out.txt");

ostream_iterator<string> oo(fo," ");

// в конце удаляем список

while(pList)

{

oo = pList->str; // выводим в файл

StringItem* tmp = pList;

pList = pList->next;

delete tmp;

}

Sort Algorithms

String matching algorithms

http://en.wikipedia.org/wiki/String_searching_algorithm http://www.ics.uci.edu/~eppstein/161/kmp/

Given a pattern P and a text T, determine whether the pattern appears somewhere in the text: Naive approach gives O(m*n):

for (i=0; T[i] != '\0'; i++){

for (j=0; T[i+j] != '\0' && P[j] != '\0' && T[i+j]==P[j]; j++) ;

if (P[j] == '\0') found a match

}

Knuth-Morris-Pratt algorithm takes only O(m+n)

http://www.cs.sunysb.edu/~algorith/files/approximate-pattern-matching.shtml

http://www.cs.ucr.edu/~stelo/pattern.html

http://en.wikipedia.org/wiki/Burrows-Wheeler_transform

Suffix Trees http://www.ddj.com/architect/184404588

The suffix array is the array of the indices of suffixes sorted in lexicographical order.

Factorial

13! = 6,227,020,800, which exceeds 2^31 − 1 or 2,147,483,647, so on a 32-bit machine using ordinary 32-bit integers, you cannot compute more than 12 different factorials. Using floating-point numbers we can compute up to 170! = about 7.3d+306, but 171! will overflow

External Sorting

http://cis.stvincent.edu/html/tutorials/swd/extsort/extsort.html

http://en.wikipedia.org/wiki/External_sort

external mergesort algorithm. For example, for sorting 900 megabytes of data using only 100 megabytes of RAM:

1. Read 100 MB of the data in main memory and sort by some conventional method (usually quicksort).
2. Write the sorted data to disk.
3. Repeat steps 1 and 2 until all of the data is sorted in 100 MB chunks, which now need to be merged into one single output file.
4. Read the first 10 MB of each sorted chunk (call them input buffers) in main memory (90 MB total) and allocate the remaining 10 MB for output buffer.

Perform a 9-way merging and store the result in the output buffer. If the output buffer is full, write it to the final sorted file. If any of the 9 input buffers gets empty, fill it with the next 10 MB of its associated 100 MB sorted chunk or otherwise mark it as exhausted if there is no more data in the sorted chunk and do not use it for merging.

Data Structures

class Node{ //Tree

pivate Node left; Node right; int data;

Node ( Node l, Node r, int val){ left=l; right=t; data=val);

public Node getLeft {return left;}

public Node getRight {return right;}

public getVal(return data;)

}

----------------

preorder(node) # Depth first traversal

print node.value

if node.left ? null then preorder(node.left)

if node.right ? null then preorder(node.right)

--------------------

inorder(node)

if node.left ? null then inorder(node.left)

print node.value

if node.right ? null then inorder(node.right)

------

postorder(node)

if node.left ? null then postorder(node.left)

if node.right ? null then postorder(node.right)

print node.value

Binary Trees

In B-trees, internal nodes can have a variable number of child nodes within some pre-defined range

Binary Search Tree

traverse tree: deep first/ bread first

http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_rbtree.aspx

Basic binary search trees (BST) O(log N) search, insertion, and deletion. For the most part this is true, assuming that the data arrives in random order, but basic binary search trees have three very nasty degenerate cases where the structure stops being logarithmic and becomes a glorified linked list. The two most common of these degenerate cases is ascending or descending sorted order (the third is outside-in alternating order). Because binary search trees store their data in such a way that can be considered sorted, if the data arrives already sorted, this causes problems.

Self-balancing Binary Search Trees: AVL trees, Red - Black trees / Splay Tree

in Java and C++, the library map structures are typically implemented with a red black tree. Red black trees are also comparable in speed to AVL trees.

http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree

self-balancing binary search tree or height-balanced binary search tree is a binary search tree that attempts to keep its height, or the number of levels of nodes beneath the root, as small as possible at all times, automatically

critical property of red-black trees: that the longest path from the root to a leaf is no more than twice as long as the shortest path from the root to a leaf in that tree. The result is that the tree is roughly balanced. Since operations such as inserting, deleting, and finding values requires worst-case time proportional to the height of the tree, this theoretical upper bound on the height allows red-black trees to be efficient in the worst-case, unlike ordinary binary search trees.

A splay tree is a self-balancing binary search tree with the additional property that recently accessed elements are quick to access again

Skiplist

Example: list of alphabetically ordered names. Instead of using only a single pointer to the next element, you could create a skip list by adding to each element a pointer to the first element of the next letter in the alphabet:

struct SkiplistNode

{

char * name;

SkiplistNode * nextName;

SkiplistNode * nextLetter;

};

Heap

heap is a specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key(A) ≥ key(B). This implies that an element with the greatest key is always in the root node, and so such a heap is sometimes called a max heap

http://en.wikipedia.org/wiki/Heap_%28data_structure%29

Judy Array

http://judy.sourceforge.net/

Hashing

http://marknelson.us/2007/08/01/memoization/

http://burtleburtle.net/bob/hash/doobs.html

http://www.azillionmonkeys.com/qed/hash.html

http://en.wikipedia.org/wiki/Universal_hashing

http://www.gnu.org/software/gperf/

http://bretm.home.comcast.net/~bretm/hash

http://www.ddj.com/article/printableArticle.jhtml?articleID=202405227&dept_url=/hpc-high-performance-computing/

The hash table data structure is based on the idea of using table lookup to speed up an arbitrary mapping. For our purposes, we are interested in mapping strings into integers. We cannot use strings directly as indices into an array. However, we can define an auxiliary function that converts strings into such indices. Such a function is called a hash function. Thus we could imagine a two-step process to map a string into an integer: for a given string calculate the hash function and then use the result to access an array that contains the pre-computed value of the mapping at that offset.

http://enfranchisedmind.com/blog/2008/02/25/problems-with-hash-tables/

The birthday paradox illustrates how quickly you can run into collision problems for small hash values.

Birthdays are effectively a tiny 365 value hash function. Using such a small hash value, there's a 50% chance of two people sharing the same birthday after a mere 23 people. With the 30 students in our hypothetical classroom, the odds of two students having a shared birthday rise to 70%.

With 23 people there are 23*22/2 = 253 unique comparisons to make. The chance of 1 comparison being unique 364/365. The chance of every comparison being unique (i.e. no overlap) is (364/365)^253 = 0.4995.

So, with 23 people there's about a 50% chance of having at 1 or more collisions.

A rule of thumb for estimating the number of values you need to enter in a hashtable before you have a 50 percent chance of an existing collision is to take the square root of 1.4 times the number of possible hash values.

SQRT(1.4 * 365) = 23

SQRT(1.4 * 2³²) = 77,543

When using a 32-bit hash value, we have a 50% chance of collision after about 77,000

entries-- a pretty far cry from the 4 billion possible values we could store in that 32-bit value.

That is, the probability of collision on inserting another entry, what one might think of as the collision rate, is 0.002%. But there is a 50% probability that within those 77k entries there is at least one collision

Hash function h(x) maps the N-bit string x to an M-bit string , where M is smaller than N.

Variable string exclusive-or method to create a hash h in the range 0 : : : 65 535 from a string x.

Author: Thomas Niemann.

unsigned char Rand8[256]; // This array contains a random permutation from 0..255 to 0..255

int Hash(char *x)

{ // x is a pointer to the first char;

int h; // *x is the first character

unsigned char h1, h2;

if (*x == 0) return 0; // Special handling of empty string

h1 = *x;

h2 = *x + 1; // Initialize two hashes

x++; // Proceed to the next character

while (*x)

{

h1 = Rand8[h1 ^ *x]; // Exclusive-or with the two hashes

h2 = Rand8[h2 ^ *x]; // and put through the randomizer

x++;

} // End of string is reached when *x=0

// Shift h1 left 8 bits and add h2

h = ((int)(h1)<<8) | (int) h2 ;

return h ; // Hash is concatenation of h1 and h2

}

Collisions

Collision handling: linear probing, quadratic probing, exponential seatch, rehashing, double hashing, chaining using linked lists or trees

For n available slots and m occupying items,

Prob. of no collision on first insertion = 1

Prob. of no collision on 2nd insertion = ( n - 1 ) / n

Prob. of no collision on 3rd insertion = ( n - 2 ) / n

...

Prob. of no collision on mth insertion = ( n - ( m - 1 ) ) / n

So the probability of no collision on ANY of m insertions is the product of the above values, which works out to

( n - 1 )! / ( ( n - m )! * n^( m - 1 ) )

and conversely, the likelihood of at least one collision is just 1 minus the above value.

Random number generator (uniform)

http://www.eternallyconfuzzled.com/arts/jsw_art_rand.aspx

srand ( (unsigned int)time ( NULL ) );

int r = M + rand() / ( RAND_MAX / ( N - M ) + 1 );

Random number generator (non-uniform)

http://en.wikipedia.org/wiki/Inverse_transform_sampling

http://cg.scs.carleton.ca/~luc/handbooksimulation1.pdf Luc Devroye

http://www.ie.boun.edu.tr/~hormannw/vortrag.pdf

The inversion method. For a univariate random variable, the inversion method is theoretically applicable: given the distribution function F, and its inverse F inv, we generate a random variate X with that distribution as Finv(U), where U is a uniform [0; 1] random variable.

This is the method of choice when the inverse is readily computable. For example, a standard exponential random variable (which has density e-x; x > 0), can be generated as log(1/U). The following table gives some further examples.

Table 1: Some densities with distribution functions that are explicitly invertible.

Note: U is a uniform random variate between 0 and 1.

The fact that there is a monotone relationship between U and X has interesting benefits in the area of coupling and variance reduction. In simulation, one sometimes requires two random variates from the same distribution that are maximally anti-correlated. This can be achieved by generating the pair (Finv(U); Finv(1 - U)). In coupling, one has two distribution functions F and G and a pair of (possibly dependent) random variates (X; Y ) with these marginal distribution functions is needed so that some appropriate metric measuring distance between them is minimized. For example, the Wasserstein metric d2(F;G) is the minimal value over all couplings of (X; Y ) of p􀀀 f(X - Y )2g. That minimal coupling occurs when (X; Y ) = (Finv(U);Ginv(U)) (see, e.g., Rachev, 1991). Finally, if we wish to simulate the maximum M of n i.i.d. random variables, each distributed as X = F inv(U), then noting that the maximum of n i.i.d. uniform [0; 1] random variables is distributed as U1=n, we see that M can be simulated as Finv(U1/n).

Deleting from vector in linear time in place

Как за линейное время удалять из вектора числа, обладающие определенным свойством

Нужно это делать в том же векторе

прогнать по массиву два указателя в одном цикле, "схлопывая" ненужные элементы, Один указатель растёт автоматически и показывает элемент, который мы проверяем. Другой указатель смотрит на ячейку, в которую мы элемент хотим записать - его мы увеличиваем сами, после того, как записали. После прохода второй указатель показывает на первую ячейку, оставшуюся пустой - с неё и до конца нужно элементы убрать.

vector erase и remove(_if)

Page updated

Google Sites

Report abuse

algorithms

Math libraries

// Reverses zero-pointer terminated singly linked list of nodes of type T.

template<class T>

void reverse(T*& list_head) throw()

{

T* n = list_head;

list_head = 0;

while (n) {

T* nn = n->next;

n->next = list_head;

list_head = n;

n = nn;

}

}

node* reverse_node_list(node* old_head) {

node* new_head = 0;

while(old_head) {

node* tmp_head = old_head;

old_head = old_head->next;

tmp_head->next = new_head;

new_head = tmp_head;

}

return new_head;

}

time dd if=/dev/zero of=/tmp/test count=200000

200000+0 records in

200000+0 records out

0.20user 0.75system 0:00.94elapsed 100%CPU (0avgtext+0avgdata 0maxresident)

wrote a 100MB file in 0.94 seconds

Complexity

Complexity of Sorting:

Radix Sort

C++ implementation of Radix Sort (code from Russian Wikipedia)

Sort Algorithms

String matching algorithms

Suffix Trees http://www.ddj.com/architect/184404588

Trie

Patricia Tries

Ternary search tree

Factorial

External Sorting

Data Structures

Binary Trees

Binary Search Tree

Self-balancing Binary Search Trees: AVL trees, Red - Black trees / Splay Tree

Skiplist

Heap

Judy Array

Hashing

Collisions

Random number generator (uniform)

Random number generator (non-uniform)