Author: Dr. Noman Islam
Introduction
Computer applications help its users in performing its tasks. For certain applications, maintaining good performance is a critical requirement. Most of the scientific computing applications fall into this category. Many desktop applications including gaming, embedded computing, voice and video applications demand very good performance. Performance analysis, therefore, becomes a crucial requirement for these applications. Different tools have been developed during past few years to assist the users in analyzing performance of an application. The goal of these tools is to understand the behavior of an application on a given platform. This article discusses one such tool called VTune amplifier, a performance profiler for serial and parallel performance analysis.
VTune amplifier
VTune Amplifier is a premier performance profiler developed by Intel. It is available for C, C++, C#, Fortran, Assembly and Java languages. The application assists in various kinds of code profiling. This includes sampling the stack, thread profiling and hardware event sampling etc. The profile can tell the time spent in each sub routine of the application. User can drill down further to the instruction level. The time taken by the instruction, for instance, is an indication of possible stalls in the pipeline during instruction execution. In addition, the tool can be also used to analyze thread performance. The Event-Based sampling feature of the tool provides an accurate representation of software's actual performance with little impact on program execution. In addition, the Call Graph Profiling offers a pictorial view of program flow to help in quickly identifying critical functions and call sequences. This helps in gaining a high-level, algorithmic view of program execution.
Features
It helps in identifying the time-consuming functions, sections of code with non-optimal utilization of processor time, code portions for sequential performance optimization and threaded performance. Synchronization objects affecting the application performance can also be located. One can find whether, where, and why an application spends time on input/output operations. Users can identify and compare the performance impact of different synchronization methods, numbers of threads or different algorithms on an application. Users can also analyze thread activity and transitions in detail. They can locate hardware-related bottlenecks in code. In VTune amplifier, one can perform three types of analysis: algorithm analysis, hardware-level analysis and power analysis. Next paragraphs discuss them in detail.
Algorithm analysis
Algorithm analysis is performed for tuning different types of software. The analysis is performed on data collected after the corresponding component is run. It is used to spot out the issues in the algorithm being analyzed. Algorithm analysis involves interrupting the processor at the specified sampling interval to collects samples of instruction addresses, identifying functions that are taking most of the CPU time to execute. Analysis can also be performed to see how well the application is threaded for the particular number of CPUs. Synchronization issues arising due to lock and waits can also be identified during algorithm analysis.
Advanced hardware-level analysis
The advanced hardware-level analysis is targeted for particular Intel processors and micro-architectures. The feature is available for different types of architecture. Using this feature, one can identify the most significant hardware issues affecting the performance of application, bandwidth issues affecting the performance of application, and bandwidth breakdown issues. It also helps to understand how shared memory cacheline contention, lock contention, true and false sharing issues affect the performance of application. One can also analyze performance issues in the core pipeline, and memory access issues; and performance impact from accessing too many memory pages, having L1 data cache and L2 cache load driven misses.
Power analysis
Power Analysis branch is based on user-mode sampling and tracing data collection. It assists in identifying behaviors that may potentially cause unnecessary power consumption. Power analysis includes analysis of CPU sleep states that detects when and what causes the hardware to wake up from a sleep state. This analysis uses a driver for collection and can provide specific source code references for timers that wake up the hardware. Another type of analysis is CPU frequency that explores processor frequency changes.
Conclusion
This article briefly discusses VTune amplifier tool. There are various other tools available for code analysis that one can try. For instance, Intel Parallel Inspector can be used to add memory and thread checking into Microsoft Visual Studio. The Intel Parallel Advisor helps Microsoft Visual Studio C++ developers to play around with parallel application design and implementation by providing step-by-step proposals. Similarly, AMD CodeAnalyst is a GUI-based code profiler for x86 and x86-64-based machines.
Tip
VTune amplifier also provides a command-line interface for code analysis. To use VTune from command, one needs to load the vtune module first and specify the analysis type of interests. Type following on terminal: module load vtune amplxe-cl -collect $analysis_type -result-dir $yourprof_dir – myApplication, where $analysis_type is the options that users can chose for analyzing the performance on different sub-system processor
Author: Dr. Noman Islam
Introduction
In today’s world, internet has become an essential component of daily life. Internet connects millions of computers, devices and users throughout the world. Most of the modern applications including voice, video, telephony, social networking and cloud computing etc. are based on internet. The performances of these applications are dependent on optimal utilization of underlying network. Hence, it is very essential to employ networking tools to monitor and analyze the network. During last few years, different performance monitoring tools have emerged to investigate network performance metrics. This article introduces its readers to bing, a bandwidth measurement tool. The article discusses tool’s usage, available options and its internal working.
What is bing all about?
Bing is a network utility written by Pierre Beyssac. It enables measurement of bandwidth between two computers on the network. Unlike other tools, bing measures the real throughput between two computers that are remote to each other. So, if a link is saturated and shared among multiple users and one user is getting few Kbps out of link, bing will be able determine if it is a 56 kbps link or 1 MB connection. Of course, it generates some traffic on the network by sending ICMP requests. Hence, this tool is intended for use only during network analysis and management. Because of the additional overhead put on the network, it is not advised to use bing during normal operations.
Installing bing on your computer
As bing is not available by default on Linux, it can be downloaded from the URL http://fgouget.free.fr/bing/ping_src-0.1.4.zip. The next step is to run the make file. A make file is a script that describes the source code files, inter-dependencies, compiler arguments, and target output settings of a software. Type make on terminal to run the file. In order for bing to be installed, root permissions are required to be able to make ICMP packets. So, type the following on terminal: “$ make $ su root # make install”. That’s it. bing should be installed on the system.
Using bing to measure bandwidth
Since, bing measures point-to-point bandwidth, first traceroute should be run to determine the intermediate hops along a path. traceroute is a network diagnostic tool for determining the routes along a path. In addition, it also measures transmission delays of packets across the network. After the links along the path is determined, bing tool can be run by specifying the near and far ends of the link on the terminal. Type the following line on terminal: “bing -e10 -c1 IP1 IP2”, where IP1 and IP2 specify the nearest and farthest endpoints. Bing will start outputting statistics about round-trip times, packet loss data and estimates of throughput. This includes estimated and average throughput, minimum and average delay per packet etc.
bing options
Bing provides a comprehensive list of options to control packet sizes, counts, debugging options, name resolution, routing, wait interval, patterns and verbose options. Using option –D, for instance, displays the measured throughput at every received packet. The option -i wait can be used for specifying the number of seconds to wait for each ECHO_REPLY packet. Similarly, the options –s and –S can be used to specify the number of data bytes to be sent. Finally, –v is used to print the output on the screen.
How does bing work?
Bing uses Internet Control Message Protocol (ICMP) for its operation. ICMP is a protocol used for diagnostic or control purposes. It is also used by Internet Protocol (IP) to generate messages in response to errors occurred during IP operations. Among the various ICMP messages, an Echo request is sent to a target host and a response packet is sent in response to the request. Bing uses ICMP Echo request for its operation. It transmits ICMP Echo Request packets of various sizes to the end point. Then bing observes the resulting RTT change. As RTT can be varied for different measurements; bing takes make multiple measurements. Then minimum RTT for each host and packet size is chosen.
Conclusion
This article introduces the reader to bing network monitoring tool. Besides bing, there are a number of network monitoring and analysis tools. For instance, netperf is a throughput performance measurement tool. iperf is a tool that can also used to test UDP bandwidth, loss, and jitter etc. of network. Other bandwidth measurement tools are ping and pathchar etc. Since this article introduces the reader to basic options of bing, interested readers can see bing manual for advanced options. Bing manual can be seen at http://fgouget.free.fr/bing/bing_src-man.shtml.
Tip
Bing is a point-to-point bandwidth measurement tool based on ping. The ping utility is commonly used to test the reachability of a host on an Internet Protocol (IP). In addition, round-trip time for messages sent from the originating host to a destination computer can also be measured using ping. To use ping, type “ping IP” on terminal. If the host can be reached, TTL, round trip time and other statistics will be displayed.
Author: Dr. Noman Islam
Introduction
During past few years, tremendous advancements have been observed in the domain of networking. Most of the computing is dependent on internet in today’s world. Despite the advancements in technologies, errors in data transmission are still a normal phenomenon. For instance, if a file is being downloaded from internet, network connection loss or sudden machine failures can occur. During transfer of a large file from one disk to another, a disk failure can result. These errors are usually very small; but a single bit error can have serious consequences. It not only affects the quality of data, but can often make them useless. Cheksum is one the widely used mechanisms to verify integrity of data. This article introduces the readers to fundamentals of checksum and various types of checksum algorithms currently available. The article also discusses ‘cksum’ command, a utility which is used to calculate the checksum of a block of data on Linux platform.
Checksum
A checksum is a hash or small-size datum computed over a block of data. It is used for detecting errors that might occur during data transmission. To calculate checksum, one applies an algorithm on binary values of packet. The result is stored with the data and transmitted. When the data is later on retrieved at the other end, the checksum is calculated again and compared with the existing checksum. If the two values don’t match, it means an error has occurred during transmission. Checksum has been used extensively in data communication. For example, the Internet Protocol uses checksum of IP header for detecting errors. Similarly, Symantec uses MD5 hashing, which is a form of checksum. While booting from the CD in Linux, an option is given to test the integrity of distribution. This integrity testing is done based on checksums. Finally, digital signatures used for providing security over internet are also based on checksum.
Checksum algorithms
The procedure used to compute checksum over a piece of data is called checksum algorithm. One of the simplest approaches is sum-of-bytes that works by summing the bytes of the message. The drawback of this approach is that no error will be detected if the entire message and data are received as a string of all zeros. A similar approach is to add bytes in the message as unsigned binary numbers, discarding overflow bits. Then the 2’s complement of the total is appended to message as the checksum. At the receiver, all the bytes are added in the same manner, including the checksum. The result should be a sequence of zeros, if the data is transmitted properly. The approaches discussed above can only detect simple types of errors during transmission. They can’t detect errors like reordering of the bytes, inserting or deleting zero-valued bytes and multiple errors that cancel each other out. Advanced algorithms i.e. Fletcher's checksum, Adler-32, and cyclic redundancy checks (CRCs) have been proposed, for detecting these errors. These algorithms address these weaknesses by considering position of bytes in the sequence as well as the value of bytes. The next section discusses CRC in detail.
Cyclic redundancy check (CRC)
Cyclic Redundancy Check (CRC) uses polynomial division to compute checksums of 16 or 32 bits in length. They are more accurate than other algorithms. Hence, if a single bit is incorrect, the CRC value will not match up. The popularity of CRCs lies in ease of implementation in hardware, mathematical analysis, and accurately detecting common errors caused by noise in transmission channels. To compute CRC, the sender applies a 16- or 32-bit polynomial to a block of data. The resultant CRC is appended with data. The receiving end applies the same polynomial to the data and compares its result with the result appended by the sender. If both values match, the data has been received successfully. If there is an error, the sender can be notified to resend the block of data.
Checksum in Linux
Linux has a checksum utility ‘cksum’ that reads the files specified as argument and calculates a 32-bit checksum and the byte count for each file. If no files are specified, it can accept data as input from terminal. The output shown on terminal is the computed checksum, number of bytes and file name. The cksum command uses a CRC algorithm based on the Ethernet standard frame check. To use this utility, type ‘% cksum filename’ on terminal. To display the checksum and the size in bytes of two files, type : ‘%cksum filename1 filename2’.
md5
Message-Digest algorithm 5 (MD5) is an algorithm that is used to verify data integrity through the creation of a 128-bit message digest from data input. It was developed by Professor Ronald L. Rivest of MIT and is commonly used in digital signature. Linux has a md5sum utility that can be used for calculating message digest or checksum based on md5 algorithm. To use md5sum, type ‘%md5sum filename’ on terminal. It will print a 128 bit fingerprint strings (ece4cb124ce4099f9c4e46f948b64474, for instance). All the OpenSource Products have md5checksum string published on their official site. This is to ensure that a right product is provided by the original developer. To verify the integrity of a product, the md5checksum string computed on local system can be compared with the string available at the website.
Conclusion
This article introduces its reader to the concept of checksum. Different implementations of checksum have been proposed. An ideal checksum algorithm will have a significantly different value even for small changes made to the input. Linux provides several powerful administrative tools and utilities which helps in management of systems effectively. This article discusses two such utilities: cksum and md5sum. Besides, different implementations of secure hashing algorithm (sha) are also available as Linux utilities. For instance, sum is a utility that calculates checksum of files specified as argument, and also outputs the number of blocks they take on disk. Various checksum computation utilities are also available for different languages and platforms. Bitser is a free Microsoft Windows application that calculates MD5, SHA-1 and SHA-256 sums. Jcksum is a Java library that can be used by developers to compute checksums using different algorithms.
Tip
Check digits and parity bits are special cases of checksums. They are suitable for small blocks of data i.e. SSN, bank account numbers and other personal information etc. There are some error-correcting codes based on checksums that sometimes allow the original data to be recovered in case of errors.
Author: Dr. Noman Islam
Introduction
Operating systems are normally shipped with basic utilities for file manipulation, directory management games and organizers etc. A calculator is also amongst one of the widely used utilities supplied with any operating system. Fortunately, Linux operating system is also equipped with a desk calculator, commonly known as dc. Most of the Linux users prefer do their work from the command-line. The dc utility serves this purpose very well by providing a simple interface for performing mathematical calculations. One of the advantages of doing calculating on the command-line is that history of all the calculations can be seen on console. So, if someone is performing a complex calculation on a calculator, he can easily consult the console to ensure he has typed in the numbers correctly and in the right order. In this article, a quick introduction to dc and its various features is provided.
Introduction to dc
dc is a simple cross-platform, command line, Linux desk calculator utility based on reverse polish notation. In reverse polish notation (also called prefix notation), the operands are first specified that is followed by the operators of the calculation. So, to multiply two numbers a and b, the statement a b * is specified. dc is equipped with a powerful set of features including support for standard operators, macros, conditions and loops etc. Another distinctive feature of dc is its unlimited precision arithmetic. Running dc is very simple. Open the Linux terminal and write dc. A set of optional command line arguments can also be specified. The command line arguments are normally filenames from which dc can read a batch of instructions for execution. However, normally dc reads from standard input. The output usually goes to standard output. After starting dc, one can start doing calculations. To multiply two number, type the line 4 5 *p q, that tells the computer to push operands 4 and 5 on stack, then multiply (*) top two elements of stack and pushes (q) the result back on stack. The 'q' command terminates the current dc instance.
Basic operations
dc supports all the classic mathematical operations i.e. addition (+), subtraction (-), division (/), multiplication (*), remainder (%), exponentiation (^) and square root (v). To divide two numbers, one should write 4 3 / p. This command sequence outputs '1.33333’, provided the precision has been adjusted using dc parameters. The dc output is converted in base 10, by default. The base can be changed with parameter o for printed numbers, and i for input numbers. For instance, 29 16 o p will print 1D, the value of 10 in hexadecimal.
Registers
All good calculators are equipped with option for memorizing some values. dc provides register for this purpose. A register is a storage location references by a character. Users can store numerical and string values in register that can be retrieved later on using dc instructions. There are 256 registers supported by dc. To store the top of the stack on a register a, write sa. Similarly, lc will push the value of register c on top of stack. The register can contain more than just a value. In fact, each register is a stack on its own. So values can be added or removed from register. For instance, 12Sa pushes the value 12 on register a.
Macros
Macros are a set of instructions that can be saved and invoked more than once. Macros can be implemented by specifying the set of operations to be performed as a string and save into stack. For instance, [d d * *] sm is a macro to square a number. To execute a macro on top of stack, x command is used. The line 3 lm x p will print the square.
Conditions and loops
dc also provides the option to specify conditional statements. The command '>r' will execute a macro if the top of stack is greater than second element of the stack. So, [[Greater]p] sR 6 7 >R will print Greater. Among the various conditionals available in dc are '>', '!>', '<', '!<', '!='. Similar to conditions, looping can be done by defining a macro which recursively invokes itself. The statement [d1-d1<F*]dsFxp will print the factorial of a number using recursion.
Conclusion
This tutorial provides a brief introduction to the Linux desktop calculator. However, it has not covered every feature of dc. For instance, arrays have not been discussed and looping has been touched briefly. For a comprehensive overview of features of dc, consult the dc manual.
Tip
Besides dc, Linux also has ‘bc’, a basic calculator that does integer calculations by default. Unlike dc, bc uses infix notation for its operation. Modern implementations of dc uses bc’s library for arithmetic operation.
Author: Dr. Noman Islam
Introduction
In Linux, users often see a list of files in /dev folder. Most of these files have similar names as the devices attached to the system. These files are called device files as they enable interfacing with corresponding devices. Depending on the kernel, installed features, and hardware present, a user will see different list of files. Among these devices is a null device (/dev/null), which is the topic of discussion of this article. A null device is among the various pseudo devices available in Linux. It is used by developer to redirect output of a program that they don’t want to see on the screen.
Linux devices
One of the distinctive features of Linux is that it treats every device as ordinary file. A device file provides an interface to peripheral devices connected to the system. In addition, device also provides a convenient and uniform approach to access other resources besides physical devices. For instance, /dev/random is to get a stream of random numbers. To see the list of major devices on your system, type the command lsdev on your terminal. Linux classifies all the devices as of two types. These are character and block devices. Character devices process one character at a time while block devices process block of characters. Examples of character devices are key board, mouse, and serial modem etc. CD-ROM and hard disks are examples of common block devices.
What is a null device?
However, there are few devices in Linux that do not to correspond to actual physical devices. These devices are called pseudo-devices. Examples of pseudo devices are /dev/null, /dev/full and /dev/random etc. A null device is a special type of pseudo device in Linux that discards all data written to it and produces no output. It essentially serves as an infinite data source and data sink. All the data written to /dev/null will be casted off and a success message is displayed. The reads from /dev/null returns end-of-file (EOF) immediately.
When to use null device?
One can think of null device as recycle bin or a doorway to nowhere. It provides a handy approach to programmers to funnel those things that they want to see on the screen or anywhere else. It is not normally used by ordinary user and is used during testing and debugging of programs. In addition, null device is also used as test empty file for input streams. Sometimes, often intruders redirect logging to /dev/null while rooting a machine. To use null device, one needs to have the understanding of redirection. The next section provides a brief introduction about redirection.
Standard in, out, and error
There are 3 basic file descriptors available in Linux. These are stdin, stdout and stderr. stdin refers to standard input, stdout means standard output and stderr points to standard input devices. Besides their names, they can also be referred by number 0, 1 and 2 respectively. Normally, the standard input is connected to the keyboard, and similarly the standard output and error refers to the terminal screen. However, one can redirect stdout and stderr to a file, stdoutput to stderr and vice versa. There are standard redirection operators that are employed for this purpose. The greater-than sign (>) redirects the program’s output to another location and less-than sign (<) is used for redirection of input. The format of a command for standard input and output redirection is % program -[options] [arguments] < input file > output file. For instance ls -l > ls-l.txt will cause the list of files in current directory to be output to a text file.
Redirecting a program’s output to null device
To redirect your program output to null device, one can type “program > /dev/null” at terminal. This will redirect the program’s output to null device. To redirect the error stream to null device, following command will be used “program > /dev/null 2>&1”. Here 2 refers to the stderr stream and 1 refers to stdout stream. So, the above command will redirect stdout to /dev/null and redirects stderr to the place where stdout is pointing.
Conclusion
This short tutorial provides an overview of null device in Linux. Besides the null device, there are a lot of other devices available in /dev folder. For example, /dev/block contains block devices like hard-drives or any other device that handles data in blocks. Similarly, /dev/bus file is used for devices on the bus system and /dev/console device file for display device. As a final note, most of the device files are only accessible by root user. Only commonly used files (CD ROM, standard input and standard output etc.) are accessible for common users. This is to avoid possible security issues that could arise if malicious programs can have access to critical devices available in the /dev folder.
Tip
In programmer’s jargon, null device is called the bit bucket or black hole. Null device is also used by programmers for technical jargon expressions. The iPhone Dev Team commonly uses the phrase "send donations to /dev/null", to simply mean donations are not accepted by them.