This page contains a collection of arbitrary insightful pieces of information appearing in my lectures / seminars / workshops. They are assembled here in a somewhat random order for my convenience and your reference.
Some sections given below are very practical and contain unique knowledge, whereas others are largely lecture-type outputs oriented to learning audiences with beginner's level of expertise. If grabbing essential portions of data, acknowledgments in your PPTs are expected and you are also ncouraged to cite my academic papers & monographs.
Molecular Dynamics versus Monte Carlo Algorithms for Molecular R&D
Molecular dynamics (MD) and Monte Carlo (MC) are two widely used computational methods for simulating the behavior of molecules and materials. While both methods aim to provide insights into the microscopic world, they employ distinct approaches and offer different advantages and limitations.
MD simulations track the motion of individual atoms or molecules over time by numerically solving Newton's equations of motion. This approach provides a detailed, time-resolved picture of the system's evolution, allowing for the study of dynamic processes and the calculation of transport properties. In contrast, MC simulations rely on random sampling to explore the configurational space of the system. By generating a series of random moves and accepting or rejecting them based on specific criteria, MC methods can efficiently sample the equilibrium distribution of the system. This approach is particularly well-suited for calculating thermodynamic properties and studying systems at equilibrium.
One key difference between MD and MC lies in their treatment of time. MD simulations explicitly follow the time evolution of the system, providing information about the dynamics of molecular processes. MC simulations, on the other hand, do not have an inherent concept of time and are primarily focused on sampling the equilibrium states of the system. Another important distinction is the nature of the information obtained from each method. MD simulations provide detailed trajectories of individual particles, allowing for the calculation of time-dependent properties such as diffusion coefficients and vibrational frequencies. MC simulations, while not providing explicit dynamical information, can efficiently calculate thermodynamic properties such as free energy and entropy.
The choice between MD and MC depends on the specific goals of the simulation. If the research question involves understanding the dynamics of a process or calculating transport properties, MD is the preferred method. If the focus is on equilibrium properties or sampling the configurational space, MC is more suitable. In some cases, hybrid methods combining MD and MC approaches can be employed to leverage the strengths of both techniques. For instance, MC steps can be incorporated into an MD simulation to enhance the sampling of rare events or to overcome energy barriers.
In summary, MD and MC are valuable computational tools for investigating molecular systems, each offering unique advantages and limitations. MD excels in providing dynamic information, while MC efficiently samples equilibrium states. The choice between these methods depends on the specific research question and the desired information.
Advantages of Python over C++ for Chemists
Python and C++ are both widely used programming languages, each with its own strengths and weaknesses. While C++ has traditionally been favored for its performance and control, Python has gained significant popularity in recent decades, particularly in scientific computing and data analysis. Several advantages of Python over C++ contribute to its growing adoption in these fields.
Python's syntax is simpler and more intuitive than C++, making it easier to learn and use, especially for semi-professional code developers. Reasonable implicity of Python allows researchers to focus on the scientific problem at hand rather than the intricacies of the programming procedure and memory management. Python currently boasts a vast ecosystem of libraries and frameworks specifically designed for scientific computing and data analysis. NumPy, SciPy, Pandas, and Matplotlib are just a few examples of powerful tools that simplify complex tasks and accelerate development. Python's concise syntax and high-level abstractions enable rapid prototyping and development. These advantages of syntax allows researchers to quickly test and iterate on their ideas, facilitating faster progress in research projects.
Python's dynamic typing eliminates the need for explicit variable declarations, reducing code complexity and development time. This flexibility can be particularly advantageous in exploratory data analysis and prototyping. Python has a large and active community of users and developers, providing ample resources, support, and a collaborative environment. This ensures that researchers can readily find solutions to problems and access a wealth of shared knowledge and expertise.
While C++ retains its advantages in performance-critical applications and systems programming, Python's ease of use, extensive libraries, and rapid development capabilities make it a compelling choice for scientific computing and data analysis. As research in these fields continues to grow in complexity and scale, Python's user-friendly nature and rich ecosystem are likely to further solidify its position as a preferred language for scientific exploration. The progress in large language models may change the opinios should it learn to generate reliable codes in low-level languages based on the human prompts. Thus far, artificial intelligence excellentkly memorize the syntax and can even code simple algorithms but fails seriously to implement even simple scientific procedures, like bare Monte Carlo, without essential human inverventions. Let us monitor the progress in the field.
The Chronology of the Major Microsoft Word Releases
MS-DOS/Windows
1983: Word 1.0 (MS-DOS)
1985: Word 2.0 (MS-DOS)
1986: Word 3.0 (MS-DOS)
1987: Word 4.0 (MS-DOS)
1989: Word 5.0 (MS-DOS)
1989: Word for Windows 1.0
1991: Word for Windows 2.0
1993: Word for Windows 6.0 (skipped versions 3, 4, and 5 to align with the Mac version numbering)
1995: Word 95 (also known as Word 7.0)
1997: Word 97
1999: Word 2000 (also known as Word 9.0)
2001: Word 2002 (also known as Word 10.0 or Word XP)
2003: Word 2003
2007: Word 2007
2010: Word 2010
2013: Word 2013
2016: Word 2016
2019: Word 2019
2021: Word 2021
Macintosh
1985: Word 1.0
1987: Word 3.0
1989: Word 4.0
1991: Word 5.0
1993: Word 6.0
1998: Word 98
2000: Word 2001 (also known as Word X)
2001: Word v. X for Mac
2004: Word 2004
2008: Word 2008
2011: Word 2011
2015: Word 2016
2019: Word 2019
2021: Word 2021
This list focuses on major standalone releases. It doesn't include every minor update or versions included with Microsoft Office suites (like Office 365). Keep in mind that the version numbering sometimes differed between the Windows and Mac versions, especially in the early years.
Global Minimum versus Local Minimum Optimization
In optimization problems, finding the global minimum is a central objective. The global minimum represents the absolute lowest point in the entire search space, signifying the optimal solution. However, optimization algorithms often encounter local minima, which are points that appear to be the minimum within a limited neighborhood but are not the true global minimum. Distinguishing between global and local minima is crucial, as settling for a local minimum can lead to suboptimal solutions. A local minimum may seem like the best solution within a confined region, but a lower point may exist elsewhere in the search space. In contrast, the global minimum guarantees the absolute lowest value of the objective function.
The challenge lies in the fact that optimization algorithms typically operate by iteratively exploring the search space, making incremental moves based on local information. This local search strategy can lead to algorithms getting trapped in local minima, mistaking them for the global minimum.
To address this issue, various optimization techniques incorporate mechanisms to escape local minima and continue the search for the global minimum. First, introducing randomness into the search process can help algorithms jump out of local minima and explore different regions of the search space. Second, initiating the optimization process from multiple starting points increases the chances of finding the global minimum by exploring different trajectories. Third, employing metaheuristic algorithms, such as simulated annealing, kinetic energy perturbations (injections), or genetic algorithms, can guide the search process towards the global minimum by employing strategies inspired by natural phenomena.
Despite these efforts, guaranteeing convergence to the global minimum in complex optimization problems looks like an eternal challenge. The distinction between global and local minima highlights the importance of employing appropriate optimization techniques and carefully evaluating the solutions obtained. By incorporating strategies to escape local minima and explore the search space effectively, optimization algorithms can increase the likelihood of finding the true global minimum and achieving optimal solutions. Global minimum and local minima are essentially different concepts. They imply applying principally differing algorithms.
When We Need a Comma in Compound Sentences, in English
You need a comma in a compound sentence when you're joining two independent clauses with a coordinating conjunction. Independent Clause can stand alone as a complete sentence (it has a subject and a verb). Coordinating Conjunctions are the words that connect the two clauses: for, and, nor, but, or, yet, so (you can remember them with the acronym FANBOYS).
Here's the basic rule: place a comma before the coordinating conjunction that joins two independent clauses. Example:
"I went to the store, and I bought a sausage."
In this example, "I went to the store" and "I bought some milk" are both independent clauses. They are joined by the coordinating conjunction "and," so we need a comma before "and."
Let's look at examples with other coordinating conjunctions:
"The dog barked, but no one answered the door."
"She wanted to go to the party, yet she knew she had to study."
"He didn't have much money, so he couldn't buy the new phone."
Note: You do not always need a comma with "and." If "and" is connecting two things within the same clause, you don't need a comma.
Example:
"I went to the store and bought some whisky." No comma is needed here because "and" is connecting two verbs within the same clause. If you're ever unsure, it's always better to err on the side of using a comma. It's a much more common mistake to leave out a necessary comma than to include an unnecessary one.
Major Version of Assembly Language
There are quite a few versions of assembly language, as each processor architecture has its own unique set of instructions. Some of the most common assembly languages include:
x86 assembly is used for Intel and AMD processors, commonly found in personal computers in the 2020x.
ARM assembly is used in many mobile devices and embedded systems.
MIPS assembly is used in some embedded systems and gaming consoles.
PowerPC assembly is used in some older computers and gaming consoles.
It's difficult to say exactly how many versions of assembly language exist, but there are likely hundreds or even thousands. The exact number depends on the number of different processor architectures that have been developed over the years.
Popular Modern Assemblers
NASM (Netwide Assembler) is a popular, open-source assembler that supports a variety of architectures, including x86, x86-64, ARM, and more.
MASM (Microsoft Macro Assembler) is a commercial assembler primarily for x86 and x86-64 architectures.
GAS (GNU Assembler) is a part of the GNU Compiler Collection (GCC), it supports a wide range of architectures, including x86, ARM, MIPS, and PowerPC.
TASM (Turbo Assembler) is a popular DOS-based assembler for x86 architectures.
While assemblers are the primary tools for working with assembly language, it is worth noting that some compilers, like GCC, can generate assembly code as an intermediate step in the compilation process. This can be useful for optimization and debugging purposes.
Creating DLL Libraries: Step-by-Step Guide
DLLs (Dynamic Link Libraries) are a powerful tool for modularizing code and sharing functionality across multiple applications. Here's a practical guide on how to create them, focusing on the Windows platform and C++.
1. Project Setup. Choose a development environment: Popular options include Microsoft Visual Studio, Code::Blocks, or a command-line toolchain like MinGW. Create a new DLL project. Configure your project settings to generate a DLL. This typically involves selecting a DLL project template or specifying appropriate linker flags.
2. Write Your Code. Define functions and variables: Write the functions and variables that you want to expose to other applications. Use the __declspec(dllexport) keyword: This keyword marks functions and variables that should be accessible from outside the DLL.
#include <windows.h>
__declspec(dllexport) int AddNumbers(int a, int b)
{
return a + b;
}
3. Compile and Link. Build the project. Use your development environment's build process to compile and link the code into a DLL file. This typically involves using a compiler and linker. Generate the DLL: The output will be a .dll file.
4. Using the DLL. Include the header file. If you have a header file defining the exported functions, include it in your main application. Link the DLL. Link the DLL to your main application. This is usually done by specifying the DLL file or library during the linking process. Call the exported functions. Use the function names defined in the header file to call the functions from the DLL.
#include <iostream>
#include "vvc_dll.h" // Assuming the header file is named vvc_dll.h
int main()
{
int result = AddNumbers(5, 3);
std::cout << "Result: " << result << std::endl;
return 0;
}
While this guide focuses on Windows, DLL creation processes and tools vary across platforms.
You can use functions like LoadLibrary and FreeLibrary to dynamically load and unload DLLs at runtime.
Ensure that your DLL and the applications using it have the necessary dependencies (other DLLs, libraries, etc.).
Consider versioning your DLLs to manage compatibility issues and updates.
Implement proper error handling mechanisms to gracefully handle potential errors during DLL loading, function calls, and unloading.
Some Modern Widely Known LLMs (as of 2025)
OpenAI GPT Series.
GPT-3. One of the most famous LLMs, with over 175 billion parameters.
GPT-4. An advanced version with improved capabilities and more parameters.
GPT-3.5. An intermediate model between GPT-3 and GPT-4, offering enhanced performance and features.
Anthropic Claude.
Developed by Anthropic, Claude is designed to be helpful, harmless, and honest.
Google PaLM Series.
PaLM (Pathways Language Model) . A large-scale LLM developed by Google.
PaLM 2. An updated version with enhanced capabilities.
Meta (formerly Facebook) LLaMA Series.
LLaMA (Large Language Model Meta AI). A family of open-source LLMs with various sizes ranging from 7 billion to 65 billion parameters.
LLaMA 2. An improved version with better performance and efficiency.
Alibaba Qwen.
Developed by Alibaba Cloud, Qwen is a large-scale pre-trained language model designed for a wide range of applications.
Microsoft Turing Series.
Turing-NLG. A large-scale LLM developed by Microsoft.
Turing-1. An advanced version with improved capabilities.
IBM Watson.
IBM has developed several LLMs as part of its Watson family, focusing on natural language processing and understanding.
Stability AI StabilityLM.
Developed by Stability AI, StabilityLM is designed for generating coherent and contextually relevant text.
DeepMind Gopher.
Developed by DeepMind, Gopher is a large-scale LLM with a focus on understanding and generating human-like text.
EleutherAI GPT-J and GPT-NeoX.
GPT-J. An open-source LLM with 6 billion parameters.
GPT-NeoX. An open-source LLM with 175 billion parameters, built on the GPT-J architecture.
NVIDIA Megatron-LM.
Developed by NVIDIA, Megatron-LM is a scalable LLM framework used to train large models efficiently.
Hugging Face Transformers.
Hugging Face offers a wide range of pre-trained LLMs, including models from OpenAI, Google, Meta, and others.
Databricks Dolly.
Developed by Databricks, Dolly is an open-source LLM trained to be helpful, harmless, and honest.
Amazon SageMaker Studio Lab.
Amazon offers pre-trained LLMs and tools for developing and deploying LLMs through SageMaker.
Cohere Command.
Developed by Cohere, Command is an LLM designed for generating human-like text and answering questions.
A100 Models.
Various LLMs optimized for NVIDIA A100 GPUs, including custom models developed by different organizations.
Statistical Programming in the 2020s
R is a programming language software environment designed primarily for statistical computing, data analysis, and graphical representation. R is widely used in academia, research, and industries like finance, healthcare, and technology where data analysis is crucial. It is particularly popular among statisticians, data scientists, and researchers who need to perform complex statistical analyses or create publication-quality visualizations.
Statistical focus. R was created specifically for statistical analysis and data science, with built-in functions for statistical tests, probability distributions, and data manipulation.
Graphics capabilities. R excels at creating high-quality visualizations and plots, with packages like ggplot2 offering sophisticated data visualization options.
Package ecosystem. CRAN (Comprehensive R Archive Network) hosts thousands of specialized packages that extend R's functionality, covering everything from machine learning to geospatial analysis.
Data manipulation. R provides powerful tools for working with data frames, matrices, and other data structures.
Integration. R works well with other languages and tools like SQL, Python, and various database systems.
Manipulation of Paths on Modern Windows Desktop/Workstations
Hard link. A hard link is an additional directory entry for the same file record in the NTFS Master File Table. Both names reference identical file data and metadata. Deletion of one name does not remove the file until all hard links are removed. Hard links work only for files and only within the same NTFS volume. No reparse point is involved.
Directory junction. A junction is an NTFS reparse point that redirects a directory path to another directory on a local NTFS volume. The junction stores an absolute internal NT path to the target. During path resolution, the filesystem substitutes the stored target path. Applications typically perceive a junction as a normal directory. Junctions cannot target network locations.
Symbolic link (file or folder). A symbolic link is a reparse point that stores a substitute path string. The target may be a file or directory. The target may reside on another volume or on a network share. During path traversal, the I/O manager replaces the link with the stored path and restarts resolution. Symbolic links may be relative or absolute. Broken targets remain as links but fail upon access.
To recapitulate, hard links create multiple names for one file record. Junctions and symbolic links create indirection through reparse metadata. Hard links operate at the file record level. Junctions and symbolic links operate at the path resolution level.
On Python & C/C++ Coding in Science
The blunt premise that C code is faster than C++ code is incorrect. Generated machine code determines performance, not language syntax. Modern C++ compilers produce identical assembly for equivalent constructs. Performance differences arise from abstractions, allocation patterns, and algorithmic complexity. The decision criterion should be interface stability and development cost. A C ABI provides maximal compatibility. A C interface remains stable across compilers and toolchains. Python extension mechanisms interact naturally with C linkage. Binary compatibility problems occur frequently with C++ name mangling and runtime differences.
A common architectural pattern uses C++ internally with a C wrapper. Core logic is implemented in modern C++. A thin extern "C" layer exposes functions. Python bindings call the C interface. This pattern combines expressive implementation with robust ABI stability. Pure C implementations offer minimal runtime overhead and simple build pipelines. Maintenance cost increases expectedly due to reduced abstraction facilities. Large numerical libraries historically used C for this reason. Modern projects increasingly prefer C++ for maintainability.
Python binding ecosystems reflect this tradeoff. ctypes and cffi integrate easily with C interfaces. Tools such as pybind11 target C++ directly. Direct C++ bindings simplify object exposure but increase binary fragility across compilers. The dominant performance factor remains algorithm design. Language choice rarely dominates runtime for compiled code. Memory layout, vectorization, and cache behavior dominate execution time.
Your pragmatic guideline from VV Chaban: Use C++ for complex logic and long-term maintenance. Expose a C ABI for maximal interoperability. Use pure C when simplicity or legacy constraints dominate.
Low-Level Programming within Python (on Windows)
Python readily calls functions from DLL libraries. This capability is standard on Windows systems. The simplest mechanism uses the built-in ctypes module. The ctypes module loads a DLL at runtime and binds exported functions. Function signatures must be declared explicitly to ensure correct argument marshaling. Only functions with a C ABI can be called reliably.
Another approach uses the cffi library. The cffi library provides a more expressive foreign function interface. The cffi library supports both ABI-level and API-level bindings. Complex structures and callbacks are easier with cffi.
Direct C++ DLL usage is problematic. Name mangling and ABI instability complicate symbol resolution. A C wrapper layer is standard practice for C++ libraries. Python extension modules represent a different model. Compiled extension modules are DLLs with a specific initialization entry point. Such modules are built using CPython headers and thus loaded like standard libraries via import.
Known practical constraints include matching architecture and calling convention. A 64-bit Python interpreter requires a 64-bit DLL. Calling convention mismatches cause crashes or corrupted data. In summary, Python interoperates cleanly with DLLs exposing a C interface. The dominant tools are ctypes, cffi, or compiled extension modules. Enjoy advanced coding, VVC.
cx_Freeze Module Creates Executables from Python Source Code
cx_Freeze is a Python packaging toolchain used to convert Python applications into standalone executables for convenient and professional distribution. The tool performs static analysis of Python bytecode to determine module dependencies, bundles the Python interpreter, compiled extension modules, and required resources. Next, the toolchain produces platform-specific binaries.
The core mechanism relies on module graph construction. cx_Freeze inspects import statements, resolves dependency trees, and copies required packages into a build directory. The resulting executable embeds a bootloader that initializes an isolated Python runtime environment. This approach avoids a system-level Python installation requirement on target machines. The build process is typically driven by a setup script using setuptools-style configuration. The developer specifies entry points, included and excluded packages, data files, and optimization flags. During compilation, Python source files are byte-compiled to .pyc and packaged with the interpreter and shared libraries. On Windows, the output is usually an .exe with accompanying DLLs. On Linux and macOS, the output consists of native binaries with shared object dependencies.
cx_Freeze supports advanced configuration such as dependency inclusion overrides, namespace package handling, environment variable injection, and executable base selection (console vs GUI). The tool can bundle non-Python assets, including configuration files, templates, and binary resources. Compression options reduce distribution size, though startup latency may increase slightly. Compared with alternatives such as PyInstaller or py2exe, cx_Freeze emphasizes transparency of the build structure and compatibility with standard packaging workflows. The directory layout remains readable, which simplifies debugging missing dependencies. The trade-off involves slightly more manual configuration in complex environments.
Typical use cases include distributing internal scientific tools, GUI applications built with Tkinter or Qt, command-line utilities, and software deployed in restricted environments without Python installations. Performance of compiled executables remains essentially identical to interpreted execution, since no native code compilation occurs. This tool only packages the interpreter but is quite useful to distribute apps in a robust manner.
PYPY for PYTHON
PyPy represents an alternative implementation of the Python language. Pypy has been designed to improve execution speed and reduce runtime overhead relative to the reference interpreter CPython.
The primary architectural distinction lies in the use of a tracing just-in-time compiler, which dynamically translates frequently executed bytecode paths into optimized machine code. This approach contrasts with the traditional interpreter loop in CPython, which executes bytecode instructions sequentially without adaptive compilation. The internal architecture of PyPy is based on a meta-tracing framework. Instead of writing a JIT (just-in-time) compiler directly for Python, the developers implemented an interpreter in a restricted subset of Python called RPython. From this high-level interpreter description, the toolchain automatically generates a tracing JIT. The tracing mechanism records operations along hot execution paths, constructs linear traces, performs optimization passes such as constant folding and allocation removal, and emits machine code specialized for observed runtime types. Guard instructions maintain semantic correctness by deoptimizing when assumptions fail. Memory management in PyPy differs significantly from CPython. CPython relies on reference counting combined with a cyclic garbage collector. PyPy employs a generational moving garbage collector. This design reduces overhead associated with frequent reference count updates and can improve cache locality through object compaction. The absence of pervasive reference counting introduces subtle behavioral differences, particularly in the timing of finalizer invocation and weak reference handling.
Performance characteristics of PyPy depend on workload structure. Long-running, computation-intensive programs benefit from JIT compilation, especially when execution exhibits stable type patterns and repeated loops. Numerical kernels written in pure Python often demonstrate substantial speedups. In contrast, short scripts with limited hot loops may incur startup overhead without sufficient optimization benefit. Extension modules implemented in C can constrain performance, since compatibility with the CPython C API requires an additional abstraction layer. The cpyext compatibility subsystem introduces overhead relative to native CPython execution.
Compatibility remains a central design objective. PyPy targets high fidelity with the Python language specification. Pure Python packages generally function without modification. Binary extension support exists but may present performance penalties. Alternative interfaces, such as CFFI, often yield superior interoperability because they avoid emulating the CPython object model. The JIT infrastructure enables experimental features. Stackless execution models, lightweight co-routines, and sandboxing capabilities have been explored within the PyPy ecosystem. The separation between interpreter description and JIT generation promotes research in dynamic language implementation. Variants of the technology have been applied to languages beyond Python.
Adoption decisions require evaluation of workload profile, dependency structure, and latency constraints. Scientific environments relying heavily on C-accelerated libraries may observe limited gains. Pure Python services with sustained execution cycles often benefit. The trade-off space involves startup latency, memory footprint, extension compatibility, and steady-state throughput. In summary, PyPy exemplifies a meta-tracing approach to dynamic language optimization. The interpreter leverages runtime specialization and generational garbage collection to reduce overhead inherent in bytecode interpretation. Comparative advantage over CPython emerges under workloads with stable, repetitive execution patterns.