Code

Amazon Foundation Model Evaluations Library (FMEval).

FMEval is a library to evaluate Large Language Models (LLMs) and select the best LLM for your use case. The library powers LLM evaluations in SageMaker Studio and Amzon Bedrock.

The library can help evaluate LLMs for the following tasks:

Open-ended generation - the production of natural language as a response to general prompts that do not have a pre-defined structure.
Text summarization - summarizing the most important parts of a text, shortening a text while preserving its meaning.
Question Answering - the generation of a relevant and accurate response to a question.
Classification - assigning a category, such as a label or score, to text based on its content.

The library is also easy to extend allowing to bring your own evaluation algorithm, task, data and LLM-based system to evaluate.

Amazon Fortuna - A Library for Uncertainty Quantification.

Proper estimation of predictive uncertainty is fundamental in applications that involve critical decisions. Uncertainty can be used to assess reliability of model predictions, trigger human intervention, or decide whether a model can be safely deployed in the wild.

Fortuna is a library for uncertainty quantification that makes it easy for users to run benchmarks and bring uncertainty to production systems. Fortuna provides calibration and conformal methods starting from pre-trained models written in any framework, and it further supports several Bayesian inference methods starting from deep learning models written in Flax. The language is designed to be intuitive for practitioners unfamiliar with uncertainty quantification, and is highly configurable.

Check the documentation for a quickstart, examples and references.

Amazon SageMaker Clarify Open Source (bias measures).

Bias detection and mitigation for datasets and models from the AWS service "Amazon SageMaker Clarify" (see official page).

Biases are imbalances in the training data or the prediction behavior of the model across different groups, such as age or income bracket. Biases can result from the data or algorithm used to train your model. For instance, if an ML model is trained primarily on data from middle-aged individuals, it may be less accurate when making predictions involving younger and older people. The field of machine learning provides an opportunity to address biases by detecting them and measuring them in your data and model.

MARTHE - Adatune package (with PyTorch).

AdaTune is a library to perform gradient based hyperparameter tuning for training deep neural networks. AdaTune currently supports tuning of the learning_rate parameter but some of the methods implemented here can be extended to other hyperparameters like momentum or weight_decay etc. AdaTune provides the following gradient based hyperparameter tuning algorithms - HD, RTHO and our newly proposed algorithm, MARTHE. The repository also contains other commonly used non-adaptive learning_rate adaptation strategies like staircase-decay, exponential-decay and cosine-annealing-with-restarts. The library is implemented in PyTorch.

FAR-HO - Gradient-based hyperparameter optimization package (with TensorFlow).

The package implements the three algorithms presented in the paper Forward and Reverse Gradient-Based Hyperparameter Optimization (2017):

Reverse-HO, generalization of algorithms presented in Domke [2012] and MacLaurin et Al. [2015] (without reversable dynamics and "reversable dtype")
Forward-HO
Real-Time Hyperparameter Optimization (RTHO)

MKLpy - A package for Multiple Kernel Learning scikit-compliant.

This package contains:

some MKL algorithms and kernel machines, such as EasyMKL (Python and R) and KOMD;
a meta-MKL-classifier used in multiclass problems according to one-vs-one pattern;
a meta-MKL-classifier for MKL algorithms based on heuristics;
tools to generate and handle list of kernels in an efficient way;
tools to operate over kernels, such as normalization, centering, summation, mean…;
metrics, such as kernel_alignment, radius…;
kernel functions, such as HPK and SSK.