AI Definitions

This page provides definitions and explanations of artificial intelligence terms and concepts, focusing on how AI impacts democracy, elections, and political discourse.

ADAM - Adaptive Moment Estimation - a replacement optimization algorithm for stochastic gradient descent for training deep learning models. Adam combines the best properties of the AdaGrad and RMSProp algorithms to provide an optimization algorithm that can handle sparse gradients on noisy problems

Algorithm - a step-by-step procedure for solving a problem or performing a computation. It is a sequence of instructions that are carried out in a predetermined order to achieve a desired outcome. Algorithms are used in all areas of computer science, as well as in mathematics, engineering, and other fields

Array - a group of related data values (called elements) that are grouped together. All the array elements must be the same data type. In some programming languages, an array is known as a list or vector

Artificial Intelligence - Intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals including humans. A term coined in 1955 by John McCarthy, Stanford’s first faculty member in AI, who defined it as “the science and engineering of making intelligent machines.”

Attention - refers to a concept inspired by human cognition, where models selectively focus on specific parts of an input to improve their understanding and decision-making. Imagine reading a complex sentence: your attention naturally shifts to different words depending on the context, paying more attention to key elements like verbs and subjects. Similarly, AI models utilize attention mechanisms to identify crucial aspects of data, be it words in a sentence, pixels in an image, or sections of audio within a speech

Batch - refers to a method of processing data for artificial intelligence and machine learning models where data is collected and then processed in large groups (batches) at scheduled intervals, instead of continuously processing it as it arrives

Classifier - an algorithm that automatically assigns data points to a range of categories or classes. Within the classifier category, there are two principal models: Supervised: In the supervised learning approach, the model is trained using labeled data; that is, the output has labels

CLIP - Contrastive Language-Image Pre-training - a neural network which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the “zero-shot” capabilities of GPT‑2 and GPT‑3

CNN - Convolution Neural Network - a powerful image processing, artificial intelligence (AI) that use deep learning to perform both generative and descriptive tasks, often using machine vison that includes image and video recognition, along with recommender systems and natural language processing (NLP)

cuDNN - NVIDIA CUDA Deep Neural Network - a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, attention, matmul, pooling, and normalization

Data parallelism - a type of parallelism that divides the data across multiple processors and operates on the data in parallel. It is a common technique used in machine learning to train large models more quickly. One way to implement data parallelism is to use multiple GPUs. Each GPU can be assigned a portion of the training data, and the GPUs can then train their assigned data in parallel. This can significantly reduce the training time of a large machine learning model.

Differentiation - refers to the process of finding the derivatives of the model's output with respect to its input variables. Essentially, it quantifies how much the model's output changes for small changes in the input. This process is crucial in various applications, including training neural networks and understanding model sensitivity

Diffusion Model - A diffusion model starts with training data (e.g., images) and progressively adds Gaussian noise over a series of time steps. This process transforms the original data into pure noise, effectively destroying the original information. The forward diffusion process is a Markov chain, meaning each step depends only on the previous one

Epoch - one cycle through the full training dataset. Usually, training a neural network takes more than a few epochs

Fitness - the quantitative measure of how well a candidate solution or individual performs in solving a given problem, as defined by a fitness function. It guides optimization algorithms, such as genetic algorithms, by indicating which solutions are "fitter" and thus more likely to be selected and propagated to subsequent generations in the search for an optimal outcome

FSL - Few Shot Learning - a subset of what is sometimes referred to more generally as n-shot learning, a category of artificial intelligence that also includes one-shot learning (in which there is only one labeled example of each class to be learned) and zero-shot learning (in which there are no labeled examples at all). While one-shot learning is essentially just a challenging variant of FSL, zero-shot learning is a distinct learning problem that necessitates its own unique methodologies

GAN - Generative Adversarial Network - a machine learning (ML) model in which two neural networks compete with each other to become more accurate in their predictions. GANs typically run unsupervised and use a cooperative zero-sum game framework to learn

GGML - a machine learning (ML) library written in C and C++ with a focus on Transformer inference. The project is open-source and is being actively developed by a growing community. ggml is similar to ML libraries such as PyTorch and TensorFlow

GGUF - a binary format that is optimized for quick loading and saving of models, making it highly efficient for inference purposes. GGUF is designed for use with GGML and other executors. GGUF was developed by @ggerganov who is also the developer of llama.cpp, a popular C/C++ LLM inference framework. Models initially developed in frameworks like PyTorch can be converted to GGUF format for use with those engines

GPT - Generative Pre-trained Transformer - a type of artificial intelligence model that utilizes deep learning techniques, particularly transformer architecture, to generate human-like text. GPT models are pre-trained on large datasets of text from the internet and then fine-tuned for specific tasks such as language translation, text completion, and question answering

Gradient - a derivative of a function that has more than one input variable

Keras - an open-source library that provides a Python interface for artificial neural networks. Keras was first independent software, then integrated into the TensorFlow library, and later added support for more

Latent Space - a lower-dimensional space in which the high-dimensional data is embedded. This space captures the underlying structure or features of the data, and is typically learned by a machine learning model, such as a deep learning model or an autoencoder

LLaMA - Large Language Model Meta AI - developed by Meta AI (formerly Facebook AI)

Logistic Regression - estimates the probability of an event occurring, such as voted or didn’t vote, based on a given data set of independent variables

LoRA - Low-Rank Adaptation - a technique used to adapt machine learning models to new contexts. It can adapt large models to specific uses by adding lightweight pieces to the original model rather than changing the entire model. A data scientist can quickly expand the ways that a model can be used rather than requiring them to build an entirely new model

Loss - the penalty for a bad prediction. That is, loss is a number indicating how bad the model's prediction was on a single example. If the model's prediction is perfect, the loss is zero; otherwise, the loss is greater. Loss functions measure how far an estimated value is from its true value. A loss function maps decisions to their associated costs. Loss functions are not fixed, they change depending on the task in hand and the goal to be met.

ML - Machine Learning - the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence

MNIST - Modified National Institute of Standards and Technology Database - a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning.

Model - a file that has been trained to recognize certain types of patterns. You train a model over a set of data, providing it an algorithm that it can use to reason over and learn from the data

MoE - Mixture of Experts - a machine learning approach that divides an artificial intelligence model into separate sub-networks (or “experts”), each specializing in a subset of the input data, to jointly perform a task

Neural Network - a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. In this sense, neural networks refer to systems of neurons, either organic or artificial in nature

NLP - Natural Language Processing - a subfield of computer science and artificial intelligence that uses machine learning to enable computers to understand and communicate with human language

Numpy - the fundamental package for scientific computing with Python

One Shot - N/A

Overfitting - when a machine learning model has become too attuned to the data on which it was trained and therefore loses its applicability to any other dataset. A model is overfitted when it is so specific to the original data that trying to apply it to data collected in the future would result in problematic or erroneous outcomes and therefore less-than-optimal decisions.

Parameters - parameters encode the knowledge and patterns learned from vast amounts of text data. The more parameters a model has, the more complex it can be and the more information it can potentially process and store. Ex: 1B, 7B, 13B, 30B

PPO - Proximal Policy Optimization - a reinforcement learning algorithm for training an intelligent agent. Specifically, it is a policy gradient method, often used for deep RL when the policy network is very large

Pytorch - an open source machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab. It is free and open-source software released under the Modified BSD license

Pytorch Lightning - a lightweight PyTorch wrapper for high-performance AI research that aims to abstract Deep Learning boilerplate while providing you full control and flexibility over your code. With Lightning, you scale your models not the boilerplate. In PyTorch, models are commonly saved using the state_dict method, which saves the parameters of the model's layers in a dictionary format. This can then be saved to a file using torch.save() function. The file extension may vary, but it's often .pt or .pth

Q-Learning - a reinforcement learning technique where an agent navigates an unknown environment by trial and error, learning which actions are most valuable in specific situations. Imagine a robot exploring a maze. At each intersection, it tries different paths, receiving rewards for reaching the goal and penalties for hitting walls. By repeatedly updating its estimate of how "good" each action is in each state (through the Q-value), the robot eventually learns the optimal path to navigate the maze efficiently. So, Q-learning empowers learning without a map, just through rewards and consequences, leading to optimal decision-making over time

Quantization - The process of converting models to lower precision (e.g., from 32-bit to 8-bit) which can significantly reduce memory footprint

RAG - Retrieval Augmented Generation - a technique that enables large language models to retrieve and incorporate new information. With RAG, LLMs do not respond to user queries until they refer to a specified set of documents. These documents supplement information from the LLM's pre-existing training data. This allows LLMs to use domain-specific and/or updated information that is not available in the training data. For example, this helps LLM-based chatbots access internal company data or generate responses based on authoritative sources

RCC - Recurrent Neural Network - Neural networks with loops in them, allowing information to persist. A Recurrent Neural Network (RNN) is a variation of standard feedforward networks

ReLU - Rectified Linear Unit - a widely used activation function in neural networks, particularly deep learning models. It is known for its simplicity and effectiveness in overcoming the limitations of other activation functions like the sigmoid and tanh functions.

RL - Reinforcement Learning - a branch of machine learning where an agent interacts with an environment, trying to maximize its rewards through trial and error. An RL agent takes actions, receives feedback (rewards or penalties), and adapts its behavior to achieve its objective.

Safetensors - a file format and library used for storing and loading tensors, especially those used in machine learning models. It offers a safer and more efficient alternative to traditional Python pickle files, which can be susceptible to security vulnerabilities. Safetensors are designed to be fast, simple, and secure, making them ideal for sharing and distributing model weights

Sample - a single row of data. It contains inputs that are fed into the algorithm and an output that is used to compare to the prediction and calculate an error. A training dataset is comprised of many rows of data, e.g. many samples. A sample may also be called an instance, an observation, an input vector, or a feature vector

Stable Diffusion - a specific type of diffusion model that was developed by Stability AI. Stable Diffusion models are known for their ability to generate high-quality images with a wide range of styles and features

Step - N/A

Stochastic Gradient Descent - an optimization algorithm used in machine learning to train models, particularly when dealing with large datasets. It's a variation of gradient descent that updates model parameters using gradients calculated from a small batch or even a single data point at each iteration, rather than the entire dataset. This makes SGD computationally efficient and suitable for large-scale datasets

Supervised Learning - a type of machine learning in which the machine is trained on a set of labeled data. The labeled data consists of input data and output data, where the output data is the desired output for the given input data. The machine learns to predict the output data for new input data by finding patterns in the labeled data

Tensor - an algebraic object that describes a multilinear relationship between sets of algebraic objects related to a vector space

Tensorflow - a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. In TensorFlow, models are often saved using the SavedModel format or the older TensorFlow SavedModel format. This typically involves saving the model architecture, weights, and other necessary information in a directory structure. Checkpoint files (with a .ckpt extension) are also used to save the model's parameters

Token - the fundamental building blocks of language. They're not simply words; they can be individual characters, sub-word units, special symbols, or even punctuation marks

Torch - an open-source machine learning library, a scientific computing framework, and a script language based on the Lua programming language. It provides a wide range of algorithms for deep learning, and uses the scripting language LuaJIT, and an underlying C implementation. It was created at IDIAP at EPFL. As of 2018, Torch is no longer in active development. However PyTorch, which is based on the Torch library, is actively developed as of June 2021

Torchaudio - a library for audio and signal processing with PyTorch. It provides I/O, signal and data processing functions, datasets, model implementations and application components

Torchvision - Python library that provides a number of useful tools for computer vision tasks, such as image classification, object detection, and image segmentation. It is built on top of the PyTorch library

Training - the process of teaching an AI model to perform a specific task. This is done by feeding the model a large amount of data and providing it with feedback on its performance. The model then uses this data and feedback to learn how to perform the task better

Transformer - a type of neural network architecture that has been shown to be very effective for a variety of natural language processing (NLP) tasks, such as machine translation, text summarization, and question answering. Transformers were first introduced in the paper "Attention Is All You Need" by Vaswani et al. in 2017. The paper showed that transformers could achieve state-of-the-art results on a number of NLP benchmarks without the need for recurrent neural networks (RNNs), which were the dominant approach to NLP at the time. Transformers work by using a self-attention mechanism to learn long-range dependencies in sequential data. This allows transformers to learn the relationships between words in a sentence, even if the words are separated by a long distance

Unsupervised Learning - a machine learning approach that uses unlabeled data to identify patterns, relationships, and structures within the data. Unlike supervised learning, where models are trained on labeled data, unsupervised learning aims to discover hidden insights without explicit guidance

VAE - Variational Autoencoder - a type of neural network that builds upon the standard autoencoder concept but incorporates probabilistic elements. Unlike a traditional autoencoder which aims to learn a compressed representation of data and reconstruct it perfectly, VAEs focus on learning the underlying probability distribution of that data in its latent space. This allows them, not only to reconstruct the original data, but also to generate entirely new samples that adhere to the same style and characteristics. In essence, VAEs combine the data compression capabilities of autoencoders with the generative power of probabilistic models, making them valuable tools for tasks like image generation, music composition, and data augmentation

Vector Quantization - a technique used in signal processing and data compression to reduce the amount of data needed to represent a signal or dataset while preserving essential information. It involves dividing the continuous signal or dataset into a finite number of discrete regions, called codewords or clusters, and representing each region by a representative vector called a codebook entry. This process allows for efficient storage and transmission of the data by encoding each original data point with the index of the nearest codeword in the codebook. Vector quantization finds applications in image and video compression, speech recognition, and various data compression algorithms

VIM - Vector-quantized Image Modeling - N/A

ViT-VQGAN - Vision-Transformer-based VQGAN - N/A

VQGAN - Vector Quantized Generative Adversarial Network - N/A

Weight - a parameter in a machine learning model that signifies the strength of the connection between units in different layers of the network. These weights determine the impact of input signals on the output during the process of training a neural network. Essentially, they represent the importance or contribution of each input feature to the prediction made by the model. During the training phase of a neural network, the model adjusts these weights iteratively based on the error between its predictions and the actual target values, using techniques like gradient descent optimization. The goal is to learn the optimal set of weights that minimizes the prediction error and allows the model to accurately generalize to new, unseen data. Weights are crucial parameters in neural networks and play a significant role in determining the model's performance and predictive capabilities

YAML - a text document that contains data formatted using YAML (YAML Ain't Markup Language), a human-readable data format used for data serialization. It is used for reading and writing data independent of a specific programming language. YAML files are often configuration files, used to define the settings of a program or application

Zero-Shot - a machine learning scenario in which an AI model is trained to recognize and categorize objects or concepts without having seen any examples of those categories or concepts beforehand