The A-Z of AI for Legal Business Development

DataPlus AI
November 6, 2023
min read

The A-Z of Generative AI for Legal BD

As AI becomes part of daily life in the legal industry, it's essential for attorneys and staff to become familiar with Generative AI concepts and terminology.

At DataPlus, we understand the need for jargon-free knowledge so we created a list of terms and concepts for non-technical professionals.

Here is the list of terms and concepts organized alphabetically:



Actuators - Actuators are the components or modules within an AI agent that enable it to take actions based on its decision-making processes. These actions can include generating text, sending emails or notifications, controlling physical devices, or any other form of output that allows the agent to interact with its environment or users.

Agents - Agents refer to a system that connects an LLM with other components such as tools and memory, so that it can perform specific tasks and take actions based on the input given by a user.

Algorithm - A set of instructions or rules that precisely defines a sequence of operations to be performed to solve a problem or accomplish a task. Algorithms are the basic building blocks of computer programs and are designed to complete tasks efficiently and accurately.

Algorithm Bias - Systematic and repeatable errors in a computer system that create unfair outcomes, such as privileging one arbitrary dataset over others.

Alignment - Alignment refers to the process of ensuring that the behavior of the models is in accordance with human values, intentions, and desired outcomes. This means making sure the LLMs are safe, follow human ethical standards, and are robust enough to withstand adversarial attacks, so they aren’t easily derailed by unexpected inputs.

Artificial Intelligence (AI) - The simulation of human intelligence processes by machines, especially computer systems.

Attention Mechanisms - Components of neural networks that weigh the importance of different parts of the input data, critical in AI models like Transformers.  

Autoencoders - A type of artificial neural network used to learn efficient representations of data, typically for dimensionality reduction.



Backpropagation - A method used to train artificial neural networks. It works by calculating the error at the output of the network, and then propagating that error back through the network's layers to update the weights of the connections between neurons.

Benchmarks - A standardized set of tests used to objectively measure the performance of an LLM. It serves as a reference point against which various models can be compared. For example, the “Massive Multi-Task Language Understanding” (MMLU) is the primary benchmark to measure the knowledge acquired by an LLM during pre-training. It consists of 57 tasks ranging from mathematics, law, and science, to logic, moral reasoning, and computer science. Benchmarks vary in specificity. For example, Human-Eval is the primary benchmark to evaluate the programming ability of LLMs, and Spyder Benchmark is even more specific, as it evaluates performance on Text-to-SQL translation.

Bias-Variance Tradeoff - The property that models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa.



Chain-of-Thought Prompting - It refers to a prompting technique that instructs the LLM to break down a problem into a series of intermediate reasoning steps, to improve its ability to solve more complex tasks. For example, you may ask the LLM to solve a problem by specifically following the steps you have outlined in the prompt.

ChatGPT - A large language model that can have impressively human like conversations.

Chunks/Chunking - A small collection of tokens that are related or have some shared meaning. Chunks are useful because they capture some semantic meaning and relationships between tokens.

Context Window - The maximum number of tokens that an LLM can take as input in a single request to generate an appropriate output. Larger context windows take longer to process and are more expensive but can allow for more complex tasks, such as summarizing entire documents.

Continuous Active Learning (CAL) - An application of AI in which the system learns to correct itself after it has learned to differentiate between responsive and nonresponsive concepts via supervised learning.

Convolutional Neural Networks (CNNs) - CNNs are a type of artificial neural network commonly used for image processing and computer vision tasks. The "convolutional" part refers to filters that extract features from input images and classify/process them based on the learned features. This process pools image layers, giving them uniqueness and making them ideal for computer vision tasks.

Cosine Similarity - When words or sentences are encoded into vectors, the semantic similarity between these pieces of text can be gauged using the cosine similarity metric, a mathematical formula to measure the angle between the vectors. A value close to 1 means the vectors are very similar, closer to 0 means they are not.


Data Poisoning - The act of manipulating the data used to train AI models with the intent to lead to incorrect or biased outputs.

Data Privacy - The aspect of data protection that deals with the proper handling of data—consent, notice, and regulatory obligations.

Data Sets - Large collections of data used to train generative AI models.

Deep Learning (DL) - A subset of Machine Learning that uses neural networks with many layers to learn from large amounts of data.

Deepfakes - Synthetic media in which a person's likeness is replaced with someone else's likeness, often using Generative AI.

Diffusion Models - Generative models that gradually create images by "diffusing" noise into an image over repeated iterations.



Embeddings - Embeddings refer to the numerical representation of words and phrases as vectors. These numbers aren't random; they're crafted to capture the meaning and context of each word, which enables LLMs to process the inputs more efficiently. Words with similar meanings or contexts have similar numerical vectors.

Ethics (AI) - The branch of ethics that examines the moral issues related to AI, including its development and implementation.



Federated Learning - A machine learning technique where the model is trained across many decentralized devices or servers holding local data samples, without exchanging them.

Fine-Tuning - The process of taking a pre-trained model and adjusting its parameters slightly to adapt it for a particular task.

Foundation Models - Often referred to as 'base models,' are large machine learning models trained on massive datasets for broad applications that can be customized for more specialized tasks.



GPT - Generative Pre-trained Transformer, such as GPT-3 by OpenAI which can generate human-like text.

GPT-3 - A large language model by OpenAI capable of generating human-like text.

Generative Adversarial Networks (GANs) - A system of two neural networks contesting with each other to generate new, synthetic instances of data.

Generative AI - The broad category of AI systems that can generate new content like text, images, audio, video, etc. that is similar but not identical to the data it was trained on.

Gradient Descent - Is an optimization algorithm used to minimize the loss function. Once we've calculated the gradient of the loss function with respect to the weights and biases (using backpropagation), we can use gradient descent to adjust those weights and biases in the direction that reduces the loss function the most.



Hallucination - When generative models produce inaccurate outputs that do not accurately reflect the facts in response to the user intent.

Heuristics - Rules-of-thumb for problem-solving, not necessarily precise or reliable. Heuristics are simple rules and strategies derived from experience, common sense or intuition that help speed up problem solving and find fairly good solutions, even if not the very best. Heuristics trade off accuracy for speed which is useful in many real-world applications.

Hyperparameters - High-level settings of the model and training process that are not learned. They are set by the user before training begins .Hyperparameters significantly impact the model's performance and training time. Examples include the number of layers in the neural network, the number of neurons per layer, and the type of activation functions used.



Image Generation - Using generative AI to create or edit digital images and artwork.

Image-to-Image - Generating new edited or enhanced images by providing an existing image and text prompt.

Inference - The process of using a trained AI model to make predictions.



Joint Probability Distribution - In generative models, it represents the probability across different variables together.



Knowledge Representation - Ways in which AI models perceive the world.

Knowledge Graph - Databases of facts and relationships that generative AI models can reference to improve capabilities.



Large Language Models (LLMs) - AI systems trained on massive text datasets that can generate human-like text such as GPT (Generative Pretrained Transformer).

Latent Space - latent space is an abstract representation of data that is learned by a neural network model. It is a hidden state space that encodes the most salient features of the input data.



Machine Learning (ML) - A subset of AI that involves computers learning from data to make decisions or predictions.

Multi-Agent System - Multi-agent systems involve the collaboration and interaction of multiple AI agents, each with potentially unique capabilities, knowledge, or roles. These agents work together to achieve a common goal or solve complex problems that may be beyond the scope of a single agent. Communication, coordination, and negotiation among agents are key aspects of multi-agent systems, enabling them to leverage their collective intelligence and capabilities.

Multimodal Models - Generative models that can take multiple input formats (text, audio, images, or video) and can generate outputs with single or mixed formats.



Natural Language Processing (NLP) - The field of AI that enables understanding, interaction, and generation of human language by a machine.

Neural Networks - The core machine learning models behind most modern AI. Designed to loosely mimic how the human brain works and inspired by the biological neural networks of humans and animals.

Non-Determinism & Probabilistic Outputs - Because LLMs make predictions when they generate text, their output is considered to be non-deterministic. They do not consistently generate the exact same answer each time you ask them the same question. Their output is probabilistic, where each token of the output has an accuracy probability. This doesn't mean that the output is unreliable. We can consider the outputs to be "different ways of saying the same thing".



Overfitting - A model that is too closely fit to a specific set of data, making it less flexible.



Parameters - All the configurable variables that the AI system can modify and adjust to improve its performance. The goal is to tweak parameters to optimize the system's performance on a given task.

Percepts - In the context of AI agents, percepts refer to the sensory inputs or data that agents receive from their environment. These inputs can include text, images, audio, or any other form of data that allows agents to perceive and understand their surroundings.

Predictive Models - Models that predict outcomes based on input data.

Pre-Training - The process where the foundation model is initially trained by its creator, with a vast and diverse corpus of data to perform desired tasks. For LLMs, pre-training helps the model learn language syntax, facts, and underlying patterns to accurately predict the next word in a sentence.

Proactive Agents - Proactive agents are AI agents that possess the ability to take initiative and act based on their own goals, expectations, or desires. They maintain an internal state and can plan and reason about future actions. Proactive agents are capable of operating effectively in more complex and dynamic environments, where they need to anticipate changes and adapt their behavior accordingly.

Prompt - An instruction given to an AI model to generate a specific output.

Prompt Engineering - Crafting the text prompts fed into generative AI models to produce better results.



Quantization - A technique to compress the number of bits of generative AI models, enabling optimization of models for deployment on consumer devices.



Reactive Agents - Reactive agents are a type of AI agent that operates based on a simple stimulus-response mechanism. They do not maintain an internal state or memory, and their actions are determined solely by the current input they receive. Reactive agents are well-suited for performing straightforward tasks in static or predictable environments where quick responses are required.

Recurrent Neural Networks (RNNs) - A type of neural network well-suited for sequence prediction.

Reflexion & Iterative Self-improvement - This is another set of techniques used in prompting to enhance the problem-solving capabilities of LLMs by asking them to review and critique their own outputs, and to correct their answers until they are fully satisfied with the answer. This self-guided refinement technique has shown promising results in increasing the problem-solving abilities of LLMs.

Reinforcement Learning - An area of machine learning where an AI model learns to make decisions by performing certain actions and receiving rewards or penalties to achieve desired behaviors in a virtual environment.

Retrieval-Augmented Generation (RAG) - RAG is an AI framework for retrieving facts from an external knowledge base to ground large language models (LLMs) on the most accurate, up-to-date information. These knowledge bases can be structured or unstructured databases, websites, PDF documents, etc.


Self-Attention Mechanism - This mechanism enables the transformer model to process and understand the relationships and dependencies between words in a sentence. The learning of dependencies is a key element in many natural language processing tasks, such as reading comprehension, text summarization, and question answering. The concept of attention heads and multi-head attention are essential components that enable the model to capture different aspects of the input sequence and gain a richer understanding of the text.

Semi-Supervised Learning - A mix of supervised and unsupervised learning.

Speech-to-Text - Technologies that convert human speech into text.

Supervised Learning – The training of a model using known input and output data so that it can predict future outputs.

Synthetic Data - Artificially generated data produced by computer algorithms, used for training AI models when real data is scarce or sensitive.



Text-to-Image Generation - The process of creating images from textual descriptions, often using models like DALL-E or similar AI systems.

Text-to-Speech (TTS) - Technologies that convert text into human-like speech, increasingly using Generative AI for natural-sounding voices.

Token - A basic unit of meaning in natural language processing. When working with large amounts of text data, it's common to break the text down into smaller pieces called tokens.

Tokenization - The process of splitting text into tokens. This breaks the text into digestible chunks that are easier for an AI system to understand and process. Tokens allow the AI to analyze relationships between words, their frequency, and other patterns.

Training Corpus - The collection of datasets used to pre-train the models.

Transfer Learning - Taking a pre-trained model and repurposing it for a different but related task. This is a popular approach in deep learning where pre-trained models are used as the starting point.

Transformers - A neural network architecture particularly well-suited for language-related tasks.



Underfitting - The phenomena in machine learning where a model does not learn the training data well enough, impacting its performance on new data.

Unsupervised Learning - Training a model to find patterns in data without pre-existing labels.



Variational Autoencoder (VAE) - A type of AI model architecture used for generating new data similar to the training data.

Vector Database - Specialized databases designed to handle high-dimensional vectors like word, audio, and image embeddings. They are optimized to store, retrieve and perform similarity search across vectors efficiently. They are used in various applications such as Gen-AI-powered apps, recommendation systems, and search engines.



Weights - The parameters within neural networks that transform input data within the model's layers.

Whisper - A Generative AI model by OpenAI focused on speech recognition and voice synthesis.



Xrisk - Existential risk from advanced AI, including concerns like misuse of generative models.

XeXplainable AI (XAI) - AI that is designed to be transparent and explainable to human users.



Yield - In the context of AI, yield refers to the output or the results obtained from an AI system.



Zero-shot Learning - The ability of an AI model to solve a task without having received any explicit training for that task.

Share this post

Checkout our latest posts

Law firms face significant revenue and profit margin declines from the use of AI in the practice of law. DataPlus uses AI to help law firms find new business opportunities and expand market share by identifying trends, and tailoring pitches. As law firms embrace innovation, law firms that leverage AI will remain competitive and achieve long-term success.
DataPlus AI
May 16, 2024
min read
Law firm Knowledge Managers using Generative AI tools will rapidly become Revenue Scouts. Equipped with LLMs, live web data, 3rd-party licensed content, and proprietary information, this new class of Revenue Scouts will support fee earners and guide law firms toward sustainable growth in an increasingly competitive industry.
DataPlus AI
March 6, 2024
min read
As law firms face a more complex and competitive business development environment, Generative AI offers a compelling answer: those who embrace this technology will find themselves at the forefront of the market and enjoy a durable competitive advantage.
DataPlus AI
February 29, 2024
min read
The Playbook for adopting AI across Legal Business Development, Research, Competitive Intelligence, and Knowledge Management is a collection of insights that DataPlus has identified from interactions with different law firms looking to add Generative AI to the business of law.
DataPlus AI
February 5, 2024
min read
ChatGPT is the mouth....Not the Brain.....While ChatGPT seems impressive, using it as a research tool for legal business development, research, or competitive intelligence purposes would be unwise. Its lack of a relevant, updated, and nuanced knowledge base and context compromises the accuracy and reliability of its output. For reliable, professional use cases, it's recommended to use a purpose-built product grounded in industry-trusted data sets that ensure outputs reflect the nuances of the legal world.
DataPlus AI
January 29, 2024
min read
Traditional BD approaches like entertainment and customer relationship management (CRM) entries, while still relevant, are no longer sufficient. The integration of Large Language Models (LLMs) into business development strategies offers a groundbreaking approach to research, analysis, and insight generation.
DataPlus AI
November 14, 2023
min read
A growing number of law firms are experimenting with Generative AI for different parts of the business to gain competitive insights, improve workflows, and spend less time on manual tasks so lawyers and legal business professionals can dedicate more time to focusing on clients.
DataPlus AI
November 7, 2023
min read
As AI becomes part of daily life in the legal industry, it's essential for attorneys and staff to become familiar with Generative AI concepts and terminology. At DataPlus, we understand the need for jargon-free knowledge so we created a list of terms and concepts for non-technical professionals.
DataPlus AI
November 6, 2023
min read
The DataPlus blog is a jargon-free hub where you can read about our product updates, law firms adopting AI, and the latest AI breakthroughs.
DataPlus AI
October 29, 2023
min read

Find & Secure New Matters

Start searching, analyzing, and generating revenue-oriented insights today