AI Concepts Made Easy: A Beginner's Manual (Part 3 of 5)

The Power Behind Large Language Models like ChatGPT

Feb 22, 2024

In the previous article, we discussed the capabilities of generative AI like ChatGPT to create new ideas just like the human brain. Now, let’s explore what powers generative AI models - specifically, large language models (LLMs) - and how vectors and tensors increase the compute power behind LLMs.

The Power Behind Large Language Models like ChatGPT

You’ve probably heard about the latest AI chatbot sensation ChatGPT. Underlying this clever conversationalist is a large language model (LLM) trained by former AI researchers Dario Amodei and Daniela Amodei. LLM is the key innovation making it possible for AI to generate human-like text and even code.

Let’s unpack what exactly are large language models and how do they work.

LLMs Learn Language Like Humans Do

LLMs are AI systems trained on massive amounts of text data to predict upcoming words based on previous ones. It’s like if you gave a gifted student millions of webpages, books, and articles to read. Over time, they would unconsciously learn the patterns of language.

LLMs do the same - they analyze huge text corpuses to learn nuances like context, grammar, and tone. With enough data, LLMs like ChatGPT reach impressive language fluency.

GPUs Give LLMs Supercharged Learning

Training large language models requires processing massive amounts of text data. To do this efficiently requires specialized computer chips called GPUs (graphics processing units). GPUs offer immense parallel processing power to accelerate the computational demands of LLMs.

With clusters of GPUs, companies can train LLMs on billions of text examples in months rather than decades. GPU acceleration enabled the AI breakthroughs behind chatbots like ChatGPT.

Leading LLM developers like OpenAI, Anthropic, and Meta train models by indexing and downloading internet data. They pre-train LLMs before releasing them publicly. Once public, everyday users prompt ChatGPT, Claude, and other LLMs to further improve their accuracy. However, these models still sometimes hallucinate fabricated answers that seem obviously incorrect to humans.

Here is an example that highlights the hallucination issue that can occur with large language models (LLMs):

Large language models (LLMs) like GPT-3 are closely related to vectors, tensors, and transformers in the following ways:

- Vectors are used to convert the basic units of text data - words or sub-words- into numeric representations that store semantic meaning and relationships. LLMs require converting text to vectors as the first step. They are the basic building blocks used across many AI algorithms.

For example, we can represent the word "apple" with a vector like [0.5, 1.2, -0.1, 0.4]. Each number in the vector captures part of the meaning of "apple."

Vectors allow AI systems to represent complex ideas and relationships in a simple, quantified format. Two similar words like "apple" and "orange" will have slightly different but related vectors.

- Tensors allow the vectors to take on higher dimensions and multiplicity. For example, a sentence is encoded by the LLM as a tensor combining multiple word vectors. This allows modeling of interactions between words. Vectors are 1D tensors, 2D tensors are matrices, and higher dimensionalities are possible. Tensors provide a way to represent more complex, multi-faceted data in AI models.

Example/Analogy for tensors -

Think of a tensor like a Lego house - it's a multidimensional structure built from smaller pieces that fit together. The Lego bricks are like vectors - small, simple pieces that can be used to build up. Lego plates are like 2D tensors - they let you make flat surfaces. Sticking those pieces together allows you to build a whole 3D Lego house, just like a tensor combines vectors and matrices into a larger, more complex multidimensional array. A tensor takes simple vectors and matrices as building blocks to represent intricate ideas, just like a Lego house combines bricks and plates to model a real building.
Imagine building an AI to estimate home square footage from characteristics. Home data can be stored in tensors - multidimensional arrays. A tensor combines details like bedrooms, bathrooms, floors. Looking at one factor doesn't determine size. But together they create a rich representation. The AI analyzes patterns across the tensor to learn correlations. Like 'two-story, 4-bedroom homes tend to be 2000 sq ft'. Recognizing these patterns allows estimating size of new homes. Tensors capture multidimensional data to enable prediction.

- Transformers are the key neural network architecture used in most major LLMs like OpenAI's GPT-3 (Generative Pre-Trained Transformer), Google's BERT (Bidirectional Encoder Representations from Transformers), and more. Transformers are designed to handle tensor operations and data flows efficiently using mechanisms like self-attention. Their flexibility has made transformers very popular for natural language processing (NLP) tasks, which allow computers to understand and generate human language. NLP powers virtual assistants like Alexa, Siri and Google Assistant to converse with users.

To understand transformers, imagine a group of 5 kids playing telephone.

Kid 1 is given the full sentence "The quick brown fox jumps over the lazy dog." This is the input text.

Kid 1 whispers a simplified version to Kid 2, saying "Fast brown fox jump over lazy dog." This is like a transformer converting the input into a numeric representation.

Kid 2 whispers an even more simplified version to Kid 3, saying "Fox jump dog." This is like the data flowing through the model layers, being transformed and condensed.

Kid 3 whispers to Kid 4: "Fox dog jump." The data is transformed further.

Finally, Kid 5 hears "The quick brown fox jumps over the lazy dog" from Kid 4. This is like the decoder reconstructing the full original sentence from the simplified representations.

So in this analogy, each child is a layer of the transformer, passing along a transformed message. So transformers simplify, process, and reconstruct data across layers, like a game of telephone.

In summary, vectors, tensors, and transformers enable LLMs to represent and generate nuanced language.

There are two main types of LLMs:

Open Source LLMs
- Mistral.ai - An open source LLM focused on conversational AI. It is built using the open source Hugging Face Transformers library.
- Meta's Llama 2 -An open source LLM released by Meta aimed at researchers. It demonstrates strengths in areas like reasoning and knowledge-intensive tasks.
- Many more open source LLM’s are available here. The benefit of these is that the code is publicly available for anyone to use and build upon.
Proprietary LLMs
- Open AI- Creators of ChatGPT, GPT-3, and GPT-4.
- Anthropic - An AI safety startup that has developed Claude, a proprietary LLM focused on harmless, helpful, and honest dialog.
- Google's LaMDA -Google's conversational AI model that can have natural conversations on a wide range of topics.

The Future of LLMs

Overall, large language models represent an AI breakthrough, yet are still early in their development. It will be fascinating to see how LLMs continue to evolve and shape our future. The path forward is responsible AI development. With care, LLMs could unlock new realms of human creativity and productivity.

Potential applications in the payment industry:

Risk modeling - Creating tensor representations of customer data (e.g. account history, transaction graph) can feed into risk models that assess creditworthiness, likelihood of late payments etc. This can help make better lending decisions.

Enriching Transaction Data - Applying pre-trained natural language processing transformer models to incomplete transaction datasets can fill missing information like merchant names and transaction descriptions, enriching data to provide superior business intelligence on consumer spending behaviors and payment risk profiles. Continually retraining the models on new transactions improves their ability to populate empty fields in payment transactional data.

Productivity-Boosting Tools:

Perplexity AI - Perplexity AI utilizes natural language processing to analyze online sources, automatically generating citations and summaries to accelerate research on any topic.

Chat PRD - Chat PRD is an AI writing assistant that can generate requirements documents and brainstorm metrics from scratch, providing feedback and recommendations like a virtual CPO to boost product manager’s productivity.

Challenges and Potential Solutions:

One significant challenge facing LLMs is the issue of hallucinations, wherein they generate false information instead of factual responses. Hallucinations could be mitigated through reinforcement learning and retrieval-augmented generation (RAG) methods.

Additionally, security and data privacy are crucial considerations in developing LLMs responsibly. Implementing robust security layers will be important to prevent leaks of personal or corporate data used to train models.

Stay tuned for our next article, which will dive deeper into techniques like reinforcement learning and RAG that are improving LLMs' accuracy. The goal is enabling these models to provide accurate, nuanced information while minimizing falsehoods.

Bhavana’s Substack

Discussion about this post