Search

Search IconIcon to open search

AI Whitepapers

Last updated by Simon Späti

Similar to Data Engineering Whitepapers is here the papers related to AI and AI Agents.


# Large Language Models

LLMs are the general intelligence made out of large neural network.


# Anthropic

Anthropic, the company behind Claude.

  • Constitutional AI: Harmlessness from AI Feedback
  • Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples: This research demonstrates that poisoning attacks on LLMs require a near-constant number of malicious documents regardless of dataset size—as few as 250 poisoned samples can successfully backdoor models from 600M to 13B parameters, even though larger models train on 20× more clean data. The findings reveal that attack difficulty does not scale with model size, making data poisoning increasingly practical for large models since the adversary’s requirements remain fixed while training datasets grow proportionally larger.
    • Supporting blog post
    • Small portion of content can affect the LLMs, unlike we thought. Best example is also a small Reddit threads which most models are trained on. But even more, you can add sudo

# Retrieval-Augmented Generation (RAG)

RAG is technique to provide more context to generative large language model.


# AI Tests

These are tests where models get tested.

# Context

Best way to store context.

# Futher Lists


Origin: Data Engineering Whitepapers
References:
Created 2025-12-01