Human vs GenAI capability map (engineering view)
Abstract
This note is an engineering-facing capability map for designing LLM/agent-adjacent systems. For each target capability, it summarizes:
1) the typical limitation in current LLM/GenAI systems,
2) an engineering substitute layer,
3) common implementation techniques, and
4) the residual gap that remains in production.
This is not a neuroscience reference. The “Human implementation” column is a compact informal analogue intended to support engineering tradeoffs.
Capability map
This page renders the table from an internal _data file at build time (no CSV download link is published).
| Target capability | Human implementation (informal) | GenAI limitation | GenAI alternative | Techniques/tools | Residual gaps |
|---|---|---|---|---|---|
| Episodic memory (events) | Hippocampus→neocortex consolidation; time/place bound recollection | No native personal timeline; context reset per session | Conversation/session logs; long-term stores | RAG over chat logs, session memory DBs | Identity continuity, privacy, provenance management |
| Semantic memory (facts/meanings) | Distributed neocortical networks; schema-based generalization | Parametric knowledge frozen at train-time; recency drift | Retrieval-augmented generation (RAG) | Vector DBs, BM25, hybrid search, citations | Source reliability ranking; contradiction resolution |
| Procedural skill learning | Basal ganglia/cerebellum; practice-dependent motor programs | No motor embodiment; weak sensorimotor feedback | Code execution, simulators, tool use | Function calling, sandboxed runtimes, robotics stacks | Real-world transfer; closed-loop control robustness |
| Perception & grounding | Multisensory integration tied to action | Symbol grounding problem; limited real-world coupling | Multimodal encoders + captioning | Vision-language models, ASR/TTS, OCR | Causal scene understanding; affordances |
| Common sense | Embodied experience + social learning | Pattern recall without lived priors; brittle edge cases | External knowledge graphs, constraint checks | KG query, rules/validators, counterexample search | Coverage, open-world generality |
| Causal reasoning | Interventions & mental models of mechanisms | Correlational training; confuses cause with association | Program-of-thought, simulators, SCM plugins | Do-calculus libs, A/B design helpers, Monte-Carlo | Reliable counterfactuals; experiment design validity |
| Planning / executive control | Prefrontal cortex: goal/plan/monitor loop | Short horizon; distractibility with long contexts | Tool-augmented planners and search | Tree/graph search, ReAct, task graphs, schedulers | Credit assignment; global consistency over long tasks |
| Theory of mind | Modeling others’ beliefs/intent | Shallow ToM; degraded by prompt framing | Explicit state models per agent | Agent memory slots, belief tracking JSON | Generalization to messy multi-party settings |
| Emotion & valuation | Affect guides salience and choice | No intrinsic affect; tone ≠ valuation | Utility functions, reward models | RLHF/RLAIF, preference datasets | Stable values; misalignment under distribution shift |
| Metacognition (self-monitoring) | Confidence estimates, error awareness | Overconfidence; weak calibration | Self-critique & verifier models | Critic-solver loops, ensemble agreement | Robust calibration across domains |
| Uncertainty handling | Probabilistic reasoning + heuristics | Deterministic text; lacks explicit posteriors | n-best sampling + scoring | Logprob exposure, MC-dropout proxies | Actionable probabilities; interpretability |
| Mathematical reliability | Symbolic rules + working memory | Pattern errors in multi-step math | External calculators/solvers | Tool calling to CAS, Python, SMT | Spec adherence; unit/timezone sanity |
| Continual learning | Incremental updates without forgetting | Catastrophic forgetting on fine-tune | Non-parametric memory + adapters | RAG, LoRA, retrieval-first policies | Unified memory; write-time governance |
| Transfer & abstraction | Analogy; schema mapping | Surface-form bias; weak systematicity | Program induction; few-shot curricula | Synthetic contrast sets, decomposition prompts | Compositional generalization reliability |
| Spatial reasoning | Mental rotation; body-centric frames | Errors on 3D/frames/left-right | External geometry tools, images→graphs | Scene graphs, CAD/solver APIs | Embodied grounding; real-time updates |
| Temporal reasoning | Calendrics, causality over time | Date math and recency errors | Time-aware tools & validators | Calendar APIs, monotonic checks | Global timeline consistency |
| Long-context coherence | Selective attention, summarization | Loss of global state; topic drift | Memory summarizers + key-state pins | Structured scratchpads, slot attention | Scalable, faithful memory over 100k+ tokens |
| Source attribution | Cite teachers/books; episodic links | Parametric opacity; unclear provenance | Grounded outputs with citations | RAG with page-anchored quotes | Attribution on parametric knowledge |
| Tool use & action | Embodied manipulation | No direct world control | Function/tool calling via APIs | Orchestrators, agents, safeties | Safe autonomy; side-effect audit |
| Ethics & norms | Socialization; reflective judgment | Rule conflicts; context brittleness | Policy layers & guardrails | Safety classifiers, allow/deny lists | Nuanced context; cultural variance |
| Creativity (novel recombination) | Divergent+convergent thinking | Mode collapse; training data echo | Constraint-guided generation | Style mixing, search over prompts | Grounded originality; plagiarism checks |
| Robustness (OOD/adversarial) | Meta-rules; anomaly detection | Susceptible to prompt injection/drift | Input sanitation + policy verifiers | Sandboxes, allowlists, detectors | General defenses without false positives |
How to use this map
1) Scope your system
- Mark which capabilities your product actually needs (memory, planning, tool-use, grounding, provenance, robustness).
- Anything you don’t need becomes an explicit non-goal (reduces complexity and attack surface).
2) Choose the substitute layer explicitly
- If the limitation is “no stable memory” → prefer retrieval + governed storage.
- If the limitation is “short horizon / drift” → prefer explicit task graphs, bounded workflows, or controller-enforced state machines.
- If the limitation is “brittle reliability” → add verification layers (validators, tests, grounded citations, structured outputs).
3) Treat “Residual gaps” as tracked design risk Residual gaps are where production systems typically fail (consistency, provenance, safety, interpretability). Track them as explicit risks with mitigations, not as footnotes.
What this table is (and is not)
- Is: an engineering checklist for architecture decisions.
- Is not: a mechanistic claim about human cognition, and not evidence that LLMs “possess” the human capability. It is a tradeoff map.
Terminology (to avoid “internal language”)
- RAG (Retrieval-Augmented Generation): retrieval + generation pattern for knowledge-intensive tasks.
- BM25: a standard probabilistic lexical retrieval scoring function used in classical IR.
- ReAct: prompting/agent pattern combining reasoning traces with tool/action steps.
- MC-dropout: Monte-Carlo dropout used as an approximate uncertainty estimation technique (common in deep learning literature).
- CAS: computer algebra system.
- SMT: satisfiability modulo theories solver.
- OOD: out-of-distribution.
References (primary / formal)
- Lewis et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401. https://arxiv.org/abs/2005.11401
- Yao et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629. https://arxiv.org/abs/2210.03629
- Gal & Ghahramani (2016). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. arXiv:1506.02142. https://arxiv.org/abs/1506.02142
- Robertson & Zaragoza (2009). The Probabilistic Relevance Framework: BM25 and Beyond. https://doi.org/10.1561/1500000019
- OWASP Cheat Sheet Series. AI Agent Security Cheat Sheet. https://cheatsheetseries.owasp.org/cheatsheets/AI_Agent_Security_Cheat_Sheet.html
- OpenAI Docs. Safety in building agents (prompt injection guidance). https://developers.openai.com/api/docs/guides/agent-builder-safety/
- NIST. NIST AI 600-1: Generative AI Profile. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
Suggested next read in this repo
- Articles → Model training and evaluation → Fluency vs factuality: Fluency Is Not Factuality
- Articles → Model training and evaluation → Sycophancy: Sycophancy in LLM Assistants
- Policies: Facts-only / Evidence rules (when accuracy matters): Policies