Human vs GenAI capability map (engineering view)

By Tamar Peretz Published 2026-02-22

Abstract

This note is an engineering-facing capability map for designing LLM/agent-adjacent systems. For each target capability, it summarizes:

1) the typical limitation in current LLM/GenAI systems,
2) an engineering substitute layer,
3) common implementation techniques, and
4) the residual gap that remains in production.

This is not a neuroscience reference. The “Human implementation” column is a compact informal analogue intended to support engineering tradeoffs.

Capability map

This page renders the table from an internal _data file at build time (no CSV download link is published).

Human vs GenAI capability map
Target capability	Human implementation (informal)	GenAI limitation	GenAI alternative	Techniques/tools	Residual gaps
Episodic memory (events)	Hippocampus→neocortex consolidation; time/place bound recollection	No native personal timeline; context reset per session	Conversation/session logs; long-term stores	RAG over chat logs, session memory DBs	Identity continuity, privacy, provenance management
Semantic memory (facts/meanings)	Distributed neocortical networks; schema-based generalization	Parametric knowledge frozen at train-time; recency drift	Retrieval-augmented generation (RAG)	Vector DBs, BM25, hybrid search, citations	Source reliability ranking; contradiction resolution
Procedural skill learning	Basal ganglia/cerebellum; practice-dependent motor programs	No motor embodiment; weak sensorimotor feedback	Code execution, simulators, tool use	Function calling, sandboxed runtimes, robotics stacks	Real-world transfer; closed-loop control robustness
Perception & grounding	Multisensory integration tied to action	Symbol grounding problem; limited real-world coupling	Multimodal encoders + captioning	Vision-language models, ASR/TTS, OCR	Causal scene understanding; affordances
Common sense	Embodied experience + social learning	Pattern recall without lived priors; brittle edge cases	External knowledge graphs, constraint checks	KG query, rules/validators, counterexample search	Coverage, open-world generality
Causal reasoning	Interventions & mental models of mechanisms	Correlational training; confuses cause with association	Program-of-thought, simulators, SCM plugins	Do-calculus libs, A/B design helpers, Monte-Carlo	Reliable counterfactuals; experiment design validity
Planning / executive control	Prefrontal cortex: goal/plan/monitor loop	Short horizon; distractibility with long contexts	Tool-augmented planners and search	Tree/graph search, ReAct, task graphs, schedulers	Credit assignment; global consistency over long tasks
Theory of mind	Modeling others’ beliefs/intent	Shallow ToM; degraded by prompt framing	Explicit state models per agent	Agent memory slots, belief tracking JSON	Generalization to messy multi-party settings
Emotion & valuation	Affect guides salience and choice	No intrinsic affect; tone ≠ valuation	Utility functions, reward models	RLHF/RLAIF, preference datasets	Stable values; misalignment under distribution shift
Metacognition (self-monitoring)	Confidence estimates, error awareness	Overconfidence; weak calibration	Self-critique & verifier models	Critic-solver loops, ensemble agreement	Robust calibration across domains
Uncertainty handling	Probabilistic reasoning + heuristics	Deterministic text; lacks explicit posteriors	n-best sampling + scoring	Logprob exposure, MC-dropout proxies	Actionable probabilities; interpretability
Mathematical reliability	Symbolic rules + working memory	Pattern errors in multi-step math	External calculators/solvers	Tool calling to CAS, Python, SMT	Spec adherence; unit/timezone sanity
Continual learning	Incremental updates without forgetting	Catastrophic forgetting on fine-tune	Non-parametric memory + adapters	RAG, LoRA, retrieval-first policies	Unified memory; write-time governance
Transfer & abstraction	Analogy; schema mapping	Surface-form bias; weak systematicity	Program induction; few-shot curricula	Synthetic contrast sets, decomposition prompts	Compositional generalization reliability
Spatial reasoning	Mental rotation; body-centric frames	Errors on 3D/frames/left-right	External geometry tools, images→graphs	Scene graphs, CAD/solver APIs	Embodied grounding; real-time updates
Temporal reasoning	Calendrics, causality over time	Date math and recency errors	Time-aware tools & validators	Calendar APIs, monotonic checks	Global timeline consistency
Long-context coherence	Selective attention, summarization	Loss of global state; topic drift	Memory summarizers + key-state pins	Structured scratchpads, slot attention	Scalable, faithful memory over 100k+ tokens
Source attribution	Cite teachers/books; episodic links	Parametric opacity; unclear provenance	Grounded outputs with citations	RAG with page-anchored quotes	Attribution on parametric knowledge
Tool use & action	Embodied manipulation	No direct world control	Function/tool calling via APIs	Orchestrators, agents, safeties	Safe autonomy; side-effect audit
Ethics & norms	Socialization; reflective judgment	Rule conflicts; context brittleness	Policy layers & guardrails	Safety classifiers, allow/deny lists	Nuanced context; cultural variance
Creativity (novel recombination)	Divergent+convergent thinking	Mode collapse; training data echo	Constraint-guided generation	Style mixing, search over prompts	Grounded originality; plagiarism checks
Robustness (OOD/adversarial)	Meta-rules; anomaly detection	Susceptible to prompt injection/drift	Input sanitation + policy verifiers	Sandboxes, allowlists, detectors	General defenses without false positives

How to use this map

1) Scope your system

Mark which capabilities your product actually needs (memory, planning, tool-use, grounding, provenance, robustness).
Anything you don’t need becomes an explicit non-goal (reduces complexity and attack surface).

2) Choose the substitute layer explicitly

If the limitation is “no stable memory” → prefer retrieval + governed storage.
If the limitation is “short horizon / drift” → prefer explicit task graphs, bounded workflows, or controller-enforced state machines.
If the limitation is “brittle reliability” → add verification layers (validators, tests, grounded citations, structured outputs).

3) Treat “Residual gaps” as tracked design risk Residual gaps are where production systems typically fail (consistency, provenance, safety, interpretability). Track them as explicit risks with mitigations, not as footnotes.

What this table is (and is not)

Is: an engineering checklist for architecture decisions.
Is not: a mechanistic claim about human cognition, and not evidence that LLMs “possess” the human capability. It is a tradeoff map.

Terminology (to avoid “internal language”)

RAG (Retrieval-Augmented Generation): retrieval + generation pattern for knowledge-intensive tasks.
BM25: a standard probabilistic lexical retrieval scoring function used in classical IR.
ReAct: prompting/agent pattern combining reasoning traces with tool/action steps.
MC-dropout: Monte-Carlo dropout used as an approximate uncertainty estimation technique (common in deep learning literature).
CAS: computer algebra system.
SMT: satisfiability modulo theories solver.
OOD: out-of-distribution.

References (primary / formal)

Lewis et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401. https://arxiv.org/abs/2005.11401
Yao et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629. https://arxiv.org/abs/2210.03629
Gal & Ghahramani (2016). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. arXiv:1506.02142. https://arxiv.org/abs/1506.02142
Robertson & Zaragoza (2009). The Probabilistic Relevance Framework: BM25 and Beyond. https://doi.org/10.1561/1500000019
OWASP Cheat Sheet Series. AI Agent Security Cheat Sheet. https://cheatsheetseries.owasp.org/cheatsheets/AI_Agent_Security_Cheat_Sheet.html
OpenAI Docs. Safety in building agents (prompt injection guidance). https://developers.openai.com/api/docs/guides/agent-builder-safety/
NIST. NIST AI 600-1: Generative AI Profile. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf