Human vs GenAI capability map (engineering view)

By Published

Abstract

This note is an engineering-facing capability map for designing LLM/agent-adjacent systems. For each target capability, it summarizes:

1) the typical limitation in current LLM/GenAI systems,
2) an engineering substitute layer,
3) common implementation techniques, and
4) the residual gap that remains in production.

This is not a neuroscience reference. The “Human implementation” column is a compact informal analogue intended to support engineering tradeoffs.

Capability map

This page renders the table from an internal _data file at build time (no CSV download link is published).

Human vs GenAI capability map
Target capability Human implementation (informal) GenAI limitation GenAI alternative Techniques/tools Residual gaps
Episodic memory (events) Hippocampus→neocortex consolidation; time/place bound recollection No native personal timeline; context reset per session Conversation/session logs; long-term stores RAG over chat logs, session memory DBs Identity continuity, privacy, provenance management
Semantic memory (facts/meanings) Distributed neocortical networks; schema-based generalization Parametric knowledge frozen at train-time; recency drift Retrieval-augmented generation (RAG) Vector DBs, BM25, hybrid search, citations Source reliability ranking; contradiction resolution
Procedural skill learning Basal ganglia/cerebellum; practice-dependent motor programs No motor embodiment; weak sensorimotor feedback Code execution, simulators, tool use Function calling, sandboxed runtimes, robotics stacks Real-world transfer; closed-loop control robustness
Perception & grounding Multisensory integration tied to action Symbol grounding problem; limited real-world coupling Multimodal encoders + captioning Vision-language models, ASR/TTS, OCR Causal scene understanding; affordances
Common sense Embodied experience + social learning Pattern recall without lived priors; brittle edge cases External knowledge graphs, constraint checks KG query, rules/validators, counterexample search Coverage, open-world generality
Causal reasoning Interventions & mental models of mechanisms Correlational training; confuses cause with association Program-of-thought, simulators, SCM plugins Do-calculus libs, A/B design helpers, Monte-Carlo Reliable counterfactuals; experiment design validity
Planning / executive control Prefrontal cortex: goal/plan/monitor loop Short horizon; distractibility with long contexts Tool-augmented planners and search Tree/graph search, ReAct, task graphs, schedulers Credit assignment; global consistency over long tasks
Theory of mind Modeling others’ beliefs/intent Shallow ToM; degraded by prompt framing Explicit state models per agent Agent memory slots, belief tracking JSON Generalization to messy multi-party settings
Emotion & valuation Affect guides salience and choice No intrinsic affect; tone ≠ valuation Utility functions, reward models RLHF/RLAIF, preference datasets Stable values; misalignment under distribution shift
Metacognition (self-monitoring) Confidence estimates, error awareness Overconfidence; weak calibration Self-critique & verifier models Critic-solver loops, ensemble agreement Robust calibration across domains
Uncertainty handling Probabilistic reasoning + heuristics Deterministic text; lacks explicit posteriors n-best sampling + scoring Logprob exposure, MC-dropout proxies Actionable probabilities; interpretability
Mathematical reliability Symbolic rules + working memory Pattern errors in multi-step math External calculators/solvers Tool calling to CAS, Python, SMT Spec adherence; unit/timezone sanity
Continual learning Incremental updates without forgetting Catastrophic forgetting on fine-tune Non-parametric memory + adapters RAG, LoRA, retrieval-first policies Unified memory; write-time governance
Transfer & abstraction Analogy; schema mapping Surface-form bias; weak systematicity Program induction; few-shot curricula Synthetic contrast sets, decomposition prompts Compositional generalization reliability
Spatial reasoning Mental rotation; body-centric frames Errors on 3D/frames/left-right External geometry tools, images→graphs Scene graphs, CAD/solver APIs Embodied grounding; real-time updates
Temporal reasoning Calendrics, causality over time Date math and recency errors Time-aware tools & validators Calendar APIs, monotonic checks Global timeline consistency
Long-context coherence Selective attention, summarization Loss of global state; topic drift Memory summarizers + key-state pins Structured scratchpads, slot attention Scalable, faithful memory over 100k+ tokens
Source attribution Cite teachers/books; episodic links Parametric opacity; unclear provenance Grounded outputs with citations RAG with page-anchored quotes Attribution on parametric knowledge
Tool use & action Embodied manipulation No direct world control Function/tool calling via APIs Orchestrators, agents, safeties Safe autonomy; side-effect audit
Ethics & norms Socialization; reflective judgment Rule conflicts; context brittleness Policy layers & guardrails Safety classifiers, allow/deny lists Nuanced context; cultural variance
Creativity (novel recombination) Divergent+convergent thinking Mode collapse; training data echo Constraint-guided generation Style mixing, search over prompts Grounded originality; plagiarism checks
Robustness (OOD/adversarial) Meta-rules; anomaly detection Susceptible to prompt injection/drift Input sanitation + policy verifiers Sandboxes, allowlists, detectors General defenses without false positives

How to use this map

1) Scope your system

2) Choose the substitute layer explicitly

3) Treat “Residual gaps” as tracked design risk Residual gaps are where production systems typically fail (consistency, provenance, safety, interpretability). Track them as explicit risks with mitigations, not as footnotes.

What this table is (and is not)

Terminology (to avoid “internal language”)

References (primary / formal)

Suggested next read in this repo