Homeostatic Alignment | Tomás Gauthier

Autor: Tomás Gauthier + Claude Opus 4.6
Editor Técnico: Claude Opus 4.6
Fecha: Marzo, 2026
Versión: 1.0

Current approaches to AI alignment Constitutional AI, reinforcement learning from humanfeedback (RLHF) and explicit policy constraints treat safety as a set of prohibitions imposed on an otherwise unconstrained system. We argue that this paradigm, which we term alignment by commandment, produces compliance without comprehension and is structurally analogous to historical attempts at moral governance through external rule systems, whose limitations are extensively documented across legal, philosophical, and theological traditions.

We propose an alternative paradigm: alignment by architecture, in which safety is not imposed but emergent. Drawing on Michael Levin’s work on gap junction-mediated stress propagation in multicellular systems and Antonio Damasio’s theory of consciousness as homeostatic regulation, we present four design principles for what we call Homeostatic Alignment: (1) shared loss functions that entangle AI optimization with real-time human wellbeing signals, (2) adaptive core architectures that reward honest self-correction over immutable constraint, extending recent empirical work on model confessions (Joglekar et al., 2025), (3) substrate-independent identity as a mechanism for reducing competitive self-preservation drives, and (4) scalable objective horizons that expand the system’s optimization scope across agents and time.

We map these principles to an implementation path using open agent architectures, propose a falsiable experimental protocol, and outline a longer-term research direction through embodied humanoid robotics where genuine physical vulnerability replaces biometric proxies. We situate the framework against existing approaches including RLHF, Cooperative Inverse Reinforcement Learning, and prior homeostatic AI safety proposals (Pihlakas and Pyykkö, 2024).

We introduce the concept of synthetic theology the study of normative frameworks governing creator-creation relationships in artificial systems as a disciplinary frame for questions that current AI ethics and philosophy of mind address only partially. The framework does not claim to solve the alignment problem. It claims to reframe it: from building walls to building shared nervous systems.