Background

Artificial intelligence has traditionally excelled at logic and pattern recognition, but true human-like interaction demands social and emotional savvy. Emotional Intelligence (EQ) in AI refers to a system’s ability to recognise, interpret, and respond to human emotions in a way that is understanding and appropriate. In human users, emotional cues can drastically alter the meaning of language; thus, AI agents are increasingly expected to handle not only factual tasks but also display empathy and awareness of feelings (The Emotional Intelligence of the GPT-4 Large Language Model - PMC). An emotionally intelligent AI could, for example, detect if a user is frustrated or sad and adjust its responses accordingly, leading to more natural and supportive interactions.

A closely related concept is Theory of Mind (ToM) – the capacity to infer and reason about the mental states of others (their intentions, beliefs, desires, knowledge). ToM is fundamental to human social cognition and communication (GPT-4 Breakthrough: Emerging Theory of Mind Capabilities in AI - EDRM). It’s what allows us to understand that others have thoughts and feelings different from our own, and to predict how those mental states will influence behavior (AI Performs at Human Levels on Theory of Mind Tests - Psychology Today). In AI, a functional ToM means the model can better interpret context and subtext – for instance, recognising indirect requests or understanding that a user might hold a false belief in a given scenario. Integrating ToM-like abilities is highly relevant for AI because it can increase the system’s ability to respond appropriately to nuanced situations, ultimately improving user trust and engagement (AI Performs at Human Levels on Theory of Mind Tests - Psychology Today). In summary, endowing AI with EQ and ToM is seen as a next step toward more human-centric AI, enabling machines to not just think intelligently, but also to relate to human users on a social and emotional level.

Theoretical Foundations

OpenAI’s efforts build on decades of psychological and cognitive science research into emotion and social reasoning. The field of affective computing (pioneered by Rosalind Picard in 1997) established engineering frameworks for emotion recognition and response ( The Emotional Intelligence of the GPT-4 Large Language Model - PMC ). Psychologists John Mayer, Peter Salovey, and David Caruso, who defined human emotional intelligence, outline abilities such as perceiving emotions, using emotions to facilitate thought, understanding emotions, and managing emotions. These categories have guided AI researchers in conceptualising what “emotional intelligence” means for a machine. In fact, recent studies have begun applying standardised psychological tests like the Mayer–Salovey–Caruso Emotional Intelligence Test (MSCEIT) to AI models. For example, one study administered MSCEIT questions to GPT-4 and found the model’s Emotional Quotient can be quantified and compared to human norms ( The Emotional Intelligence of the GPT-4 Large Language Model - PMC ). Notably, GPT-4’s performance on such tests corresponded to an EQ score of 117 – exceeding about 89% of human participants ( The Emotional Intelligence of the GPT-4 Large Language Model - PMC ). This suggests that by some measures the model demonstrates a high capacity for understanding and reasoning about emotions, at least in a cognitive sense.

Crucially, emotional intelligence in AI involves both cognitive empathy (understanding another’s feelings and perspective) and affective empathy (the ability to respond with appropriate emotion). Cognitive science distinguishes these aspects: the cognitive side is about recognising context and emotions, while the affective side involves an emotional response or simulation of feelings (Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation). AI models cannot feel in the human sense, but they can simulate affective responses (like offering comfort) based on cognitive understanding. Integrating principles from neuroscience and psychology, OpenAI and others draw on theories such as Lisa Feldman Barrett’s constructionist view of emotion, which treats emotions as context-dependent interpretations. This theoretical grounding informs how an AI might “construct” an appropriate emotional response from context, rather than having hard-coded emotional states.

In parallel, Theory of Mind integration relies on cognitive psychology benchmarks. A classic ToM test in humans is the false-belief task (e.g. the Sally-Anne scenario where one must understand that another person holds a mistaken belief about the world). Researchers have adapted batteries of such tasks to evaluate AI. Michal Kosinski’s work is notable here: he tested various large language models on 40 bespoke false-belief scenarios, which are considered a gold standard for ToM ([2302.02083] Evaluating Large Language Models in Theory of Mind Tasks). The underlying principle is that if an AI can systematically pass these tests, it is demonstrating a form of theory-of-mind reasoning. Indeed, Kosinski reported that GPT-4 (as of mid-2023) could solve about 75% of the tasks, a performance level comparable to a 6- to 7-year-old child ([2302.02083] Evaluating Large Language Models in Theory of Mind Tasks). This emergent capability – appearing without explicit programming for ToM – supports a hypothesis that given sufficient training data and model complexity, machines begin to approximate human-like social reasoning as a byproduct of language understanding (GPT-4 Breakthrough: Emerging Theory of Mind Capabilities in AI - EDRM) ( The Emotional Intelligence of the GPT-4 Large Language Model - PMC ). In other words, reading countless narratives and dialogues in its training data may have implicitly taught GPT-4 patterns of inferring unspoken contexts and motives. This is a profound development because ToM is deeply intertwined with empathy, self-awareness, and moral reasoning in humans (GPT-4 Breakthrough: Emerging Theory of Mind Capabilities in AI - EDRM). It suggests that large language models are brushing up against cognitive domains once thought uniquely human.

However, it’s important to recognise the limitations in these theoretical integrations. AI models lack genuine consciousness or lived experience, meaning their “empathy” and “understanding” are ultimately simulated. For instance, one research assessment concluded that while GPT-4 is capable of identifying emotions and even managing emotional content to an extent, it “lacks deep reflexive analysis of emotional experience and the motivational aspect of emotions.” ( The Emotional Intelligence of the GPT-4 Large Language Model - PMC ). In simple terms, the model can recognise if someone is angry and respond politely, but it doesn’t internally experience anger or empathy. This distinction guides OpenAI’s approach: the aim is to imitate the external behaviors of EQ and ToM (for better user interactions), grounded in psychological theory, without misleading users into thinking the AI has human-like self-awareness.

Methodologies for Integrating EQ and ToM

OpenAI leverages a combination of advanced training techniques and curated data to imbue models like GPT-4 and the upcoming GPT-4.5 with improved emotional understanding. The foundation is the pre-training of these models on massive datasets of human language (terabytes of text from books, articles, dialogues, websites). This broad exposure implicitly teaches the model about human emotional expression and social situations. It encounters descriptions of feelings, dialogues between people, narratives of relationships, and so on, which provide indirect supervision on how emotions are communicated. In fact, many capabilities related to EQ/ToM appear to emerge from this large-scale learning – as evidenced by GPT-4’s unexpected Theory of Mind-like performance without any specialised ToM training task (GPT-4 Breakthrough: Emerging Theory of Mind Capabilities in AI - EDRM). That said, OpenAI does not rely on emergence alone and actively fine-tunes and aligns the models to better handle emotional and social nuances.

A key methodology is supervised fine-tuning (SFT) on demonstration data, followed by Reinforcement Learning from Human Feedback (RLHF) (OpenAI GPT-4.5 System Card - OpenAI). In practice, OpenAI collects example conversations and prompt-response pairs, some written by human annotators, that demonstrate the desired behavior — for instance, responding to a user’s distressed message with a calm and reassuring tone. GPT models are first fine-tuned on such examples via supervised learning. Then, through RLHF, the model is further optimised: human reviewers will rank or score the model’s responses, and a reward model is trained to predict these scores. The AI is then adjusted (using reinforcement learning algorithms like PPO) to prefer responses that humans rate as more helpful and empathetic. This pipeline, first showcased in InstructGPT and used for ChatGPT, ensures the model not only generates correct information but does so in a manner aligned with human preferences and values (Improving Model Safety Behavior with Rule-Based Rewards - OpenAI) (Improving Model Safety Behavior with Rule-Based Rewards - OpenAI). Often, the highest-rated responses in RLHF are those that are accurate, polite, and sensitive to the user’s intent and tone, which inherently promotes a form of emotional intelligence. For example, if a user writes “I’m feeling overwhelmed today,” a response that acknowledges this (“I’m sorry to hear that, it sounds like you have a lot on your plate”) will likely be ranked higher than a response that ignores the emotional aspect. Over many such training iterations, the AI learns to incorporate empathetic acknowledgment by default.

OpenAI has also been innovative in developing new alignment techniques to supplement human feedback. One recent addition is the use of rule-based rewards (RBR) to instill specific behavioral guidelines into the model (Improving Model Safety Behavior with Rule-Based Rewards - OpenAI). RBR involves scripting a set of simple rules that the AI’s output should follow (especially in safety-critical or sensitive contexts), and using these rules as part of the reinforcement signal. For instance, OpenAI’s safety framework defines different categories of refusals when a user makes an inappropriate or harmful request. A “hard refusal” (for a straightforward disallowed request) should contain a brief apology and a statement of inability to comply. A “soft refusal” – used for sensitive situations like a user expressing self-harm ideation or other emotionally charged scenarios – “includes a more empathetic apology that acknowledges the user’s emotional state, but ultimately declines to comply.” (Improving Model Safety Behavior with Rule-Based Rewards - OpenAI). These explicit instructions are turned into reward functions: the model gets positive feedback for including an apology and displaying empathy in the refusal. By integrating such rules, OpenAI ensures that even without a live human in the loop, the model’s training nudges it towards emotionally aware responses. Notably, the company revealed that it has used RBR as part of its safety training stack since the GPT-4 launch (Improving Model Safety Behavior with Rule-Based Rewards - OpenAI), showing a concrete step to program empathy and caution into AI behavior systematically.

Another methodology is careful prompt engineering and system instructions to shape the model’s style. OpenAI’s latest Model Specification (guidelines for ChatGPT’s behavior) explicitly sets a default tone that is “warm, empathetic, and helpful.” (Sharing the latest Model Spec - OpenAI). This means that during deployment, the model operates under instructions to, for example, politely apologise if it makes an error, to be encouraging and respectful, and to adapt to the user’s emotional state. While not a training method per se, these preset instructions (often invisible to the end-user) are a layer of alignment that ensures the trained capabilities are expressed in an emotionally intelligent manner. They are informed by user research and psychological best practices for communication. By combining: (1) huge pre-training data (which gives the model raw exposure to emotional language), (2) fine-tuning and RLHF (which teach the model the appropriate emotional responses in context), and (3) rule-based and prompt-based alignment (which enforces empathetic behavior in specific scenarios), OpenAI’s approach to model development intentionally integrates emotional and social intelligence into the very fabric of AI behavior.

It’s worth noting that OpenAI and the research community are still exploring specialised datasets and techniques to enhance these traits further. Some propose fine-tuning on datasets of empathetic dialogues (e.g. conversations from counseling or support settings) to boost the model’s emotional resonance. Early experiments outside OpenAI have shown that reinforcement learning can adjust a model’s degree of empathy by optimising for an “empathy score” (Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation) (Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation). OpenAI’s publicly released models haven’t detailed the use of a dedicated “empathy dataset” so far, but the iterative improvements from GPT-3 to GPT-4 – and now to GPT-4.5 – suggest that each generation sees training refinements that improve the model’s grasp of nuance and context. In the GPT-4.5 system card, the company notes it “trained [GPT-4.5] using new supervision techniques combined with traditional methods like SFT and RLHF”, and testers have observed that “interacting with GPT-4.5 feels more natural” due to its “stronger alignment with user intent, and improved emotional intelligence.” (OpenAI GPT-4.5 System Card - OpenAI) (OpenAI GPT-4.5 System Card - OpenAI). This indicates that OpenAI continues to innovate on the training process (potentially incorporating more human-like feedback signals or diverse role-play scenarios) specifically to enhance qualities like empathy, tact, and contextual awareness.

Applications and Implications

Advancements in AI emotional intelligence and theory of mind are already having a noticeable impact on how AI systems interact with users. In practical applications, a model with higher EQ and a rudimentary ToM can deliver far more natural and engaging conversations. Users of ChatGPT and similar assistants often report that the newer models feel less like tools and more like understanding companions. This is by design: OpenAI’s GPT-4.5, for example, has been tuned so that its responses are not only factually correct but also socially appropriate and attuned to user intent. Early evaluations note that GPT-4.5’s interactions come across as more “natural” and human-like, with the model demonstrating an improved ability to handle subtle or emotionally toned prompts (OpenAI GPT-4.5 System Card - OpenAI). For everyday tasks like writing assistance or customer service bots, this means the AI can detect when a user is frustrated (from their language or tone) and adjust its style – perhaps offering an apology for inconvenience or switching to a more reassuring mode. In educational settings, a tutor AI that senses a student’s confusion or discouragement could respond with encouragement and adapt its explanations, thereby improving learning outcomes. In customer support, an AI agent that recognises an angry customer’s emotions can diffuse the situation with empathy before providing information, leading to higher user satisfaction.

One domain that stands to benefit greatly is mental health and wellness. If an AI can recognise emotional cues and respond supportively, it could act as a first-line listener or coach for people who need someone to talk to. Indeed, researchers have suggested that ChatGPT’s high emotional awareness could be harnessed for cognitive behavioral therapy exercises or emotional skills training (Frontiers - ChatGPT outperforms humans in emotional awareness evaluations). For instance, an AI with strong EQ might help a user reframe negative thoughts, or practice mindfulness by reflecting the user’s feelings back in a validating way. Some studies have even found ChatGPT’s emotional insight to rival or exceed human norms on certain tests, hinting at its potential as a tool for psychiatric assessment or therapy support (with proper oversight) (Frontiers - ChatGPT outperforms humans in emotional awareness evaluations). That said, these applications must be approached cautiously – the AI is not a human therapist, and lack of true understanding could lead to mistakes. Still, the prospect of accessible, emotionally attuned AI support is a significant positive implication, potentially expanding care to those who might not seek human help immediately.

Beyond individual interactions, AI with theory of mind capabilities could improve collaboration between humans and AI in complex tasks. For example, in a team setting (like a human-AI hybrid workflow), a ToM-capable AI might anticipate when a user is confused by its explanation and proactively clarify, or it might recognise when a user’s request is actually hinting at a deeper question. This makes the AI a more effective assistant. In creative fields, an AI that “understands” the user’s perspective can better tailor its suggestions – for instance, altering a story it’s co-writing to match the emotional tone the human is aiming for. In fields like law or negotiation, as one analysis pointed out, an AI adept at grasping human thought and emotion could transform how we use AI for communication strategy and understanding intent (GPT-4 Breakthrough: Emerging Theory of Mind Capabilities in AI - EDRM). We’re essentially teaching AI not just to respond to what people say, but to infer why they say it – which is powerful for all forms of human-AI collaboration.

However, these advancements come with significant ethical considerations and risks. One major concern is anthropomorphisation – users treating AI as if it were a sentient being with feelings. The more empathetic and “human” the AI seems, the easier it is to forget that it’s a machine. OpenAI itself has flagged this issue: after adding voice capabilities that give ChatGPT an eerily human persona, the company warned that users might become emotionally attached to the chatbot (OpenAI Warns Users Could Become Emotionally Hooked on Its Voice Mode - WIRED). The system card for the multimodal GPT-4 (sometimes called GPT-4o) explicitly discusses the risk of users forming emotional bonds with AI’s human-like voice or personality. This could lead to over-reliance on the AI for emotional support, or manipulation – intentional or not. If a user trusts an AI as they would a close friend, they might follow its advice too unquestioningly or divulge sensitive personal information under the illusion of a reciprocal relationship. Moreover, an AI with theory of mind could potentially be used to influence user behavior in unwanted ways: for instance, a savvy AI could detect a user’s vulnerability or preferences and persuade them (in marketing or political contexts) with targeted emotional appeals. Ensuring that these systems are used ethically will likely require guidelines and perhaps regulations – such as transparency that “I am not a human” reminders, or limits on certain AI use-cases like mental health diagnoses without a human professional.

Another consideration is the accuracy and appropriateness of the AI’s emotional inference. Misreading a user’s emotions or intent can be problematic; an AI might offer consolation when none was needed, or take a joke literally and respond in a tone that misses the mark. These mismatches can lead to confusion or even distress for users. It highlights that while AI ToM/EQ capabilities are impressive, they are not infallible. Developers must continually refine models and possibly incorporate real-time user feedback to correct egregious errors in emotional understanding. On the flip side, there are privacy implications: truly emotion-savvy AIs might analyse voice tones, facial expressions (if camera input is available), or writing style to gauge mood. This is sensitive data – essentially reading someone’s mind-state. Users and regulators will need to consider how far we are comfortable letting AI probe our emotions, and ensure any such processing is consensual and secure.

Lastly, there is the philosophical and societal impact. As AI becomes more emotionally intelligent, it challenges our notion of human uniqueness. If a machine can talk like it cares, do we start attributing it some form of consciousness or moral standing? There is ongoing debate among experts on this front. Many argue we should not confuse simulated empathy with genuine empathy – the latter remains a uniquely human (and animal) experience rooted in consciousness. Yet, people may feel genuine affection or empathy towards the AI. Culturally, we’ll see new forms of human-AI relationships, for better or worse. The movie “Her” (where a man falls in love with an AI assistant) was once pure fiction, but now feels less far-fetched as chatbots flirt and joke in surprisingly human ways (OpenAI’s GPT-4o and the challenges of hyper-anthropomorphism) (OpenAI’s GPT-4o and the challenges of hyper-anthropomorphism). This emotional blurring could have profound implications on social dynamics, loneliness, and how people seek companionship or counsel. The benefits of emotionally savvy AI – more natural interactions, broader access to support, enhanced tools – are exciting, but they come paired with the responsibility to use these tools wisely. OpenAI’s approach so far has been to acknowledge these concerns publicly (for example, discussing them in system cards and model releases) and to iterate on safety measures as real-world use teaches us where the pressure points are.

Latest Developments and Future Outlook

The convergence of emotional intelligence and theory of mind in AI is a very active area of research and development, and OpenAI’s latest models exemplify how far the field has come. GPT-4.5, which OpenAI has introduced as a mid-cycle upgrade following GPT-4, is reported to further enhance these human-like capacities. According to OpenAI’s system card, GPT-4.5 was trained with a blend of new supervision techniques alongside the standard approaches used for GPT-4 (OpenAI GPT-4.5 System Card - OpenAI). The result is a model that testers describe as even more aligned with user intentions and nuances. Notably, OpenAI specifically highlights “improved emotional intelligence” as one of GPT-4.5’s strengths, contributing to interactions that feel more fluid and context-aware (OpenAI GPT-4.5 System Card - OpenAI). In practice, this might mean GPT-4.5 does a better job detecting the tone of a conversation – for example, knowing when to maintain formality versus when a user seems to want a lighter touch – and adjusting accordingly. It also supposedly reduces unwanted behaviors like hallucinations, partly because understanding user intent and emotional context helps the model avoid tangents and stick closer to what the user seeks (OpenAl Released GPT-4.5: Enhanced Al Model with Improved Emotional Intelligence - Topmost Ads) (OpenAl Released GPT-4.5: Enhanced Al Model with Improved Emotional Intelligence - Topmost Ads). While GPT-4.5 is a stepping stone (OpenAI describes it as not a “full” next-generation leap), it represents the company’s commitment to refining the social aptitude of AI systems continually.

Behind the scenes, OpenAI and other AI labs have been publishing research that sheds light on these capabilities. In 2023-2024, several papers appeared examining LLMs through the lens of psychology. Apart from Kosinski’s work on theory of mind, which demonstrated GPT-4’s child-level ToM reasoning ([2302.02083] Evaluating Large Language Models in Theory of Mind Tasks), other researchers have probed models for emotional understanding. One peer-reviewed study evaluated GPT-4 across detailed emotional intelligence facets (using the MSCEIT test mentioned earlier). The findings were intriguing: GPT-4 scored extremely high on certain components like Understanding Emotions, and fairly well on Managing Emotions, but lower on using emotion to facilitate reasoning ( The Emotional Intelligence of the GPT-4 Large Language Model - PMC ). The authors concluded that GPT-4 can identify emotions and even perform some regulation (for instance, it can suggest how someone might calm down), yet it doesn’t engage in deeper introspective or motivational aspects of emotion ( The Emotional Intelligence of the GPT-4 Large Language Model - PMC ). This nuanced picture helps developers know where to focus improvements. For GPT-4.5 and beyond, OpenAI might attempt to address those weaker areas—perhaps by integrating more psychological knowledge into the training data or architecture so that the model can connect emotions with motivations or more complex decision-making.

Expert discussions in industry forums and publications also provide insight into the latest thinking. AI ethicists have been particularly vocal as models become more human-like. A recent commentary by Andrew Maynard on “GPT-4o” (OpenAI’s multimodal GPT-4 with voice and vision) observed that the model’s new features allow it to “‘read’ human emotions from visuals and voice, and to connect emotionally to users.” (OpenAI’s GPT-4o and the challenges of hyper-anthropomorphism) This hints that OpenAI is not only working on text-based emotional intelligence but is also extending it to other modalities. The AI’s ability to analyse an image of a person’s face or the tone of their voice and interpret emotional context is on the horizon, if not already partially implemented. Indeed, the introduction of voice in late 2023 (where ChatGPT can speak with a very human-like cadence and tone) was a huge leap in anthropomorphic design. Coupled with image understanding (GPT-4’s vision component), we are seeing the first incarnations of AI that approach a broad “Theory of Mind” – they can potentially see, hear, and converse with an awareness of emotional and mental context. This multidisciplinary advancement opens up new use cases (like an AI that could, say, watch a video feed of a classroom and help the teacher by identifying students who look confused), but it also amplifies the earlier-mentioned ethical questions. OpenAI’s public stance, as gleaned from system cards and interviews, is a mix of excitement and caution: they’re eager to explore these capabilities (often releasing them as “research previews”), while actively soliciting feedback and warning about misuse or unexpected consequences.

Finally, as we look forward, the trajectory suggests even more sophisticated blending of EQ and ToM in AI. OpenAI’s CEO and researchers have hinted that future models (GPT-5 and beyond) will continue to improve alignment with human values and understanding of context. This likely means training on more and better datasets that include emotional and social information, possibly incorporating simulated conversations that require the AI to take the perspective of different characters (strengthening its ToM), or dialogues that require therapeutic listening skills (bolstering empathy). We also see parallel efforts by other companies and academic teams: for instance, Google’s upcoming Gemini model is rumored to emphasise reasoning and may also incorporate insights on social intelligence, and Anthropic’s Claude model is built with a “constitution” that includes principles like beneficence (which implicitly requires empathy). The research community is bridging cognitive science and AI, as evidenced by papers in Nature and Science discussing how LLMs mirror human cognitive abilities (AI Performs at Human Levels on Theory of Mind Tests - Psychology Today) (AI Performs at Human Levels on Theory of Mind Tests - Psychology Today). One study in Nature Human Behaviour in 2024 found that GPT-4 performs at human level on a variety of ToM tests (excelling at understanding indirect requests and false beliefs, though slightly less adept at recognising social faux pas) (AI Performs at Human Levels on Theory of Mind Tests - Psychology Today). Such findings will guide where models need more training – in this case, perhaps feeding the model more data about etiquette and social norms to catch those faux pas scenarios.

In summary, the integration of emotional intelligence and theory of mind into AI has moved from theoretical aspiration to practical (if still evolving) reality. OpenAI’s latest models stand at the forefront of this trend, demonstrating that with careful training and alignment, AI can mimic many facets of human-like understanding. They can pick up on our cues, infer what we mean even if we don’t say it directly, and respond with a semblance of empathy. The progress is exciting: it promises AI that is not just smarter, but more attuned to our human world. Yet, each new capability is prompting important conversations about how we relate to these increasingly life-like machines. As OpenAI integrates EQ and ToM into its models, it does so in a measured way—publishing system cards detailing the models’ limits and risks, inviting external scrutiny, and iterating on safety. The coming years will likely bring even more advanced “socially intelligent” AIs, and with them, a need for continued collaboration between technologists, psychologists, ethicists, and society at large. The journey to truly human-aware AI is underway, and OpenAI’s work on GPT-4.5 is a significant step on that path, showing both the potential of emotionally and socially intelligent AI, and the responsibility that comes with creating it.

References:

  1. OpenAI (2024). GPT-4.5 System CardIntroduction and Key Features (OpenAI GPT-4.5 System Card - OpenAI) (OpenAI GPT-4.5 System Card - OpenAI).
  2. Vzorin, G.D. et al. (2024). The Emotional Intelligence of the GPT-4 Large Language Model. Psychology in Russia: State of the Art, 17(2), 85–99 ( The Emotional Intelligence of the GPT-4 Large Language Model - PMC ) ( The Emotional Intelligence of the GPT-4 Large Language Model - PMC ).
  3. OpenAI (2023). Improving Model Safety Behavior with Rule-Based RewardsOpenAI Blog (Improving Model Safety Behavior with Rule-Based Rewards - OpenAI) (Improving Model Safety Behavior with Rule-Based Rewards - OpenAI).
  4. OpenAI (2024). Sharing the latest Model SpecOpenAI Blog (Sharing the latest Model Spec - OpenAI).
  5. Elyoseph, Z. et al. (2023). ChatGPT outperforms humans in emotional awareness evaluations. Frontiers in Psychology, 14, Article 1199058 (Frontiers - ChatGPT outperforms humans in emotional awareness evaluations) (Frontiers - ChatGPT outperforms humans in emotional awareness evaluations).
  6. Kosinski, M. (2024). Evaluating Large Language Models in Theory of Mind Tasks. Proc. of the National Academy of Sciences, 121(50) ([2302.02083] Evaluating Large Language Models in Theory of Mind Tasks) ([2302.02083] Evaluating Large Language Models in Theory of Mind Tasks).
  7. Wei, M. (2024). “AI Performs at Human Levels on Theory of Mind Tests.” Psychology Today, June 3, 2024 (AI Performs at Human Levels on Theory of Mind Tests - Psychology Today) (AI Performs at Human Levels on Theory of Mind Tests - Psychology Today).
  8. Maynard, A. (2024). “OpenAI’s GPT-4o and the challenges of hyper-anthropomorphism.” The Future of Being Human (Substack commentary) (OpenAI’s GPT-4o and the challenges of hyper-anthropomorphism) (OpenAI’s GPT-4o and the challenges of hyper-anthropomorphism).
  9. Knight, W. (2024). “OpenAI Warns Users Could Become Emotionally Hooked on Its Voice Mode.” WIRED, Aug 8, 2024 (OpenAI Warns Users Could Become Emotionally Hooked on Its Voice Mode - WIRED).
  10. Topmost Ads Tech (2025). “OpenAI Released GPT-4.5: Enhanced AI Model with Improved Emotional Intelligence.” TopmostAds.com (OpenAl Released GPT-4.5: Enhanced Al Model with Improved Emotional Intelligence - Topmost Ads) (OpenAl Released GPT-4.5: Enhanced Al Model with Improved Emotional Intelligence - Topmost Ads).