Eric W. Cooper’s Paper Proposal

Evaluation of an Online Non-task-oriented Dialogue System based on Objective Personality Ratings

Evaluations of conversational agents or dialogue systems have focused on specific tasks requiring cognitive abilities, or the general task of being indistinguishable from human dialogue partners. Non-task-oriented conversational agents are not intended to convey specific information but to entertain and to occupy the attention of human conversation partners. Methods of evaluating such systems have often been variations of Turing tests. Yet, outside of these contests, people who converse with the dialogue systems often enjoy their time conversing just as much when they are informed that the system is a machine. While contests for an illusion of humanness are useful as a benchmark development, that illusion is not essential for the objectives of many dialogue systems.

This research examined methods of conveying personality in a dialogue system. The most obvious way of communicating a sense of personality is through the explicit meanings of words and sentences but personality is also conveyed through more subtle changes. Embodied agents may communicate moods and personalities by posture, gesture, and complexion. Vocal agents may do so through tone, speed, intonation, etc. The Japanese language dialogue system uses knowledge from web sources, such as microblogging sites and online open encyclopedias, to generate interesting responses. The goal of the project presented was to test whether slight changes in the probability of each type of pattern used would affect how human users would interpret the personality of the dialogue system, when they are informed beforehand that the dialogue system is a computer program.

The conversation system developed first parses the user input using morphological analysis tools. Support Vector Machines (SVM) are used to classify user input according to simplified Dialog Act Markup in Several Layers (DAMSL). After analysis and classification, the system generates responses based on hierarchical keyword meanings developed from the online open encyclopedia Wikipedia and vernacular sentence formations from microblogging site Twitter. The system determines how each response generated would affect the pattern of conversation, again using DAMSL tags.

Dialogues were prepared beforehand using three different settings for frequencies of the DAMSL classifications: Statement, Question, Tag-Question, Response, and Admiration. Human subjects read the dialogs generated and evaluated the personality of the dialogue agent according to adjectives meaning (translated from Japanese): Obstinate, Cold, Patient, Gloomy, Capricious, Direct (or straightforward and honest), Gentle, Frail, Bright, and Calm.

The results of these evaluations show that slight alterations of frequencies of conversation patterns led to significant differences in how human observers interpreted the personalities of the conversational agents. These results demonstrate the feasibility of subtly altering the personality of non-task-oriented conversational agents and therefore changing the experience and, perhaps, emotional responses of their human conversational partners. This research also gives some concepts for evaluation of non-task-oriented conversation agents when the human users are aware that their counterparts are computer programs. As such, we think these methods explore alternatives to games such as Turing tests for evaluation of the intelligence and wit of a conversational agent.

Cybernetic traditions:

  • 1) Computer science; AI; robotics
  • 2) Control systems; automation; systems engineering

1 thought on “Eric W. Cooper’s Paper Proposal

  1. Faisal

    Very interesting, I have many questions and hope to have a conversation with you in Washington.
    Two comments: 1- Your users are judging “personality” from emotional cues (adjectives), the ten emotions correspond roughly to some Big 5 dimensions. 2- I think you are suggesting a criterion of emotional intelligence in place of the Turing test which is non-specific in its scope.

    Reply

Leave a Reply

Your email address will not be published.