distributed models of trust and reputation in group interactions

Copyright 2000, Joshua O'Madadhain. All rights reserved.

Background

This is a lightly edited archive of work that I was doing in this space while at the University of Oregon Computer and Information Science Department (1999-2001) working with Steve Fickas of the Computer and Information Science Department of the University of Oregon on a project for which he, Holly Arrow of the Psychology Department, and John Orbell of the Political Science Department, were the principal investigators. The essential motivation of the project was to study the self-organization of people into groups to achieve tasks; groups that form in this way are called clubs in the theory of political science. In particular, we are interested in the processes of negotiation that are involved both in club formation, and in the distribution of whatever good that the club derives through achieving its task.

The experimental metaphor that is being used to study these phenomena is a game called social poker. In social poker, no one player has enough cards to comprise a complete poker hand. Thus, the players must organize themselves into groups (such that each group has a set of cards from which a valid poker hand may be created) and then informally decide how to distribute the payoff associated with that hand. Then each player makes a private claim on that payoff, which may bear no relation to the amount which they agreed to claim; if the sum of the claims is greater than the size of the payoff, then each player is penalized. Thus, each player has an incentive to identify those players who contribute to overclaiming, so that he/she may avoid forming a group with them in a subsequent round.

This archive's purpose is twofold: as a mechanism for organizing my own thoughts, and as an informal means of making those thoughts accessible to others that might be interested. Some parts of it are derivative from others' work (see "References" below). If anyone would like to use or refer to any part of this in their own work, or would like to comment on it, please contact me.

Where I Come In

My connection to, and interest in, this endeavor is:

  • the study of these phenomena in an on-line environment

  • the use of software agents as facilitators in such an environment

  • the representation and communication of reputation information

At the moment, I am concentrating on the last of these three topics.

Agents (in this context, 'entities that are capable of action') do not generally have available all the information that would allow them to completely predict the behavior of other agents. In particular, often two agents will interact that have never interacted before. Thus, when two or more agents engage in a transaction, they tend to rely on information about each others' reputations as an aid to guide their actions in the transaction. Agents in a computer-mediated (CM) environment must rely on reputation even more than must those in a face-to-face (FTF) environment, because CM-environment agents do not have available such information as facial expression, tone of voice, style of dress, etc. as are available to FTF-environment agents.

Centralized versus Distributed Reputation

There are essentially two distinct types of models for managing reputation information: the central authority model and the distributed model.

In the central authority model, a single server is responsible for storing all reputation information, according to the evaluations that participating agents submit after each transaction. The weaknesses of this approach are:

  • Any centralized reputation server can be compromised, either by feeding it bogus information, or by actually cracking the server on which it is stored. Feeding bogus information to a centralized server is not difficult:

    • X can create large numbers of email accounts and use them to send messages, which appear to be from different people, and that promote a positive reputation for X to the central server.

    • Alternatively, X can simply arrange for other agents to submit bogus reputation information on X's behalf. Such agents are called "shills", and centralized servers (such as eBay) have already been demonstrated to be vulnerable to their influence.

  • A central server is a bottleneck which can restrict the flow of transactions.

  • Finally, a central authority enables any individual to determine his/her exact reputation at any time. This may not seem a problem at first. However, if one knows that one's reputation is poor, one can adopt a new identity to compensate...and this sort of practice should not be encouraged.

In the distributed model, each agent maintains its own reputation information. At first, this seems to be a terrible waste of space--and would be, if each agent were to store all of the information that would have been stored by the central authority. However, it seems reasonable to suppose that:

  • Any agent will only need to know the reputation of a fairly small number of people (in comparison with the number of people for which a centralized reputation server might maintain records). I would guess that the average person has done business with no more than a few tens of thousands of people, 90% of which were probably one-time transactions for which maintaining detailed information might not be worthwhile.

  • There is a limit to the number of degrees of separation that can be in a useful chain of recommendations. (If X is told by a (friend of a)4 friend (i.e., five friends away) that Y is a good mechanic, how much will X be inclined to trust this information?)

So What Is Reputation?

Reputation is comprised of the recorded judgements of agents on the quality of their interaction with other agents. After agent X and agent Y interact, then X and Y each rate their interaction based on whether their respective outcomes of the interaction were satisfactory. These ratings are then incorporated into X's reputation for Y and Y's reputation for X.

Some important qualities that reputation should have are:

  • Reputation is not symmetric. X's reputation for Y is not in general the same as Y's reputation for X, either because their transactions have not been not equally beneficial to both, or because their evaluation criteria are different.

  • Reputation is generally specific to a domain. For example, X may believe that Y is a competent mechanic, but a poor movie reviewer.

  • Reputation information should be sharable among agents. If Y has interacted with Z, but X has no experience with Z, X should be able to ask Y for Y's opinion of Z's reputation.

Formal Definitions

  • agent: Any entity that is capable of action.

  • reputation: X has a belief that Y will execute a task in domain D if X requests this of Y. This belief is Y's reputation in domain D for X.

  • credibility/reliability: X has a belief that Y provides accurate information about tasks in domain D. This belief is Y's credibility (or reliability) in domain D for X. This is referred to by Abdul-Rahman as recommender trust.
    (Alternate definition: X has a belief that Y provides accurate information about Z's performance on tasks in domain D; this belief would then be Y's credibility with respect to Z's performance in domain D for X.)

Parameters

I am interested in exploring the questions of how to represent reputation information for each agent, how to share it among agents, and how to incorporate one agent's reputation information into the reputation information of another. To that end, I have compiled the following list of parameters whose values we may need to determine.

  • default reputation: the reputation that Y has in domain D for X a priori (before X has ever had a chance to test the accuracy of Y's information).

  • default credibility: analogous to default reputation. Default credibility may or may not be affected by reputation.

  • volatility of credibility/reputation: the extent to which X's evaluation of Y's credibility/reputation in domain D is affected by new evidence (either personal experience with Y, or information from others). Note that volatility may differ with different kinds of evidence.

  • time discounts: The extent to which more recent encounters are weighted more than less recent ones, in the determination of reputation and credibility.

  • relative value of experience vs. recommendations: the relative emphasis of one's own evaluations with respect to those of others, in the determination of reputation and credibility.

  • weighting based on similar values: the extent to which X believes that Y's evaluations are those that X would have made; this is presumably determined by calculating the distance from Y's evaluations to those of X. Note that this sort of measurement is very similar to that done in profiling and in collaborative filtering of email [get reference from Steve].

  • credibility cutoff: if Y's credibility in domain D for X is less than some critical value, then X will ignore any information from Y regarding D (in particular, any information that Y gives about others' reputations/credibility in D). This can be very useful, especially in the presence of the credibility distance discount (below), to keep the number of agents in a given agent's database to a manageable level.

  • confidence: X's belief that X's evaluations of reputation or credibility are well-founded; this is presumably increased for Y in domain D with the number of transactions X has with Y in D.

  • credibility distance discount: the more intermediaries there are between X and reports of Z, the lower the credibility that these reports will have for X. This discount is based on X's lack of credibility in nth-hand information.

  • novelty: the extent to which X chooses to transact with someone with whom X has had few or no transactions.

Approaches

There are (at least) two major ways of approaching this research.

One is to attempt to reproduce the complex of mechanisms that humans use in collecting and using reputation information; if successful, this might tell us some interesting and perhaps useful things about human psychology and decision- making.

The other is to attempt to design agents that have their own criteria for assembling reputation information, whose purpose is to assist human beings in making decisions about transactions. My personal inclination, at the moment, is to pursue this latter course.

Active vs. Passive Dissemination of Recommendations

An active query is an (unprompted) request for information from X of other agents that X trusts. Such queries fall into two categories:

  • "I want to know about Y's reputation in domain D."

  • "I am looking for someone with a good reputation in domain D." The mechanisms for queries in these two categories may be similar, or the same, but how the search for recommendations proceeds, and how it terminates, will probably differ.

How an active query proceeds at all (in other words, how agents go about requesting information of one another when their users are not engaged in a transaction) is a question for which I don't as yet have an answer.

Passive exchange of information can most readily take place around a transaction (either before or after it), in which an agent may request its counterpart to reveal the contents of its recommendation/evaluation database (or offer to reveal its own). Intuitively, this is "gossip". The user may place restrictions on the kinds of information that they wish their agent to passively collect from other agents in general.

Of course, the extent to which the user's agent will incorporate this information into its database will depend on the credibility of its counterpart agent in the various domains in which information was received.

Agents of Deception

It seems prudent to assume that there will be some agents that will provide false information. Obviously, if such agents comprise too much of the population, then the system becomes useless (and society probably falls apart, too, so no one will care or even notice). However, agents must be prepared to cope with deliberately falsified information. Another (perhaps less depressing) way to put this is that other users will not necessarily evaluate on the same set of priorities

Fortunately, the answer is inherent in the distributed model of reputation information: don't trust any given individual too much, i.e., one shouldn't let information from a single source have too great an effect on one's own reputation evaluations.

Reputation as Probability

When I first started to consider issues of reputation, it seemed obvious to me that reputation ought to represented directly as an expectation, e.g., if Y has a "good" reputation in domain D for X, then X believes that there is a 85% probability (perhaps with some confidence interval) that Y will perform satisfactorily in a given transaction. This formulation would have had the advantage of enabling us to use various existing tools and techniques in probability and in graph theory

Partially as a result of reading Abdul-Rahman's papers, I have decided that probabilities are probably not the best representations of reputations. The basic reason for this is that probabilities do have convenient mathematical properties...not all of which are properly used in this context. In particular, consider transitivity. If Y has a credibility of 80% in domain D for X, and Z has a credibility of 90% in D for Y, probability theory would seem to imply that Z has, or should have, a credibility of 72% (.9 * .8) in D for X. I strongly doubt that this is the most reasonable conclusion, and it is for this reason that I suggest that probabilities be used sparingly, if at all, in this context.

Evaluations, User Characteristics, and Prioritization

Evaluations are subjective, based on X's level of satisfaction with the outcome of a transaction in which X has participated. However, it might be useful to consider the following: there are characteristics that X possesses that inform these evaluations--perhaps we should try to represent some of these as well. It was pointed out by Towle and Quinn (in their paper entitled Knowledge Based Recommender Systems Using Explicit User Models, presented in the 2000 Austin, TX workshop on Knowledge-Based Markets) that people make evaluations for different reasons, and (by implication) that conflating two ratings of "good" may therefore be misleading. (Towle and Quinn also pointed out that some make decisions on basis of preference, and some on basis of need. Should we try to capture this? It doesn't necessarily seem apropos in terms of evaluations (as opposed to purchase decisions, which is what they're talking about), but it may be worth looking into some more.)

Evaluations should be multidimensional: X should be able to evaluate a transaction based on several different factors. To combine this with the note above, perhaps X should also specify, as global characteristics, how important each of these factors tend to be. This could also be represented as annotations to each evaluation, or (best?) there could be global defaults, which could be modified ad hoc as necessary. (Because, of course, one's priorities are not always the same.)

Making prioritizations public knowledge is a two-edged sword. In some sense the ideal outcome is that everyone else makes theirs public, but mine remain secret. The reason is this: if I know someone else's prioritizations, that gives me a valuable context in which to interpret their recommendations. However, other people knowing my prioritizations doesn't help me...and it may hurt me, because if my prioritizations are known, then other agents (of deception) may represent themselves as having the same (or highly similar, to reduce suspicion) set of prioritizations for purposes of increasing their influence on me when I ask them for recommendations.

Concept for prioritizations: don't list order of priority for factors; imposing a total ordering is not always apropos (two factors may have equal priority) and does not tell the whole story in any case. Instead list their relative importance in terms of fractions of a whole (e.g., factor A comprises 15%, B comprises 10%, C comprises 30%, and so on).

How, in most circumstances, is a reputation agent to know that its user should be prompted to give an evaluation, or that it should provide reputation/recommendation info? (The second one is a priori easier, since the easy way to deal with this is to "speak when spoken to".)

A Sketch Of A Model

User specifies prioritization/relative importance of various factors (including characteristics of other users), perhaps in terms of percentages of the whole. These percentages may (and almost certainly will) vary for different domains, times, external circumstances, and people. However, if we always specify the prioritization for every transaction (possibly only specifying deviations from a predefined template) then it will be easier for other agents/users to incorporate the associated evaluations/recommendations into their own experiences. (See, however, the third paragraph of "Evaluations, User Characteristics, and Prioritization" in the main page of my working notes .)

Note for simulation: each agent should have, in addition to its preferences (prioritizations) and its parameters, a set of properties or characteristics that determine its performance in various areas to be evaluated by other agents.

After a transaction, each user is prompted by his or her agent to evaluate the transaction (see "big questions"). Some objective information, e.g.a record of goods/money/services exchanged, would be nice; ideally this would be fed to the agent automatically. This evaluation would be several-dimensional, with perhaps room for additional comments and keywords for aspects of the evaluation that did not fit into the given evaluation format.

Reputation information is passed around either in response to active queries (asking for recommendations) or passively (enabling database exchange in the context of a transaction, or, I suppose, with random passersby, if that means anything). Note that passive information gathering may be filtered according to the user's needs and preferences, and that any information gathered by either method will be interpreted in the light of the agent's view of the credibility of its source.

In any event, the various methods for information-gathering cause some recommendations to be dumped in the agent's metaphorical lap. The agent must then, according to the parameters specified by the user, incorporate the new data with the old "known" data (i.e., experiences and recommendations), taking into account the credibility of the source(s) and the path travelled by the data.

re: the datapath: Abdul-Rahman suggests that the return path ought to be the same as the outgoing path (for active queries) so as to ensure that the return path has at least one known entity, i.e., the agent which was the first one queried. This may mean that the information is more "attenuated" than it need be, if one of those queried is also known to the originator of the query and has good credibility. Some minor graph-theoretical stuff should probably happen here to prune the return path as necessary. (Others on the way back should perhaps do this too...although note that the only way that such pruning can be done is if the entire chain is exposed, which may not be desirable for security purposes.)

Reputation Venues

The communication, representation, synthesis, and reconciliation of experience and recommendations is in some ways an entirely orthogonal problem to that of group formation. They are both something that might be called "applied psychology/sociology", insofar as we are trying to construct, formalize, and implement practical theories using agents as assistants. However, while having some kind of information about potential group members is useful, reputation is useful in other venues as well, and need not have anything to do with group formation.

That said, it strikes me that the problems of group formation may well be more of a can of worms than I really want to address here.

For what venues is information about reputation useful, and how will it be used?

  • identifying people with whom you want to do business (finding people to fix your roof, your car, your traffic ticket...)

  • knowing what you're dealing with when you're already "stuck" with transacting with someone

  • "distributed resume" (comprised of digests of performance reviews, grades, ...). Note that in this case the participating entities are an institution and an individual.

How much do reputation agents help under adverse conditions:

  • sparse connections between users

  • few users of the system

  • infrequent transactions

  • 'agents of deception'

One thing that people are going to need to do--and should be able to do-- is to create their own domains and criteria for evaluation (and perhaps specify their relevance to other domains, i.e., if someone is a good recommender for car repair, this may mean that they're a good recommender for tractor repair). The problem is, of course, that unless people use the same terms (and in the same ways), that their agents won't be able to meaningfully communicate about them. (If I talk about "car repair", does it mean the same as your "automobile repair"? How about her "motor vehicle repair?") This could be viewed as a more profound (and perhaps subtle) form of the problem of reconciling recommendations from people that don't have the same prioritization. How can, or should, this be resolved?

Related question: should there be prefabricated sets of criteria/prioritizations for domains as they are created?

We need some way of merging evaluations and recommendations into a single opinion in order for recommendations to be meaningful at all. Perhaps this means that we can use a similar process to combine evaluations from several people into a single 'consensual [in the sense of 'consensus'] opinion'. This may inform our consideration of reputation agents as facilitators of group formation. See also the concept of the "distributed resume", mentioned above.

If reputation agents are informing the decisions of people engaging in club formation, what protocol is to be used to determine how the groups actually form? (It appears that no matter what mechanism one uses for determining group preferences for inclusion/exclusion--unless perhaps every time someone shows up, all current group members vote on the membership of all current group members (an n^3 algorithm!), group members are not created equal, since the later you show up, (a) the more people must pass on your inclusion, and (b) the fewer members in whose membership you will have a say.)

Since reputation is not necessarily the same thing as a list of characteristics (but rather a subjective perception of certain of those characteristics) this may be irrelevant...but it's worth considering: the best trading partner is not necessarily someone whose characteristics and preferences mirror your own. In particular, trading works best when the two people trading each have different things to offer (generally speaking, although there are exceptions, such as "I'll scratch your back if you'll scratch mine"). On a related note, it may be that the best groups for some kinds of tasks are those that have a certain amount of heterogeneity in skills, attitudes, and so on.

The Big (Open) Questions

  • This is a highly complex system that I've outlined here. Where the heck do I start?

  • How do I meaningfully test this (once I've figured out what "this" is--see the first question) for accuracy of modelling (in the case of modelling of human behavior) or usefulness (in the case of constructing reputation agents)? The number of variables that I'm looking at here would be staggering even if many of them did not involve subjective evaluation.

  • Where is this useful? (See the discussion on reputation venues.)

  • How does an agent know when to prompt its user for an evaluation of a transaction--i.e., what gives it knowledge that a transaction has taken place? (My strong suspicion is that if the user is given the responsibility for initiating the evaluation, much of the time it just won't happen.)

  • How do active queries work?

  • How do we determine, and verify, identity?

References

The best papers that I have seen on this subject--in terms of a detailed discussion of reputation and recommendations--have been authored by Alfarez Abdul-Rahman and Stephen Hailes. They are referenced here: https://www.researchgate.net/scientific-contributions/Alfarez-Abdul-rahman-7472444

Jay Schneider, formerly a member of the Wearable Computing group of the U of Oregon CIS department, got me into this in the first place with a paper that he co-authored, entitled "Disseminating Trust Information in Wearable Communities". You can find his papers here: https://scholar.google.com/citations?user=lgMONBcAAAAJ&hl=en

Notes and stuff to be cleaned up

I have not (yet) specified how these beliefs are represented.

note that this is all exclusive of risk, utility, and other concepts related to payoff...

add hyperlinks to definitions