OAuth Practices

The purpose of this documentation is to provide an overview of becoming an OAuth service provider to product/project managers, as opposed to the specific engineering/technical issues of using the OAuth protocol.

What is OAuth?
It is an open source industry standard to allow a user with an account on one website (the service provider), to allow another website (the consumer) to access their data from the first website.

What does our company get out of it?
Many companies initially shy away from the idea of letting other websites access customer data, but there are a number of potential benefits.  For example, do your end-users go to your website as frequently as they go to the website of their E-mail provider or websearch site?  If not, then you may find that you can raise awareness of your company with your end-users by providing a  web gadget for the personalized homepages that many E-mail/websearch sites provide that are designed to give users a single page snapshot of their digital life.  You can use that gadget to remind them of the services your websites offers, especially to give them quick glances at new information you have for them or about them.  As an example, look at this walkthrough of how MySpace built a gadget for the iGoogle homepage (for more details see the OAuth Proxy documentation).  Because OAuth is an industry standard, iGoogle can proxy the requests from the gadget back to your website.  In fact, that integration of OAuth with gadgets is available as open source for any E-mail/websearch site providing a personalized homepage.  The OAuth protocol also provide a technical method to let you maintain a list of the specific gadgets and 3rd party sites that can connect to your servers via OAuth.

Or go a step further and think about building an OpenSocial gadget that your customers might add to their social network page.  This could provide a new way for you to build your user base by advertising through your current customers.  Even if you think your business is not that sexy, you might find that the data you have on end-users is interesting for them to publish.  For example, an online role-playing game might let end-users add a gadget that shows what level their characters have achieved and what the characters look like.  Or take a look at Skydeck who even makes a phone bill interesting.

In addition to these web gadget pages, there are also companies building websites to aggregate specific classes of data such as financial, health, travel, activity streams, etc.  If your business is in a sector where end-users frequently use multiple vendors (such as finance or travel), then you might wish you could keep the user loyal only to you, and thus keep their data only on your website.  However the other way to look at this is that if you make it easier for end-users to aggregate their data from your company into one of these aggregation sites, then it will show you are more customer focused and may lead them to do more business with you, especially if the aggregator provides links back to your site.  As an example of such an aggregator, look at Google Health or Microsoft HealthVault.

Another perspective is to consider what data your company might pull from other businesses.  For example, could you pull your customer's calendar information from their calendar provider and use it to make it easier for them to schedule visits with you?  Could you pull their list of friends and tell your customers what their friends are doing on your website?  Could you pull their photos and let them personalize the service you offer by adding their picture, such as to a loyalty card?  Or are there even higher value mashups you could do with tighter data integration?  For example, if you are a clinical trial matching website, could you pull their health summary and suggest specific trials that would be relevant to them?

What if startups are already pulling data from our site without our permission?
Many companies today provide web-sites that let users create/view/edit/manage their own information.  This includes content websites like Photo Hosting, collaboration sites like Social Networks, and customer portals like Online Banking.  Around the same time that buzzwords like Internet 2.0 and mashups and AJAX started to get popular, a number of startups on the web started to build business by enhancing the data that end-users had on other websites by pulling the information from that website, and doing something that was fun, helpful, and sometimes even powerful.  A few common examples include:
  • Social Networks would pull a user's address book from their E-mail provider, and use that to make it easy for the end-user to invite their friends to that social network
  • Photo Printing services would pull a user's photos from their photo hosting service and let the user have those photos printed on t-shirts, mugs, calendars, photo albums, etc.
  • Financial aggregation services would pull a user's financial details from their banks, credit cards companies, etc. and show a combined view of that information
  • Webmail providers would use the POP3/IMAP protocols to aggregate a user's mail from multiple E-mail boxes
If your company provides a web-site with information about end-users, there is a good chance an Internet startup is working on ways to pull data from your site and try to enhance it as part of their business.

The problem is that the range of these startups and volume of customer interest quickly exceeded the tools provided by the websites who hosted the original content that was being pulled by these startups.  Some big websites had APIs that were designed to be used by desktop applications to access end-user data (e.g. the OFX standard for accessing financial data from banks), but that required users to give their login information of their banks to the startup who would then access those desktop APIs.  Other big websites did not even have those types of APIs, so instead these startups got good at screen-scraping the websites to extract the information they needed (e.g. the address books that social networks would access), but those startups again needed users to provide the login information for their E-mail service.

This was a scary time for big websites.  Many were not comfortable with the idea that users could easily extract their information because it seemed to put their business model at risk.  Others big sites were uncomfortable that if they let these new startups access their customers information, then the big site would be held liable if the startup did something bad with the data.  But all these websites agreed that it was extremely dangerous for users to give their login credentials to these unproven startups.

If this sounds like your company, then keep reading.

What can our company do?
Before OAuth was created, a number of big Internet websites started providing proprietary APIs to try to build ecosystems of 3rd party partners who would extend their platform by enhancing end-users data.  eBay was one of the first, but AOL, Yahoo, Google, Amazon, & Microsoft all started developing what were generally called "web delegation APIs" that let an end-user tell the big website to let some smaller website access part of their data.  So your company could certainly go down the route of building another proprietary API, but read further to learn why that may not be in your best interest.  However if your company already does have a proprietary API, you may think it offers some feature that OAuth does not provide today such as federated login.  If that is the case, you should definitely get involved with the OAuth standards discussion because the protocol is quickly being extended to support the features that other big companies had in the proprietary delegation APIs.

Unfortunately a lot of startups had already built tools that either used those website's desktop APIs or screen-scraped them, and those tools worked fine, so there was not much incentive to switch to these proprietary APIs.  But then a funny things happened, some of the Internet 2.0 startups who had started to become successful started to find that they too were being screen scraped, and even some of the startups that pulled data from a big website would in turn find that smaller more nice-startups would start pulling data from those startups.  Fortunately, instead of building yet another proprietary web delegation API, some of the smaller websites started the OAuth project to build a common standard.  The big Internet websites had seen how hard it was to convince developers to switch to their proprietary delegation APIs, so many of them supported this OAuth effort in the hope that if a standard emerged that developers used on one website, then they would be more likely to use it with other websites instead of asking users for their login credentials.  This created a well aligned set of incentives.

However, if you are considering becoming an OAuth service provider, realize that many of the companies contributing to the design of OAuth are more focused on making the startup developer's life simple, and not necessarily the life of the service provider.  So becoming an OAuth service provide is not necessarily easy on either the engineering side, or the product side.  The purpose of this document is to try to offer some advice on the product issues of becoming an OAuth service provider.

What data should we make accessible via OAuth?
Before worrying about the technical parts of OAuth, you should first decide what data you want to allow end-users to make available to 3rd party websites/startups on that user's behalf.  One easy way to do that prioritization is to find out what startups are already screen-scraping your site, or using your desktop APIs.  There is a good chance however that you will find that those startups are pulling out the data that you consider core to your business.  One option of course is to call in the lawyers and send cease & desist letters to all the sites who are pulling this data.  Another is to implement technical mechanisms to detect these sites and try to block him.  However, you should be prepared to get a nasty response from your users and the press and government officials if you do that.  These days end-users consider the data about them to be their data, no matter how much work your business did to create that data, and if they are willing to take the risk of giving their login information to an unproven startup, then they are willing to complain vocally to the press and government if you block their access to their data.

Once your company has tried blocking this access for awhile, you are likely to start to see the side effects, such as losing customers.  If you want to help your company with the transition to the idea that users own their data, then you should focus on how to leverage these startups to enhance your business.  For example, are there features your users want that your company does not have the time to build?  Could those startups act as a virtual R&D group that provides potential acquisitions without your company having to provide funding?  Or if you are more monetarily focused, then realize that the end-users those startups want to target are your company's customers, and thus you could charge them to advertise to your customers.

How do I control what startups we work with?
Assuming you do decide to start providing some support to these startups, then most companies immediately hit the virtual brake again and start thinking about how they control what startups they work with.  For example, your company might want to audit them to make sure they don't do something bad with the user data.  Or maybe you want to require custom contracts with each of them.  Or maybe you just want them to identify themselves so you can keep track of much how data they access.  The OAuth protocol does provide a technical way to do this by letting you maintain a list of the 3rd party sites that you trust.

While those are nice goals, the thing to remember is that those startups still have the alternatives of screen-scraping or using desktop APIs.  So you might keep some of the smaller and more ethical startups from working with you to help your end-users, but you won't block the bigger, more aggressive startups and you certainly won't block the unethical startups.  However, most companies need to walk before they run so providing OAuth enabled delgation APIs to a few startups is better then none.

Are we liable for what these startups do with the data?
This is a tough question.  Legally speaking your lawyers can write terms-of-service for your APIs and end-users that absolve your firm of any liability.  However, the bigger concern is that the press will hold your firm responsible.  One reason not to provide OAuth enabled APIs is that if a website does screen-scrape you and then does something bad with the data (like selling the end-users logins to spammers), then your PR people can tell the press that the website must have known they were doing something illegal and the end-users were "obviously" taking on the risk by giving their login credentials to that startup.  But that problem leads to a potential answer which is that if the users own their data then they need to take responsibility for it.

To be more specific, lets say your company does provide OAuth enabled delegation APIs, and some startup uses them and has a security hole that leaks SOME user data.  If journalists start contacting your company about this problem, one response is to get your lawyers to start suing that startup, and then to block the startups access to any customer data your company hosts.  However, that both sets the expectation that your company will act as a cop in the future, and if some users don't think the security hole was that bad then they are likely to get upset that you have restricted what they can do with their data.  So another way to handle the journalist inquiries is to thank them for the notification, and tell them you will evaluate the situation and possibly add a cautionary warning to your website for any new users that try to share data with that startup, and also possibly send an E-mail warning to your customers who have already shared data with that website.  But specifically note that if end-users ask you to make their data available to that site, your company will continue to do so.  Now of course there are some restrictions, such as if you learn that the website is involved in something like child pornography, but that can be the exception rather then the rule.

Of course, your company may find that you can create business value specifically out of playing the cop role and tightly regulating these add-on services while trying to block non-approved services legally or through technical means.  Plenty of companies like Prodigy, CompuServe, & AOL have been very successful in that approach, but over time as users become more comfortable with the risks, they tend to dislike the restrictions this model places upon them.

Another challenge with the cop approach is that it stifles innovation.  Startups who might have provided a valuable enhancement to your service will be concerned that your company might not approve their application, or that you might leak the details or steal the idea if they tell you about the application ahead of time.  It will also make it much harder for them to get venture funding.

What user interface design is required?
This User Interface doc provides some examples and suggestions.

What are some advanced applications of OAuth?
The following sections are for the advanced reader who wants to plan a bit further in the future.  While they are advanced, all 3 examples are based on technology in use today like OpenSocial, Personal Health Records, and alternate payment systems.

Claims, Reputation & Assertions
As a basic example of claims & reputation, consider the blogs or social network pages on which teenagers talk about what they have been up to.  For example, "I went to Jack's party" or "I stayed up all night playing Diablo."  However, they also use these pages to make claims about themselves such as "I just got into Harvard" or "I became a level 32 warrior on Warcraft."  Their goal is to enhance their reputation.  However these are just claims, and so are hard to verify.  With standards like OAuth & OpenSocial, that is starting to change.  An end-user could give a gaming website the ability to push data about their online characters into their activity stream (as opposed to the more common pull model we have discussed above.)  Similarly, they could give their University permission to push data into their activity stream such as awards they received, classes they finished taking, degrees they received, etc.  In these cases, the end-user is aggregating assertions about themselves that can be validated through trusted sources.

Or in another example, applications like Microsoft HealthVault & Google Health allow a user to aggregate assertions about their health status.  For example, a major lab company like Quest can push an assertion into their health profile that provides a set of lab results.  The end-user can in turn share that information with another hospital they visit in the future to avoid having to pay to retake the lab test, or they might share the information with a clinical matching service that wants to verify the person actually fits the criteria for a certain clinical trial.  The potential financial savings to society for reducing the paperwork normally needed to verify these results (or the duplication of tests and procedures) is enormous.

Consider how often your own business needs to verify claims made by a customer.  Or consider how often your company is asked to verify a claim on behalf of a customer.  Could you instead allow the customer to aggregate the information about themselves electronically, and then share it when needed?

Data Warehouses
One of the primary ways companies "verify" information about a customer today is through data warehouses, such as credit warehouses.  Your company might use one of these firms, or share data with these firms about your customers.  You don't need to read too many newspapers to understand the public's current opinion on these warehouses.  They certainly play an important role, especially as it relates to credit liquidity, but end-users want more control over their data.  One of the key things that made these data warehouses possible was the use of globally unique identifiers like the United States' social security number, as well as other personally identifiable information like address, phone #, credit card #, etc.  This allows warehouses to pull information from many companies and use that personally identifiable information to link information about the same person into a master profile.

One of the potential benefits of technology such as OAuth is the ability for users to share verifiable assertions about themselves without having to release any personally identifiable information.  The technology itself does not rely on the use of global identifiers like E-mail or social security number.  A user of an application like Google Health or Microsoft HealthVault can share their health profiles with a Diabetes Management service or an Aids clinic without Google or Microsoft having to release any personally identifiable information.  Or a user can share their financial information with an advanced budgeting application without worrying that the site will sell their information to one of these data warehouses.

Financial Transactions
When you make a credit card purchase at an online store, how do they verify your access to that card?  You give them the exact same details as you give to any other online store, which in turn means any other online store has the ability to impersonate you.  Some payment systems like eBay and Google Checkout have provided more privacy preserving options, but in most cases end-users still get asked for their credit card number.  Credit Card companies could act as OAuth service providers, but in this case the user would delegate to the online store the ability to push assertions into their credit card account, such as the assertion that the user wants to pay the company $50 for a purchase.  Banks could do the same if you prefer to have your bank account auto debited.  And if you want to transfer money between bank accounts, or pay a credit card from a bank, then the end-user can similarly delegate the rights they want these different entities to have.  Using technology like OAuth for the delegation provides the user with much greater control (like the ability to revoke access), as well as better monitoring (detailed audit trails of who did what), and avoids the risk of an end-user's bank or credit card information getting stolen and used for fraud.  Some credit card companies are already experimenting with technology along these lines by using one-time credit card numbers, but each approach is proprietary and the user experience is clunky.  The key value of standards in this space is the potential to get wide adoption of new approaches that both better protect user's privacy and reduces fraud, while actually improving usability.

Some day maybe US citizens will be able to delegate to a bank or employer the right to push information into their IRS account, as well as the ability to delegate to your accountant the right to pull information out, all without needing to use your social security number.  However that one might take a few decades :-)

Anything else I should be scared about?
In today's economy, data is a valuable resource, and many companies business is built around that data.  As end-users pull more and more data out of your company through screen-scraping or OAuth enabled APIs, companies will need to focus more and more on WHAT they do with that data, and less on HOW much they have.  If your business model has been focused on storing data, then be worried.  If your business model is focused on being paid to enrich data, then you are in much better shape.  But as data starts flowing more freely, realize its easy for it to bypass you if someone else does a better/faster/cheaper job of enriching the data then your business.

Also recognize that you may need to consider your monetization strategy.  Many companies (especially consumer Internet services) monetize at the time of data presentation, i.e. the webpages of their website.  However if the end-users are pulling the data from your company and viewing it elsewhere, you lose that monetization opportunity, i.e. those eyeballs.  So look for mixed strategies where you provide some UI to visualize/manage/manipulate the data you store/enhance, and hopefully the other startups who enhance the data you provide will in turn make your service more useful to end-users so you get more of them, even if they spend less time on your site as compared to the amount of data you store for them.

Eric Sachs
Product Manager, Google Security
esachs