Google for Webmasters Tutorial: Crawling and Indexing


Download English captions

English captions:
00:00:02.900,00:00:15.230
Now that you know how to have Google find your site and how to block Googlebot from specific pages, the next step is making the appropriate content on your site accessible to users and Google.

00:00:15.231,00:00:31.330
Accessible, in this context, means that both Googlebot and users -- including those using screen readers or mobile devices -- can navigate from page to page and, within reason, enjoy the core content throughout your site.

00:00:31.331,00:00:41.690
It's important to make your site accessible... to ensure a good experience for your users and also to help Google understand and list more of your pages.

00:00:43.000,00:00:51.430
In striving to make your pages accessible, it's helpful to understand what Googlebot can and cannot most effectively tackle.

00:00:51.431,00:01:03.800
HTML files and other document types comprised mostly of text are pretty straightforward for Googlebot. Music, images, and movies are harder for Googlebot to understand.

00:01:03.801,00:01:12.930
So, too, are dynamic pages -- those pages with frequently changing or on-the-fly-generated content -- potentially problematic.

00:01:12.931,00:01:25.760
You can see your site almost as Googlebot does by viewing your site in a text browser, like Lynx, or in a different browser with images, JavaScript, and Flash turned off.

00:01:28.600,00:01:39.300
As previously noted, images can be tough for Google to index. There are some things you can do, however, to help us better understand the images on your site.

00:01:39.301,00:02:02.100
Annotate your image in alt text, as shown above, and optionally in plain visible text near your image. Your visible comment before or after the image can be whatever you like, but it's best to stick with a concise version for the alt-text; no need, for instance, to include the word "image" or "photo," since Googlebot already sees the image tag.

00:02:02.101,00:02:13.960
Using descriptive file names can be helpful to Google, and also for your users who may download your images. “googlebot.jpg,” for instance, instead of “photo.jpg.”

00:02:13.961,00:02:32.360
By annotating your images in these ways, you're not only helping sight-impaired users who may be accessing your site with a screen reader, but you're also giving Google a better understanding of the images and improving the chances of your images showing up for relevant queries in Google Image Search.

00:02:33.530,00:02:45.260
Along with images, many web designers like to integrate rich-media or interactive aspects into their site, often using technologies like Flash or AJAX.

00:02:45.261,00:02:52.760
While these can provide an engaging experience for users, Googlebot may have trouble discovering or following links on these sites.

00:02:52.761,00:03:04.060
For example, textual content is sometimes stored in Flash as images, making it difficult for Google to capture the words, much less understand the meaning of the pages.

00:03:04.061,00:03:14.000
With careful planning, however, sites can include dynamic and media-rich elements while still remaining reasonably accessible to users and Googlebot.

00:03:14.001,00:03:36.960
Consider structuring your site so that these elements are "extras," with your site's core information and navigation rendered in plain text for Googlebot and all users without Flash. This is otherwise known as "graceful degradation." For additional useful suggestions, check out the two blog entries listed on this page.

00:03:39.130,00:03:48.060
After you've ensured that your site is both findable and accessible, don't let your great content languish with uninspired introductions.

00:03:48.061,00:04:01.000
Think of the titles and descriptions on your pages together as an advertising billboard: You have just a few words to let people know what each page is about and convince them that it's worth a visit.

00:04:01.001,00:04:14.900
The title tag of your page is likely to be displayed anytime Google shows your page in its search results, and it's also what people will typically see in various places in their web browser and even on social sharing sites on the web.

00:04:14.901,00:04:35.260
Therefore, it's important to have a concise, descriptive title for each page on your site. Google may draw from several different sources for the descriptive snippets in search listings, including meta descriptions, so you'll also want to make sure your meta descriptions are thoughtfully drafted for each page on your site.

00:04:35.261,00:04:44.000
Note that you can use Google’s Webmaster Tools’ “Content Analysis” feature to help you optimize your page titles and descriptions.

00:04:45.960,00:04:57.900
It's great having your pages in Google, but what happens when you find copies of your pages, either indexed from your site or -- with or without your permission -- on other sites?

00:04:57.901,00:05:16.060
This is known as duplicate content, and we know that most of the time it's unintentional. Your editorial, for example, ends up getting indexed on one of your site's topics pages, then on your monthly archives page... and perhaps then even on a syndicated partner's page.

00:05:16.061,00:05:24.830
In cases like this, there are steps you can take to help Google determine which is the best copy to show in search results.

00:05:24.831,00:05:43.560
With duplicate content on your own site, your best bet is to minimize the duplication in the first place. Use 301 redirects to forward visitors to a preferred page, consistently link to this preferred version, and list it in place of other versions in your XML Sitemap.

00:05:43.561,00:05:53.660
If you're syndicating your content, you may wish to ask your partners to include a link on each of their pages back to the original source on your domain.

00:05:53.661,00:06:13.430
And lastly, if you find someone copying your site and you want it removed from Google’s search results, you can file a Digital Millennium Copyright Act notice, otherwise known as a “DMCA” takedown request. For additional tips, check out the Webmaster Central blog post referenced here.

Attachments (1)

  • GEbS0a2JcAo-en.txt - on Jun 29, 2009 5:26 PM by Michael Wyszomierski (version 1)
    6k Download