<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title> &#187; Intelligent Positioning: News, articles &amp; updates 2011</title>
	<atom:link href="http://www.intelligentpositioning.com/blog/tag/htaccess/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.intelligentpositioning.com/blog</link>
	<description>SEO web development social media consulting</description>
	<lastBuildDate>Fri, 03 Feb 2012 12:24:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>X-Robots-Tag: Control Google Indexing via HTTP Headers</title>
		<link>http://www.intelligentpositioning.com/blog/2009/08/x-robots-tag-control-google-indexing-via-http-headers/</link>
		<comments>http://www.intelligentpositioning.com/blog/2009/08/x-robots-tag-control-google-indexing-via-http-headers/#comments</comments>
		<pubDate>Thu, 27 Aug 2009 09:22:54 +0000</pubDate>
		<dc:creator>Andrew Mabbott</dc:creator>
				<category><![CDATA[Design and Development]]></category>
		<category><![CDATA[htaccess]]></category>
		<category><![CDATA[noindex]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[x-robots-tag]]></category>

		<guid isPermaLink="false">http://www.ip-seo.com/latest/?p=794</guid>
		<description><![CDATA[The X-Robots-Tag allows crawler directives to be sent in HTTP Headers, allowing the noindex attribute to be used with images.]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve all got used to being able to control how the major search engines index our sites using a combination of robots.txt and the robots meta tag to add attributes like &#8216;noindex&#8217; to individual pages. While this works great for the pages themselves, it&#8217;s not so good for non-HTML, indexable content such as PDFs or embedded media, as we have no HTML &lt;meta> tag in which to insert the meta-information. In this article we take a look at a potential solution to this problem: the <em>X-Robots-Tag</em> HTTP Header.<br />
<span id="more-794"></span></p>
<h2>The X-Robots-Tag Header</h2>
<p>The idea behind X-Robots-Tag is that it allows robots directives normally found in a meta element to be sent as part of the server&#8217;s HTTP response headers. In other words the instructions are sent <em>with</em> the file rather than <em>within</em> the file, with the main advantage that this can be used with any type of content. So an HTTP response might look like this (using the <a target="_blank" href="https://addons.mozilla.org/en-US/firefox/addon/3829">Live HTTP Headers</a> plugin for Firefox):</p>
<p style="border:1px dashed #999; padding:5px">
<code>HTTP/1.x 200 OK<br />
Date: Thu, 27 Aug 2009 09:21:23 GMT<br />
Server: Apache/2.0.52 (Red Hat)<br />
Connection: close<br />
Transfer-Encoding: chunked<br />
Content-Type: text/html<br />
<span style="background:#ccc;">X-Robots-Tag: noindex</span><br />
</code>
</p>
<h2>A &#8216;noindex&#8217; for Images</h2>
<p>As an example consider the case of a webmaster wishing to prevent images being indexed and appearing in search results. One approach might be to add the entire &#8216;images&#8217; directory to the robots.txt file (i.e. <code>Disallow: /images</code>). The problem with this is that the robots.txt file provides crawling directives rather than indexing directives &#8211; that is, the search engine is instructed not to visit the &#8216;images&#8217; directory when crawling your site, but could still end up at one of your images if it is embedded in someone else&#8217;s site. Furthermore, what if we only want to block crawlers from certain images &#8211; the robots.txt file would quickly become large and unmanageable.</p>
<h2>X-Robots-Tag in Apache (htaccess)</h2>
<p>A combination of &#8216;Header set&#8217; and the FilesMatch directive allows us to add the robots tag in Apache. Here&#8217;s a couple of examples which could be in Apache&#8217;s httpd.conf or .htaccess files.</p>
<p>Add &#8216;noindex&#8217; header to all image files:</p>

<div class="wp_syntax"><div class="code"><pre class="apache" style="font-family:monospace;">&lt;<span style="color: #000000; font-weight:bold;">FilesMatch</span> <span style="color: #7f007f;">&quot;<span style="color: #000099; font-weight: bold;">\.</span>(gif|jpe?g|png)$&quot;</span>&gt;
<span style="color: #00007f;">Header</span> set X-Robots-Tag <span style="color: #7f007f;">&quot;noindex&quot;</span>
&lt;/<span style="color: #000000; font-weight:bold;">FilesMatch</span>&gt;</pre></div></div>

<p>Add noindex to image files matching a particular pattern &#8211; in this case those with &#8216;thumbnail&#8217; in the filename (i.e. /images/product41-thumbnail.jpg would be served with a noindex header while /images/product41-large.jpg would not):</p>

<div class="wp_syntax"><div class="code"><pre class="apache" style="font-family:monospace;">&lt;<span style="color: #000000; font-weight:bold;">FilesMatch</span> <span style="color: #7f007f;">&quot;images/.+-thumbnail<span style="color: #000099; font-weight: bold;">\.</span>jpg$&quot;</span>&gt;
<span style="color: #00007f;">Header</span> set X-Robots-Tag <span style="color: #7f007f;">&quot;noindex&quot;</span>
&lt;/<span style="color: #000000; font-weight:bold;">FilesMatch</span>&gt;</pre></div></div>

<h2>X-Robots-Tag in PHP</h2>
<p>PHP&#8217;s header function allows us to send any HTTP header, as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #990000;">header</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'X-Robots-Tag: noindex,nofollow'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>This supports more complex scenarios than using Apache alone. Rather than applying the same rule to all images, we could use some custom logic such as a database lookup to determine whether to add the X-Robots-Tag header. The first step would be to route requests for images to a php script using Apache&#8217;s .htaccess file (e.g. requests for jpeg files within the images directory will be handled by &#8216;image-handler.php&#8217;).</p>

<div class="wp_syntax"><div class="code"><pre class="apache" style="font-family:monospace;"><span style="color: #00007f;">RewriteEngine</span> <span style="color: #0000ff;">On</span>
<span style="color: #00007f;">RewriteRule</span> ^images/.*.jpg$ image-handler.php</pre></div></div>

<p>Then in image-handler.php, we can perform our custom logic (defined in the allowImageIndexing() function) and set the appropriate header:<br />
<code></p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$filename</span> <span style="color: #339933;">=</span> <span style="color: #990000;">basename</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$_SERVER</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'REQUEST_URI'</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// extract image filename </span>
<span style="color: #990000;">header</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Content-Type: image/jpg'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>             <span style="color: #666666; font-style: italic;">// set content type (otherwise it </span>
                                               <span style="color: #666666; font-style: italic;">// will be the dafault text/html)</span>
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span>allowImageIndexing<span style="color: #009900;">&#40;</span><span style="color: #000088;">$filename</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>          <span style="color: #666666; font-style: italic;">// perform lookup</span>
    <span style="color: #990000;">header</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'X-Robots-Tag: noindex'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>           <span style="color: #666666; font-style: italic;">// set the x-robots-tag accordingly</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #990000;">readfile</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'images/'</span> <span style="color: #339933;">.</span> <span style="color: #000088;">$filename</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>               <span style="color: #666666; font-style: italic;">// stream image file in response</span></pre></div></div>

<p></code></p>
<h2>X-Robots-Tag Search Engine Support</h2>
<p>Fortunately the three major search engines all now support the X-Robots-Tag</p>
<ul>
<li>Google &#8211; supports <a href="http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html">any value which can be used in the robots &lt;meta> tag</a></li>
<li>Yahoo &#8211; supports <a href="http://www.ysearchblog.com/2007/12/05/yahoo-search-support-for-x-robots-tag-directive-to-simplify-webmasters-control-and-weather-update/">noindex, noarchive, nofollow, nosnippet</a></li>
<li>Bing &#8211; <a href="http://www.bing.com/community/blogs/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx">as Yahoo plus noodp</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.intelligentpositioning.com/blog/2009/08/x-robots-tag-control-google-indexing-via-http-headers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

