Robots.txt Allow sub folder but not the parent
up vote
18
down vote
favorite
Can anybody please explain the correct robots.txt
command for the following scenario.
I would like to allow access to:
/directory/subdirectory/..
But I would also like to restrict access to /directory/
not withstanding the above exception.
robots.txt
add a comment |
up vote
18
down vote
favorite
Can anybody please explain the correct robots.txt
command for the following scenario.
I would like to allow access to:
/directory/subdirectory/..
But I would also like to restrict access to /directory/
not withstanding the above exception.
robots.txt
add a comment |
up vote
18
down vote
favorite
up vote
18
down vote
favorite
Can anybody please explain the correct robots.txt
command for the following scenario.
I would like to allow access to:
/directory/subdirectory/..
But I would also like to restrict access to /directory/
not withstanding the above exception.
robots.txt
Can anybody please explain the correct robots.txt
command for the following scenario.
I would like to allow access to:
/directory/subdirectory/..
But I would also like to restrict access to /directory/
not withstanding the above exception.
robots.txt
robots.txt
edited Nov 20 at 1:36
Paolo
9,8011263102
9,8011263102
asked Sep 30 '11 at 10:24
QFDev
4,345134474
4,345134474
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
up vote
21
down vote
accepted
Be aware that there is no real official standard and that any web crawler may happily ignore your robots.txt
According to a Google groups post, the following works at least with GoogleBot;
User-agent: Googlebot
Disallow: /directory/
Allow: /directory/subdirectory/
2
I wanted to dynamically allow sub-directories but not that first level, changing the Allow statement toAllow: /directory/*/
works.
– Duncanmoo
Apr 12 '13 at 9:00
4
According to the robots.txt Wikipedia entry, the 'Allow' directive should be placed before the 'Disallow' for maximum compatibility (though neither Google or Bing will mind)
– pelms
Jun 20 '13 at 14:17
add a comment |
up vote
2
down vote
If these are truly directories then the accepted answer is probably your best choice. But, if you're writing an application and the directories are dynamically generated paths (a.k.a. contexts, routes, etc), then you might want to use meta tags instead of defining it in the robots.txt. This gives you the advantage of not having to worry about how different browsers may interpret/prioritize the access to the subdirectory path.
You might try something like this in the code:
if is_parent_directory_path
<meta name="robots" content="noindex, nofollow">
end
add a comment |
up vote
1
down vote
I would recommend using Google's robot tester. Utilize Google Webmaster tools - https://support.google.com/webmasters/answer/6062598?hl=en
You can edit and test URLs right in the tool, plus you get a wealth of other tools as well.
Good point! Not sure if that was available back in 2011 when I posted this, but it's a very useful addition to WMT.
– QFDev
Feb 25 '16 at 17:43
QFDEV I had to use the Robot tester today because I am working really hard to get our companies site ranking higher on Google's Search Results. And the only thing I see under "HTML Improvements" on are duplicate titles and meta tags. This is because they are reading the same pages twice (query strings). Also for some reason the robot is crawling directories that do not exist. I found your post, which helped and then noticed the tester in Google Web Master tools and saw that it would validate the changes. Thought it could help other developers by posting to your thread.
– Moojjoo
Mar 4 '16 at 21:38
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f7609031%2frobots-txt-allow-sub-folder-but-not-the-parent%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
21
down vote
accepted
Be aware that there is no real official standard and that any web crawler may happily ignore your robots.txt
According to a Google groups post, the following works at least with GoogleBot;
User-agent: Googlebot
Disallow: /directory/
Allow: /directory/subdirectory/
2
I wanted to dynamically allow sub-directories but not that first level, changing the Allow statement toAllow: /directory/*/
works.
– Duncanmoo
Apr 12 '13 at 9:00
4
According to the robots.txt Wikipedia entry, the 'Allow' directive should be placed before the 'Disallow' for maximum compatibility (though neither Google or Bing will mind)
– pelms
Jun 20 '13 at 14:17
add a comment |
up vote
21
down vote
accepted
Be aware that there is no real official standard and that any web crawler may happily ignore your robots.txt
According to a Google groups post, the following works at least with GoogleBot;
User-agent: Googlebot
Disallow: /directory/
Allow: /directory/subdirectory/
2
I wanted to dynamically allow sub-directories but not that first level, changing the Allow statement toAllow: /directory/*/
works.
– Duncanmoo
Apr 12 '13 at 9:00
4
According to the robots.txt Wikipedia entry, the 'Allow' directive should be placed before the 'Disallow' for maximum compatibility (though neither Google or Bing will mind)
– pelms
Jun 20 '13 at 14:17
add a comment |
up vote
21
down vote
accepted
up vote
21
down vote
accepted
Be aware that there is no real official standard and that any web crawler may happily ignore your robots.txt
According to a Google groups post, the following works at least with GoogleBot;
User-agent: Googlebot
Disallow: /directory/
Allow: /directory/subdirectory/
Be aware that there is no real official standard and that any web crawler may happily ignore your robots.txt
According to a Google groups post, the following works at least with GoogleBot;
User-agent: Googlebot
Disallow: /directory/
Allow: /directory/subdirectory/
answered Sep 30 '11 at 10:38
user967058
305311
305311
2
I wanted to dynamically allow sub-directories but not that first level, changing the Allow statement toAllow: /directory/*/
works.
– Duncanmoo
Apr 12 '13 at 9:00
4
According to the robots.txt Wikipedia entry, the 'Allow' directive should be placed before the 'Disallow' for maximum compatibility (though neither Google or Bing will mind)
– pelms
Jun 20 '13 at 14:17
add a comment |
2
I wanted to dynamically allow sub-directories but not that first level, changing the Allow statement toAllow: /directory/*/
works.
– Duncanmoo
Apr 12 '13 at 9:00
4
According to the robots.txt Wikipedia entry, the 'Allow' directive should be placed before the 'Disallow' for maximum compatibility (though neither Google or Bing will mind)
– pelms
Jun 20 '13 at 14:17
2
2
I wanted to dynamically allow sub-directories but not that first level, changing the Allow statement to
Allow: /directory/*/
works.– Duncanmoo
Apr 12 '13 at 9:00
I wanted to dynamically allow sub-directories but not that first level, changing the Allow statement to
Allow: /directory/*/
works.– Duncanmoo
Apr 12 '13 at 9:00
4
4
According to the robots.txt Wikipedia entry, the 'Allow' directive should be placed before the 'Disallow' for maximum compatibility (though neither Google or Bing will mind)
– pelms
Jun 20 '13 at 14:17
According to the robots.txt Wikipedia entry, the 'Allow' directive should be placed before the 'Disallow' for maximum compatibility (though neither Google or Bing will mind)
– pelms
Jun 20 '13 at 14:17
add a comment |
up vote
2
down vote
If these are truly directories then the accepted answer is probably your best choice. But, if you're writing an application and the directories are dynamically generated paths (a.k.a. contexts, routes, etc), then you might want to use meta tags instead of defining it in the robots.txt. This gives you the advantage of not having to worry about how different browsers may interpret/prioritize the access to the subdirectory path.
You might try something like this in the code:
if is_parent_directory_path
<meta name="robots" content="noindex, nofollow">
end
add a comment |
up vote
2
down vote
If these are truly directories then the accepted answer is probably your best choice. But, if you're writing an application and the directories are dynamically generated paths (a.k.a. contexts, routes, etc), then you might want to use meta tags instead of defining it in the robots.txt. This gives you the advantage of not having to worry about how different browsers may interpret/prioritize the access to the subdirectory path.
You might try something like this in the code:
if is_parent_directory_path
<meta name="robots" content="noindex, nofollow">
end
add a comment |
up vote
2
down vote
up vote
2
down vote
If these are truly directories then the accepted answer is probably your best choice. But, if you're writing an application and the directories are dynamically generated paths (a.k.a. contexts, routes, etc), then you might want to use meta tags instead of defining it in the robots.txt. This gives you the advantage of not having to worry about how different browsers may interpret/prioritize the access to the subdirectory path.
You might try something like this in the code:
if is_parent_directory_path
<meta name="robots" content="noindex, nofollow">
end
If these are truly directories then the accepted answer is probably your best choice. But, if you're writing an application and the directories are dynamically generated paths (a.k.a. contexts, routes, etc), then you might want to use meta tags instead of defining it in the robots.txt. This gives you the advantage of not having to worry about how different browsers may interpret/prioritize the access to the subdirectory path.
You might try something like this in the code:
if is_parent_directory_path
<meta name="robots" content="noindex, nofollow">
end
answered Aug 12 '13 at 16:25
Javid Jamae
5,91743457
5,91743457
add a comment |
add a comment |
up vote
1
down vote
I would recommend using Google's robot tester. Utilize Google Webmaster tools - https://support.google.com/webmasters/answer/6062598?hl=en
You can edit and test URLs right in the tool, plus you get a wealth of other tools as well.
Good point! Not sure if that was available back in 2011 when I posted this, but it's a very useful addition to WMT.
– QFDev
Feb 25 '16 at 17:43
QFDEV I had to use the Robot tester today because I am working really hard to get our companies site ranking higher on Google's Search Results. And the only thing I see under "HTML Improvements" on are duplicate titles and meta tags. This is because they are reading the same pages twice (query strings). Also for some reason the robot is crawling directories that do not exist. I found your post, which helped and then noticed the tester in Google Web Master tools and saw that it would validate the changes. Thought it could help other developers by posting to your thread.
– Moojjoo
Mar 4 '16 at 21:38
add a comment |
up vote
1
down vote
I would recommend using Google's robot tester. Utilize Google Webmaster tools - https://support.google.com/webmasters/answer/6062598?hl=en
You can edit and test URLs right in the tool, plus you get a wealth of other tools as well.
Good point! Not sure if that was available back in 2011 when I posted this, but it's a very useful addition to WMT.
– QFDev
Feb 25 '16 at 17:43
QFDEV I had to use the Robot tester today because I am working really hard to get our companies site ranking higher on Google's Search Results. And the only thing I see under "HTML Improvements" on are duplicate titles and meta tags. This is because they are reading the same pages twice (query strings). Also for some reason the robot is crawling directories that do not exist. I found your post, which helped and then noticed the tester in Google Web Master tools and saw that it would validate the changes. Thought it could help other developers by posting to your thread.
– Moojjoo
Mar 4 '16 at 21:38
add a comment |
up vote
1
down vote
up vote
1
down vote
I would recommend using Google's robot tester. Utilize Google Webmaster tools - https://support.google.com/webmasters/answer/6062598?hl=en
You can edit and test URLs right in the tool, plus you get a wealth of other tools as well.
I would recommend using Google's robot tester. Utilize Google Webmaster tools - https://support.google.com/webmasters/answer/6062598?hl=en
You can edit and test URLs right in the tool, plus you get a wealth of other tools as well.
answered Feb 25 '16 at 14:22
Moojjoo
451825
451825
Good point! Not sure if that was available back in 2011 when I posted this, but it's a very useful addition to WMT.
– QFDev
Feb 25 '16 at 17:43
QFDEV I had to use the Robot tester today because I am working really hard to get our companies site ranking higher on Google's Search Results. And the only thing I see under "HTML Improvements" on are duplicate titles and meta tags. This is because they are reading the same pages twice (query strings). Also for some reason the robot is crawling directories that do not exist. I found your post, which helped and then noticed the tester in Google Web Master tools and saw that it would validate the changes. Thought it could help other developers by posting to your thread.
– Moojjoo
Mar 4 '16 at 21:38
add a comment |
Good point! Not sure if that was available back in 2011 when I posted this, but it's a very useful addition to WMT.
– QFDev
Feb 25 '16 at 17:43
QFDEV I had to use the Robot tester today because I am working really hard to get our companies site ranking higher on Google's Search Results. And the only thing I see under "HTML Improvements" on are duplicate titles and meta tags. This is because they are reading the same pages twice (query strings). Also for some reason the robot is crawling directories that do not exist. I found your post, which helped and then noticed the tester in Google Web Master tools and saw that it would validate the changes. Thought it could help other developers by posting to your thread.
– Moojjoo
Mar 4 '16 at 21:38
Good point! Not sure if that was available back in 2011 when I posted this, but it's a very useful addition to WMT.
– QFDev
Feb 25 '16 at 17:43
Good point! Not sure if that was available back in 2011 when I posted this, but it's a very useful addition to WMT.
– QFDev
Feb 25 '16 at 17:43
QFDEV I had to use the Robot tester today because I am working really hard to get our companies site ranking higher on Google's Search Results. And the only thing I see under "HTML Improvements" on are duplicate titles and meta tags. This is because they are reading the same pages twice (query strings). Also for some reason the robot is crawling directories that do not exist. I found your post, which helped and then noticed the tester in Google Web Master tools and saw that it would validate the changes. Thought it could help other developers by posting to your thread.
– Moojjoo
Mar 4 '16 at 21:38
QFDEV I had to use the Robot tester today because I am working really hard to get our companies site ranking higher on Google's Search Results. And the only thing I see under "HTML Improvements" on are duplicate titles and meta tags. This is because they are reading the same pages twice (query strings). Also for some reason the robot is crawling directories that do not exist. I found your post, which helped and then noticed the tester in Google Web Master tools and saw that it would validate the changes. Thought it could help other developers by posting to your thread.
– Moojjoo
Mar 4 '16 at 21:38
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f7609031%2frobots-txt-allow-sub-folder-but-not-the-parent%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown