You are viewing this forum as a guest. Login to an existing account, or create a new account, to reply to topics and to create new topics.
Why, when a first-time visitor (w/o cookies) comes to the site, the links are the long urls that contain the sid?
Then, after they go to another page, it switches to the SEO urls?
How can we get around the page that the first-time visitor comes to to contain the SEO urls and not the long ones?
Offline
it's annoying and I hate it with the passion of a mighty hurricane, but it's the way ccp works and I've been told it's normal (although I'ver never seen another site that needed to do that).
Offline
This question comes up often enough I should really create a Wiki entry for it.
The first time a person visits your site CCP attempts to set a cookie which contains the SID. When setting a cookie an application can not determine whether or not setting the cookie actually worked until the next time something is sent back to the server (completing a form or navigating to another page). The 2nd time CCP sees something it can now know whether or not the cookie setting worked and adjust the URLs accordingly.
Other sites that don't exhibit this usually carry the SID in the URL which isn't very SEO friendly.
Offline
I managed to resolve this with CCP5 and forgot what I did. It had to do with cookies.
Can we fool the program to test for the cookie and circumvent this?
Anyway, my CCP6 store is in a subdirectory (e.g., wwwmystore.com/buy/index.php).
When a person puts something in their cart, then goes to the home page
in the root directory (wwwmystore.com/, one dir up from the CCP6 store directory), their
cart empties.
Do you have any suggestion how to get around the cart emptying?
Last edited by Blitzen (11-14-2008 19:49:20)
Offline
I'm a little confused by this.
What links will a search engine crawler see?
Will it see the long ones or the SEO ones?
Offline
The SE will index both. The days are over that long URLs are a problem.
You can go to G and enter site:www.mysite.com to see what G will index. That can answer one question.
The concern is if you have two identical pages with different URL's, SE's see this as "duplicate pages" and will demote one. It's uncertain if the other dupe page will suffer a demotion - no one has good research data on that.
Last edited by Blitzen (02-02-2009 16:11:51)
Offline
If Google indexes both the long and the seo url's produced by CCP, this means the destination page is the same for both url's and will be seen as duplicate content. Therefore every page in CCP is a duplicate if SEO is turned on - someone please tell me this is not correct otherwise this is a complete disaster.
Offline
According to the , duplicate content links are NOT the problem that everyone seems to think they are.
Offline
Why not place these in your robots.txt file
Disallow: /index.php?app=
Disallow: /ccp0-emailfriend/
and sit back and relax knowing only the SEO links will be indexed?
Last edited by theblade24 (02-08-2009 19:07:41)
Offline
blade, wouldn't your suggestion in the robots tex file effectively ban search engines completely from a website?
These long url's are what is presented on first visit and it's not until one of them is followed that the seo links become visible and if the command is to disallow the long urls the search engines have nowhere to go.
Offline
Don't forget that what you see as a real user with a real browser is very different from what is presented to bots that are crawling your site.
Offline
Exactly! And if youre submitting sitemaps to google, yahoo, and msn with the short urls then all is good.
Offline
Hi All,
Conversation with Brett Yount at MSN LIVE SEARCH about site not being spidered, he pointed out this...........
I think we may have hit on something. It is possible your robots.txt is blocking due to your site being accessable using /index.php?
You have that blocked in your rep to disallow /index.php?=App
Problem is, ? is a wildcard in the REP
As borrowed from janeandrobot.com:
Selectively allow access to a URL that matches a blocked pattern - Use the Allow directive in conjunction with pattern matching for more complex implementations.
# Block access to URLs that contain ?
# Allow access to URLs that end in ?
User-agent: *
Disallow: /*?
Allow: /*?$
That directive blocks all URLs that contain ? except those that end in ?.
In this example, the default version of the page will be indexable:
* http://www.example.com/productlisting.aspx?
Variations of the page will be blocked:
* http://www.example.com/productlisting.aspx?nav=price
* http://www.example.com/productlisting.aspx?sort=alpha
Maybe adding the allow statement will help in your case...
--------------
OK added this to the robots.txt
----------------
so I added this to my robots.txt but so far no help at all.
User-agent: *
Disallow: /*?
Allow: /*?$
------------------
What do you think guys?
Cheers,
Bruce.
Offline
west4 wrote:
What do you think guys?
Speaking only for myself I think people spend far too much time, effort and $$ on something that is really pretty straightforward. Have a sitemap and feed it to the search engines. End of story.
Yes, that's simplistic and easy to do but that is really all that should be needed. You tell the bots exactly what you want them to look at and tell them exactly what URL it is you want them to associate with what they find.
Standing back waiting for the net rocks
Offline
I'm confused on what you are asking. I see your categories and products spidered with nice SEO urls in Google.
What problem are you having? Not showing up in Yahoo or MSN?
Both of those are waaaaayyyyyy slower to add sites to their index than google is. Is that the issue?
As far as index.php? I can't think of any url that I would want it in and allow it to be spidered. I don't want or need that appearing in any urls picked up by search engines. Am I missing something?
west4 wrote:
What do you think guys?
Cheers,
Bruce.
Offline
I avoid duplicate pages with my own SEO URL mod and denying absolutely everything in cgi-bin in robots.txt.
There are no links in cgi-bin that I care to have indexed.
Before denying cgi-bin, I saw both URLs to the same page being indexed by SE's.
Bear in mind that not every SE obeys robots.txt.
I'm disappointed to see G Webmaster Tools (Sitemaps) listing the URLS in the cgi-bin as restricted in robots.txt. For some reason, G can and is looking at those pages, even though robots.txt tells it to ignore those pages. What a waste of resources for G.
In my experience, the sitemap doesn't trump the links in the website itself. Not all SE's read the sitemap.
I would think [opinion] that the links in the website itself are weighted by SE's more than external pages.
Offline
What cgi-bin directory are you referring to in CCP6?
Google webmaster tools is doing exactly as it states. It's letting you know the urls that are resticted from being indexed by them by robots.txt. Sure it's going to read them all as it has no idea what not to read unless it relies on something to tell it not to read something. It still at least reads urls that are restricted because it may find a link to a page that isn't resticted on a restricted page.
Offline
Hi,
Well I think Brett (administrator on msn forum) was saying that having ? blocked was hindering my attempt to get msn to spider my site! in fact it fell from only the sitemap.xml being indexed to nothing being indexed in the last 3 weeks since adding the allow statement.. so do i want spiders to pick up index.php? or not, and do i want any links with ? on the end spidered, and does any one else see an issue with msn not spidering and cured it with a robot.txt entry?
Cheers,
Bruce.
Offline
How long has your site been live?
I'm seeing MSN completely ignore my robots.txt file anyway. I see both good and bad urls spidered.
Offline
Hi theblade24,
The site has been live since Sept 2008 so 5 months, loads of pages in Google and some pages in Yahoo but zero in MSN, the company web site has been going for over 5 years but this is the new design and url.
Google is really good, it changed the pages within weeks of me adding the Meta Title hack to the site, and re-spidered with all the new names, cool.
Yahoo seems to have given up after doing about random 30 pages and won't do any more.
MSN just wont show any pages.
Cheers,
Bruce.
Offline
I have seen the same behavior. I wouldn't worry too much about. I think time has alot to do with it regarding yahoo and msn. I'll bet if you look at server logs back from when you launched youll find MSNbot not even coming around until long after the others.
MSN is a very very very small fraction of traffic. I would lose sleep over them.
Offline
The SiteMap XMOD for the US version of CCP includes an option to have your site map automatically submitted to ask.com, Yahoo, MSN and Google. Makes it a "no brainer" to get the major engines to index your site.
Offline