Search Engine Optimization
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsWeb DesignSearch Engine Optimization

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old February 5th, 2008, 11:14 PM
time4fishing time4fishing is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2008
Posts: 24 time4fishing User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 14 h 10 m 38 sec
Reputation Power: 0
"included" html

If you include an html page from another site, (for example a list of something), into an html page on your site, and the method (php cURL) leaves the included pages head, title, etc tags in the middle of the html page on your site does that hurt anything as far as the search engines go?

Reply With Quote
  #2  
Old February 6th, 2008, 01:46 AM
Catacaustic's Avatar
Catacaustic Catacaustic is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2005
Location: A Land Down Under
Posts: 442 Catacaustic User rank is Second Lieutenant (5000 - 10000 Reputation Level)Catacaustic User rank is Second Lieutenant (5000 - 10000 Reputation Level)Catacaustic User rank is Second Lieutenant (5000 - 10000 Reputation Level)Catacaustic User rank is Second Lieutenant (5000 - 10000 Reputation Level)Catacaustic User rank is Second Lieutenant (5000 - 10000 Reputation Level)Catacaustic User rank is Second Lieutenant (5000 - 10000 Reputation Level)Catacaustic User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 5 Days 15 h 53 m 40 sec
Reputation Power: 68
It's not the best thing to do. Apart from breaking HTML standards by having two <head> and ,body> sections, you'll also run the risk of having the SE's see this in your code and realise that you're scraping it from somewhere else.

If you're using cURL in PHP it's a pretty easy thing to do to write a small wrapper function that will strip off everything up to the end of the <body> tag, and after the start of the </body> tag. That way you'll be sure to have no issues with it.
Comments on this post
jabba_29 agrees!

Reply With Quote
  #3  
Old February 6th, 2008, 08:43 AM
time4fishing time4fishing is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2008
Posts: 24 time4fishing User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 14 h 10 m 38 sec
Reputation Power: 0
Quote:
Originally Posted by Catacaustic
It's not the best thing to do. Apart from breaking HTML standards by having two <head> and ,body> sections, you'll also run the risk of having the SE's see this in your code and realise that you're scraping it from somewhere else.

If you're using cURL in PHP it's a pretty easy thing to do to write a small wrapper function that will strip off everything up to the end of the <body> tag, and after the start of the </body> tag. That way you'll be sure to have no issues with it.


Can you give me a hint where to start. My code is:
PHP Code:
<td>
              <?
php
                
// create a new cURL resource
                
$ch curl_init();
                
// set URL and other appropriate options
                
curl_setopt($chCURLOPT_URL"http://www.time4fishing.com/pagefooter/links-charter-boats.htm");
                
curl_setopt($chCURLOPT_HEADERfalse);
                
// grab URL and pass it to the browser
                
curl_exec($ch);
                
// close cURL resource, and free up system resources
                
curl_close($ch);
              
?> 
            </td> 

Reply With Quote
  #4  
Old February 8th, 2008, 01:47 AM
Catacaustic's Avatar
Catacaustic Catacaustic is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2005
Location: A Land Down Under
Posts: 442 Catacaustic User rank is Second Lieutenant (5000 - 10000 Reputation Level)Catacaustic User rank is Second Lieutenant (5000 - 10000 Reputation Level)Catacaustic User rank is Second Lieutenant (5000 - 10000 Reputation Level)Catacaustic User rank is Second Lieutenant (5000 - 10000 Reputation Level)Catacaustic User rank is Second Lieutenant (5000 - 10000 Reputation Level)Catacaustic User rank is Second Lieutenant (5000 - 10000 Reputation Level)Catacaustic User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 5 Days 15 h 53 m 40 sec
Reputation Power: 68
It's easy. Once you've got the returned values from your curl calls, just use the PHP string index functions to determine where the finishing ">" character is after the body tag, and remove everything before that. Then find the position of the closing "</body" tag and strip everything after there. It's 5-6 lines of code, and not to hard. Should be a good exercise for you.

Also, you will want to use this to get the page returned to the script.
Code:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$page_contents = curl_exec($ch); 

Reply With Quote
Reply

Viewing: Dev Shed ForumsWeb DesignSearch Engine Optimization > "included" html


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

 Free IT White Papers!
 
How to Present Effectively Online
This white paper offers practical and actionable advice on the key steps that any presenter should consider as they plan and execute a Webinar or online meeting.

 
Open Source Security Myths
Open Source Software (OSS) is computer software whose source code is available to the general public with relaxed or non-existent intellectual property restrictions (or arrangement such as the public domain), and is usually developed with the input of many contributors.

 
Power and Cooling Capacity Management for Data Centers
This paper describes the principles for achieving power and cooling capacity management.

 
Scalable, Fault-Tolerant NAS for Oracle - The Next Generation
For several years NAS has been evolving as a storage alternative for Oracle databases, and for good reason: NAS is quite often the simplest, most cost-effective storage approach for Oracle. Learn about the benefits that HP's approach to scalable NAS brings to Oracle environments in this comprehensive white paper.

 
Understanding Web Application Security Challenges
This white paper discusses many common threats and preventive measures for Web application security, and explains what you can do to help protect your organization.

 

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 1 hosted by Hostway
Stay green...Green IT