PHP Development
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPHP Development

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old December 18th, 2012, 09:16 AM
artsir artsir is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 6 artsir User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 26 m 17 sec
Reputation Power: 0
Optimizing an HTML Page removing white space and carrage returns

I want to open and read an html document and output a copy of it adding a D to the end of it.

//this i can probably figure out

But i'm curious how i would remove all the white space and carraige returns. But i wouldn't want to remove them in a paragraph tags.

My idea would be find ">" then if the next char is "<" then remove all characters between.

But how would I absorb an entire page into a string?

Thanks

Reply With Quote
  #2  
Old December 18th, 2012, 09:26 AM
gw1500se gw1500se is offline
Contributing User
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Jul 2003
Posts: 2,867 gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level) 
Time spent in forums: 1 Year 1 Week 5 Days 9 h 6 m 6 sec
Reputation Power: 581
__________________
There are 10 kinds of people in the world. Those that understand binary and those that don't.

Reply With Quote
  #3  
Old December 18th, 2012, 12:14 PM
msteudel's Avatar
msteudel msteudel is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Sep 2002
Location: Seattle, U.S.A.
Posts: 712 msteudel User rank is Lance Corporal (50 - 100 Reputation Level)msteudel User rank is Lance Corporal (50 - 100 Reputation Level)msteudel User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 4 Days 11 h 4 m 59 sec
Reputation Power: 11
This StackOverflow post might help you with stripping whitespace and new lines:

http://stackoverflow.com/questions/...s-and-new-lines

Reply With Quote
  #4  
Old December 18th, 2012, 01:07 PM
E-Oreo's Avatar
E-Oreo E-Oreo is offline
Lost in code
Click here for more information.
 
Join Date: Dec 2004
Posts: 7,931 E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)E-Oreo User rank is General 90th Grade (Above 100000 Reputation Level)  Folding Points: 945 Folding Title: Novice Folder
Time spent in forums: 2 Months 7 h 43 m 47 sec
Reputation Power: 6991
If you are doing this for performance reasons it is completely a waste of time. The performance increase you'll see from this will be completely negligible. You're more likely to see a decrease in performance due to the extra CPU time required to remove the spaces.
__________________
PHP FAQ
How to program a basic, secure login system using PHP

Quote:
Originally Posted by Spad
Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around

Reply With Quote
  #5  
Old December 18th, 2012, 02:29 PM
artsir artsir is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 6 artsir User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 26 m 17 sec
Reputation Power: 0
Thanks guys i'm going to look into this and let you know how it goes.

As for the reason i'm doing is

1. Education, i just want to get better with my php, i took a class a year ago. I can do forms and sql calls. I just want to explore more with writing files.

2. As for performance, the php wont be removing the white spaces in real time. I have this brain dead web job where i update html sale pages. Their server doesn't support php, i will try to convince them later. The updates are very tedious, I have written a few php code to streamline this process.So I have the php output html files that i upload to the server.

Reply With Quote
  #6  
Old December 18th, 2012, 09:48 PM
BarryG BarryG is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2011
Location: Sydney Australia
Posts: 131 BarryG User rank is Second Lieutenant (5000 - 10000 Reputation Level)BarryG User rank is Second Lieutenant (5000 - 10000 Reputation Level)BarryG User rank is Second Lieutenant (5000 - 10000 Reputation Level)BarryG User rank is Second Lieutenant (5000 - 10000 Reputation Level)BarryG User rank is Second Lieutenant (5000 - 10000 Reputation Level)BarryG User rank is Second Lieutenant (5000 - 10000 Reputation Level)BarryG User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 3 Days 7 h 51 m 28 sec
Reputation Power: 83
Quote:
Originally Posted by artsir
I have written a few php code to streamline this process.So I have the php output html files that i upload to the server.


You are re-inventing the wheel.

HTML-Tidy already does this.

Chami HTML-Kit has HTML-Tidy integrated.
http://www.htmlkit.com/

Last edited by BarryG : December 18th, 2012 at 09:56 PM.

Reply With Quote
  #7  
Old December 19th, 2012, 03:56 AM
Jacques1's Avatar
Jacques1 Jacques1 is offline
pollyanna
Click here for more information.
 
Join Date: Jul 2012
Location: Germany
Posts: 1,835 Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 1 Day 4 h 14 m 14 sec
Reputation Power: 811
Hi,

I understand you're doing this partly for learning, but manually fumbling with HTML and regexes is almost always a bad idea. Instead, use a HTML parser to fetch the elements and then output them in the way you like. That's a much more intelligent and clean approach. You'll also gain useful knowledge from this and won't just be playing around with strings.

Regarding the performance: Don't try to make you own home-made optimizations (unless you really know what you're doing). The effect will be minimal compared to the gigantic effort. Use a proven solution like in this case (gzip) compression.

Reply With Quote
  #8  
Old December 19th, 2012, 12:25 PM
msteudel's Avatar
msteudel msteudel is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Sep 2002
Location: Seattle, U.S.A.
Posts: 712 msteudel User rank is Lance Corporal (50 - 100 Reputation Level)msteudel User rank is Lance Corporal (50 - 100 Reputation Level)msteudel User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 4 Days 11 h 4 m 59 sec
Reputation Power: 11
Quote:
Originally Posted by Jacques1
You'll also gain useful knowledge from this and won't just be playing around with strings.


I disagree with this. Understanding how to manipulate strings with REGEX is a VERY useful skill.

Quote:
Originally Posted by Jacques1
Instead, use a HTML parser to fetch the elements and then output them in the way you like. That's a much more intelligent and clean approach.


This is a great idea.

Trying to gain optimization out of stripping white space from HTML, probably the bottom of my list, actually probably not on my list.

Check out this page for a good list of optimizations to try:
http://developer.yahoo.com/yslow/

Also check out this project, it has a great process for building web pages, with minification as part of the process:
http://html5boilerplate.com/

But lastly, good on you for taking a boring brain dead job and doing something to keep it interesting and keep yourself learning new skills, even if it's, "Ok stripping whitespace from html is not a good idea".

Last edited by msteudel : December 19th, 2012 at 12:28 PM.

Reply With Quote
  #9  
Old December 19th, 2012, 12:42 PM
Jacques1's Avatar
Jacques1 Jacques1 is offline
pollyanna
Click here for more information.
 
Join Date: Jul 2012
Location: Germany
Posts: 1,835 Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 1 Day 4 h 14 m 14 sec
Reputation Power: 811
Quote:
Originally Posted by msteudel
Understanding how to manipulate strings with REGEX is a VERY useful skill.


Sure, I do not doubt that. But what you also have to learn is to choose the right tool for the right job. Regexes are far overused in my opinion. People tend to think they could do any string manipulation if only the regex is complicated enough. So instead of looking for an appropriate parser, they fumble with regex hacks forever.

That's why I suggested using a different approach.

Reply With Quote
  #10  
Old December 19th, 2012, 12:48 PM
requinix's Avatar
requinix requinix is online now
Still alive
Click here for more information.
 
Join Date: Mar 2007
Location: Washington, USA
Posts: 12,680 requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)  Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 5 Months 1 Week 4 Days 1 h 55 m 43 sec
Reputation Power: 8969
Send a message via AIM to requinix Send a message via MSN to requinix Send a message via Yahoo to requinix Send a message via Google Talk to requinix
Quote:
Originally Posted by Jacques1
But what you also have to learn is to choose the right tool for the right job. Regexes are far overused in my opinion.

"If the question is HTML then regex is not the answer." With very few exceptions.

Reply With Quote
  #11  
Old December 19th, 2012, 12:59 PM
msteudel's Avatar
msteudel msteudel is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Sep 2002
Location: Seattle, U.S.A.
Posts: 712 msteudel User rank is Lance Corporal (50 - 100 Reputation Level)msteudel User rank is Lance Corporal (50 - 100 Reputation Level)msteudel User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 4 Days 11 h 4 m 59 sec
Reputation Power: 11
Quote:
Originally Posted by Jacques1
Sure, I do not doubt that. But what you also have to learn is to choose the right tool for the right job. Regexes are far overused in my opinion. People tend to think they could do any string manipulation if only the regex is complicated enough. So instead of looking for an appropriate parser, they fumble with regex hacks forever.

That's why I suggested using a different approach.


It was a great suggestion. And yeah, regex in this case is not a good way to go, but inferring that string manipulation is not useful information to know seems misleading, especially since the OP is obviously trying to just learn stuff. Especially since your opinion pulls a lot of weight on this board. ANyway I'm probably making mountains of molehills ....

Reply With Quote
  #12  
Old December 19th, 2012, 01:00 PM
artsir artsir is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 6 artsir User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 26 m 17 sec
Reputation Power: 0
Thanks for everyone's responses I didn't know about the html parsers.

Today is very busy, I have to modify these photoshop images for the website. I'm multi-talented .

But yea i can't wait to look into them. My goal is to setup it up so that i can do these updates as quickly as possible which would leave me free time to do my private studies. Getting paid to learn!

I want to get better at php and maybe make apps on the ipad, I played around with xcode. And just finished my second C++ class. I know apps are made nativily with objective C. I'm unsure where i'm going exactly. Can you make apps with C++ on the ipad? Anyway thats for another forum.

Thanks everyone. And i'm glad no one was rude and called me an idiot. You know how the internet can be.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPHP Development > Optimizing an HTML Page removing white space and carrage returns

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap