The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages
> PHP Development
|
Optimizing an HTML Page removing white space and carrage returns
Discuss Optimizing an HTML Page removing white space and carrage returns in the PHP Development forum on Dev Shed. Optimizing an HTML Page removing white space and carrage returns PHP Development forum discussing coding practices, tips on PHP, and other PHP-related topics. PHP is an open source scripting language that has taken the web development industry by storm.
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

December 18th, 2012, 09:16 AM
|
|
Registered User
|
|
Join Date: Dec 2012
Posts: 6
Time spent in forums: 1 h 26 m 17 sec
Reputation Power: 0
|
|
|
Optimizing an HTML Page removing white space and carrage returns
I want to open and read an html document and output a copy of it adding a D to the end of it.
//this i can probably figure out
But i'm curious how i would remove all the white space and carraige returns. But i wouldn't want to remove them in a paragraph tags.
My idea would be find ">" then if the next char is "<" then remove all characters between.
But how would I absorb an entire page into a string?
Thanks
|

December 18th, 2012, 09:26 AM
|
|
|
|
__________________
There are 10 kinds of people in the world. Those that understand binary and those that don't.
|

December 18th, 2012, 12:14 PM
|
 |
Contributing User
|
|
Join Date: Sep 2002
Location: Seattle, U.S.A.
Posts: 712
 
Time spent in forums: 4 Days 11 h 4 m 59 sec
Reputation Power: 11
|
|
|

December 18th, 2012, 01:07 PM
|
 |
Lost in code
|
|
|
|
|
If you are doing this for performance reasons it is completely a waste of time. The performance increase you'll see from this will be completely negligible. You're more likely to see a decrease in performance due to the extra CPU time required to remove the spaces.
|

December 18th, 2012, 02:29 PM
|
|
Registered User
|
|
Join Date: Dec 2012
Posts: 6
Time spent in forums: 1 h 26 m 17 sec
Reputation Power: 0
|
|
|
Thanks guys i'm going to look into this and let you know how it goes.
As for the reason i'm doing is
1. Education, i just want to get better with my php, i took a class a year ago. I can do forms and sql calls. I just want to explore more with writing files.
2. As for performance, the php wont be removing the white spaces in real time. I have this brain dead web job where i update html sale pages. Their server doesn't support php, i will try to convince them later. The updates are very tedious, I have written a few php code to streamline this process.So I have the php output html files that i upload to the server.
|

December 18th, 2012, 09:48 PM
|
|
Contributing User
|
|
Join Date: Aug 2011
Location: Sydney Australia
|
|
Quote: | Originally Posted by artsir I have written a few php code to streamline this process.So I have the php output html files that i upload to the server. |
You are re-inventing the wheel.
HTML-Tidy already does this.
Chami HTML-Kit has HTML-Tidy integrated.
http://www.htmlkit.com/
Last edited by BarryG : December 18th, 2012 at 09:56 PM.
|

December 19th, 2012, 03:56 AM
|
 |
pollyanna
|
|
Join Date: Jul 2012
Location: Germany
|
|
|
Hi,
I understand you're doing this partly for learning, but manually fumbling with HTML and regexes is almost always a bad idea. Instead, use a HTML parser to fetch the elements and then output them in the way you like. That's a much more intelligent and clean approach. You'll also gain useful knowledge from this and won't just be playing around with strings.
Regarding the performance: Don't try to make you own home-made optimizations (unless you really know what you're doing). The effect will be minimal compared to the gigantic effort. Use a proven solution like in this case (gzip) compression.
|

December 19th, 2012, 12:25 PM
|
 |
Contributing User
|
|
Join Date: Sep 2002
Location: Seattle, U.S.A.
Posts: 712
 
Time spent in forums: 4 Days 11 h 4 m 59 sec
Reputation Power: 11
|
|
Quote: | Originally Posted by Jacques1
You'll also gain useful knowledge from this and won't just be playing around with strings.
|
I disagree with this. Understanding how to manipulate strings with REGEX is a VERY useful skill.
Quote: | Originally Posted by Jacques1 Instead, use a HTML parser to fetch the elements and then output them in the way you like. That's a much more intelligent and clean approach. |
This is a great idea.
Trying to gain optimization out of stripping white space from HTML, probably the bottom of my list, actually probably not on my list.
Check out this page for a good list of optimizations to try:
http://developer.yahoo.com/yslow/
Also check out this project, it has a great process for building web pages, with minification as part of the process:
http://html5boilerplate.com/
But lastly, good on you for taking a boring brain dead job and doing something to keep it interesting and keep yourself learning new skills, even if it's, "Ok stripping whitespace from html is not a good idea".
Last edited by msteudel : December 19th, 2012 at 12:28 PM.
|

December 19th, 2012, 12:42 PM
|
 |
pollyanna
|
|
Join Date: Jul 2012
Location: Germany
|
|
Quote: | Originally Posted by msteudel Understanding how to manipulate strings with REGEX is a VERY useful skill. |
Sure, I do not doubt that. But what you also have to learn is to choose the right tool for the right job. Regexes are far overused in my opinion. People tend to think they could do any string manipulation if only the regex is complicated enough. So instead of looking for an appropriate parser, they fumble with regex hacks forever.
That's why I suggested using a different approach.
|

December 19th, 2012, 12:48 PM
|
 |
Still alive
|
|
Join Date: Mar 2007
Location: Washington, USA
|
|
Quote: | Originally Posted by Jacques1 But what you also have to learn is to choose the right tool for the right job. Regexes are far overused in my opinion. |
"If the question is HTML then regex is not the answer." With very few exceptions.
|

December 19th, 2012, 12:59 PM
|
 |
Contributing User
|
|
Join Date: Sep 2002
Location: Seattle, U.S.A.
Posts: 712
 
Time spent in forums: 4 Days 11 h 4 m 59 sec
Reputation Power: 11
|
|
Quote: | Originally Posted by Jacques1 Sure, I do not doubt that. But what you also have to learn is to choose the right tool for the right job. Regexes are far overused in my opinion. People tend to think they could do any string manipulation if only the regex is complicated enough. So instead of looking for an appropriate parser, they fumble with regex hacks forever.
That's why I suggested using a different approach. |
It was a great suggestion. And yeah, regex in this case is not a good way to go, but inferring that string manipulation is not useful information to know seems misleading, especially since the OP is obviously trying to just learn stuff. Especially since your opinion pulls a lot of weight on this board. ANyway I'm probably making mountains of molehills ....
|

December 19th, 2012, 01:00 PM
|
|
Registered User
|
|
Join Date: Dec 2012
Posts: 6
Time spent in forums: 1 h 26 m 17 sec
Reputation Power: 0
|
|
Thanks for everyone's responses I didn't know about the html parsers.
Today is very busy, I have to modify these photoshop images for the website. I'm multi-talented  .
But yea i can't wait to look into them. My goal is to setup it up so that i can do these updates as quickly as possible which would leave me free time to do my private studies. Getting paid to learn!
I want to get better at php and maybe make apps on the ipad, I played around with xcode. And just finished my second C++ class. I know apps are made nativily with objective C. I'm unsure where i'm going exactly. Can you make apps with C++ on the ipad? Anyway thats for another forum.
Thanks everyone. And i'm glad no one was rude and called me an idiot. You know how the internet can be.
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|