#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    6
    Rep Power
    0

    Optimizing an HTML Page removing white space and carrage returns


    I want to open and read an html document and output a copy of it adding a D to the end of it.

    //this i can probably figure out

    But i'm curious how i would remove all the white space and carraige returns. But i wouldn't want to remove them in a paragraph tags.

    My idea would be find ">" then if the next char is "<" then remove all characters between.

    But how would I absorb an entire page into a string?

    Thanks
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,337
    Rep Power
    594
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  4. #3
  5. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Sep 2002
    Location
    Seattle, U.S.A.
    Posts
    712
    Rep Power
    12
    This StackOverflow post might help you with stripping whitespace and new lines:

    http://stackoverflow.com/questions/9...-and-new-lines
  6. #4
  7. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,317
    Rep Power
    7170
    If you are doing this for performance reasons it is completely a waste of time. The performance increase you'll see from this will be completely negligible. You're more likely to see a decrease in performance due to the extra CPU time required to remove the spaces.
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    6
    Rep Power
    0
    Thanks guys i'm going to look into this and let you know how it goes.

    As for the reason i'm doing is

    1. Education, i just want to get better with my php, i took a class a year ago. I can do forms and sql calls. I just want to explore more with writing files.

    2. As for performance, the php wont be removing the white spaces in real time. I have this brain dead web job where i update html sale pages. Their server doesn't support php, i will try to convince them later. The updates are very tedious, I have written a few php code to streamline this process.So I have the php output html files that i upload to the server.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2011
    Location
    Sydney Australia
    Posts
    182
    Rep Power
    83
    Originally Posted by artsir
    I have written a few php code to streamline this process.So I have the php output html files that i upload to the server.
    You are re-inventing the wheel.

    HTML-Tidy already does this.

    Chami HTML-Kit has HTML-Tidy integrated.
    http://www.htmlkit.com/
    Last edited by BarryG; December 18th, 2012 at 09:56 PM.
  12. #7
  13. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,957
    Rep Power
    1046
    Hi,

    I understand you're doing this partly for learning, but manually fumbling with HTML and regexes is almost always a bad idea. Instead, use a HTML parser to fetch the elements and then output them in the way you like. That's a much more intelligent and clean approach. You'll also gain useful knowledge from this and won't just be playing around with strings.

    Regarding the performance: Don't try to make you own home-made optimizations (unless you really know what you're doing). The effect will be minimal compared to the gigantic effort. Use a proven solution like in this case (gzip) compression.
  14. #8
  15. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Sep 2002
    Location
    Seattle, U.S.A.
    Posts
    712
    Rep Power
    12
    Originally Posted by Jacques1
    You'll also gain useful knowledge from this and won't just be playing around with strings.
    I disagree with this. Understanding how to manipulate strings with REGEX is a VERY useful skill.

    Originally Posted by Jacques1
    Instead, use a HTML parser to fetch the elements and then output them in the way you like. That's a much more intelligent and clean approach.
    This is a great idea.

    Trying to gain optimization out of stripping white space from HTML, probably the bottom of my list, actually probably not on my list.

    Check out this page for a good list of optimizations to try:
    http://developer.yahoo.com/yslow/

    Also check out this project, it has a great process for building web pages, with minification as part of the process:
    http://html5boilerplate.com/

    But lastly, good on you for taking a boring brain dead job and doing something to keep it interesting and keep yourself learning new skills, even if it's, "Ok stripping whitespace from html is not a good idea".
    Last edited by msteudel; December 19th, 2012 at 12:28 PM.
  16. #9
  17. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,957
    Rep Power
    1046
    Originally Posted by msteudel
    Understanding how to manipulate strings with REGEX is a VERY useful skill.
    Sure, I do not doubt that. But what you also have to learn is to choose the right tool for the right job. Regexes are far overused in my opinion. People tend to think they could do any string manipulation if only the regex is complicated enough. So instead of looking for an appropriate parser, they fumble with regex hacks forever.

    That's why I suggested using a different approach.
  18. #10
  19. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,965
    Rep Power
    9397
    Originally Posted by Jacques1
    But what you also have to learn is to choose the right tool for the right job. Regexes are far overused in my opinion.
    "If the question is HTML then regex is not the answer." With very few exceptions.
  20. #11
  21. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Sep 2002
    Location
    Seattle, U.S.A.
    Posts
    712
    Rep Power
    12
    Originally Posted by Jacques1
    Sure, I do not doubt that. But what you also have to learn is to choose the right tool for the right job. Regexes are far overused in my opinion. People tend to think they could do any string manipulation if only the regex is complicated enough. So instead of looking for an appropriate parser, they fumble with regex hacks forever.

    That's why I suggested using a different approach.
    It was a great suggestion. And yeah, regex in this case is not a good way to go, but inferring that string manipulation is not useful information to know seems misleading, especially since the OP is obviously trying to just learn stuff. Especially since your opinion pulls a lot of weight on this board. ANyway I'm probably making mountains of molehills ....
  22. #12
  23. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    6
    Rep Power
    0
    Thanks for everyone's responses I didn't know about the html parsers.

    Today is very busy, I have to modify these photoshop images for the website. I'm multi-talented .

    But yea i can't wait to look into them. My goal is to setup it up so that i can do these updates as quickly as possible which would leave me free time to do my private studies. Getting paid to learn!

    I want to get better at php and maybe make apps on the ipad, I played around with xcode. And just finished my second C++ class. I know apps are made nativily with objective C. I'm unsure where i'm going exactly. Can you make apps with C++ on the ipad? Anyway thats for another forum.

    Thanks everyone. And i'm glad no one was rude and called me an idiot. You know how the internet can be.

IMN logo majestic logo threadwatch logo seochat tools logo