Page 1 of 2 12 Last
  • Jump to page:
    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    9
    Rep Power
    0

    Javascript to get some information from inside the HTML at an arbitrary ULR?


    So I have a file `test.html` on my desktop.

    Code:
    <!DOCTYPE html>
    <html>
    <head>
    	<script type="text/javascript"">
    
    function do_something(){
    	document.getElementById("the_p").innerHTML = "X";
    }
    	
    	</script>
    </head>
    
    <body>
    	
    	<input type="button" onclick="do_something()" />
    	<p id="the_p"><p>
    
    </body>
    </html>
    So, when I load test.html in my browser, and press the button, `X` appears on the page (obviously).

    Rather than putting `X` on the page, what I want it to do is get (for a random test example) the article count from the HTML source of ` http://en.wikipedia.org/wiki/Main_Page `, and put that instead.

    (The article count in the source of the page is in a div that looks like this:

    Code:
    <div id="articlecount" style="font-size:85%;"><a href="/wiki/Special:Statistics" title="Special:Statistics">4,140,344</a> articles in <a href="/wiki/English_language" title="English language">English</a></div>
    )

    So when I press the button, `4,140,344` (or whatever it was up to by that point) would show up instead.

    How do I do this? and/or What is the terminology I need to use to find documentation on the subject?
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    165
    Rep Power
    18
    Well im guessing your learning javascript so i'm going to give you what i believe to be the simplest solution without giving you the source code. There are more correct ways but i believe this to be the simplest.

    I would include the wiki page in your html using <iframe id="WikiPage" style="display:none" src="WIKIPAGE"></iframe> This will include the page inside your pages DOM structure allowing you to read and write with javascript without having to load the page into a variable using jscript

    Now look at where the count is located. It is inside of an anchor without an id, which is inside of a div with an id. So using getelementbyid we will select our iframe. Then furter select the div using getelementbyid. then finally select your anchor using getelementbytagname. Then your number will be inside of the innerHTML property. Now you can play around with that and see if you can get anywhere

    Originally Posted by Owen_R
    So I have a file `test.html` on my desktop.

    Code:
    <!DOCTYPE html>
    <html>
    <head>
    	<script type="text/javascript"">
    
    function do_something(){
    	document.getElementById("the_p").innerHTML = "X";
    }
    	
    	</script>
    </head>
    
    <body>
    	
    	<input type="button" onclick="do_something()" />
    	<p id="the_p"><p>
    
    </body>
    </html>
    So, when I load test.html in my browser, and press the button, `X` appears on the page (obviously).

    Rather than putting `X` on the page, what I want it to do is get (for a random test example) the article count from the HTML source of ` http://en.wikipedia.org/wiki/Main_Page `, and put that instead.

    (The article count in the source of the page is in a div that looks like this:

    Code:
    <div id="articlecount" style="font-size:85%;"><a href="/wiki/Special:Statistics" title="Special:Statistics">4,140,344</a> articles in <a href="/wiki/English_language" title="English language">English</a></div>
    )

    So when I press the button, `4,140,344` (or whatever it was up to by that point) would show up instead.

    How do I do this? and/or What is the terminology I need to use to find documentation on the subject?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    9
    Rep Power
    0
    So are you saying that this should work? :

    Code:
    <!DOCTYPE html>
    <html>
    <head>
    	<script type="text/javascript"">
    
    function do_something(){
    	var x = document.getElementById("wiki_page").getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
    	document.getElementById("the_p").innerHTML = articlecount;
    }
    
    	</script>
    </head>
    
    <body>
    
    	<iframe id="wiki_page" href="http://en.wikipedia.org/wiki/Main_Page"></iframe>
    	
    	<input type="button" onclick="do_something()" />
    	<p id="the_p"><p>
    
    	</body>
    </html>
    It doesn't...

    (Although if I copy the

    Code:
    <div id="articlecount" style="font-size:85%;"><a href="/wiki/Special:Statistics" title="Special:Statistics">4,140,344</a> articles in <a href="/wiki/English_language" title="English language">English</a></div>
    bit to the body of test.html, then

    Code:
    var x = document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
    gets the string "4,140,344" correctly.)
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    165
    Rep Power
    18
    Remember the <DIV> is not dirrectly a child of the <IFRAME> it is a child of the document within the IFRAME.

    so use getElementById["wiki_page"].contentWindow.document.getElementById........

    Originally Posted by Owen_R
    So are you saying that this should work? :

    Code:
    <!DOCTYPE html>
    <html>
    <head>
    	<script type="text/javascript"">
    
    function do_something(){
    	var x = document.getElementById("wiki_page").getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
    	document.getElementById("the_p").innerHTML = articlecount;
    }
    
    	</script>
    </head>
    
    <body>
    
    	<iframe id="wiki_page" href="http://en.wikipedia.org/wiki/Main_Page"></iframe>
    	
    	<input type="button" onclick="do_something()" />
    	<p id="the_p"><p>
    
    	</body>
    </html>
    It doesn't...

    (Although if I copy the

    Code:
    <div id="articlecount" style="font-size:85%;"><a href="/wiki/Special:Statistics" title="Special:Statistics">4,140,344</a> articles in <a href="/wiki/English_language" title="English language">English</a></div>
    bit to the body of test.html, then

    Code:
    var x = document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
    gets the string "4,140,344" correctly.)
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    9
    Rep Power
    0
    So:

    Code:
    document.getElementById["wiki_page"].contentWindow.document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML
    should get "4,140,344" (or whatever it's up to)?

    (I'm confused by the sudden use of square brackets for the `getElementbyId()` method? Was that just a typo? I tried it both ways...)

    This doesn't work either:

    Code:
    <!DOCTYPE html>
    <html>
    <head>
    	<script type="text/javascript"">
    
    function do_something(){
    	var x = document.getElementById("wiki_page").contentWindow.document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
    	document.getElementById("the_p").innerHTML = x;
    }
    	</script>
    
    </head>
    
    <body>
    
    	<iframe id="wiki_page" style="display:none" href="http://en.wikipedia.org/wiki/Main_Page"></iframe>
    	
    	<input type="button" onclick="do_something()" />
    	<p id="the_p"><p>
    	
    	</body>
    </html>

    And if I get rid of `style="display:none"`, the frame that appears is empty.

    Should it be?

    Is there something wrong with:

    Code:
    <iframe id="wiki_page"  href="http://en.wikipedia.org/wiki/Main_Page"></iframe>
    ?
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    165
    Rep Power
    18
    oops square brackets are typo and your ifram should be src not href attribute
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    9
    Rep Power
    0
    Yeah, I was pretty much certain the square brackets were a typo. ^^

    Actually, changing the `href` to a `src` was one of the tweaks I already tried.

    This doesn't work either:

    Code:
    <!DOCTYPE html>
    <html>
    <head>
    	<script type="text/javascript"">
    
    function do_something(){
    	var x = document.getElementById("wiki_page").contentWindow.document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
    	document.getElementById("the_p").innerHTML = x;
    }
    	</script>
    
    </head>
    
    <body>
    
    	<iframe id="wiki_page" src="http://en.wikipedia.org/wiki/Main_Page"></iframe>
    	
    	<input type="button" onclick="do_something()" />
    	<p id="the_p"><p>
    	
    	</body>
    </html>

    Can you yourself get it to work using that method?

    I mean, if you've been using the "more correct ways" you mentioned for a sufficiently long time, it's possible that this "simpler way" no longer works and/or you forgot some vital component yourself, maybe... ?
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    165
    Rep Power
    18
    Im not sure because the script works if it is pointed to another file say test2.html where this file contains the <div> and <a> it works

    Originally Posted by Owen_R
    Yeah, I was pretty much certain the square brackets were a typo. ^^

    Actually, changing the `href` to a `src` was one of the tweaks I already tried.

    This doesn't work either:

    Code:
    <!DOCTYPE html>
    <html>
    <head>
    	<script type="text/javascript"">
    
    function do_something(){
    	var x = document.getElementById("wiki_page").contentWindow.document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
    	document.getElementById("the_p").innerHTML = x;
    }
    	</script>
    
    </head>
    
    <body>
    
    	<iframe id="wiki_page" src="http://en.wikipedia.org/wiki/Main_Page"></iframe>
    	
    	<input type="button" onclick="do_something()" />
    	<p id="the_p"><p>
    	
    	</body>
    </html>

    Can you yourself get it to work using that method?

    I mean, if you've been using the "more correct ways" you mentioned for a sufficiently long time, it's possible that this "simpler way" no longer works and/or you forgot some vital component yourself, maybe... ?
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    165
    Rep Power
    18
    And the script works being executed dirrectly on wikipedias site by entering javascript: alert(document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML); into the address bar

    Originally Posted by portcitysoftwar
    Im not sure because the script works if it is pointed to another file say test2.html where this file contains the <div> and <a> it works

    Comments on this post

    • Jacques1 disagrees : The script *only* works by manually executing it on the Wikipedia page. It doesn't work like you and the OP expect it to work. See "same origin policy".
  18. #10
  19. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    9
    Rep Power
    0
    Yeah, I just tried running
    Code:
    javascript: alert(document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML);
    on en.wikipedia.org/wiki/Main_Page in the Scratchpad (using Firefox 17, so address bar trick doesn't work), and it certainly gets something like `4,140,409`...

    But other Scratchpad tests, on test.html :

    Code:
    var x = document.getElementById("wiki_page").contentWindow.document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
    alert(x);
    /*
    Exception: Permission denied to access property 'document'
    @Scratchpad:11
    */
    Code:
    var list = "";
    for (var i in document.getElementById("wiki_page").contentWindow){
        list += ", " + i;
    }
    alert(list);
    
    /*
    Exception: Not allowed to enumerate cross origin objects
    @Scratchpad:15
    */

    So... you don't know how to do it either, or do you have a different method?
  20. #11
  21. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,317
    Rep Power
    7170
    You can't retrieve the HTML for an arbitrary URL using JavaScript; you can only retrieve HTML for a URL that resides on the same domain as the JavaScript code being run. This also applies to iframes. The iframe can point to a remote domain, but you won't be able to access its DOM using JavaScript.

    The only way you can do this is using a server side script, hosting on the same domain as the JavaScript, to proxy the request to the remote server.
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around
  22. #12
  23. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    9
    Rep Power
    0
    Originally Posted by E-Oreo
    The only way you can do this is using a server side script, hosting on the same domain as the JavaScript, to proxy the request to the remote server.
    Thank you, but... I don't know how to use that information.

    What I'm trying to do with this exercise is just grab information from the HTML of an arbitrary website (like the article count from the Wikipedia main page), and have it show up in my test page when I press the button.

    I haven't been able to google any information that looks like it explains how to do that (and why it must be done that way), using just the key words you used there...
  24. #13
  25. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    You cannot access Wikipedia from your own page using JavaScript. You just cannot. Anybody who tells you differently has no idea what he/she is talking about.

    This restriction is called same origin policy, and it's the reason why we can visit websites without having to worry that they make transactions with our PayPal account, buy things with our Amazon account and whatnot.

    Just think about this for a second: The counter on the Wikipedia page might as well be account information on your online banking page, so there's a good reason not to allow JavaScript to access other websites and then fetch data.

    What you can do is what E-Oreo already said: You can have your own server make a request to Wikipedia and then give back the result. This isn't all that trivial, however. You need to use a script on your server (written in PHP, for example) and then call it via AJAX. If you want concrete help, we need to know your server setup (do you have PHP/Ruby/Python/Perl/... installed?) and your current knowledge regarding this topic.
  26. #14
  27. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    165
    Rep Power
    18
    sorry about that. I had no idea of such restrictions as i havnt attempted to try such a thing before. just new that within my own domain i have been able to use javascript to access other pages within an iframe.
  28. #15
  29. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    9
    Rep Power
    0
    Originally Posted by portcitysoftwar
    sorry about that. I had no idea of such restrictions as i havnt attempted to try such a thing before. just new that within my own domain i have been able to use javascript to access other pages within an iframe.
    That's okay. I guess the moral is that we should always test our own advice before we give it. xD Give a concrete example of code that you *know* works. Concrete is easier to understand anyway...

    E-Oreo and Jacques1, thank you. It sounds like this is just a much more advanced subject than it first appears.

    This is pretty much the extent of my current knowledge (what I can easily *do*, that is), so I guess I just need to study more before I can tackle this?

    But I could eventually achieve the effect with, say, Python? Meaning "Django" or...? (And if I used Ruby, that would mean Rails?) What about jQuery? (I don't know anything about Perl other than fundamental regexardry.)

    All that being said...

    Originally Posted by Jacques1
    Just think about this for a second: The counter on the Wikipedia page might as well be account information on your online banking page, so there's a good reason not to allow JavaScript to access other websites and then fetch data.
    No, thinking about that with the information I'm limited to, there is no way to logically derive that conclusion.

    I can browse the web to any public page and look at the source.

    Why shouldn't my javascript be able to?

    If I try to go to my online banking page, I must know my identification information and password for an account in order to view it.

    Why wouldn't any javascript have the same limitation?

    That's the obvious conclusion (albeit meta-obviously wrong) from the information I have.

    So...

    That's a "fake explanation" you gave me, isn't it? ;P

    But do you think you *could* communicate the information necessary to logically derive that "there would be a security issue" conclusion to someone at my current level?
    Last edited by Owen_R; January 14th, 2013 at 04:10 PM. Reason: typo
Page 1 of 2 12 Last
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo