Javascript to get some information from inside the HTML at an arbitrary ULR?
Discuss Javascript to get some information from inside the HTML at an arbitrary ULR? in the Beginner Programming forum on Dev Shed. Javascript to get some information from inside the HTML at an arbitrary ULR? Beginner Programming forum discussing problems and solutions for just about any issue. Experienced programmers offer their help to those just starting out.
So, when I load test.html in my browser, and press the button, `X` appears on the page (obviously).
Rather than putting `X` on the page, what I want it to do is get (for a random test example) the article count from the HTML source of ` http://en.wikipedia.org/wiki/Main_Page `, and put that instead.
(The article count in the source of the page is in a div that looks like this:
Posts: 163
Time spent in forums: 1 Day 13 h 18 m 54 sec
Reputation Power: 17
Well im guessing your learning javascript so i'm going to give you what i believe to be the simplest solution without giving you the source code. There are more correct ways but i believe this to be the simplest.
I would include the wiki page in your html using <iframe id="WikiPage" style="display:none" src="WIKIPAGE"></iframe> This will include the page inside your pages DOM structure allowing you to read and write with javascript without having to load the page into a variable using jscript
Now look at where the count is located. It is inside of an anchor without an id, which is inside of a div with an id. So using getelementbyid we will select our iframe. Then furter select the div using getelementbyid. then finally select your anchor using getelementbytagname. Then your number will be inside of the innerHTML property. Now you can play around with that and see if you can get anywhere
So, when I load test.html in my browser, and press the button, `X` appears on the page (obviously).
Rather than putting `X` on the page, what I want it to do is get (for a random test example) the article count from the HTML source of ` http://en.wikipedia.org/wiki/Main_Page `, and put that instead.
(The article count in the source of the page is in a div that looks like this:
Posts: 9
Time spent in forums: 2 h 4 m 56 sec
Reputation Power: 0
Yeah, I was pretty much certain the square brackets were a typo. ^^
Actually, changing the `href` to a `src` was one of the tweaks I already tried.
This doesn't work either:
Code:
<!DOCTYPE html>
<html>
<head>
<script type="text/javascript"">
function do_something(){
var x = document.getElementById("wiki_page").contentWindow.document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
document.getElementById("the_p").innerHTML = x;
}
</script>
</head>
<body>
<iframe id="wiki_page" src="http://en.wikipedia.org/wiki/Main_Page"></iframe>
<input type="button" onclick="do_something()" />
<p id="the_p"><p>
</body>
</html>
Can you yourself get it to work using that method?
I mean, if you've been using the "more correct ways" you mentioned for a sufficiently long time, it's possible that this "simpler way" no longer works and/or you forgot some vital component yourself, maybe... ?
Posts: 163
Time spent in forums: 1 Day 13 h 18 m 54 sec
Reputation Power: 17
Im not sure because the script works if it is pointed to another file say test2.html where this file contains the <div> and <a> it works
Quote:
Originally Posted by Owen_R
Yeah, I was pretty much certain the square brackets were a typo. ^^
Actually, changing the `href` to a `src` was one of the tweaks I already tried.
This doesn't work either:
Code:
<!DOCTYPE html>
<html>
<head>
<script type="text/javascript"">
function do_something(){
var x = document.getElementById("wiki_page").contentWindow.document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
document.getElementById("the_p").innerHTML = x;
}
</script>
</head>
<body>
<iframe id="wiki_page" src="http://en.wikipedia.org/wiki/Main_Page"></iframe>
<input type="button" onclick="do_something()" />
<p id="the_p"><p>
</body>
</html>
Can you yourself get it to work using that method?
I mean, if you've been using the "more correct ways" you mentioned for a sufficiently long time, it's possible that this "simpler way" no longer works and/or you forgot some vital component yourself, maybe... ?
Posts: 163
Time spent in forums: 1 Day 13 h 18 m 54 sec
Reputation Power: 17
And the script works being executed dirrectly on wikipedias site by entering javascript: alert(document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML); into the address bar
Quote:
Originally Posted by portcitysoftwar
Im not sure because the script works if it is pointed to another file say test2.html where this file contains the <div> and <a> it works
on en.wikipedia.org/wiki/Main_Page in the Scratchpad (using Firefox 17, so address bar trick doesn't work), and it certainly gets something like `4,140,409`...
But other Scratchpad tests, on test.html :
Code:
var x = document.getElementById("wiki_page").contentWindow.document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
alert(x);
/*
Exception: Permission denied to access property 'document'
@Scratchpad:11
*/
Code:
var list = "";
for (var i in document.getElementById("wiki_page").contentWindow){
list += ", " + i;
}
alert(list);
/*
Exception: Not allowed to enumerate cross origin objects
@Scratchpad:15
*/
So... you don't know how to do it either, or do you have a different method?
Posts: 7,947
Time spent in forums: 2 Months 10 h 37 m 24 sec
Reputation Power: 7053
You can't retrieve the HTML for an arbitrary URL using JavaScript; you can only retrieve HTML for a URL that resides on the same domain as the JavaScript code being run. This also applies to iframes. The iframe can point to a remote domain, but you won't be able to access its DOM using JavaScript.
The only way you can do this is using a server side script, hosting on the same domain as the JavaScript, to proxy the request to the remote server.
Posts: 9
Time spent in forums: 2 h 4 m 56 sec
Reputation Power: 0
Quote:
Originally Posted by E-Oreo
The only way you can do this is using a server side script, hosting on the same domain as the JavaScript, to proxy the request to the remote server.
Thank you, but... I don't know how to use that information.
What I'm trying to do with this exercise is just grab information from the HTML of an arbitrary website (like the article count from the Wikipedia main page), and have it show up in my test page when I press the button.
I haven't been able to google any information that looks like it explains how to do that (and why it must be done that way), using just the key words you used there...
Posts: 1,881
Time spent in forums: 1 Month 2 Weeks 2 Days 9 h 28 m 59 sec
Reputation Power: 813
You cannot access Wikipedia from your own page using JavaScript. You just cannot. Anybody who tells you differently has no idea what he/she is talking about.
This restriction is called same origin policy, and it's the reason why we can visit websites without having to worry that they make transactions with our PayPal account, buy things with our Amazon account and whatnot.
Just think about this for a second: The counter on the Wikipedia page might as well be account information on your online banking page, so there's a good reason not to allow JavaScript to access other websites and then fetch data.
What you can do is what E-Oreo already said: You can have your own server make a request to Wikipedia and then give back the result. This isn't all that trivial, however. You need to use a script on your server (written in PHP, for example) and then call it via AJAX. If you want concrete help, we need to know your server setup (do you have PHP/Ruby/Python/Perl/... installed?) and your current knowledge regarding this topic.
Posts: 163
Time spent in forums: 1 Day 13 h 18 m 54 sec
Reputation Power: 17
sorry about that. I had no idea of such restrictions as i havnt attempted to try such a thing before. just new that within my own domain i have been able to use javascript to access other pages within an iframe.
Posts: 9
Time spent in forums: 2 h 4 m 56 sec
Reputation Power: 0
Quote:
Originally Posted by portcitysoftwar
sorry about that. I had no idea of such restrictions as i havnt attempted to try such a thing before. just new that within my own domain i have been able to use javascript to access other pages within an iframe.
That's okay. I guess the moral is that we should always test our own advice before we give it. xD Give a concrete example of code that you *know* works. Concrete is easier to understand anyway...
E-Oreo and Jacques1, thank you. It sounds like this is just a much more advanced subject than it first appears.
But I could eventually achieve the effect with, say, Python? Meaning "Django" or...? (And if I used Ruby, that would mean Rails?) What about jQuery? (I don't know anything about Perl other than fundamental regexardry.)
All that being said...
Quote:
Originally Posted by Jacques1
Just think about this for a second: The counter on the Wikipedia page might as well be account information on your online banking page, so there's a good reason not to allow JavaScript to access other websites and then fetch data.
No, thinking about that with the information I'm limited to, there is no way to logically derive that conclusion.
I can browse the web to any public page and look at the source.
Why shouldn't my javascript be able to?
If I try to go to my online banking page, I must know my identification information and password for an account in order to view it.
Why wouldn't any javascript have the same limitation?
That's the obvious conclusion (albeit meta-obviously wrong) from the information I have.
So...
That's a "fake explanation" you gave me, isn't it? ;P
But do you think you *could* communicate the information necessary to logically derive that "there would be a security issue" conclusion to someone at my current level?
Last edited by Owen_R : January 14th, 2013 at 04:10 PM.
Reason: typo