Beginner Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsOtherBeginner Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old January 9th, 2013, 09:58 PM
Owen_R Owen_R is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 9 Owen_R User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 2 h 4 m 56 sec
Reputation Power: 0
Javascript to get some information from inside the HTML at an arbitrary ULR?

So I have a file `test.html` on my desktop.

Code:
<!DOCTYPE html>
<html>
<head>
	<script type="text/javascript"">

function do_something(){
	document.getElementById("the_p").innerHTML = "X";
}
	
	</script>
</head>

<body>
	
	<input type="button" onclick="do_something()" />
	<p id="the_p"><p>

</body>
</html>


So, when I load test.html in my browser, and press the button, `X` appears on the page (obviously).

Rather than putting `X` on the page, what I want it to do is get (for a random test example) the article count from the HTML source of ` http://en.wikipedia.org/wiki/Main_Page `, and put that instead.

(The article count in the source of the page is in a div that looks like this:

Code:
<div id="articlecount" style="font-size:85%;"><a href="/wiki/Special:Statistics" title="Special:Statistics">4,140,344</a> articles in <a href="/wiki/English_language" title="English language">English</a></div>

)

So when I press the button, `4,140,344` (or whatever it was up to by that point) would show up instead.

How do I do this? and/or What is the terminology I need to use to find documentation on the subject?

Reply With Quote
  #2  
Old January 9th, 2013, 10:11 PM
portcitysoftwar portcitysoftwar is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 163 portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Day 13 h 18 m 54 sec
Reputation Power: 17
Well im guessing your learning javascript so i'm going to give you what i believe to be the simplest solution without giving you the source code. There are more correct ways but i believe this to be the simplest.

I would include the wiki page in your html using <iframe id="WikiPage" style="display:none" src="WIKIPAGE"></iframe> This will include the page inside your pages DOM structure allowing you to read and write with javascript without having to load the page into a variable using jscript

Now look at where the count is located. It is inside of an anchor without an id, which is inside of a div with an id. So using getelementbyid we will select our iframe. Then furter select the div using getelementbyid. then finally select your anchor using getelementbytagname. Then your number will be inside of the innerHTML property. Now you can play around with that and see if you can get anywhere

Quote:
Originally Posted by Owen_R
So I have a file `test.html` on my desktop.

Code:
<!DOCTYPE html>
<html>
<head>
	<script type="text/javascript"">

function do_something(){
	document.getElementById("the_p").innerHTML = "X";
}
	
	</script>
</head>

<body>
	
	<input type="button" onclick="do_something()" />
	<p id="the_p"><p>

</body>
</html>


So, when I load test.html in my browser, and press the button, `X` appears on the page (obviously).

Rather than putting `X` on the page, what I want it to do is get (for a random test example) the article count from the HTML source of ` http://en.wikipedia.org/wiki/Main_Page `, and put that instead.

(The article count in the source of the page is in a div that looks like this:

Code:
<div id="articlecount" style="font-size:85%;"><a href="/wiki/Special:Statistics" title="Special:Statistics">4,140,344</a> articles in <a href="/wiki/English_language" title="English language">English</a></div>

)

So when I press the button, `4,140,344` (or whatever it was up to by that point) would show up instead.

How do I do this? and/or What is the terminology I need to use to find documentation on the subject?

Reply With Quote
  #3  
Old January 9th, 2013, 10:49 PM
Owen_R Owen_R is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 9 Owen_R User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 2 h 4 m 56 sec
Reputation Power: 0
So are you saying that this should work? :

Code:
<!DOCTYPE html>
<html>
<head>
	<script type="text/javascript"">

function do_something(){
	var x = document.getElementById("wiki_page").getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
	document.getElementById("the_p").innerHTML = articlecount;
}

	</script>
</head>

<body>

	<iframe id="wiki_page" href="http://en.wikipedia.org/wiki/Main_Page"></iframe>
	
	<input type="button" onclick="do_something()" />
	<p id="the_p"><p>

	</body>
</html>


It doesn't...

(Although if I copy the

Code:
<div id="articlecount" style="font-size:85%;"><a href="/wiki/Special:Statistics" title="Special:Statistics">4,140,344</a> articles in <a href="/wiki/English_language" title="English language">English</a></div>


bit to the body of test.html, then

Code:
var x = document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;


gets the string "4,140,344" correctly.)

Reply With Quote
  #4  
Old January 9th, 2013, 10:52 PM
portcitysoftwar portcitysoftwar is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 163 portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Day 13 h 18 m 54 sec
Reputation Power: 17
Remember the <DIV> is not dirrectly a child of the <IFRAME> it is a child of the document within the IFRAME.

so use getElementById["wiki_page"].contentWindow.document.getElementById........

Quote:
Originally Posted by Owen_R
So are you saying that this should work? :

Code:
<!DOCTYPE html>
<html>
<head>
	<script type="text/javascript"">

function do_something(){
	var x = document.getElementById("wiki_page").getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
	document.getElementById("the_p").innerHTML = articlecount;
}

	</script>
</head>

<body>

	<iframe id="wiki_page" href="http://en.wikipedia.org/wiki/Main_Page"></iframe>
	
	<input type="button" onclick="do_something()" />
	<p id="the_p"><p>

	</body>
</html>


It doesn't...

(Although if I copy the

Code:
<div id="articlecount" style="font-size:85%;"><a href="/wiki/Special:Statistics" title="Special:Statistics">4,140,344</a> articles in <a href="/wiki/English_language" title="English language">English</a></div>


bit to the body of test.html, then

Code:
var x = document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;


gets the string "4,140,344" correctly.)

Reply With Quote
  #5  
Old January 9th, 2013, 11:06 PM
Owen_R Owen_R is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 9 Owen_R User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 2 h 4 m 56 sec
Reputation Power: 0
So:

Code:
document.getElementById["wiki_page"].contentWindow.document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML


should get "4,140,344" (or whatever it's up to)?

(I'm confused by the sudden use of square brackets for the `getElementbyId()` method? Was that just a typo? I tried it both ways...)

This doesn't work either:

Code:
<!DOCTYPE html>
<html>
<head>
	<script type="text/javascript"">

function do_something(){
	var x = document.getElementById("wiki_page").contentWindow.document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
	document.getElementById("the_p").innerHTML = x;
}
	</script>

</head>

<body>

	<iframe id="wiki_page" style="display:none" href="http://en.wikipedia.org/wiki/Main_Page"></iframe>
	
	<input type="button" onclick="do_something()" />
	<p id="the_p"><p>
	
	</body>
</html>



And if I get rid of `style="display:none"`, the frame that appears is empty.

Should it be?

Is there something wrong with:

Code:
<iframe id="wiki_page"  href="http://en.wikipedia.org/wiki/Main_Page"></iframe>


?

Reply With Quote
  #6  
Old January 9th, 2013, 11:09 PM
portcitysoftwar portcitysoftwar is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 163 portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Day 13 h 18 m 54 sec
Reputation Power: 17
oops square brackets are typo and your ifram should be src not href attribute

Reply With Quote
  #7  
Old January 9th, 2013, 11:15 PM
Owen_R Owen_R is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 9 Owen_R User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 2 h 4 m 56 sec
Reputation Power: 0
Yeah, I was pretty much certain the square brackets were a typo. ^^

Actually, changing the `href` to a `src` was one of the tweaks I already tried.

This doesn't work either:

Code:
<!DOCTYPE html>
<html>
<head>
	<script type="text/javascript"">

function do_something(){
	var x = document.getElementById("wiki_page").contentWindow.document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
	document.getElementById("the_p").innerHTML = x;
}
	</script>

</head>

<body>

	<iframe id="wiki_page" src="http://en.wikipedia.org/wiki/Main_Page"></iframe>
	
	<input type="button" onclick="do_something()" />
	<p id="the_p"><p>
	
	</body>
</html>



Can you yourself get it to work using that method?

I mean, if you've been using the "more correct ways" you mentioned for a sufficiently long time, it's possible that this "simpler way" no longer works and/or you forgot some vital component yourself, maybe... ?

Reply With Quote
  #8  
Old January 9th, 2013, 11:44 PM
portcitysoftwar portcitysoftwar is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 163 portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Day 13 h 18 m 54 sec
Reputation Power: 17
Im not sure because the script works if it is pointed to another file say test2.html where this file contains the <div> and <a> it works

Quote:
Originally Posted by Owen_R
Yeah, I was pretty much certain the square brackets were a typo. ^^

Actually, changing the `href` to a `src` was one of the tweaks I already tried.

This doesn't work either:

Code:
<!DOCTYPE html>
<html>
<head>
	<script type="text/javascript"">

function do_something(){
	var x = document.getElementById("wiki_page").contentWindow.document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
	document.getElementById("the_p").innerHTML = x;
}
	</script>

</head>

<body>

	<iframe id="wiki_page" src="http://en.wikipedia.org/wiki/Main_Page"></iframe>
	
	<input type="button" onclick="do_something()" />
	<p id="the_p"><p>
	
	</body>
</html>



Can you yourself get it to work using that method?

I mean, if you've been using the "more correct ways" you mentioned for a sufficiently long time, it's possible that this "simpler way" no longer works and/or you forgot some vital component yourself, maybe... ?

Reply With Quote
  #9  
Old January 9th, 2013, 11:46 PM
portcitysoftwar portcitysoftwar is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 163 portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Day 13 h 18 m 54 sec
Reputation Power: 17
And the script works being executed dirrectly on wikipedias site by entering javascript: alert(document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML); into the address bar

Quote:
Originally Posted by portcitysoftwar
Im not sure because the script works if it is pointed to another file say test2.html where this file contains the <div> and <a> it works
Comments on this post
Jacques1 disagrees: The script *only* works by manually executing it on the Wikipedia page. It doesn't work like you and
the OP expect it to work. See "same origin policy".

Reply With Quote
  #10  
Old January 10th, 2013, 12:41 AM
Owen_R Owen_R is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 9 Owen_R User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 2 h 4 m 56 sec
Reputation Power: 0
Yeah, I just tried running
Code:
javascript: alert(document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML);

on en.wikipedia.org/wiki/Main_Page in the Scratchpad (using Firefox 17, so address bar trick doesn't work), and it certainly gets something like `4,140,409`...

But other Scratchpad tests, on test.html :

Code:
var x = document.getElementById("wiki_page").contentWindow.document.getElementById("articlecount").getElementsByTagName("a")[0].innerHTML;
alert(x);
/*
Exception: Permission denied to access property 'document'
@Scratchpad:11
*/


Code:
var list = "";
for (var i in document.getElementById("wiki_page").contentWindow){
    list += ", " + i;
}
alert(list);

/*
Exception: Not allowed to enumerate cross origin objects
@Scratchpad:15
*/



So... you don't know how to do it either, or do you have a different method?

Reply With Quote
  #11  
Old January 13th, 2013, 09:19 AM
E-Oreo's Avatar
E-Oreo E-Oreo is offline
Lost in code
Click here for more information.
 
Join Date: Dec 2004
Posts: 7,947 E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)  Folding Points: 945 Folding Title: Novice Folder
Time spent in forums: 2 Months 10 h 37 m 24 sec
Reputation Power: 7053
You can't retrieve the HTML for an arbitrary URL using JavaScript; you can only retrieve HTML for a URL that resides on the same domain as the JavaScript code being run. This also applies to iframes. The iframe can point to a remote domain, but you won't be able to access its DOM using JavaScript.

The only way you can do this is using a server side script, hosting on the same domain as the JavaScript, to proxy the request to the remote server.
__________________
PHP FAQ
How to program a basic, secure login system using PHP
Connect with me on LinkedIn


Quote:
Originally Posted by Spad
Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around

Reply With Quote
  #12  
Old January 13th, 2013, 01:19 PM
Owen_R Owen_R is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 9 Owen_R User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 2 h 4 m 56 sec
Reputation Power: 0
Quote:
Originally Posted by E-Oreo
The only way you can do this is using a server side script, hosting on the same domain as the JavaScript, to proxy the request to the remote server.


Thank you, but... I don't know how to use that information.

What I'm trying to do with this exercise is just grab information from the HTML of an arbitrary website (like the article count from the Wikipedia main page), and have it show up in my test page when I press the button.

I haven't been able to google any information that looks like it explains how to do that (and why it must be done that way), using just the key words you used there...

Reply With Quote
  #13  
Old January 13th, 2013, 02:05 PM
Jacques1's Avatar
Jacques1 Jacques1 is online now
pollyanna
Click here for more information.
 
Join Date: Jul 2012
Location: Germany
Posts: 1,881 Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 2 Days 9 h 28 m 59 sec
Reputation Power: 813
You cannot access Wikipedia from your own page using JavaScript. You just cannot. Anybody who tells you differently has no idea what he/she is talking about.

This restriction is called same origin policy, and it's the reason why we can visit websites without having to worry that they make transactions with our PayPal account, buy things with our Amazon account and whatnot.

Just think about this for a second: The counter on the Wikipedia page might as well be account information on your online banking page, so there's a good reason not to allow JavaScript to access other websites and then fetch data.

What you can do is what E-Oreo already said: You can have your own server make a request to Wikipedia and then give back the result. This isn't all that trivial, however. You need to use a script on your server (written in PHP, for example) and then call it via AJAX. If you want concrete help, we need to know your server setup (do you have PHP/Ruby/Python/Perl/... installed?) and your current knowledge regarding this topic.

Reply With Quote
  #14  
Old January 14th, 2013, 02:45 PM
portcitysoftwar portcitysoftwar is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 163 portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level)portcitysoftwar User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Day 13 h 18 m 54 sec
Reputation Power: 17
sorry about that. I had no idea of such restrictions as i havnt attempted to try such a thing before. just new that within my own domain i have been able to use javascript to access other pages within an iframe.

Reply With Quote
  #15  
Old January 14th, 2013, 04:09 PM
Owen_R Owen_R is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 9 Owen_R User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 2 h 4 m 56 sec
Reputation Power: 0
Quote:
Originally Posted by portcitysoftwar
sorry about that. I had no idea of such restrictions as i havnt attempted to try such a thing before. just new that within my own domain i have been able to use javascript to access other pages within an iframe.


That's okay. I guess the moral is that we should always test our own advice before we give it. xD Give a concrete example of code that you *know* works. Concrete is easier to understand anyway...

E-Oreo and Jacques1, thank you. It sounds like this is just a much more advanced subject than it first appears.

This is pretty much the extent of my current knowledge (what I can easily *do*, that is), so I guess I just need to study more before I can tackle this?

But I could eventually achieve the effect with, say, Python? Meaning "Django" or...? (And if I used Ruby, that would mean Rails?) What about jQuery? (I don't know anything about Perl other than fundamental regexardry.)

All that being said...

Quote:
Originally Posted by Jacques1
Just think about this for a second: The counter on the Wikipedia page might as well be account information on your online banking page, so there's a good reason not to allow JavaScript to access other websites and then fetch data.


No, thinking about that with the information I'm limited to, there is no way to logically derive that conclusion.

I can browse the web to any public page and look at the source.

Why shouldn't my javascript be able to?

If I try to go to my online banking page, I must know my identification information and password for an account in order to view it.

Why wouldn't any javascript have the same limitation?

That's the obvious conclusion (albeit meta-obviously wrong) from the information I have.

So...

That's a "fake explanation" you gave me, isn't it? ;P

But do you think you *could* communicate the information necessary to logically derive that "there would be a security issue" conclusion to someone at my current level?

Last edited by Owen_R : January 14th, 2013 at 04:10 PM. Reason: typo

Reply With Quote
Reply

Viewing: Dev Shed ForumsOtherBeginner Programming > Javascript to get some information from inside the HTML at an arbitrary ULR?

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap