Perl Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming LanguagesPerl Programming
The ASP Free website provides in-depth information on the latest developer tools available from Microsoft. Our cadre of writers, highly experienced industry experts, reveals the best ways to use established technologies as well as new and emerging technologies. Our coverage of Microsoft's development and administration technologies is among the most respected in the IT industry today.

ASP Free and Iron Speed Designer are giving away $5,500+ in FREE licenses. Iron Speed's RAD CASE toolset can save up to 80% of your coding time. One free license per week, one perpetual license per month!
Download and Activate to enter!

Intel® Graphics Performance Analyzers is a powerful tool suite for analyzing and optimizing your games, media, and graphics-intensive applications. Used by some of the best developers on the planet, Intel GPA lets you maximize your app’s performance.


Tutorials
| Forums

Download to Enter
| Contest Rules

DOWNLOAD INTEL® GPA FOR FREE

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old September 11th, 2006, 03:24 PM
sushi23 sushi23 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2005
Posts: 165 sushi23 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 Day 12 h 21 m 16 sec
Reputation Power: 7
Search PDF's and return Match w/ PDF file link

I've been looking around and not too sure how to do this.
I have hundreds of PDF files that are inconsitent in format. I need to create a search engine that searches an entire pdf file thoroughly and other PDF files, returning matches along with a link to the PDF file.

Any particular modules I will need?
Actually if possible, I rather do this with Text Files. I can convert all my PDF's to textfiles if it makes things easier. I know I have Word Docs as well.
It is a lot of trouble installing new modules.

Thanks.

So far I have this which only reads a particular file and outputs it.

open (MYFILE, textfile.txt');
while (<MYFILE>) {
chomp;
print "$_\n";
}
close (MYFILE);

Last edited by sushi23 : September 12th, 2006 at 06:40 PM.

Reply With Quote
  #2  
Old September 11th, 2006, 07:07 PM
sushi23 sushi23 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2005
Posts: 165 sushi23 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 Day 12 h 21 m 16 sec
Reputation Power: 7
help

Reply With Quote
  #3  
Old September 11th, 2006, 07:38 PM
Axweildr's Avatar
Axweildr Axweildr is offline
'fie' on me, allege-dly
Dev Shed God 16th Plane (12500 - 12999 posts)
 
Join Date: Mar 2003
Location: in da kitchen ...
Posts: 12,668 Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)  Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 4 Months 2 Weeks 3 h 7 m 53 sec
Reputation Power: 6120
Send a message via Google Talk to Axweildr
Orkut
http://www.perlfect.com <--answer
__________________
--Ax
without exception, there is no rule ...
Heavy Haulage Ireland
Targeted Advertising Cookie Optout (TACO) extension for Firefox
The great thing about Object Oriented code is that it can make small, simple problems look like large, complex ones


09 F9 11 02
9D 74 E3 5B
D8 41 56 C5
63 56 88 C0
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
-- Jamie Zawinski
Detavil - the devil is in the detail, allegedly, and I use the term advisedly, allegedly ... oh, no, wait I did ...

Reply With Quote
  #4  
Old September 11th, 2006, 08:09 PM
sushi23 sushi23 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2005
Posts: 165 sushi23 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 Day 12 h 21 m 16 sec
Reputation Power: 7
I would pay them but I don't have too much time...oh well...i'll keep searching.

Reply With Quote
  #5  
Old September 11th, 2006, 08:18 PM
Axweildr's Avatar
Axweildr Axweildr is offline
'fie' on me, allege-dly
Dev Shed God 16th Plane (12500 - 12999 posts)
 
Join Date: Mar 2003
Location: in da kitchen ...
Posts: 12,668 Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)  Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 4 Months 2 Weeks 3 h 7 m 53 sec
Reputation Power: 6120
Send a message via Google Talk to Axweildr
Orkut
http://www.perlfect.com/freescripts/, last I heard it still indexed PDF's, even the free version ...

Reply With Quote
  #6  
Old September 12th, 2006, 05:19 PM
sushi23 sushi23 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2005
Posts: 165 sushi23 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 Day 12 h 21 m 16 sec
Reputation Power: 7
I already have a interface built on my own that searches against a database. I need to have it read a directory of pdfs to match results. I'm am very much a beginner at this. Any sample codes would help or direction.

Reply With Quote
  #7  
Old September 12th, 2006, 06:55 PM
Axweildr's Avatar
Axweildr Axweildr is offline
'fie' on me, allege-dly
Dev Shed God 16th Plane (12500 - 12999 posts)
 
Join Date: Mar 2003
Location: in da kitchen ...
Posts: 12,668 Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)  Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 4 Months 2 Weeks 3 h 7 m 53 sec
Reputation Power: 6120
Send a message via Google Talk to Axweildr
Orkut
have a look at the indexer.pl script, and search for the PDF functionality

Reply With Quote
  #8  
Old September 12th, 2006, 09:07 PM
sushi23 sushi23 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2005
Posts: 165 sushi23 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 Day 12 h 21 m 16 sec
Reputation Power: 7
wow that is extremely complicated.

Reply With Quote
  #9  
Old September 12th, 2006, 09:43 PM
Axweildr's Avatar
Axweildr Axweildr is offline
'fie' on me, allege-dly
Dev Shed God 16th Plane (12500 - 12999 posts)
 
Join Date: Mar 2003
Location: in da kitchen ...
Posts: 12,668 Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)  Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 4 Months 2 Weeks 3 h 7 m 53 sec
Reputation Power: 6120
Send a message via Google Talk to Axweildr
Orkut
Turns out they use pdftotext anyhow.

Also it shells to antiword for word documents

Last edited by Axweildr : September 12th, 2006 at 09:46 PM.

Reply With Quote
  #10  
Old September 13th, 2006, 05:53 PM
sushi23 sushi23 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2005
Posts: 165 sushi23 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 Day 12 h 21 m 16 sec
Reputation Power: 7
Good News. I got the script to read a directory of files and print all it's contents. Small steps at a time

Now I need to create a search.

Reply With Quote
  #11  
Old September 13th, 2006, 08:23 PM
Axweildr's Avatar
Axweildr Axweildr is offline
'fie' on me, allege-dly
Dev Shed God 16th Plane (12500 - 12999 posts)
 
Join Date: Mar 2003
Location: in da kitchen ...
Posts: 12,668 Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)Axweildr User rank is General 77th Grade (Above 100000 Reputation Level)  Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1Folding Points: 161075 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 4 Months 2 Weeks 3 h 7 m 53 sec
Reputation Power: 6120
Send a message via Google Talk to Axweildr
Orkut
ah, the easy part

what ideas you got?

Reply With Quote
  #12  
Old September 14th, 2006, 02:05 PM
sushi23 sushi23 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2005
Posts: 165 sushi23 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 Day 12 h 21 m 16 sec
Reputation Power: 7
Hmm...now I'm thinking about throwing the text file into an array and split by spacing or tab. I have tried both but they output the same. When I print [0] or [1], it grabs an entire sentence. I got to resolve this.

Anyways, even if I get that working the search will probably take a very long time. BUT, I have another idea. I want to add each letter's ascii code, add the entire word and have the script match the totals..THEN compare only the matched ascii totals, convert back to letters and match by word...this way it would speed things up..i hope.

How can I do this with Perl?

Reply With Quote
  #13  
Old September 14th, 2006, 02:25 PM
Ctb's Avatar
Ctb Ctb is offline
An Ominous Coward
Dev Shed Specialist (4000 - 4499 posts)
 
Join Date: Jan 2002
Posts: 4,425 Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level) 
Time spent in forums: 3 Weeks 10 h
Reputation Power: 0
That's not going to work. If you convert each character to ascii code and sum the codes there's no guarantee any two words won't have the same total. To the contrary, it's almost certain that a large number of different words will all sum to the same thing which will make the search massively inaccurate.

Reply With Quote
  #14  
Old September 14th, 2006, 02:32 PM
sushi23 sushi23 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2005
Posts: 165 sushi23 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 Day 12 h 21 m 16 sec
Reputation Power: 7
Quote:
Originally Posted by Ctb
That's not going to work. If you convert each character to ascii code and sum the codes there's no guarantee any two words won't have the same total. To the contrary, it's almost certain that a large number of different words will all sum to the same thing which will make the search massively inaccurate.


But wouldn't it cut down the next search alot?
I meant first search ascii then a second search with those ascii results but search them the normal way. Ok, if not this...any ideas?
I'm still having trouble splitting the contents by words in the file.

Last edited by sushi23 : September 14th, 2006 at 02:39 PM.

Reply With Quote
  #15  
Old September 14th, 2006, 02:37 PM
Ctb's Avatar
Ctb Ctb is offline
An Ominous Coward
Dev Shed Specialist (4000 - 4499 posts)
 
Join Date: Jan 2002
Posts: 4,425 Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level)Ctb User rank is Colonel (50000 - 60000 Reputation Level) 
Time spent in forums: 3 Weeks 10 h
Reputation Power: 0

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPerl Programming > Search PDF's and return Match w/ PDF file link


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.

© 2003-2012 by Developer Shed. All rights reserved. DS Cluster 4 - Follow our Sitemap