|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| ||||||||||||||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
Search PDF's and return Match w/ PDF file link
I've been looking around and not too sure how to do this.
I have hundreds of PDF files that are inconsitent in format. I need to create a search engine that searches an entire pdf file thoroughly and other PDF files, returning matches along with a link to the PDF file. Any particular modules I will need? Actually if possible, I rather do this with Text Files. I can convert all my PDF's to textfiles if it makes things easier. I know I have Word Docs as well. It is a lot of trouble installing new modules. Thanks. So far I have this which only reads a particular file and outputs it. open (MYFILE, textfile.txt'); while (<MYFILE>) { chomp; print "$_\n"; } close (MYFILE); Last edited by sushi23 : September 12th, 2006 at 06:40 PM. |
|
#2
|
|||
|
|||
|
help
![]() |
|
#3
|
||||
|
||||
|
http://www.perlfect.com <--answer
__________________
--Ax without exception, there is no rule ... Heavy Haulage Ireland Targeted Advertising Cookie Optout (TACO) extension for Firefox The great thing about Object Oriented code is that it can make small, simple problems look like large, complex ones ![]() 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0 Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. -- Jamie Zawinski Detavil - the devil is in the detail, allegedly, and I use the term advisedly, allegedly ... oh, no, wait I did ... |
|
#4
|
|||
|
|||
|
I would pay them but I don't have too much time...oh well...i'll keep searching.
|
|
#5
|
||||
|
||||
|
http://www.perlfect.com/freescripts/, last I heard it still indexed PDF's, even the free version ...
|
|
#6
|
|||
|
|||
|
I already have a interface built on my own that searches against a database. I need to have it read a directory of pdfs to match results. I'm am very much a beginner at this. Any sample codes would help
or direction. |
|
#7
|
||||
|
||||
|
have a look at the indexer.pl script, and search for the PDF functionality
|
|
#8
|
|||
|
|||
|
wow that is extremely complicated.
|
|
#9
|
||||
|
||||
|
Last edited by Axweildr : September 12th, 2006 at 09:46 PM. |
|
#10
|
|||
|
|||
|
Good News. I got the script to read a directory of files and print all it's contents. Small steps at a time
Now I need to create a search. |
|
#11
|
||||
|
||||
|
ah, the easy part
![]() what ideas you got? |
|
#12
|
|||
|
|||
|
Hmm...now I'm thinking about throwing the text file into an array and split by spacing or tab. I have tried both but they output the same. When I print [0] or [1], it grabs an entire sentence. I got to resolve this.
Anyways, even if I get that working the search will probably take a very long time. BUT, I have another idea. I want to add each letter's ascii code, add the entire word and have the script match the totals..THEN compare only the matched ascii totals, convert back to letters and match by word...this way it would speed things up..i hope. How can I do this with Perl? ![]() |
|
#13
|
||||
|
||||
|
That's not going to work. If you convert each character to ascii code and sum the codes there's no guarantee any two words won't have the same total. To the contrary, it's almost certain that a large number of different words will all sum to the same thing which will make the search massively inaccurate.
|
|
#14
|
|||
|
|||
|
Quote:
But wouldn't it cut down the next search alot? I meant first search ascii then a second search with those ascii results but search them the normal way. Ok, if not this...any ideas? I'm still having trouble splitting the contents by words in the file. Last edited by sushi23 : September 14th, 2006 at 02:39 PM. |
|
#15
|
||||
|
||||
|
|
![]() |
| Viewing: Dev Shed Forums > Programming Languages > Perl Programming > Search PDF's and return Match w/ PDF file link |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|
|