ColdFusion Development
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming Languages - MoreColdFusion Development

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old December 18th, 2012, 03:11 PM
dilbert12 dilbert12 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 6 dilbert12 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 33 m 38 sec
Reputation Power: 0
Solr Search

Great site, this is my first post.

I am running CF9.
I have a folder of .doc and .docx files I want to search he file contents, I don't need the meta data, just the viewable content.
I have these same files listed in a MS SQL 2005 table.

What I'd like is to be able to search the contents of the .doc and .docx and return the primary key and extension that are listed in the table.

Am I making this more complicated than it needs to be? I thought this would be easier, but it's making me crazy.

Thanks for the help

Reply With Quote
  #2  
Old December 18th, 2012, 05:37 PM
kiteless kiteless is offline
Moderator
Dev Shed God (5000 - 5499 posts)
 
Join Date: Jun 2002
Location: Raleigh, NC
Posts: 5,091 kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level) 
Time spent in forums: 2 Weeks 5 Days 2 h 53 m 27 sec
Reputation Power: 966
If you think about what you're asking, this isn't easy at all. You have two separate sets of data (a directory of files, and a database table), and you want to search the content of the files but map the search results to a database row.

That said, I believe Example #5 (toward the bottom) of the cfindex page in the docs shows one way to blend database results and file path indexing.

Last edited by kiteless : December 18th, 2012 at 05:39 PM.

Reply With Quote
  #3  
Old December 18th, 2012, 07:46 PM
dilbert12 dilbert12 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 6 dilbert12 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 33 m 38 sec
Reputation Power: 0
Thank you very much for your reply.

I had seen that reference, but didn't see where I told it where the files are.

OK, maybe I can make it easier. When I upload the files, I name them with the primary key from the same database table I want to later reference. So in the folder it might look like:
1,doc
2.docx
7.docx
9.doc
...


Is there a way to index the folder and use the filename as the key? Then I can use what the search returns to query the database based on the key which will match the primary key from the table.

Thanks again

Reply With Quote
  #4  
Old December 18th, 2012, 08:21 PM
kiteless kiteless is offline
Moderator
Dev Shed God (5000 - 5499 posts)
 
Join Date: Jun 2002
Location: Raleigh, NC
Posts: 5,091 kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level) 
Time spent in forums: 2 Weeks 5 Days 2 h 53 m 27 sec
Reputation Power: 966
I've actually only used Solr to index database data OR documents, but not both of them at once. What you're talking about is probably possible, but it's an usual case and is not the normal way indexing is done. Other than going over the docs and doing some trial and error, I'm afraid I can't offer much on this specific scenario.

Reply With Quote
  #5  
Old December 18th, 2012, 08:24 PM
dilbert12 dilbert12 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 6 dilbert12 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 33 m 38 sec
Reputation Power: 0
OK, thanks, but in the second proposal I think I was only asking how to index the docs, all of the docs in a single folder.
I'll do some more searching.

Reply With Quote
  #6  
Old December 19th, 2012, 08:51 AM
kiteless kiteless is offline
Moderator
Dev Shed God (5000 - 5499 posts)
 
Join Date: Jun 2002
Location: Raleigh, NC
Posts: 5,091 kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level)kiteless User rank is General (90000 - 100000 Reputation Level) 
Time spent in forums: 2 Weeks 5 Days 2 h 53 m 27 sec
Reputation Power: 966
Ah, the Example #2 on the cfindex docs page shows indexing a file path.

Reply With Quote
  #7  
Old December 19th, 2012, 02:38 PM
dilbert12 dilbert12 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 6 dilbert12 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 33 m 38 sec
Reputation Power: 0
It worked!!!

In case anyone else tries something similar, and I understand this is pretty basic, but here is what I did.

Here is the index I'll run nightly to keep the search updated.

<cfindex
collection="docs"
action="refresh"
type="path"
key="my path"
extensions=".doc, .docx"
URLpath="my URL">

Then from the Key I can get the file name:
<cfset FileName=GetFileFromPath(Key)>

I'll use that variable in a query to get a list of documents from the database.

The one hiccup I had is getting the number from the key. I used something similar in another search, but all of the documents were .pdf so I could simply strip off the 4 characters on the right. I don't have anything like that this time. The file might be 1.doc or 1342.docx.
So I used this to get the number portion of the Key:
<cfset SearchFile = ReReplaceNoCase(FileName,"[^0-9,]","","ALL")>

Then I'll query the database to find all of those stripped Keys that will match the primary key.

Thank you for your patience and guidance.

Cliff

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreColdFusion Development > Solr Search

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap