December 18th, 2012, 03:11 PM
Great site, this is my first post.
I am running CF9.
I have a folder of .doc and .docx files I want to search he file contents, I don't need the meta data, just the viewable content.
I have these same files listed in a MS SQL 2005 table.
What I'd like is to be able to search the contents of the .doc and .docx and return the primary key and extension that are listed in the table.
Am I making this more complicated than it needs to be? I thought this would be easier, but it's making me crazy.
Thanks for the help
December 18th, 2012, 05:37 PM
If you think about what you're asking, this isn't easy at all. You have two separate sets of data (a directory of files, and a database table), and you want to search the content of the files but map the search results to a database row.
That said, I believe Example #5 (toward the bottom) of the cfindex page in the docs shows one way to blend database results and file path indexing.
Last edited by kiteless; December 18th, 2012 at 05:39 PM.
December 18th, 2012, 07:46 PM
Thank you very much for your reply.
I had seen that reference, but didn't see where I told it where the files are.
OK, maybe I can make it easier. When I upload the files, I name them with the primary key from the same database table I want to later reference. So in the folder it might look like:
Is there a way to index the folder and use the filename as the key? Then I can use what the search returns to query the database based on the key which will match the primary key from the table.
December 18th, 2012, 08:21 PM
I've actually only used Solr to index database data OR documents, but not both of them at once. What you're talking about is probably possible, but it's an usual case and is not the normal way indexing is done. Other than going over the docs and doing some trial and error, I'm afraid I can't offer much on this specific scenario.
December 18th, 2012, 08:24 PM
OK, thanks, but in the second proposal I think I was only asking how to index the docs, all of the docs in a single folder.
I'll do some more searching.
December 19th, 2012, 08:51 AM
Ah, the Example #2 on the cfindex docs page shows indexing a file path.
December 19th, 2012, 02:38 PM
In case anyone else tries something similar, and I understand this is pretty basic, but here is what I did.
Here is the index I'll run nightly to keep the search updated.
Then from the Key I can get the file name:
I'll use that variable in a query to get a list of documents from the database.
The one hiccup I had is getting the number from the key. I used something similar in another search, but all of the documents were .pdf so I could simply strip off the 4 characters on the right. I don't have anything like that this time. The file might be 1.doc or 1342.docx.
So I used this to get the number portion of the Key:
<cfset SearchFile = ReReplaceNoCase(FileName,"[^0-9,]","","ALL")>
Then I'll query the database to find all of those stripped Keys that will match the primary key.
Thank you for your patience and guidance.