August 12th, 2013, 02:09 PM
Making Text Files Look Neater
So, in addition to coding, I'm also something of a poet.
I've had a few changes over the years to how I wrote the headers in my poetry files, as illustrated below:
Torn To Bits
Originally published June 21, 2011
Where's My Moment?
Published To Facebook November 13, 2010
Written August 4, 2011
Bathed In Blood And Glory
Inspired by Psalms 32
Written July 2, 2011 [This is the only date I actually want]
Originally published to facebook July 3, 2011
So, I had an idea about possibly writing a program that would take these, and turn it into the following uniform format:
[By My Name]
Written on [Date]
[Any additional notes that would originally have been mixed in the wrong space, like the 'Inspired by Psalms 32' found above]
(Series names are based on the folder they're found in. The examples above, coincidentally, didn't seem to have them on there, although some headers do.)
And I wanted to create a text file that would go in each sub category in the following manner:
Name Date Written Series Name
Organized not by name (Which the folder ought to do by default) but by Date Written.
So, here's the questions I have to ask:
How do I search those random headers and find the first line that contains a date and pull that date, regardless of what format it might be in? I don't think I have purely numeric dates, but I don't know if that's the case. I know there are one or two instances where the date only includes a year. If I could catch cases that didn't follow the normal format and look at them separately to make judgement calls, that would also work.
How do I make a list with uniform tabs that would fit to any sized poem names? My longest poem name is like 20+ characters long, and my shortest is like 8, so I don't know how to tab them to uniform length.
Last edited by Mr909; August 12th, 2013 at 02:11 PM.
August 13th, 2013, 11:26 PM
I usually use flex and bison to parse grammars.
You might interpret this as "Use a finite state machine".
If regular expressions don't work. Which is a FSM. (Apologies for the non-sentences. I shouldn't post while consuming wine.) The dates look like a problem. (As I think you said. I haven't read your post today.) It may be easiest in your case to find parts that do not look like a date. And then we'd have the problem of converting the dates to a standard form. It's a translation. Google translation works with Bayes law (Bayes networds), with a fantastic amount of data providing the probabilities.
[/code] are essential for python code and Makefiles!
August 30th, 2013, 04:23 PM
I want to do this... within Python.
It's not that I have problems with other frameworks, it's just that I kind of want to stick with this one.
Any way to approach this sort of thing with that caveat?
August 31st, 2013, 01:13 AM
You need to get very familiar with the "re" module, and regular expressions. Brace yourself.
Humans are very good at looking at a wide variety of information and inferring relationships of various kinds from subtleties. Computers need very tight rules, so if you want a computer to translate this stuff you need to tell it the rules.
In my experience doing similar kinds of things (e.g, turning free-form spreadsheet data into clean, uniform RDBMS databases), you basically have to survey the data and develop a set of rules that matches the widest possible set of records, then check the output and fix the edge cases.
There may be a "magic module" somewhere in a github repo that translates date strings to datetime objects, but I'm not specifically aware of one. Most likely you're just going to have to develop a regex.