January 16th, 2014, 05:17 AM
How do i match everything between <ul to </ul>?
<li id="FN_Footnote-MAT-1"><a href="#Footnote-MAT-1"></a> 1.26a Tamar Bore her twin sons out of wedlock (Gen 38.630).</li>
<li id="FN_Footnote-MAT-2"><a href="#Footnote-MAT-2"></a> 1.26a Rahab A PROSTITUTE in Jericho (Josh 2.121; 6.1725; Jas 2.25).</li>
<li id="FN_Footnote-MAT-3"><a href="#Footnote-MAT-3"></a> 1.26a Ruth A MOABITE (Ruth 1.4). Only outstanding women were normally included in Jewish genealogical lists.</li>
<li id="FN_Footnote-MAT-4"><a href="#Footnote-MAT-4"></a> 1.6b11 Solomon In Luke's genealogy (Lk 3.31) David's son Nathan (2 Sam 5.14) appears as Jesus' ancestor.</li>
<li id="FN_Footnote-MAT-5"><a href="#Footnote-MAT-5"></a> 1.6b11 his mother Bathsheba (2 Sam 12.24).</li>
<li id="FN_Footnote-MAT-6"><a href="#Footnote-MAT-6"></a> 1.6b11 exile in Babylon In 597 BC King Nebuchadnezzar of Babylonia conquered JERUSALEM and took many of its inhabitants as prisoners to his country (2 Kgs 24.1016; 2 Chr 36.910; Jer 27.20).</li>
<li id="FN_Footnote-MAT-8"><a href="#Footnote-MAT-8"></a> 1.1216 after the exile in Babylon In 538 BC Emperor CYRUS of Persia, who the year before had conquered Babylon, allowed the Jews to return to their homeland.</li>
<li id="FN_Footnote-MAT-9"><a href="#Footnote-MAT-9"></a> 1.1216 Zerubbabel Leader of the Jewish people after they returned from exile (Ezra 3.2; Hag 1.1; 2.2; Zech 4.610).</li>
<li id="FN_Footnote-MAT-10"><a href="#Footnote-MAT-10"></a> 1.1216 MESSIAH</li>
<li id="FN_Footnote-MAT-11"><a href="#Footnote-MAT-11"></a> 1.17 fourteen generations The number may be related to the numerical value of the name David in Hebrew: d (4) +v (6) +d (4) = 14.</li>
January 16th, 2014, 02:02 PM
January 16th, 2014, 02:26 PM
Even better than regular expressions: use an HTML parser to find the UL and grab its HTML contents.
Comments on this post
January 17th, 2014, 03:06 AM
will that enable me to extract all <ul </ul> and move it somewhere else? I looked it up on their website but it doesn't explain it massively well (im quite new to HTML if you haven't guessed already)
Originally Posted by requinix
January 17th, 2014, 03:38 AM
"Their website"? Who is "they"?
The thing is that regexes are dumb. They don't understand the input, all they see is a sequence of characters. Sometimes you can process a simple HTML snippet by only looking at the characters. But in general, this is the wrong way. It's extremely cumbersome, inflexible and difficult to read. And it only works in very simple cases.
The correct solution as pointed out by requinix is to use an HTML parser which actually understands the stuff you're giving it. Once the raw markup has been turned into a structure of elements, you can do anything you want: You can move the elements around, add new ones, change the attributes etc.
Regexes are overrated. They work well for simple patterns like timestamps. But they're completely unsuitable for anything more complex. Whenever you have to deal with a complicated structure like an HTML document, you need a real parser.
Comments on this post