March 19th, 2013, 06:53 AM
Help with a regex that reads the data between two strings with irregular no. of \n
I'm fairly new to regex and I'm struggling to get a string that can scrape data from a web page.
I need all the data between two fixed points - lets define them as "string". However the data in between could span across anything from 1 up to 6 lines. So for example it might look like this ( I've bolded string for visibility, it would not be bolded in the file ).
af> sf \n
In this case I would be looking to select
af> sf \n
I tried to set up a string using an if then like this
But that doesn't seem to work, I'm guessing because regex doesn't exit if the evaluation is true but just keeps going.
Does anybody have any suggestions on how I could solve this problem please?
Best regards Steve
March 20th, 2013, 06:08 PM
Because your problem involves a little bit more that pure regexes, it would be important to know in which language you are working. The solution mght be very different depending on the language used.
For example in Perl, I could just do something as simple as this
my $input = "string\n12312\nasd\nstring fsdsg\nasgsdgfd sdfsd\n<saef 12n\af> sf \n123\nstring\n123\nstring gasfsfdsg\n";
@foo = split /string/, $input;
Now, the @foo array contains four elements, the first one empty because string comes right at the beginning (easy to solve if this is nor desired) and then the three chunks you are looking for:
But this very easy solution depends on the language being used (but I am sure other languages have similar facilities).
2 " fsdsg\cJasgsdgfd sdfsd\cJ<saef 12n\cGf> sf \cJ123\cJ"
4 ' gasfsfdsg