|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
Get inside! Sample the range of functionality easily built with JMSL Library for Time Series Data Analysis, Heat Maps, Portfolio Optimization, Monte Carlo Simulation, Stock Price Charting and more. Download Now! |
|
#1
|
|||
|
|||
|
Tab delimited parse
Hi...encountering a difficult file format.
Please see attached file: tsv_parse.txt (open with excel to see format) tsv_result.txt is what I would like the data to look like. Any direction or help would be appreciated. So far I have this code: Code:
while (<$file>) {
$lines++;
$len = length($_);
my @fields = split(/\t/);
if ($len > 2) {
print @fields;
...
|
|
#2
|
|||
|
|||
|
my other approach was to use excel vba however, the data in the one cell is separated by ALT+ENTER. I also tried importing to Access and what I saw was a question mark sign closed by a square symbol to represent the delimiter.
|
|
#3
|
||||
|
||||
|
I can't look at your files from here, but from experience I think Excel does a terrible job of creating tab delimited files. It tends to add unwanted quotes around the place anyway.
I prefer to export as CSV when working with Excel, and parse it with Text::CSV or Text::CSV_XS. It's slightly more work, but better results. |
|
#4
|
|||
|
|||
|
Substitute the | with ALT+Enter symbol
Code:
For Instance if I have: id name zip 1 bob 98789 2 tom 23434 3 abe 345 | 356 | 467 (these 3 zip's are in ONE cell) I want that to look like this: id name zip 1 bob 98789 2 tom 23434 3 abe 345 3 abe 356 3 abe 467 I need each entry to go down to a new row that also copies the rest of the row to the new row. Here is the exact problem I am having but I didn't understand it either and rather do this with perl: http://www.mrexcel.com/archive/Data/951.html |
|
#5
|
|||
|
|||
|
I took a look at your files. Take the following as a reason to use CSV...
I think this will solve the problem of getting the data into memory. The files use have have CRLFs ending the lines but just LFs in the multi-line field. Code:
# Add this before the loop binmode($file); $/ = "\cM\cJ"; There's some stray quotes in around the multi-line fields so you probably want to clean that up too. Code:
s/^"(.*)"$/$1/s for @fields For the second part you can just split the record on "\cJ" and print out the entire line substituting in each piece of the split record with something like this: Code:
print @fields[0,1], $_, @fields[3,4,5,6,7,8] for split /\cJ/, $fields[2]; But the problem gets more interesting if you don't know which field is the multi-line field or if you can have multiple multi-line fields. I think the easiest way of solving it would be looping over the columns creating an array of arrays as you go: you flatten any multi-lines in column 1 building a new array, then you flatten column two resulting in a new array and so on. The real answer is that (A) Excel should not be used as a database and (B) you need to normalize that database. ; )
__________________
sub{*{$::{$_}}{CODE}==$_[0]&& print for(%:: )}->(\&Meh); Last edited by OmegaZero : April 16th, 2008 at 06:34 PM. |
|
#6
|
|||
|
|||
|
OmegaZero, that worked very nicely =)
Now how would I dump that in a database? I know I should be normalizing... I will down the road but for the time being I need this working. not sure if this is the best way for the sql statement: Code:
my $insert = "INSERT INTO table A (1, 2, 3, 4, 5, 6, 7, 8) VALUES (@field)"; my $sth = $dbh->prepare($insert); $isth->execute; $isth->finish(); $dbh->disconnect(); Quote:
|
|
#7
|
|||
|
|||
|
In general ...
Code:
#!\user\bin\perl -w
while (my @a_Field=GetTSV(@a_Field)) {
Print('',@a_Field);
};
sub Print{
my $string=shift;
my @field=@_;
if (@field == 0) { # all done
substr($string,-1,1)="\n";
print "$string";
} else { # more to come
for (split(/\n/,shift(@field))) {
Print("$string$_\t",@field);
};
};
};
sub GetTSV{
my($s_Line,$s_Text);
while ($s_Line=<DATA>) {
#print "'$s_Line'\n";
my(@a_Field);
$s_Text.=$s_Line;
while ($s_Text =~ m{([^"\t]*)([\t\n])|"((?:[^"]|"")*)"([\t\n])}gs) {
unless (substr($`,-1,1) eq '"') {
if (defined $1) {
push(@a_Field,$1);
if ($2 eq "\n") {
return @a_Field;
};
} else {
my($s_Field);
($s_Field=$3)=~s/""/"/g;
push(@a_Field,$s_Field);
if ($4 eq "\n") {
return @a_Field;
};
};
};
};
};
return ();
};
__DATA__
data1 data2 "data3a
data3b
data3c
data3d" data4 "data5a
data5b" data6 "data7a
data7b" data8 data9 data10
which gives Code:
data1 data2 data3a data4 data5a data6 data7a data8 data9 data10 data1 data2 data3a data4 data5a data6 data7b data8 data9 data10 data1 data2 data3a data4 data5b data6 data7a data8 data9 data10 data1 data2 data3a data4 data5b data6 data7b data8 data9 data10 data1 data2 data3b data4 data5a data6 data7a data8 data9 data10 data1 data2 data3b data4 data5a data6 data7b data8 data9 data10 data1 data2 data3b data4 data5b data6 data7a data8 data9 data10 data1 data2 data3b data4 data5b data6 data7b data8 data9 data10 data1 data2 data3c data4 data5a data6 data7a data8 data9 data10 data1 data2 data3c data4 data5a data6 data7b data8 data9 data10 data1 data2 data3c data4 data5b data6 data7a data8 data9 data10 data1 data2 data3c data4 data5b data6 data7b data8 data9 data10 data1 data2 data3d data4 data5a data6 data7a data8 data9 data10 data1 data2 data3d data4 data5a data6 data7b data8 data9 data10 data1 data2 data3d data4 data5b data6 data7a data8 data9 data10 data1 data2 data3d data4 data5b data6 data7b data8 data9 data10
__________________
"Plagiarize the code of others, I say!" That's about the worst you can do. It's the fast way to make a very bad coder out of you. It's important to understand what you are doing. Just asking what you should do, and blindly copying that if you would have gotten an unambigious answer is very, very bad. -- Abigail Last edited by Clueless Newbie : April 17th, 2008 at 02:03 PM. |
|
#8
|
|||
|
|||
|
amazing...any books out there to teach me all this stuff?
![]() |
|
#9
|
|||
|
|||
|
I am still having some difficulty insert this into a database.
Any hints appreciated. Code:
while (<$file>) {
$lines++;
$len = length($_);
my @fields = split(/\t/);
binmode($file);
$/ = "\cM\cJ";
s/^"(.*)"$/$1/s for @fields;
if ($len > 2) {
print @fields[0,1], $_, @fields[3,4,5,6,7,8,9,10,11,12,13] for split /\cJ/, $fields[2];
my $insert = "INSERT INTO table () values ()";
my $sth = $dbh->prepare($insert);
$sth->execute;
$sth->finish();
$dbh->disconnect();
}# closes if $len
}# closes while
Last edited by electron_89 : April 21st, 2008 at 06:04 PM. |
|
#10
|
|||
|
|||
|
Code:
#!\user\bin\perl -w
use strict;
use warnings;
use DBI;
my $Temp_dbh_o=DBI->connect...
#$Temp_dbh_o->do("create table example (p0,p1,p2,p3,p4,p5,p6,p7,p8,p9)");
my $Temp_sth_o=$Temp_dbh_o->prepare("insert into example values(?,?,?,?,?,?,?,?,?,?)");
my @a_Field;
while (@a_Field=GetTSV(@a_Field)) {
Insert([],@a_Field);
};
$Temp_dbh_o->commit;
$Temp_dbh_o->disconnect();
exit;
sub Insert{
my $value_ra=shift;
my @field=@_;
if (@field == 0) { # all done
$Temp_sth_o->execute(@$value_ra);
} else { # more to come
for (split(/\n/,shift(@field))) {
Insert([@$value_ra,$_],@field);
};
};
};
sub GetTSV{
my($s_Line,$s_Text);
while ($s_Line=<DATA>) {
#print "'$s_Line'\n";
my(@a_Field);
$s_Text.=$s_Line;
while ($s_Text =~ m{([^"\t]*)([\t\n])|"((?:[^"]|"")*)"([\t\n])}gs) {
unless (substr($`,-1,1) eq '"') {
if (defined $1) {
push(@a_Field,$1);
if ($2 eq "\n") {
return @a_Field;
};
} else {
my($s_Field);
($s_Field=$3)=~s/""/"/g;
push(@a_Field,$s_Field);
if ($4 eq "\n") {
return @a_Field;
};
};
};
};
};
return ();
};
__DATA__
data1 data2 "data3a
data3b
data3c
data3d" data4 "data5a
data5b" data6 "data7a
data7b" data8 data9 data10
HTH, clueless |
|
#11
|
|||
|
|||
|
Thanks but I can't get that working on a form.
What happens is user browses for the tsv file, clicks submit. It works with OmegaZero's example. Thanks Quote:
|
|
#12
|
|||
|
|||
|
Post the code.
|
|
#13
|
|||
|
|||
|
This is the partial working version I have. Your code is very nice but hopefully it can be incorporated into this and will import into my database. As you probably can tell, my insert statement won't work.
thanks Code:
print "Content-type: text/html\n\n";
print <<HTML;
<head><title>parser</title>
</head>
<body>
<table>
<tr>
<td>
<form action="parse4.cgi" enctype="multipart/form-data" method="post">
<table>
<tr>
<td><input type="file" name="filename" size="40"></td>
</tr>
<tr>
<td><input type="submit" value="Import"></td>
</tr>
</table>
</form>
</td>
</tr>
</table>
</body>
HTML
if (my $file = param('filename')) {
$db="";
$host="";
$port="";
$userid="";
$passwd="";
$connectionInfo="DBI:mysql:database=$db;$host:$port";
my $dbh = DBI->connect($connectionInfo,$userid,$passwd)
or die("Cannot connect: $DBI::errstr\n");
while (<$file>) {
$lines++;
$len = length($_);
my @fields = split(/\t/);
binmode($file);
$/ = "\cM\cJ";
s/^"(.*)"$/$1/s for @fields;
if ($len > 2) {
print @fields[0,1], $_, @fields[3,4,5,6,7,8,9,10,11,12,13] for split /\cJ/, $fields[2];
my $insert = "INSERT INTO table (column1, column2, column3, column4, column5, column6, column7, column8, column9, column10,
column11, column12, column13, column14)
values (?,?,?,?,?,?,?,?,?,?,?,?,?,?)";
my $sth = $dbh->prepare($insert);
$sth->execute;
$sth->finish();
$dbh->disconnect();
}
}
}
Quote:
|
|
#14
|
||||
|