regex - perl split 8gb csv with "," as pattern



I recognise this might be a duplicate but the size of the file I have to split requires a method with doesn't load the csv into memory before processing it. ie I'm looking for a line by line method to read and split and output my file. I I only need my output to be the last 3 field without the quotes and without the thousand delimiting comma.

I have a file of arcGIS coordinates which contain quotes and commas internal to the fields. Data example below.


I have been trying to do this using variations on split( '","' , $line);. Here'e my code.

use strict;
use warnings;

open (FH, '<', "DEM_Export.csv") or die "Can't open file DEM_Export.csv";

open (FH2, '>', "DEM_ExportProcessed.csv") or die "Can't open file DEM_ExportProcessed.csv"; 
my $count = 0;
foreach my $line (<FH>) {
    # if ($count == 0){next;}

    print $line, "\n";
    my @list = split( '","' , $line);
    print "1st print $list[5],$list[6],$list[4]\n";
    $list[4] =~ s/,//g;
    $list[5] =~ s/,//g;
    $list[6] =~ s/,//g;
    $list[4] =~ s/"//g;
    $list[5] =~ s/"//g;
    $list[6] =~ s/"//g;
    print "2nd print $list[5],$list[6],$list[4]\n";
    if ($count == 10) { 
    my $string = sprintf("%.3f,%.3f,%.3f\n", $list[5],$list[6],$list[4]); 
    print FH2 $string;

close FH;
close FH2;

I'm getting close my my wits end with this and really need a solution. Any help will be gratefully received. Cheers

2 Answers: 

This is really very straightforward using the Text::CSV to handle the nastiness of CSV data

Here's an example, which works fine with the sample data you have shown. As long as your input file is plain ASCII and the rows are about the size you have shown it should work fine

It prints its output to STDOUT, so you'll want to use a command-line redirect to put it into the file you want

use strict;
use warnings 'all';

use Text::CSV;

my $csv_file = 'DEM_Export.csv';

open my $in_fh, '<', $csv_file or die qq{Unable to open "$csv_file" for input: $!};

my $csv = Text::CSV->new({ eol => "\n" });


while ( my $row = $csv->getline($in_fh) ) {

   $csv->print(\*STDOUT, [ map tr/,//dr, @$row[-2,-1,-3] ] );



I guess I should have been braver and had a crack with Text::CSV to start with rather than asking a question. Many thanks to Сухой27 and choroba for pointing me in the right direction.

Here is the code I ended up with. Probably not the tidiest.

use strict;
use warnings;
use Text::CSV;

my $file  = "DEM_Export.csv";
my $file2 = "DEM_ExportProcessed.csv";

open (FH2, '>', $file2) or die "Can't open file $file2: $!";
print "Starting file processing...\n";
my $csv = Text::CSV->new ({ binary => 1, eol => $/ });
open my $io, "<", $file or die "$file: $!";
while (my $row = $csv->getline ($io)) {
    my @fields = @$row;
    s/,//g for @fields[3..5];     
    my $string = sprintf("%.3f,%.3f,%.3f\n", $fields[4],$fields[5],$fields[3]); 
    print FH2 $string;
print "Finished!";
close FH2;  

Worked a treat! Thank you.