BATW Perl example/lesson: Replace variable in entries in a csv file
by Bonnie Dalzell
Here is a problem presented on a list:
I need to know how to use search and replace functions in perl.
I want to loop through file A line by line and substitute a string
globally in file B based upon the second element of the array in file
A.
File A is a .csv file of this sort:
1,company 123
2,company 456
File B identifies the company by its company id nubmer in this format:
ID,1,NAME,PHONE
I want to replace the numeric company ID in file B with the company name, witing a new file C with the changes.
ID,company 123,NAME,PHONE
ID,company 456,NAME,PHONE
ID,company 123,NAME,PHONE
Here is a way of doing this:
Comments about the code are in blue
cut and paste the program below into a text editor and save as
"find_replace_in_file.pl" or anything else you want to call it.
#!/usr/bin/perl
# by Bonnie Dalzell Jan 14, 2009
# copyright under GPL
# may be reused, modified and resdistributed as needed under GPL
## the motto of PERL is "there is more than one way to do it"
## in addition to the output to the file I have left a report to the shell
use strict;
our $a = "a.csv";
our $b = "b.csv";
our $combined = "combinedfiles.csv";
open (INDUMP, "<$a") || die "cannot open $a";
# you need to check the parentheses the < is included in with file name not speparated
our %companyid=();
our @lines2 = <INDUMP>; # read file into list
chomp(@lines2);
foreach my $item (@lines2) { # loop thru list
chomp($item);
my @data=split(/,/,$item);
print "key: $data[0] - value: $data[1]\n";
#this is to the shell so you can monitor output - you can delete this or comment it out.
$companyid{$data[0]} = $data[1];
# this sets up a key value pair where company id number is associated with company name
} #end foreach
close INDUMP;
open (INGROUP, "<$b") || die "cannot open $b";;
open (DUMP, ">$combined") || die "cannot open $combined";
## if you use >> you append to a preexisting file
## if you use > you make a new version of the file.
while (<INGROUP>){
my $in_line = $_; # using my in loop assures clean variable - no hanovers from previous loops
chomp($in_line);
my @inline = split(/,/,$in_line);
$inline[1] = $companyid{$inline[1]};
# here we substitute the value from the companyid hash for the key taken from the new file
my $line = &join_array(@inline);
print "$line\n"; #this is to the shell so you can monitor output - you can delete this or comment it out
print DUMP "$line\n";
} #endwhile
close(INGROUP);
close DUMP;
##### subroutine below this ######
sub join_array {
## use my to keep variables local to subroutine
## there is what appears to be a simpler way to do this (see below) looping through the array
## but when i do it that way the resulting line keeps coming out with a comma before
## the first entry this method does not
my $i=0;
my @array = @_;
my $joinline=$array[0];
for( $i = 1; $i <= $#array; $i++){
$joinline ="$joinline".','."$array[$i]";
$i++;
}
return $joinline;
} #endsub
#####End Program ###################
Alternative method for joining array back into a single line for output.
However this leaves a comma at the beginning of the output line, $joinline. I cannot figure out why.
sub join_array {
## use my to keep variables local to subroutine
## there is what appears to be a simpler way to do this (see below) looping through the array
## but when i do it that way the resulting line keeps coming out with a comma before
## the first entry this method does not
my $i=0;
my $item;
my @array = @_;
foreach $item(@array){
$joinline = join (",", $joinline","$item");
}
return $joinline;
} #endsub