BATW Perl example/lesson: Replace variable in entries in a csv file

by Bonnie Dalzell


Here is a problem presented on a list:

I need to know how to use search and replace functions in perl. I want to loop through file A line by line and substitute a string globally in file B based upon the second element of the array in file A.

File A is a .csv file of this sort:

1,company 123
2,company 456


File B identifies the company by its company id nubmer in this format:

ID,1,NAME,PHONE

I want to replace the numeric company ID in file B with the company name, witing a new file C with the changes.

ID,company 123,NAME,PHONE
ID,company 456,NAME,PHONE
ID,company 123,NAME,PHONE


Here is a way of doing this:
Comments about the code are in blue
cut and paste the program below into a text editor and save as "find_replace_in_file.pl" or anything else you want to call it.
#!/usr/bin/perl
# by Bonnie Dalzell Jan 14, 2009
# copyright under GPL
# may be reused, modified and resdistributed as needed under GPL
## the motto of PERL is "there is more than one way to do it"
## in addition to the output to the file I have left a report to the shell
use strict;
our $a = "a.csv";
our $b = "b.csv";
our $combined = "combinedfiles.csv";
open (INDUMP, "<$a")  || die "cannot open $a";  
# you need to check the parentheses the < is included in with file name not speparated

our %companyid=();
our @lines2 = <INDUMP>; # read file into list
chomp(@lines2);
foreach my $item (@lines2) { # loop thru list
  chomp($item); 
  my @data=split(/,/,$item);
  print "key: $data[0] - value: $data[1]\n"; 
    #this is to the shell so you can monitor output - you can delete this or comment it out.
  $companyid{$data[0]} = $data[1]; 
                      # this sets up a key value pair where company id number is associated with company name
} #end foreach
close INDUMP;

open (INGROUP, "<$b") || die "cannot open $b";;
open (DUMP, ">$combined") || die "cannot open $combined"; 
                     ## if you use >> you append to a preexisting file 
                     ## if you use > you make a new version of the file.
while (<INGROUP>){ 
       my $in_line = $_; # using my in loop assures clean variable - no hanovers from previous loops
       chomp($in_line); 
       my @inline = split(/,/,$in_line);
       $inline[1] = $companyid{$inline[1]}; 
       # here we substitute the value from the companyid hash for the key taken from the new file
       my $line = &join_array(@inline);
       print "$line\n"; #this is to the shell so you can monitor output - you can delete this or comment it out
       print DUMP "$line\n";
} #endwhile
close(INGROUP);
close DUMP;
##### subroutine below this ######
sub join_array {
## use my to keep variables local to subroutine
## there is what appears to be a simpler way to do this (see below) looping through the array
## but when i do it that way the resulting line keeps coming out with a comma before
## the first entry this method does not 
  my $i=0;
  my @array = @_;
  my $joinline=$array[0];
  for( $i = 1; $i <= $#array; $i++){
      $joinline ="$joinline".','."$array[$i]"; 
      $i++;
  }
 return $joinline;
} #endsub
#####End Program ###################
Alternative method for joining array back into a single line for output.
However this leaves a comma at the beginning of the output line, $joinline. I cannot figure out why.


sub join_array { ## use my to keep variables local to subroutine ## there is what appears to be a simpler way to do this (see below) looping through the array ## but when i do it that way the resulting line keeps coming out with a comma before ## the first entry this method does not my $i=0; my $item; my @array = @_; foreach $item(@array){ $joinline = join (",", $joinline","$item"); } return $joinline; } #endsub