tools/cvs2cl/cvs2cl.pl

   1 #!/bin/sh
   2 exec perl -w -x $0 ${1+"$@"} # -*- mode: perl; perl-indent-level: 2; -*-
   3 #!perl -w
   4
   5
   6 ##############################################################
   7 ###                                                        ###
   8 ### cvs2cl.pl: produce ChangeLog(s) from `cvs log` output. ###
   9 ###                                                        ###
  10 ##############################################################
  11
  12 ## $Revision: 1.3 $
  13 ## $Date: 2003-02-20 21:17:07 $
  14 ## $Author: btb $
  15 ##
  16 ##   (C) 2001,2002,2003 Martyn J. Pearce <fluffy@cpan.org>, under the GNU GPL.
  17 ##   (C) 1999 Karl Fogel <kfogel@red-bean.com>, under the GNU GPL.
  18 ##
  19 ##   (Extensively hacked on by Melissa O'Neill <oneill@cs.sfu.ca>.)
  20 ##
  21 ## cvs2cl.pl is free software; you can redistribute it and/or modify
  22 ## it under the terms of the GNU General Public License as published by
  23 ## the Free Software Foundation; either version 2, or (at your option)
  24 ## any later version.
  25 ##
  26 ## cvs2cl.pl is distributed in the hope that it will be useful,
  27 ## but WITHOUT ANY WARRANTY; without even the implied warranty of
  28 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  29 ## GNU General Public License for more details.
  30 ##
  31 ## You may have received a copy of the GNU General Public License
  32 ## along with cvs2cl.pl; see the file COPYING.  If not, write to the
  33 ## Free Software Foundation, Inc., 59 Temple Place - Suite 330,
  34 ## Boston, MA 02111-1307, USA.
  35
  36 \f
  37 use strict;
  38 use Text::Wrap;
  39 use Time::Local;
  40 use File::Basename;
  41
  42 \f
  43 # The Plan:
  44 #
  45 # Read in the logs for multiple files, spit out a nice ChangeLog that
  46 # mirrors the information entered during `cvs commit'.
  47 #
  48 # The problem presents some challenges. In an ideal world, we could
  49 # detect files with the same author, log message, and checkin time --
  50 # each <filelist, author, time, logmessage> would be a changelog entry.
  51 # We'd sort them; and spit them out.  Unfortunately, CVS is *not atomic*
  52 # so checkins can span a range of times.  Also, the directory structure
  53 # could be hierarchical.
  54 #
  55 # Another question is whether we really want to have the ChangeLog
  56 # exactly reflect commits. An author could issue two related commits,
  57 # with different log entries, reflecting a single logical change to the
  58 # source. GNU style ChangeLogs group these under a single author/date.
  59 # We try to do the same.
  60 #
  61 # So, we parse the output of `cvs log', storing log messages in a
  62 # multilevel hash that stores the mapping:
  63 #   directory => author => time => message => filelist
  64 # As we go, we notice "nearby" commit times and store them together
  65 # (i.e., under the same timestamp), so they appear in the same log
  66 # entry.
  67 #
  68 # When we've read all the logs, we twist this mapping into
  69 # a time => author => message => filelist mapping for each directory.
  70 #
  71 # If we're not using the `--distributed' flag, the directory is always
  72 # considered to be `./', even as descend into subdirectories.
  73
  74 \f
  75 ############### Globals ################
  76
  77 # What we run to generate it:
  78 my $Log_Source_Command = "cvs log";
  79
  80 # In case we have to print it out:
  81 my $VERSION = '$Revision: 1.3 $';
  82 $VERSION =~ s/\S+\s+(\S+)\s+\S+/$1/;
  83
  84 ## Vars set by options:
  85
  86 # Print debugging messages?
  87 my $Debug = 0;
  88
  89 # Just show version and exit?
  90 my $Print_Version = 0;
  91
  92 # Just print usage message and exit?
  93 my $Print_Usage = 0;
  94
  95 # Single top-level ChangeLog, or one per subdirectory?
  96 my $Distributed = 0;
  97
  98 # What file should we generate (defaults to "ChangeLog")?
  99 my $Log_File_Name = "ChangeLog";
 100
 101 # Grab most recent entry date from existing ChangeLog file, just add
 102 # to that ChangeLog.
 103 my $Cumulative = 0;
 104
 105 # Expand usernames to email addresses based on a map file?
 106 my $User_Map_File = "";
 107
 108 # Output to a file or to stdout?
 109 my $Output_To_Stdout = 0;
 110
 111 # Eliminate empty log messages?
 112 my $Prune_Empty_Msgs = 0;
 113
 114 # Tags of which not to output
 115 my @ignore_tags;
 116
 117 # Don't call Text::Wrap on the body of the message
 118 my $No_Wrap = 0;
 119
 120 # Separates header from log message.  Code assumes it is either " " or
 121 # "\n\n", so if there's ever an option to set it to something else,
 122 # make sure to go through all conditionals that use this var.
 123 my $After_Header = " ";
 124
 125 # XML Encoding
 126 my $XML_Encoding = '';
 127
 128 # Format more for programs than for humans.
 129 my $XML_Output = 0;
 130
 131 # Do some special tweaks for log data that was written in FSF
 132 # ChangeLog style.
 133 my $FSF_Style = 0;
 134
 135 # Show times in UTC instead of local time
 136 my $UTC_Times = 0;
 137
 138 # Show times in output?
 139 my $Show_Times = 1;
 140
 141 # Show day of week in output?
 142 my $Show_Day_Of_Week = 0;
 143
 144 # Show revision numbers in output?
 145 my $Show_Revisions = 0;
 146
 147 # Show tags (symbolic names) in output?
 148 my $Show_Tags = 0;
 149
 150 # Show tags separately in output?
 151 my $Show_Tag_Dates = 0;
 152
 153 # Show branches by symbolic name in output?
 154 my $Show_Branches = 0;
 155
 156 # Show only revisions on these branches or their ancestors.
 157 my @Follow_Branches;
 158
 159 # Don't bother with files matching this regexp.
 160 my @Ignore_Files;
 161
 162 # How exactly we match entries.  We definitely want "o",
 163 # and user might add "i" by using --case-insensitive option.
 164 my $Case_Insensitive = 0;
 165
 166 # Maybe only show log messages matching a certain regular expression.
 167 my $Regexp_Gate = "";
 168
 169 # Pass this global option string along to cvs, to the left of `log':
 170 my $Global_Opts = "";
 171
 172 # Pass this option string along to the cvs log subcommand:
 173 my $Command_Opts = "";
 174
 175 # Read log output from stdin instead of invoking cvs log?
 176 my $Input_From_Stdin = 0;
 177
 178 # Don't show filenames in output.
 179 my $Hide_Filenames = 0;
 180
 181 # Max checkin duration. CVS checkin is not atomic, so we may have checkin
 182 # times that span a range of time. We assume that checkins will last no
 183 # longer than $Max_Checkin_Duration seconds, and that similarly, no
 184 # checkins will happen from the same users with the same message less
 185 # than $Max_Checkin_Duration seconds apart.
 186 my $Max_Checkin_Duration = 180;
 187
 188 # What to put at the front of [each] ChangeLog.
 189 my $ChangeLog_Header = "";
 190
 191 # Whether to enable 'delta' mode, and for what start/end tags.
 192 my $Delta_Mode = 0;
 193 my $Delta_From = "";
 194 my $Delta_To = "";
 195
 196 ## end vars set by options.
 197
 198 # latest observed times for the start/end tags in delta mode
 199 my $Delta_StartTime = 0;
 200 my $Delta_EndTime = 0;
 201
 202 # In 'cvs log' output, one long unbroken line of equal signs separates
 203 # files:
 204 my $file_separator = "======================================="
 205                    . "======================================";
 206
 207 # In 'cvs log' output, a shorter line of dashes separates log messages
 208 # within a file:
 209 my $logmsg_separator = "----------------------------";
 210
 211 ############### End globals ############
 212
 213 \f
 214 &parse_options ();
 215 &derive_change_log ();
 216
 217 \f
 218 ### Everything below is subroutine definitions. ###
 219
 220 # If accumulating, grab the boundary date from pre-existing ChangeLog.
 221 sub maybe_grab_accumulation_date ()
 222 {
 223   if (! $Cumulative) {
 224     return "";
 225   }
 226
 227   # else
 228
 229   open (LOG, "$Log_File_Name")
 230       or die ("trouble opening $Log_File_Name for reading ($!)");
 231
 232   my $boundary_date;
 233   while (<LOG>)
 234   {
 235     if (/^(\d\d\d\d-\d\d-\d\d\s+\d\d:\d\d)/)
 236     {
 237       $boundary_date = "$1";
 238       last;
 239     }
 240   }
 241
 242   close (LOG);
 243   return $boundary_date;
 244 }
 245
 246 # Fills up a ChangeLog structure in the current directory.
 247 sub derive_change_log ()
 248 {
 249   # See "The Plan" above for a full explanation.
 250
 251   my %grand_poobah;
 252
 253   my $file_full_path;
 254   my $time;
 255   my $revision;
 256   my $author;
 257   my $msg_txt;
 258   my $detected_file_separator;
 259
 260   my %tag_date_printed;
 261
 262   # Might be adding to an existing ChangeLog
 263   my $accumulation_date = &maybe_grab_accumulation_date ();
 264   if ($accumulation_date) {
 265     # Insert -d immediately after 'cvs log'
 266     my $Log_Date_Command = "-d\'>${accumulation_date}\'";
 267     $Log_Source_Command =~ s/(^.*log\S*)/$1 $Log_Date_Command/;
 268     &debug ("(adding log msg starting from $accumulation_date)\n");
 269   }
 270
 271   # We might be expanding usernames
 272   my %usermap;
 273
 274   # In general, it's probably not very maintainable to use state
 275   # variables like this to tell the loop what it's doing at any given
 276   # moment, but this is only the first one, and if we never have more
 277   # than a few of these, it's okay.
 278   my $collecting_symbolic_names = 0;
 279   my %symbolic_names;    # Where tag names get stored.
 280   my %branch_names;      # We'll grab branch names while we're at it.
 281   my %branch_numbers;    # Save some revisions for @Follow_Branches
 282   my @branch_roots;      # For showing which files are branch ancestors.
 283
 284   # Bleargh.  Compensate for a deficiency of custom wrapping.
 285   if (($After_Header ne " ") and $FSF_Style)
 286   {
 287     $After_Header .= "\t";
 288   }
 289
 290   if (! $Input_From_Stdin) {
 291     &debug ("(run \"${Log_Source_Command}\")\n");
 292     open (LOG_SOURCE, "$Log_Source_Command |")
 293         or die "unable to run \"${Log_Source_Command}\"";
 294   }
 295   else {
 296     open (LOG_SOURCE, "-") or die "unable to open stdin for reading";
 297   }
 298
 299   binmode LOG_SOURCE;
 300
 301   %usermap = &maybe_read_user_map_file ();
 302
 303   while (<LOG_SOURCE>)
 304   {
 305     # Canonicalize line endings
 306     s/\r$//;
 307     # If on a new file and don't see filename, skip until we find it, and
 308     # when we find it, grab it.
 309     if ((! (defined $file_full_path)) and /^Working file: (.*)/)
 310     {
 311       $file_full_path = $1;
 312       if (@Ignore_Files)
 313       {
 314         my $base;
 315         ($base, undef, undef) = fileparse ($file_full_path);
 316         # Ouch, I wish trailing operators in regexps could be
 317         # evaluated on the fly!
 318         if ($Case_Insensitive) {
 319           if (grep ($file_full_path =~ m|$_|i, @Ignore_Files)) {
 320             undef $file_full_path;
 321           }
 322         }
 323         elsif (grep ($file_full_path =~ m|$_|, @Ignore_Files)) {
 324           undef $file_full_path;
 325         }
 326       }
 327       next;
 328     }
 329
 330     # Just spin wheels if no file defined yet.
 331     next if (! $file_full_path);
 332
 333     # Collect tag names in case we're asked to print them in the output.
 334     if (/^symbolic names:$/) {
 335       $collecting_symbolic_names = 1;
 336       next;  # There's no more info on this line, so skip to next
 337     }
 338     if ($collecting_symbolic_names)
 339     {
 340       # All tag names are listed with whitespace in front in cvs log
 341       # output; so if see non-whitespace, then we're done collecting.
 342       if (/^\S/) {
 343         $collecting_symbolic_names = 0;
 344       }
 345       else    # we're looking at a tag name, so parse & store it
 346       {
 347         # According to the Cederqvist manual, in node "Tags", tag
 348         # names must start with an uppercase or lowercase letter and
 349         # can contain uppercase and lowercase letters, digits, `-',
 350         # and `_'.  However, it's not our place to enforce that, so
 351         # we'll allow anything CVS hands us to be a tag:
 352         /^\s+([^:]+): ([\d.]+)$/;
 353         my $tag_name = $1;
 354         my $tag_rev  = $2;
 355
 356         # A branch number either has an odd number of digit sections
 357         # (and hence an even number of dots), or has ".0." as the
 358         # second-to-last digit section.  Test for these conditions.
 359         my $real_branch_rev = "";
 360         if (($tag_rev =~ /^(\d+\.\d+\.)+\d+$/)   # Even number of dots...
 361             and (! ($tag_rev =~ /^(1\.)+1$/)))   # ...but not "1.[1.]1"
 362         {
 363           $real_branch_rev = $tag_rev;
 364         }
 365         elsif ($tag_rev =~ /(\d+\.(\d+\.)+)0.(\d+)/)  # Has ".0."
 366         {
 367           $real_branch_rev = $1 . $3;
 368         }
 369         # If we got a branch, record its number.
 370         if ($real_branch_rev)
 371         {
 372           $branch_names{$real_branch_rev} = $tag_name;
 373           if (@Follow_Branches) {
 374             if (grep ($_ eq $tag_name, @Follow_Branches)) {
 375               $branch_numbers{$tag_name} = $real_branch_rev;
 376             }
 377           }
 378         }
 379         else {
 380           # Else it's just a regular (non-branch) tag.
 381           push (@{$symbolic_names{$tag_rev}}, $tag_name);
 382         }
 383       }
 384     }
 385     # End of code for collecting tag names.
 386
 387     # If have file name, but not revision, and see revision, then grab
 388     # it.  (We collect unconditionally, even though we may or may not
 389     # ever use it.)
 390     if ((! (defined $revision)) and (/^revision (\d+\.[\d.]+)/))
 391     {
 392       $revision = $1;
 393
 394       if (@Follow_Branches)
 395       {
 396         foreach my $branch (@Follow_Branches)
 397         {
 398           # Special case for following trunk revisions
 399           if (($branch =~ /^trunk$/i) and ($revision =~ /^[0-9]+\.[0-9]+$/))
 400           {
 401             goto dengo;
 402           }
 403
 404           my $branch_number = $branch_numbers{$branch};
 405           if ($branch_number)
 406           {
 407             # Are we on one of the follow branches or an ancestor of
 408             # same?
 409             #
 410             # If this revision is a prefix of the branch number, or
 411             # possibly is less in the minormost number, OR if this
 412             # branch number is a prefix of the revision, then yes.
 413             # Otherwise, no.
 414             #
 415             # So below, we determine if any of those conditions are
 416             # met.
 417
 418             # Trivial case: is this revision on the branch?
 419             # (Compare this way to avoid regexps that screw up Emacs
 420             # indentation, argh.)
 421             if ((substr ($revision, 0, ((length ($branch_number)) + 1)))
 422                 eq ($branch_number . "."))
 423             {
 424               goto dengo;
 425             }
 426             # Non-trivial case: check if rev is ancestral to branch
 427             elsif ((length ($branch_number)) > (length ($revision)))
 428             {
 429               $revision =~ /^((?:\d+\.)+)(\d+)$/;
 430               my $r_left = $1;          # still has the trailing "."
 431               my $r_end = $2;
 432
 433               $branch_number =~ /^((?:\d+\.)+)(\d+)\.\d+$/;
 434               my $b_left = $1;  # still has trailing "."
 435               my $b_mid  = $2;   # has no trailing "."
 436
 437               if (($r_left eq $b_left)
 438                   && ($r_end <= $b_mid))
 439               {
 440                 goto dengo;
 441               }
 442             }
 443           }
 444         }
 445       }
 446       else    # (! @Follow_Branches)
 447       {
 448         next;
 449       }
 450
 451       # Else we are following branches, but this revision isn't on the
 452       # path.  So skip it.
 453       undef $revision;
 454     dengo:
 455       next;
 456     }
 457
 458     # If we don't have a revision right now, we couldn't possibly
 459     # be looking at anything useful.
 460     if (! (defined ($revision))) {
 461       $detected_file_separator = /^$file_separator$/o;
 462       if ($detected_file_separator) {
 463         # No revisions for this file; can happen, e.g. "cvs log -d DATE"
 464         goto CLEAR;
 465       }
 466       else {
 467         next;
 468       }
 469     }
 470
 471     # If have file name but not date and author, and see date or
 472     # author, then grab them:
 473     unless (defined $time)
 474     {
 475       if (/^date: .*/)
 476       {
 477         ($time, $author) = &parse_date_and_author ($_);
 478         if (defined ($usermap{$author}) and $usermap{$author}) {
 479           $author = $usermap{$author};
 480         }
 481       }
 482       else {
 483         $detected_file_separator = /^$file_separator$/o;
 484         if ($detected_file_separator) {
 485           # No revisions for this file; can happen, e.g. "cvs log -d DATE"
 486           goto CLEAR;
 487         }
 488       }
 489       # If the date/time/author hasn't been found yet, we couldn't
 490       # possibly care about anything we see.  So skip:
 491       next;
 492     }
 493
 494     # A "branches: ..." line here indicates that one or more branches
 495     # are rooted at this revision.  If we're showing branches, then we
 496     # want to show that fact as well, so we collect all the branches
 497     # that this is the latest ancestor of and store them in
 498     # @branch_roots.  Just for reference, the format of the line we're
 499     # seeing at this point is:
 500     #
 501     #    branches:  1.5.2;  1.5.4;  ...;
 502     #
 503     # Okay, here goes:
 504
 505     if (/^branches:\s+(.*);$/)
 506     {
 507       if ($Show_Branches)
 508       {
 509         my $lst = $1;
 510         $lst =~ s/(1\.)+1;|(1\.)+1$//;  # ignore the trivial branch 1.1.1
 511         if ($lst) {
 512           @branch_roots = split (/;\s+/, $lst);
 513         }
 514         else {
 515           undef @branch_roots;
 516         }
 517         next;
 518       }
 519       else
 520       {
 521         # Ugh.  This really bothers me.  Suppose we see a log entry
 522         # like this:
 523         #
 524         #    ----------------------------
 525         #    revision 1.1
 526         #    date: 1999/10/17 03:07:38;  author: jrandom;  state: Exp;
 527         #    branches:  1.1.2;
 528         #    Intended first line of log message begins here.
 529         #    ----------------------------
 530         #
 531         # The question is, how we can tell the difference between that
 532         # log message and a *two*-line log message whose first line is
 533         #
 534         #    "branches:  1.1.2;"
 535         #
 536         # See the problem?  The output of "cvs log" is inherently
 537         # ambiguous.
 538         #
 539         # For now, we punt: we liberally assume that people don't
 540         # write log messages like that, and just toss a "branches:"
 541         # line if we see it but are not showing branches.  I hope no
 542         # one ever loses real log data because of this.
 543         next;
 544       }
 545     }
 546
 547     # If have file name, time, and author, then we're just grabbing
 548     # log message texts:
 549     $detected_file_separator = /^$file_separator$/o;
 550     if ($detected_file_separator && ! (defined $revision)) {
 551       # No revisions for this file; can happen, e.g. "cvs log -d DATE"
 552       goto CLEAR;
 553     }
 554     unless ($detected_file_separator || /^$logmsg_separator$/o)
 555     {
 556       $msg_txt .= $_;   # Normally, just accumulate the message...
 557       next;
 558     }
 559     # ... until a msg separator is encountered:
 560     # Ensure the message contains something:
 561     if ((! $msg_txt)
 562         || ($msg_txt =~ /^\s*\.\s*$|^\s*$/)
 563         || ($msg_txt =~ /\*\*\* empty log message \*\*\*/))
 564     {
 565       if ($Prune_Empty_Msgs) {
 566         goto CLEAR;
 567       }
 568       # else
 569       $msg_txt = "[no log message]\n";
 570     }
 571
 572     ### Store it all in the Grand Poobah:
 573     {
 574       my $dir_key;        # key into %grand_poobah
 575       my %qunk;           # complicated little jobbie, see below
 576
 577       # Each revision of a file has a little data structure (a `qunk')
 578       # associated with it.  That data structure holds not only the
 579       # file's name, but any additional information about the file
 580       # that might be needed in the output, such as the revision
 581       # number, tags, branches, etc.  The reason to have these things
 582       # arranged in a data structure, instead of just appending them
 583       # textually to the file's name, is that we may want to do a
 584       # little rearranging later as we write the output.  For example,
 585       # all the files on a given tag/branch will go together, followed
 586       # by the tag in parentheses (so trunk or otherwise non-tagged
 587       # files would go at the end of the file list for a given log
 588       # message).  This rearrangement is a lot easier to do if we
 589       # don't have to reparse the text.
 590       #
 591       # A qunk looks like this:
 592       #
 593       #   {
 594       #     filename    =>    "hello.c",
 595       #     revision    =>    "1.4.3.2",
 596       #     time        =>    a timegm() return value (moment of commit)
 597       #     tags        =>    [ "tag1", "tag2", ... ],
 598       #     branch      =>    "branchname" # There should be only one, right?
 599       #     branchroots =>    [ "branchtag1", "branchtag2", ... ]
 600       #   }
 601
 602       if ($Distributed) {
 603         # Just the basename, don't include the path.
 604         ($qunk{'filename'}, $dir_key, undef) = fileparse ($file_full_path);
 605       }
 606       else {
 607         $dir_key = "./";
 608         $qunk{'filename'} = $file_full_path;
 609       }
 610
 611       # This may someday be used in a more sophisticated calculation
 612       # of what other files are involved in this commit.  For now, we
 613       # don't use it much except for delta mode, because the
 614       # common-commit-detection algorithm is hypothesized to be
 615       # "good enough" as it stands.
 616       $qunk{'time'} = $time;
 617
 618       # We might be including revision numbers and/or tags and/or
 619       # branch names in the output.  Most of the code from here to
 620       # loop-end deals with organizing these in qunk.
 621
 622       $qunk{'revision'} = $revision;
 623
 624       # Grab the branch, even though we may or may not need it:
 625       $qunk{'revision'} =~ /((?:\d+\.)+)\d+/;
 626       my $branch_prefix = $1;
 627       $branch_prefix =~ s/\.$//;  # strip off final dot
 628       if ($branch_names{$branch_prefix}) {
 629         $qunk{'branch'} = $branch_names{$branch_prefix};
 630       }
 631
 632       # If there's anything in the @branch_roots array, then this
 633       # revision is the root of at least one branch.  We'll display
 634       # them as branch names instead of revision numbers, the
 635       # substitution for which is done directly in the array:
 636       if (@branch_roots) {
 637         my @roots = map { $branch_names{$_} } @branch_roots;
 638         $qunk{'branchroots'} = \@roots;
 639       }
 640
 641       # Save tags too.
 642       if (defined ($symbolic_names{$revision})) {
 643         $qunk{'tags'} = $symbolic_names{$revision};
 644         delete $symbolic_names{$revision};
 645
 646         # If we're in 'delta' mode, update the latest observed
 647         # times for the beginning and ending tags, and
 648         # when we get around to printing output, we will simply restrict
 649         # ourselves to that timeframe...
 650
 651         if ($Delta_Mode) {
 652           if (($time > $Delta_StartTime) &&
 653               (grep { $_ eq $Delta_From } @{$qunk{'tags'}}))
 654           {
 655             $Delta_StartTime = $time;
 656           }
 657
 658           if (($time > $Delta_EndTime) &&
 659               (grep { $_ eq $Delta_To } @{$qunk{'tags'}}))
 660           {
 661             $Delta_EndTime = $time;
 662           }
 663         }
 664       }
 665
 666       # Add this file to the list
 667       # (We use many spoonfuls of autovivication magic. Hashes and arrays
 668       # will spring into existence if they aren't there already.)
 669
 670       &debug ("(pushing log msg for ${dir_key}$qunk{'filename'})\n");
 671
 672       # Store with the files in this commit.  Later we'll loop through
 673       # again, making sure that revisions with the same log message
 674       # and nearby commit times are grouped together as one commit.
 675       push (@{$grand_poobah{$dir_key}{$author}{$time}{$msg_txt}}, \%qunk);
 676     }
 677
 678   CLEAR:
 679     # Make way for the next message
 680     undef $msg_txt;
 681     undef $time;
 682     undef $revision;
 683     undef $author;
 684     undef @branch_roots;
 685
 686     # Maybe even make way for the next file:
 687     if ($detected_file_separator) {
 688       undef $file_full_path;
 689       undef %branch_names;
 690       undef %branch_numbers;
 691       undef %symbolic_names;
 692     }
 693   }
 694
 695   close (LOG_SOURCE);
 696
 697   ### Process each ChangeLog
 698
 699   while (my ($dir,$authorhash) = each %grand_poobah)
 700   {
 701     &debug ("DOING DIR: $dir\n");
 702
 703     # Here we twist our hash around, from being
 704     #   author => time => message => filelist
 705     # in %$authorhash to
 706     #   time => author => message => filelist
 707     # in %changelog.
 708     #
 709     # This is also where we merge entries.  The algorithm proceeds
 710     # through the timeline of the changelog with a sliding window of
 711     # $Max_Checkin_Duration seconds; within that window, entries that
 712     # have the same log message are merged.
 713     #
 714     # (To save space, we zap %$authorhash after we've copied
 715     # everything out of it.)
 716
 717     my %changelog;
 718     while (my ($author,$timehash) = each %$authorhash)
 719     {
 720       my $lasttime;
 721       my %stamptime;
 722       foreach my $time (sort {$main::a <=> $main::b} (keys %$timehash))
 723       {
 724         my $msghash = $timehash->{$time};
 725         while (my ($msg,$qunklist) = each %$msghash)
 726         {
 727           my $stamptime = $stamptime{$msg};
 728           if ((defined $stamptime)
 729               and (($time - $stamptime) < $Max_Checkin_Duration)
 730               and (defined $changelog{$stamptime}{$author}{$msg}))
 731           {
 732             push(@{$changelog{$stamptime}{$author}{$msg}}, @$qunklist);
 733           }
 734           else {
 735             $changelog{$time}{$author}{$msg} = $qunklist;
 736             $stamptime{$msg} = $time;
 737           }
 738         }
 739       }
 740     }
 741     undef (%$authorhash);
 742
 743     ### Now we can write out the ChangeLog!
 744
 745     my ($logfile_here, $logfile_bak, $tmpfile);
 746
 747     if (! $Output_To_Stdout) {
 748       $logfile_here =  $dir . $Log_File_Name;
 749       $logfile_here =~ s/^\.\/\//\//;   # fix any leading ".//" problem
 750       $tmpfile      = "${logfile_here}.cvs2cl$$.tmp";
 751       $logfile_bak  = "${logfile_here}.bak";
 752
 753       open (LOG_OUT, ">$tmpfile") or die "Unable to open \"$tmpfile\"";
 754     }
 755     else {
 756       open (LOG_OUT, ">-") or die "Unable to open stdout for writing";
 757     }
 758
 759     print LOG_OUT $ChangeLog_Header;
 760
 761     if ($XML_Output) {
 762       my $encoding    =
 763         length $XML_Encoding ? qq'encoding="$XML_Encoding"' : '';
 764       my $version     = 'version="1.0"';
 765       my $declaration =
 766         sprintf '<?xml %s?>', join ' ', grep length, $version, $encoding;
 767       my $root        =
 768         '<changelog xmlns="http://www.red-bean.com/xmlns/cvs2cl/">';
 769       print LOG_OUT "$declaration\n\n$root\n\n";
 770     }
 771
 772     foreach my $time (sort {$main::b <=> $main::a} (keys %changelog))
 773     {
 774       next if ($Delta_Mode &&
 775                (($time <= $Delta_StartTime) ||
 776                 ($time > $Delta_EndTime && $Delta_EndTime)));
 777
 778       # Set up the date/author line.
 779       # kff todo: do some more XML munging here, on the header
 780       # part of the entry:
 781       my ($ignore,$min,$hour,$mday,$mon,$year,$wday)
 782           = $UTC_Times ? gmtime($time) : localtime($time);
 783
 784       # XML output includes everything else, we might as well make
 785       # it always include Day Of Week too, for consistency.
 786       if ($Show_Day_Of_Week or $XML_Output) {
 787         $wday = ("Sunday", "Monday", "Tuesday", "Wednesday",
 788                  "Thursday", "Friday", "Saturday")[$wday];
 789         $wday = ($XML_Output) ? "<weekday>${wday}</weekday>\n" : " $wday";
 790       }
 791       else {
 792         $wday = "";
 793       }
 794
 795       my $authorhash = $changelog{$time};
 796       if ($Show_Tag_Dates) {
 797         my %tags;
 798         while (my ($author,$mesghash) = each %$authorhash) {
 799           while (my ($msg,$qunk) = each %$mesghash) {
 800             foreach my $qunkref2 (@$qunk) {
 801               if (defined ($$qunkref2{'tags'})) {
 802                 foreach my $tag (@{$$qunkref2{'tags'}}) {
 803                   $tags{$tag} = 1;
 804                 }
 805               }
 806             }
 807           }
 808         }
 809         foreach my $tag (keys %tags) {
 810           if (!defined $tag_date_printed{$tag}) {
 811             $tag_date_printed{$tag} = $time;
 812             if ($XML_Output) {
 813               # NOT YET DONE
 814             }
 815             else {
 816              if ($Show_Times) {
 817               printf LOG_OUT ("%4u-%02u-%02u${wday} %02u:%02u  tag %s\n\n",
 818                               $year+1900, $mon+1, $mday, $hour, $min, $tag);
 819              } else {
 820                printf LOG_OUT ("%4u-%02u-%02u${wday}  tag %s\n\n",
 821                                $year+1900, $mon+1, $mday, $tag);
 822              }
 823             }
 824           }
 825         }
 826       }
 827       while (my ($author,$mesghash) = each %$authorhash)
 828       {
 829         # If XML, escape in outer loop to avoid compound quoting:
 830         if ($XML_Output) {
 831           $author = &xml_escape ($author);
 832         }
 833
 834       FOOBIE:
 835         while (my ($msg,$qunklist) = each %$mesghash)
 836         {
 837           ## MJP: 19.xii.01 : Exclude @ignore_tags
 838           for my $ignore_tag (@ignore_tags) {
 839             next FOOBIE
 840               if grep $_ eq $ignore_tag, map(@{$_->{tags}},
 841                                              grep(defined $_->{tags},
 842                                                   @$qunklist));
 843           }
 844           ## MJP: 19.xii.01 : End exclude @ignore_tags
 845
 846           my $files               = &pretty_file_list ($qunklist);
 847           my $header_line;          # date and author
 848           my $body;                 # see below
 849           my $wholething;           # $header_line + $body
 850
 851           if ($XML_Output) {
 852             $header_line =
 853                 sprintf ("<date>%4u-%02u-%02u</date>\n"
 854                          . "${wday}"
 855                          . "<time>%02u:%02u</time>\n"
 856                          . "<author>%s</author>\n",
 857                          $year+1900, $mon+1, $mday, $hour, $min, $author);
 858           }
 859           else {
 860            if ($Show_Times) {
 861             $header_line =
 862                 sprintf ("%4u-%02u-%02u${wday} %02u:%02u  %s\n\n",
 863                          $year+1900, $mon+1, $mday, $hour, $min, $author);
 864            } else {
 865              $header_line =
 866                 sprintf ("%4u-%02u-%02u${wday}  %s\n\n",
 867                          $year+1900, $mon+1, $mday, $author);
 868            }
 869           }
 870
 871           $Text::Wrap::huge = 'overflow'
 872             if $Text::Wrap::VERSION >= 2001.0130;
 873           # Reshape the body according to user preferences.
 874           if ($XML_Output)
 875           {
 876             $msg = &preprocess_msg_text ($msg);
 877             $body = $files . $msg;
 878           }
 879           elsif ($No_Wrap)
 880           {
 881             $msg = &preprocess_msg_text ($msg);
 882             $files = wrap ("\t", "      ", "$files");
 883             $msg =~ s/\n(.*)/\n\t$1/g;
 884             unless ($After_Header eq " ") {
 885               $msg =~ s/^(.*)/\t$1/g;
 886             }
 887             $body = $files . $After_Header . $msg;
 888           }
 889           else  # do wrapping, either FSF-style or regular
 890           {
 891             if ($FSF_Style)
 892             {
 893               $files = wrap ("\t", "        ", "$files");
 894
 895               my $files_last_line_len = 0;
 896               if ($After_Header eq " ")
 897               {
 898                 $files_last_line_len = &last_line_len ($files);
 899                 $files_last_line_len += 1;  # for $After_Header
 900               }
 901
 902               $msg = &wrap_log_entry
 903                   ($msg, "\t", 69 - $files_last_line_len, 69);
 904               $body = $files . $After_Header . $msg;
 905             }
 906             else  # not FSF-style
 907             {
 908               $msg = &preprocess_msg_text ($msg);
 909               $body = $files . $After_Header . $msg;
 910               $body = wrap ("\t", "        ", "$body");
 911             }
 912           }
 913
 914           $wholething = $header_line . $body;
 915
 916           if ($XML_Output) {
 917             $wholething = "<entry>\n${wholething}</entry>\n";
 918           }
 919
 920           # One last check: make sure it passes the regexp test, if the
 921           # user asked for that.  We have to do it here, so that the
 922           # test can match against information in the header as well
 923           # as in the text of the log message.
 924
 925           # How annoying to duplicate so much code just because I
 926           # can't figure out a way to evaluate scalars on the trailing
 927           # operator portion of a regular expression.  Grrr.
 928           if ($Case_Insensitive) {
 929             unless ($Regexp_Gate && ($wholething !~ /$Regexp_Gate/oi)) {
 930               print LOG_OUT "${wholething}\n";
 931             }
 932           }
 933           else {
 934             unless ($Regexp_Gate && ($wholething !~ /$Regexp_Gate/o)) {
 935               print LOG_OUT "${wholething}\n";
 936             }
 937           }
 938         }
 939       }
 940     }
 941
 942     if ($XML_Output) {
 943       print LOG_OUT "</changelog>\n";
 944     }
 945
 946     close (LOG_OUT);
 947
 948     if (! $Output_To_Stdout)
 949     {
 950       # If accumulating, append old data to new before renaming.  But
 951       # don't append the most recent entry, since it's already in the
 952       # new log due to CVS's idiosyncratic interpretation of "log -d".
 953       if ($Cumulative && -f $logfile_here)
 954       {
 955         open (NEW_LOG, ">>$tmpfile")
 956             or die "trouble appending to $tmpfile ($!)";
 957
 958         open (OLD_LOG, "<$logfile_here")
 959             or die "trouble reading from $logfile_here ($!)";
 960
 961         my $started_first_entry = 0;
 962         my $passed_first_entry = 0;
 963         while (<OLD_LOG>)
 964         {
 965           if (! $passed_first_entry)
 966           {
 967             if ((! $started_first_entry)
 968                 && /^(\d\d\d\d-\d\d-\d\d\s+\d\d:\d\d)/) {
 969               $started_first_entry = 1;
 970             }
 971             elsif (/^(\d\d\d\d-\d\d-\d\d\s+\d\d:\d\d)/) {
 972               $passed_first_entry = 1;
 973               print NEW_LOG $_;
 974             }
 975           }
 976           else {
 977             print NEW_LOG $_;
 978           }
 979         }
 980
 981         close (NEW_LOG);
 982         close (OLD_LOG);
 983       }
 984
 985       if (-f $logfile_here) {
 986         rename ($logfile_here, $logfile_bak);
 987       }
 988       rename ($tmpfile, $logfile_here);
 989     }
 990   }
 991 }
 992
 993 sub parse_date_and_author ()
 994 {
 995   # Parses the date/time and author out of a line like:
 996   #
 997   # date: 1999/02/19 23:29:05;  author: apharris;  state: Exp;
 998
 999   my $line = shift;
1000
1001   my ($year, $mon, $mday, $hours, $min, $secs, $author) = $line =~
1002       m#(\d+)/(\d+)/(\d+)\s+(\d+):(\d+):(\d+);\s+author:\s+([^;]+);#
1003           or  die "Couldn't parse date ``$line''";
1004   die "Bad date or Y2K issues" unless ($year > 1969 and $year < 2258);
1005   # Kinda arbitrary, but useful as a sanity check
1006   my $time = timegm($secs,$min,$hours,$mday,$mon-1,$year-1900);
1007
1008   return ($time, $author);
1009 }
1010
1011 # Here we take a bunch of qunks and convert them into printed
1012 # summary that will include all the information the user asked for.
1013 sub pretty_file_list ()
1014 {
1015   if ($Hide_Filenames and (! $XML_Output)) {
1016     return "";
1017   }
1018
1019   my $qunksref = shift;
1020   my @qunkrefs = @$qunksref;
1021   my @filenames;
1022   my $beauty = "";          # The accumulating header string for this entry.
1023   my %non_unanimous_tags;   # Tags found in a proper subset of qunks
1024   my %unanimous_tags;       # Tags found in all qunks
1025   my %all_branches;         # Branches found in any qunk
1026   my $common_dir = undef;   # Dir prefix common to all files ("" if none)
1027   my $fbegun = 0;           # Did we begin printing filenames yet?
1028
1029   # First, loop over the qunks gathering all the tag/branch names.
1030   # We'll put them all in non_unanimous_tags, and take out the
1031   # unanimous ones later.
1032  QUNKREF:
1033   foreach my $qunkref (@qunkrefs)
1034   {
1035     ## MJP: 19.xii.01 : Exclude @ignore_tags
1036     for my $ignore_tag (@ignore_tags) {
1037       next QUNKREF
1038         if grep $_ eq $ignore_tag, @{$$qunkref{'tags'}};
1039     }
1040     ## MJP: 19.xii.01 : End exclude @ignore_tags
1041
1042     # Keep track of whether all the files in this commit were in the
1043     # same directory, and memorize it if so.  We can make the output a
1044     # little more compact by mentioning the directory only once.
1045     if ((scalar (@qunkrefs)) > 1)
1046     {
1047       if (! (defined ($common_dir)))
1048       {
1049         my ($base, $dir);
1050         ($base, $dir, undef) = fileparse ($$qunkref{'filename'});
1051
1052         if ((! (defined ($dir)))  # this first case is sheer paranoia
1053             or ($dir eq "")
1054             or ($dir eq "./")
1055             or ($dir eq ".\\"))
1056         {
1057           $common_dir = "";
1058         }
1059         else
1060         {
1061           $common_dir = $dir;
1062         }
1063       }
1064       elsif ($common_dir ne "")
1065       {
1066         # Already have a common dir prefix, so how much of it can we preserve?
1067         $common_dir = &common_path_prefix ($$qunkref{'filename'}, $common_dir);
1068       }
1069     }
1070     else  # only one file in this entry anyway, so common dir not an issue
1071     {
1072       $common_dir = "";
1073     }
1074
1075     if (defined ($$qunkref{'branch'})) {
1076       $all_branches{$$qunkref{'branch'}} = 1;
1077     }
1078     if (defined ($$qunkref{'tags'})) {
1079       foreach my $tag (@{$$qunkref{'tags'}}) {
1080         $non_unanimous_tags{$tag} = 1;
1081       }
1082     }
1083   }
1084
1085   # Any tag held by all qunks will be printed specially... but only if
1086   # there are multiple qunks in the first place!
1087   if ((scalar (@qunkrefs)) > 1) {
1088     foreach my $tag (keys (%non_unanimous_tags)) {
1089       my $everyone_has_this_tag = 1;
1090       foreach my $qunkref (@qunkrefs) {
1091         if ((! (defined ($$qunkref{'tags'})))
1092             or (! (grep ($_ eq $tag, @{$$qunkref{'tags'}})))) {
1093           $everyone_has_this_tag = 0;
1094         }
1095       }
1096       if ($everyone_has_this_tag) {
1097         $unanimous_tags{$tag} = 1;
1098         delete $non_unanimous_tags{$tag};
1099       }
1100     }
1101   }
1102
1103   if ($XML_Output)
1104   {
1105     # If outputting XML, then our task is pretty simple, because we
1106     # don't have to detect common dir, common tags, branch prefixing,
1107     # etc.  We just output exactly what we have, and don't worry about
1108     # redundancy or readability.
1109
1110     foreach my $qunkref (@qunkrefs)
1111     {
1112       my $filename    = $$qunkref{'filename'};
1113       my $revision    = $$qunkref{'revision'};
1114       my $tags        = $$qunkref{'tags'};
1115       my $branch      = $$qunkref{'branch'};
1116       my $branchroots = $$qunkref{'branchroots'};
1117
1118       $filename = &xml_escape ($filename);   # probably paranoia
1119       $revision = &xml_escape ($revision);   # definitely paranoia
1120
1121       $beauty .= "<file>\n";
1122       $beauty .= "<name>${filename}</name>\n";
1123       $beauty .= "<revision>${revision}</revision>\n";
1124       if ($branch) {
1125         $branch   = &xml_escape ($branch);     # more paranoia
1126         $beauty .= "<branch>${branch}</branch>\n";
1127       }
1128       foreach my $tag (@$tags) {
1129         $tag = &xml_escape ($tag);  # by now you're used to the paranoia
1130         $beauty .= "<tag>${tag}</tag>\n";
1131       }
1132       foreach my $root (@$branchroots) {
1133         $root = &xml_escape ($root);  # which is good, because it will continue
1134         $beauty .= "<branchroot>${root}</branchroot>\n";
1135       }
1136       $beauty .= "</file>\n";
1137     }
1138
1139     # Theoretically, we could go home now.  But as long as we're here,
1140     # let's print out the common_dir and utags, as a convenience to
1141     # the receiver (after all, earlier code calculated that stuff
1142     # anyway, so we might as well take advantage of it).
1143
1144     if ((scalar (keys (%unanimous_tags))) > 1) {
1145       foreach my $utag ((keys (%unanimous_tags))) {
1146         $utag = &xml_escape ($utag);   # the usual paranoia
1147         $beauty .= "<utag>${utag}</utag>\n";
1148       }
1149     }
1150     if ($common_dir) {
1151       $common_dir = &xml_escape ($common_dir);
1152       $beauty .= "<commondir>${common_dir}</commondir>\n";
1153     }
1154
1155     # That's enough for XML, time to go home:
1156     return $beauty;
1157   }
1158
1159   # Else not XML output, so complexly compactify for chordate
1160   # consumption.  At this point we have enough global information
1161   # about all the qunks to organize them non-redundantly for output.
1162
1163   if ($common_dir) {
1164     # Note that $common_dir still has its trailing slash
1165     $beauty .= "$common_dir: ";
1166   }
1167
1168   if ($Show_Branches)
1169   {
1170     # For trailing revision numbers.
1171     my @brevisions;
1172
1173     foreach my $branch (keys (%all_branches))
1174     {
1175       foreach my $qunkref (@qunkrefs)
1176       {
1177         if ((defined ($$qunkref{'branch'}))
1178             and ($$qunkref{'branch'} eq $branch))
1179         {
1180           if ($fbegun) {
1181             # kff todo: comma-delimited in XML too?  Sure.
1182             $beauty .= ", ";
1183           }
1184           else {
1185             $fbegun = 1;
1186           }
1187           my $fname = substr ($$qunkref{'filename'}, length ($common_dir));
1188           $beauty .= $fname;
1189           $$qunkref{'printed'} = 1;  # Just setting a mark bit, basically
1190
1191           if ($Show_Tags && (defined @{$$qunkref{'tags'}})) {
1192             my @tags = grep ($non_unanimous_tags{$_}, @{$$qunkref{'tags'}});
1193
1194             if (@tags) {
1195               $beauty .= " (tags: ";
1196               $beauty .= join (', ', @tags);
1197               $beauty .= ")";
1198             }
1199           }
1200
1201           if ($Show_Revisions) {
1202             # Collect the revision numbers' last components, but don't
1203             # print them -- they'll get printed with the branch name
1204             # later.
1205             $$qunkref{'revision'} =~ /.+\.([\d]+)$/;
1206             push (@brevisions, $1);
1207
1208             # todo: we're still collecting branch roots, but we're not
1209             # showing them anywhere.  If we do show them, it would be
1210             # nifty to just call them revision "0" on a the branch.
1211             # Yeah, that's the ticket.
1212           }
1213         }
1214       }
1215       $beauty .= " ($branch";
1216       if (@brevisions) {
1217         if ((scalar (@brevisions)) > 1) {
1218           $beauty .= ".[";
1219           $beauty .= (join (',', @brevisions));
1220           $beauty .= "]";
1221         }
1222         else {
1223           # Square brackets are spurious here, since there's no range to
1224           # encapsulate
1225           $beauty .= ".$brevisions[0]";
1226         }
1227       }
1228       $beauty .= ")";
1229     }
1230   }
1231
1232   # Okay; any qunks that were done according to branch are taken care
1233   # of, and marked as printed.  Now print everyone else.
1234
1235   foreach my $qunkref (@qunkrefs)
1236   {
1237     next if (defined ($$qunkref{'printed'}));   # skip if already printed
1238
1239     if ($fbegun) {
1240       $beauty .= ", ";
1241     }
1242     else {
1243       $fbegun = 1;
1244     }
1245     $beauty .= substr ($$qunkref{'filename'}, length ($common_dir));
1246     # todo: Shlomo's change was this:
1247     # $beauty .= substr ($$qunkref{'filename'},
1248     #              (($common_dir eq "./") ? "" : length ($common_dir)));
1249     $$qunkref{'printed'} = 1;  # Set a mark bit.
1250
1251     if ($Show_Revisions || $Show_Tags)
1252     {
1253       my $started_addendum = 0;
1254
1255       if ($Show_Revisions) {
1256         $started_addendum = 1;
1257         $beauty .= " (";
1258         $beauty .= "$$qunkref{'revision'}";
1259       }
1260       if ($Show_Tags && (defined $$qunkref{'tags'})) {
1261         my @tags = grep ($non_unanimous_tags{$_}, @{$$qunkref{'tags'}});
1262         if ((scalar (@tags)) > 0) {
1263           if ($started_addendum) {
1264             $beauty .= ", ";
1265           }
1266           else {
1267             $beauty .= " (tags: ";
1268           }
1269           $beauty .= join (', ', @tags);
1270           $started_addendum = 1;
1271         }
1272       }
1273       if ($started_addendum) {
1274         $beauty .= ")";
1275       }
1276     }
1277   }
1278
1279   # Unanimous tags always come last.
1280   if ($Show_Tags && %unanimous_tags)
1281   {
1282     $beauty .= " (utags: ";
1283     $beauty .= join (', ', sort keys (%unanimous_tags));
1284     $beauty .= ")";
1285   }
1286
1287   # todo: still have to take care of branch_roots?
1288
1289   $beauty = "* $beauty:";
1290
1291   return $beauty;
1292 }
1293
1294 sub common_path_prefix ()
1295 {
1296   my $path1 = shift;
1297   my $path2 = shift;
1298
1299   my ($dir1, $dir2);
1300   (undef, $dir1, undef) = fileparse ($path1);
1301   (undef, $dir2, undef) = fileparse ($path2);
1302
1303   # Transmogrify Windows filenames to look like Unix.
1304   # (It is far more likely that someone is running cvs2cl.pl under
1305   # Windows than that they would genuinely have backslashes in their
1306   # filenames.)
1307   $dir1 =~ tr#\\#/#;
1308   $dir2 =~ tr#\\#/#;
1309
1310   my $accum1 = "";
1311   my $accum2 = "";
1312   my $last_common_prefix = "";
1313
1314   while ($accum1 eq $accum2)
1315   {
1316     $last_common_prefix = $accum1;
1317     last if ($accum1 eq $dir1);
1318     my ($tmp1) = split (/\//, (substr ($dir1, length ($accum1))));
1319     my ($tmp2) = split (/\//, (substr ($dir2, length ($accum2))));
1320     $accum1 .= "$tmp1/" if (defined $tmp1 and $tmp1 ne '');
1321     $accum2 .= "$tmp2/" if (defined $tmp2 and $tmp2 ne '');
1322   }
1323
1324   return $last_common_prefix;
1325 }
1326
1327 sub preprocess_msg_text ()
1328 {
1329   my $text = shift;
1330
1331   # Strip out carriage returns (as they probably result from DOSsy editors).
1332   $text =~ s/\r\n/\n/g;
1333
1334   # If it *looks* like two newlines, make it *be* two newlines:
1335   $text =~ s/\n\s*\n/\n\n/g;
1336
1337   if ($XML_Output)
1338   {
1339     $text = &xml_escape ($text);
1340     $text = "<msg>${text}</msg>\n";
1341   }
1342   elsif (! $No_Wrap)
1343   {
1344     # Strip off lone newlines, but only for lines that don't begin with
1345     # whitespace or a mail-quoting character, since we want to preserve
1346     # that kind of formatting.  Also don't strip newlines that follow a
1347     # period; we handle those specially next.  And don't strip
1348     # newlines that precede an open paren.
1349     1 while ($text =~ s/(^|\n)([^>\s].*[^.\n])\n([^>\n])/$1$2 $3/g);
1350
1351     # If a newline follows a period, make sure that when we bring up the
1352     # bottom sentence, it begins with two spaces.
1353     1 while ($text =~ s/(^|\n)([^>\s].*)\n([^>\n])/$1$2  $3/g);
1354   }
1355
1356   return $text;
1357 }
1358
1359 sub last_line_len ()
1360 {
1361   my $files_list = shift;
1362   my @lines = split (/\n/, $files_list);
1363   my $last_line = pop (@lines);
1364   return length ($last_line);
1365 }
1366
1367 # A custom wrap function, sensitive to some common constructs used in
1368 # log entries.
1369 sub wrap_log_entry ()
1370 {
1371   my $text = shift;                  # The text to wrap.
1372   my $left_pad_str = shift;          # String to pad with on the left.
1373
1374   # These do NOT take left_pad_str into account:
1375   my $length_remaining = shift;      # Amount left on current line.
1376   my $max_line_length  = shift;      # Amount left for a blank line.
1377
1378   my $wrapped_text = "";             # The accumulating wrapped entry.
1379   my $user_indent = "";              # Inherited user_indent from prev line.
1380
1381   my $first_time = 1;                # First iteration of the loop?
1382   my $suppress_line_start_match = 0; # Set to disable line start checks.
1383
1384   my @lines = split (/\n/, $text);
1385   while (@lines)   # Don't use `foreach' here, it won't work.
1386   {
1387     my $this_line = shift (@lines);
1388     chomp $this_line;
1389
1390     if ($this_line =~ /^(\s+)/) {
1391       $user_indent = $1;
1392     }
1393     else {
1394       $user_indent = "";
1395     }
1396
1397     # If it matches any of the line-start regexps, print a newline now...
1398     if ($suppress_line_start_match)
1399     {
1400       $suppress_line_start_match = 0;
1401     }
1402     elsif (($this_line =~ /^(\s*)\*\s+[a-zA-Z0-9]/)
1403            || ($this_line =~ /^(\s*)\* [a-zA-Z0-9_\.\/\+-]+/)
1404            || ($this_line =~ /^(\s*)\([a-zA-Z0-9_\.\/\+-]+(\)|,\s*)/)
1405            || ($this_line =~ /^(\s+)(\S+)/)
1406            || ($this_line =~ /^(\s*)- +/)
1407            || ($this_line =~ /^()\s*$/)
1408            || ($this_line =~ /^(\s*)\*\) +/)
1409            || ($this_line =~ /^(\s*)[a-zA-Z0-9](\)|\.|\:) +/))
1410     {
1411       # Make a line break immediately, unless header separator is set
1412       # and this line is the first line in the entry, in which case
1413       # we're getting the blank line for free already and shouldn't
1414       # add an extra one.
1415       unless (($After_Header ne " ") and ($first_time))
1416       {
1417         if ($this_line =~ /^()\s*$/) {
1418           $suppress_line_start_match = 1;
1419           $wrapped_text .= "\n${left_pad_str}";
1420         }
1421
1422         $wrapped_text .= "\n${left_pad_str}";
1423       }
1424
1425       $length_remaining = $max_line_length - (length ($user_indent));
1426     }
1427
1428     # Now that any user_indent has been preserved, strip off leading
1429     # whitespace, so up-folding has no ugly side-effects.
1430     $this_line =~ s/^\s*//;
1431
1432     # Accumulate the line, and adjust parameters for next line.
1433     my $this_len = length ($this_line);
1434     if ($this_len == 0)
1435     {
1436       # Blank lines should cancel any user_indent level.
1437       $user_indent = "";
1438       $length_remaining = $max_line_length;
1439     }
1440     elsif ($this_len >= $length_remaining) # Line too long, try breaking it.
1441     {
1442       # Walk backwards from the end.  At first acceptable spot, break
1443       # a new line.
1444       my $idx = $length_remaining - 1;
1445       if ($idx < 0) { $idx = 0 };
1446       while ($idx > 0)
1447       {
1448         if (substr ($this_line, $idx, 1) =~ /\s/)
1449         {
1450           my $line_now = substr ($this_line, 0, $idx);
1451           my $next_line = substr ($this_line, $idx);
1452           $this_line = $line_now;
1453
1454           # Clean whitespace off the end.
1455           chomp $this_line;
1456
1457           # The current line is ready to be printed.
1458           $this_line .= "\n${left_pad_str}";
1459
1460           # Make sure the next line is allowed full room.
1461           $length_remaining = $max_line_length - (length ($user_indent));
1462
1463           # Strip next_line, but then preserve any user_indent.
1464           $next_line =~ s/^\s*//;
1465
1466           # Sneak a peek at the user_indent of the upcoming line, so
1467           # $next_line (which will now precede it) can inherit that
1468           # indent level.  Otherwise, use whatever user_indent level
1469           # we currently have, which might be none.
1470           my $next_next_line = shift (@lines);
1471           if ((defined ($next_next_line)) && ($next_next_line =~ /^(\s+)/)) {
1472             $next_line = $1 . $next_line if (defined ($1));
1473             # $length_remaining = $max_line_length - (length ($1));
1474             $next_next_line =~ s/^\s*//;
1475           }
1476           else {
1477             $next_line = $user_indent . $next_line;
1478           }
1479           if (defined ($next_next_line)) {
1480             unshift (@lines, $next_next_line);
1481           }
1482           unshift (@lines, $next_line);
1483
1484           # Our new next line might, coincidentally, begin with one of
1485           # the line-start regexps, so we temporarily turn off
1486           # sensitivity to that until we're past the line.
1487           $suppress_line_start_match = 1;
1488
1489           last;
1490         }
1491         else
1492         {
1493           $idx--;
1494         }
1495       }
1496
1497       if ($idx == 0)
1498       {
1499         # We bottomed out because the line is longer than the
1500         # available space.  But that could be because the space is
1501         # small, or because the line is longer than even the maximum
1502         # possible space.  Handle both cases below.
1503
1504         if ($length_remaining == ($max_line_length - (length ($user_indent))))
1505         {
1506           # The line is simply too long -- there is no hope of ever
1507           # breaking it nicely, so just insert it verbatim, with
1508           # appropriate padding.
1509           $this_line = "\n${left_pad_str}${this_line}";
1510         }
1511         else
1512         {
1513           # Can't break it here, but may be able to on the next round...
1514           unshift (@lines, $this_line);
1515           $length_remaining = $max_line_length - (length ($user_indent));
1516           $this_line = "\n${left_pad_str}";
1517         }
1518       }
1519     }
1520     else  # $this_len < $length_remaining, so tack on what we can.
1521     {
1522       # Leave a note for the next iteration.
1523       $length_remaining = $length_remaining - $this_len;
1524
1525       if ($this_line =~ /\.$/)
1526       {
1527         $this_line .= "  ";
1528         $length_remaining -= 2;
1529       }
1530       else  # not a sentence end
1531       {
1532         $this_line .= " ";
1533         $length_remaining -= 1;
1534       }
1535     }
1536
1537     # Unconditionally indicate that loop has run at least once.
1538     $first_time = 0;
1539
1540     $wrapped_text .= "${user_indent}${this_line}";
1541   }
1542
1543   # One last bit of padding.
1544   $wrapped_text .= "\n";
1545
1546   return $wrapped_text;
1547 }
1548
1549 sub xml_escape ()
1550 {
1551   my $txt = shift;
1552   $txt =~ s/&/&amp;/g;
1553   $txt =~ s/</&lt;/g;
1554   $txt =~ s/>/&gt;/g;
1555   return $txt;
1556 }
1557
1558 sub maybe_read_user_map_file ()
1559 {
1560   my %expansions;
1561
1562   if ($User_Map_File)
1563   {
1564     open (MAPFILE, "<$User_Map_File")
1565         or die ("Unable to open $User_Map_File ($!)");
1566
1567     while (<MAPFILE>)
1568     {
1569       next if /^\s*#/;  # Skip comment lines.
1570       next if not /:/;  # Skip lines without colons.
1571
1572       # It is now safe to split on ':'.
1573       my ($username, $expansion) = split ':';
1574       chomp $expansion;
1575       $expansion =~ s/^'(.*)'$/$1/;
1576       $expansion =~ s/^"(.*)"$/$1/;
1577
1578       # If it looks like the expansion has a real name already, then
1579       # we toss the username we got from CVS log.  Otherwise, keep
1580       # it to use in combination with the email address.
1581
1582       if ($expansion =~ /^\s*<{0,1}\S+@.*/) {
1583         # Also, add angle brackets if none present
1584         if (! ($expansion =~ /<\S+@\S+>/)) {
1585           $expansions{$username} = "$username <$expansion>";
1586         }
1587         else {
1588           $expansions{$username} = "$username $expansion";
1589         }
1590       }
1591       else {
1592         $expansions{$username} = $expansion;
1593       }
1594     }
1595
1596     close (MAPFILE);
1597   }
1598
1599   return %expansions;
1600 }
1601
1602 sub parse_options ()
1603 {
1604   # Check this internally before setting the global variable.
1605   my $output_file;
1606
1607   # If this gets set, we encountered unknown options and will exit at
1608   # the end of this subroutine.
1609   my $exit_with_admonishment = 0;
1610
1611   while (my $arg = shift (@ARGV))
1612   {
1613     if ($arg =~ /^-h$|^-help$|^--help$|^--usage$|^-?$/) {
1614       $Print_Usage = 1;
1615     }
1616     elsif ($arg =~ /^--delta$/) {
1617       my $narg = shift(@ARGV) || die "$arg needs argument.\n";
1618       if ($narg =~ /^([A-Za-z][A-Za-z0-9_\-]*):([A-Za-z][A-Za-z0-9_\-]*)$/) {
1619         $Delta_From = $1;
1620         $Delta_To = $2;
1621         $Delta_Mode = 1;
1622       } else {
1623         die "--delta FROM_TAG:TO_TAG is what you meant to say.\n";
1624       }
1625     }
1626     elsif ($arg =~ /^--debug$/) {        # unadvertised option, heh
1627       $Debug = 1;
1628     }
1629     elsif ($arg =~ /^--version$/) {
1630       $Print_Version = 1;
1631     }
1632     elsif ($arg =~ /^-g$|^--global-opts$/) {
1633       my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1634       # Don't assume CVS is called "cvs" on the user's system:
1635       $Log_Source_Command =~ s/(^\S*)/$1 $narg/;
1636     }
1637     elsif ($arg =~ /^-l$|^--log-opts$/) {
1638       my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1639       $Log_Source_Command .= " $narg";
1640     }
1641     elsif ($arg =~ /^-f$|^--file$/) {
1642       my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1643       $output_file = $narg;
1644     }
1645     elsif ($arg =~ /^--accum$/) {
1646       $Cumulative = 1;
1647     }
1648     elsif ($arg =~ /^--fsf$/) {
1649       $FSF_Style = 1;
1650     }
1651     elsif ($arg =~ /^-U$|^--usermap$/) {
1652       my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1653       $User_Map_File = $narg;
1654     }
1655     elsif ($arg =~ /^-W$|^--window$/) {
1656       defined(my $narg = shift (@ARGV)) || die "$arg needs argument.\n";
1657       $Max_Checkin_Duration = $narg;
1658     }
1659     elsif ($arg =~ /^-I$|^--ignore$/) {
1660       my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1661       push (@Ignore_Files, $narg);
1662     }
1663     elsif ($arg =~ /^-C$|^--case-insensitive$/) {
1664       $Case_Insensitive = 1;
1665     }
1666     elsif ($arg =~ /^-R$|^--regexp$/) {
1667       my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1668       $Regexp_Gate = $narg;
1669     }
1670     elsif ($arg =~ /^--stdout$/) {
1671       $Output_To_Stdout = 1;
1672     }
1673     elsif ($arg =~ /^--version$/) {
1674       $Print_Version = 1;
1675     }
1676     elsif ($arg =~ /^-d$|^--distributed$/) {
1677       $Distributed = 1;
1678     }
1679     elsif ($arg =~ /^-P$|^--prune$/) {
1680       $Prune_Empty_Msgs = 1;
1681     }
1682     elsif ($arg =~ /^-S$|^--separate-header$/) {
1683       $After_Header = "\n\n";
1684     }
1685     elsif ($arg =~ /^--no-wrap$/) {
1686       $No_Wrap = 1;
1687     }
1688     elsif ($arg =~ /^--gmt$|^--utc$/) {
1689       $UTC_Times = 1;
1690     }
1691     elsif ($arg =~ /^-w$|^--day-of-week$/) {
1692       $Show_Day_Of_Week = 1;
1693     }
1694     elsif ($arg =~ /^--no-times$/) {
1695       $Show_Times = 0;
1696     }
1697     elsif ($arg =~ /^-r$|^--revisions$/) {
1698       $Show_Revisions = 1;
1699     }
1700     elsif ($arg =~ /^-t$|^--tags$/) {
1701       $Show_Tags = 1;
1702     }
1703     elsif ($arg =~ /^-T$|^--tagdates$/) {
1704       $Show_Tag_Dates = 1;
1705     }
1706     elsif ($arg =~ /^-b$|^--branches$/) {
1707       $Show_Branches = 1;
1708     }
1709     elsif ($arg =~ /^-F$|^--follow$/) {
1710       my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1711       push (@Follow_Branches, $narg);
1712     }
1713     elsif ($arg =~ /^--stdin$/) {
1714       $Input_From_Stdin = 1;
1715     }
1716     elsif ($arg =~ /^--header$/) {
1717       my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1718       $ChangeLog_Header = &slurp_file ($narg);
1719       if (! defined ($ChangeLog_Header)) {
1720         $ChangeLog_Header = "";
1721       }
1722     }
1723     elsif ($arg =~ /^--xml-encoding$/) {
1724       my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1725       $XML_Encoding = $narg ;
1726     }
1727     elsif ($arg =~ /^--xml$/) {
1728       $XML_Output = 1;
1729     }
1730     elsif ($arg =~ /^--hide-filenames$/) {
1731       $Hide_Filenames = 1;
1732       $After_Header = "";
1733     }
1734     elsif ($arg =~ /^--ignore-tag$/ ) {
1735       die "$arg needs argument.\n"
1736         unless @ARGV;
1737       push @ignore_tags, shift @ARGV;
1738     }
1739     else {
1740       # Just add a filename as argument to the log command
1741       $Log_Source_Command .= " '$arg'";
1742     }
1743   }
1744
1745   ## Check for contradictions...
1746
1747   if ($Output_To_Stdout && $Distributed) {
1748     print STDERR "cannot pass both --stdout and --distributed\n";
1749     $exit_with_admonishment = 1;
1750   }
1751
1752   if ($Output_To_Stdout && $output_file) {
1753     print STDERR "cannot pass both --stdout and --file\n";
1754     $exit_with_admonishment = 1;
1755   }
1756
1757   if ($XML_Output && $Cumulative) {
1758     print STDERR "cannot pass both --xml and --accum\n";
1759     $exit_with_admonishment = 1;
1760   }
1761
1762   # Or if any other error message has already been printed out, we
1763   # just leave now:
1764   if ($exit_with_admonishment) {
1765     &usage ();
1766     exit (1);
1767   }
1768   elsif ($Print_Usage) {
1769     &usage ();
1770     exit (0);
1771   }
1772   elsif ($Print_Version) {
1773     &version ();
1774     exit (0);
1775   }
1776
1777   ## Else no problems, so proceed.
1778
1779   if ($output_file) {
1780     $Log_File_Name = $output_file;
1781   }
1782 }
1783
1784 sub slurp_file ()
1785 {
1786   my $filename = shift || die ("no filename passed to slurp_file()");
1787   my $retstr;
1788
1789   open (SLURPEE, "<${filename}") or die ("unable to open $filename ($!)");
1790   my $saved_sep = $/;
1791   undef $/;
1792   $retstr = <SLURPEE>;
1793   $/ = $saved_sep;
1794   close (SLURPEE);
1795   return $retstr;
1796 }
1797
1798 sub debug ()
1799 {
1800   if ($Debug) {
1801     my $msg = shift;
1802     print STDERR $msg;
1803   }
1804 }
1805
1806 sub version ()
1807 {
1808   print "cvs2cl.pl version ${VERSION}; distributed under the GNU GPL.\n";
1809 }
1810
1811 sub usage ()
1812 {
1813   &version ();
1814   print <<'END_OF_INFO';
1815 Generate GNU-style ChangeLogs in CVS working copies.
1816
1817 Notes about the output format(s):
1818
1819    The default output of cvs2cl.pl is designed to be compact, formally
1820    unambiguous, but still easy for humans to read.  It is largely
1821    self-explanatory, I hope; the one abbreviation that might not be
1822    obvious is "utags".  That stands for "universal tags" -- a
1823    universal tag is one held by all the files in a given change entry.
1824
1825    If you need output that's easy for a program to parse, use the
1826    --xml option.  Note that with XML output, just about all available
1827    information is included with each change entry, whether you asked
1828    for it or not, on the theory that your parser can ignore anything
1829    it's not looking for.
1830
1831 Notes about the options and arguments (the actual options are listed
1832 last in this usage message):
1833
1834   * The -I and -F options may appear multiple times.
1835
1836   * To follow trunk revisions, use "-F trunk" ("-F TRUNK" also works).
1837     This is okay because no would ever, ever be crazy enough to name a
1838     branch "trunk", right?  Right.
1839
1840   * For the -U option, the UFILE should be formatted like
1841     CVSROOT/users. That is, each line of UFILE looks like this
1842        jrandom:jrandom@red-bean.com
1843     or maybe even like this
1844        jrandom:'Jesse Q. Random <jrandom@red-bean.com>'
1845     Don't forget to quote the portion after the colon if necessary.
1846
1847   * Many people want to filter by date.  To do so, invoke cvs2cl.pl
1848     like this:
1849        cvs2cl.pl -l "-d'DATESPEC'"
1850     where DATESPEC is any date specification valid for "cvs log -d".
1851     (Note that CVS 1.10.7 and below requires there be no space between
1852     -d and its argument).
1853
1854 Options/Arguments:
1855
1856   -h, -help, --help, or -?     Show this usage and exit
1857   --version                    Show version and exit
1858   -r, --revisions              Show revision numbers in output
1859   -b, --branches               Show branch names in revisions when possible
1860   -t, --tags                   Show tags (symbolic names) in output
1861   -T, --tagdates               Show tags in output on their first occurance
1862   --stdin                      Read from stdin, don't run cvs log
1863   --stdout                     Output to stdout not to ChangeLog
1864   -d, --distributed            Put ChangeLogs in subdirs
1865   -f FILE, --file FILE         Write to FILE instead of "ChangeLog"
1866   --fsf                        Use this if log data is in FSF ChangeLog style
1867   -W SECS, --window SECS       Window of time within which log entries unify
1868   -U UFILE, --usermap UFILE    Expand usernames to email addresses from UFILE
1869   -R REGEXP, --regexp REGEXP   Include only entries that match REGEXP
1870   -I REGEXP, --ignore REGEXP   Ignore files whose names match REGEXP
1871   -C, --case-insensitive       Any regexp matching is done case-insensitively
1872   -F BRANCH, --follow BRANCH   Show only revisions on or ancestral to BRANCH
1873   -S, --separate-header        Blank line between each header and log message
1874   --no-wrap                    Don't auto-wrap log message (recommend -S also)
1875   --gmt, --utc                 Show times in GMT/UTC instead of local time
1876   --accum                      Add to an existing ChangeLog (incompat w/ --xml)
1877   -w, --day-of-week            Show day of week
1878   --no-times                   Don't show times in output
1879   --header FILE                Get ChangeLog header from FILE ("-" means stdin)
1880   --xml                        Output XML instead of ChangeLog format
1881   --xml-encoding ENCODING      Insert encoding clause in XML header
1882   --hide-filenames             Don't show filenames (ignored for XML output)
1883   -P, --prune                  Don't show empty log messages
1884   -g OPTS, --global-opts OPTS  Invoke like this "cvs OPTS log ..."
1885   -l OPTS, --log-opts OPTS     Invoke like this "cvs ... log OPTS"
1886   FILE1 [FILE2 ...]            Show only log information for the named FILE(s)
1887
1888 See http://www.red-bean.com/cvs2cl for maintenance and bug info.
1889 END_OF_INFO
1890 }
1891
1892 __END__
1893
1894 =head1 NAME
1895
1896 cvs2cl.pl - produces GNU-style ChangeLogs in CVS working copies, by
1897     running "cvs log" and parsing the output.  Shared log entries are
1898     unified in an intuitive way.
1899
1900 =head1 DESCRIPTION
1901
1902 This script generates GNU-style ChangeLog files from CVS log
1903 information.  Basic usage: just run it inside a working copy and a
1904 ChangeLog will appear.  It requires repository access (i.e., 'cvs log'
1905 must work).  Run "cvs2cl.pl --help" to see more advanced options.
1906
1907 See http://www.red-bean.com/cvs2cl for updates, and for instructions
1908 on getting anonymous CVS access to this script.
1909
1910 Maintainer: Karl Fogel <kfogel@red-bean.com>
1911 Please report bugs to <bug-cvs2cl@red-bean.com>.
1912
1913 =head1 README
1914
1915 This script generates GNU-style ChangeLog files from CVS log
1916 information.  Basic usage: just run it inside a working copy and a
1917 ChangeLog will appear.  It requires repository access (i.e., 'cvs log'
1918 must work).  Run "cvs2cl.pl --help" to see more advanced options.
1919
1920 See http://www.red-bean.com/cvs2cl for updates, and for instructions
1921 on getting anonymous CVS access to this script.
1922
1923 Maintainer: Karl Fogel <kfogel@red-bean.com>
1924 Please report bugs to <bug-cvs2cl@red-bean.com>.
1925
1926 =head1 PREREQUISITES
1927
1928 This script requires C<Text::Wrap>, C<Time::Local>, and
1929 C<File::Basename>.
1930 It also seems to require C<Perl 5.004_04> or higher.
1931
1932 =pod OSNAMES
1933
1934 any
1935
1936 =pod SCRIPT CATEGORIES
1937
1938 Version_Control/CVS
1939
1940 =cut
1941
1942 -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*-
1943
1944 Note about a bug-slash-opportunity:
1945 -----------------------------------
1946
1947 There's a bug in Text::Wrap, which affects cvs2cl.  This script
1948 reveals it:
1949
1950   #!/usr/bin/perl -w
1951
1952   use Text::Wrap;
1953
1954   my $test_text =
1955   "This script demonstrates a bug in Text::Wrap.  The very long line
1956   following this paragraph will be relocated relative to the surrounding
1957   text:
1958
1959   ====================================================================
1960
1961   See?  When the bug happens, we'll get the line of equal signs below
1962   this paragraph, even though it should be above.";
1963
1964   # Print out the test text with no wrapping:
1965   print "$test_text";
1966   print "\n";
1967   print "\n";
1968
1969   # Now print it out wrapped, and see the bug:
1970   print wrap ("\t", "        ", "$test_text");
1971   print "\n";
1972   print "\n";
1973
1974 If the line of equal signs were one shorter, then the bug doesn't
1975 happen.  Interesting.
1976
1977 Anyway, rather than fix this in Text::Wrap, we might as well write a
1978 new wrap() which has the following much-needed features:
1979
1980 * initial indentation, like current Text::Wrap()
1981 * subsequent line indentation, like current Text::Wrap()
1982 * user chooses among: force-break long words, leave them alone, or die()?
1983 * preserve existing indentation: chopped chunks from an indented line
1984   are indented by same (like this line, not counting the asterisk!)
1985 * optional list of things to preserve on line starts, default ">"
1986
1987 Note that the last two are essentially the same concept, so unify in
1988 implementation and give a good interface to controlling them.
1989
1990 And how about:
1991
1992 Optionally, when encounter a line pre-indented by same as previous
1993 line, then strip the newline and refill, but indent by the same.
1994 Yeah...
1995