Pod::Literate

Literate Perl is a preprocessor for perl programs that let you write in a literate style.

In the literate style, documentation is forefront and embedded within it is the actual code to run. To create a runnable code block within the documentation, which follows standard POD conventions, use =begin code and =end code directives:

=begin code
my $foo = 'bar';

$selfpopulate(
    'a'  'b',
    
);


=end code

Alternatively you can use the 'bird tracks' style, where each line in a block of code is started with an angle bracket (>):

> my $foo = 'bar';

To create a code example, not indended to be run, use the normal POD verbatim style.

One of the signatures of true Literate Programming is to be able to rearrange code so that it makes sense to the human reader first, and can be manipulated into the proper order for the compiler later. A common convention for deferring code definitions is to embed a descriptive anchor surrounded by doubled angle brackets on its own line.

our $code_anchor      = qr/ << \s* (.+?) \s* >> /x;
our $code_anchor_line = qr/^(\s*)$code_anchor\s*$/;

for example:

sub its_complicated {
    my ( $self, %params ) = @_;

«the messy bit»

    
  }

The descriptive anchor is then used later as the text on a POD head directive and any code under that heading is stitched back into the code at the anchor point. When the preprocessor runs it generates two files:

Options

Using the preprocessor on all your code files you can set a consistent policy of pragma use across your whole project. calling the use_strict or use_warnings method with a boolean value lets you tell the preprocessor whether to inject those pragmas at the head of every processed file.

sub new {

    my ( $class, %config ) = @_;
    my $self = bless Pod::Parsernew(), $class;

    $self{_css} = $config{css} if defined $config{css};
    $self{_use_strict}   = $config{pragmas}{strict}   || 0;
    $self{_use_warnings} = $config{pragmas}{warnings} || 0;
    $self{_paths}        = $config{paths}             || 0;

    return $self;
}

stylesheet

the use_css method can be used to tell the preprocesser what stylesheet to include in the header of the html document generated.

method use_css($css) {

    $self{_css} = $css;
  }

The Parser

The preprocessor is a subclass of Pod::Parser and as such does most of its parsing by overriding the handlers suggested by the documentation.

At the beginning of the document we simply reset our tracking variables and initialize our html code highlighter for any code we may encounter.

method begin_input {

    $self{_toc}            = [];
    $self{_code_sections}  = [];
    $self{_last_paragraph} = '';
    delete $self{$_}
      foreach (qw( _current_code _code _current_code_name _doc ));

    $self{_highlighter} ||= PPI::HTMLnew();
}

at the end of the parsing cycle we write out the code and document files. The method that creates the code file behaves a bit differently based on whether we are working on a script or on a module, so we pass along the input file's extension, less the l at the front which indicated it was a literate perl file in the first place.

method end_input {

    my $extension = '';
    my ( $path, $filename ) =
      ( splitpath( abs2rel( $selfinput_file, $self{_paths}{source} ) ) )
      [ 1, 2 ];
    ( $filename, $extension ) = ( $filename =~ m/ (.+?) \. (.+) $ /x );
    $extension =~ s/^l//;

    my $html_path =
      catfile( grep { defined $_ } ( $self{_paths}{html}, $path ) );
    my $code_path =
      catfile( grep { defined $_ } ( $self{_paths}{$extension}, $path ) );

    mkpath( $code_path, $html_path );

    write_file(
        catfile( $code_path, "$filename.$extension" ),
        tidy( $selfdump_code($extension) )
    );
    write_file( catfile( $html_path, "$filename.html" ), $selfdump_docs() );
}

To process the files we override methods from our base class Pod::Parser to handle each type of POD feature.

POD Command Paragraphs

The command paragraphs we are particularly interested in are begin code and end code, which indicate where our runnable code lives. Since Pod::Parser is a streaming parser we need to keep track of when we enter and exit these code sections. We accumulate the code until we reach the end code command. At this point we can add it to the documentation output, but we defer generating the output for the code file as we need to re-integrate code snippets in place of the code anchors, all of which we won't have collected until the end of the document.

method command( $command, $paragraph, $line_num ) {

    $paragraph =~ s/\s+$//;

      $selfclose_verbatim();

      if ( $command eq 'begin' and $paragraph eq 'code' ) {
        $self{_in_code_block} = 1;
        delete $self{_current_code};
    }
    elsif ( $command eq 'end' and $paragraph eq 'code' ) {
        $self{_in_code_block} = 0;

        $selfadd_doc_code( $self{_current_code} );
    }
    else {

          «normal pod commands»
              
    }

    $self{_last_paragraph} = 'command';

  }

  method set_title($text) {

    $self{_doctitle} ||= $text;
  }

  method is_source_code($paragraph) {

    if ( $self{_in_code_block} ) {
        if ( $paragraph =~ $code_anchor ) {
            $self{_code_names}{ lc $1 } = 1;
        }
        $selfadd_code($paragraph);
        return 1;
    }
    elsif ( $paragraph =~ /^>/ ) {
        $paragraph =~ s/^>//gm;
        local $self{_in_code_block} = 1;
        $selfis_source_code($paragraph);
        $selfadd_doc_code($paragraph);
        return 1;
    }
    return 0;
  }

  method add_doc_code($paragraph) {

    return if all { $_ =~ $code_anchor_line } split "\n", $paragraph;

    my $code = $selfhtmlize_code($paragraph);

      $code =~ s| (<span \s+ class \s* = \s* "comment" \s* > \s*) << (.+?) >>
              | $1 &laquo;<a href="#$2">$2</a>&raquo;
              |xmg;

      $selfadd_doc( '<div class="code"><pre>'  $code  '</pre></div>' );

  }

  method add_code($code) {

    $self{_code}{ $self{_current_code_name} } ⋅= $code;
      $self{_current_code}                      ⋅= $code;
  }

  method add_doc($text) {

    $self{_doc} ⋅= $text;
  }

  sub nested_push( $toc, $header ) {

    if ( @$toc && $toc[-1][1] < $header[1] ) {
        nested_push( $toc[-1][2], $header );
    }
    else {
        push @$toc, $header;
    }
}

Normal POD Commands

        $paragraph = $selfinterpolate( $paragraph, $line_num );

        if ( $command =~ /^head(\d)/ ) {
            $self{_current_code_name} = lc $paragraph;
            push @{ $self{_code_sections} }, lc $paragraph;
            nested_push( $parser{_toc}, [ $paragraph, $1, [] ] );

            if ( $command =~ /head1/ ) {
                $selfset_title($paragraph);
            }
            $selfadd_doc( '<a name="'  lc $paragraph  '"></a>' );
        }

        $selfadd_doc( htmlize_doc( $command, $paragraph ) );

POD Verbatim Paragraphs

method verbatim( $paragraph, $line_num ) {

    return if $paragraph =~ /^\s*$/;

    unless ( $selfis_source_code($paragraph) ) {
        $selfadd_doc('<pre class="sample">')
          unless $self{_last_paragraph} eq 'verbatim';

        $paragraph =~ s/^\s//gmx;

        my $code = $selfhtmlize_code($paragraph);
        $code =~ s/<</&laquo;/g;
        $code =~ s/>>/&raquo;/g;
        $selfadd_doc($code);
    }
    $self{_last_paragraph} = 'verbatim';
  }

  method close_verbatim {

    $selfadd_doc('</pre>') if $self{_last_paragraph} eq 'verbatim';
}

POD Ordinary Paragraphs

method textblock( $paragraph, $line_num ) {

    $selfclose_verbatim();

      unless ( $selfis_source_code($paragraph) ) {
        $paragraph = $selfinterpolate( $paragraph, $line_num );
        $selfadd_doc( '<p>'  $paragraph  '</p>' );
    }
    $self{_last_paragraph} = 'text';
  }

POD Formatting Codes

method interior_sequence( $seq_command, $seq_argument ) {
    for ($seq_command)
    {

        /C/ and return "<samp>$seq_argument</samp>";
        /B/ and return "<b>$seq_argument</b>";
        /I/ and return "<i>$seq_argument</i>";
        /L/ and return "<a href=\"\">$seq_argument</i></a>";
        /F/ and return "$seq_argument";
        /S/ and do {
            $seq_argument =~ s/ \s / &nbsp; /xg;
            return $seq_argument;
        };
    }
  }

The code file

In order to generate a coherent code file we first need to replace all code anchors with the code snippets to which they refer, and then assemble all the found code chunks in the order in which they were encountered (less the snippets that have been integrated already). Finally we pass the code through perltidy (via tidy) to produce a legible source file.

method dump_code($extension) {

    my $source;

        «replace code anchors with code snippets»
              

      $source ⋅= join(
        '',

        map( { $self{_code}{$_} }
            grep { defined $self{_code}{$_} } @{ $self{_code_sections} } )
      );

      return $source;

  }

Replace Code Anchors with Code Snippets

    while ( my ( $name, $block ) = each %{ $self{_code} } ) {
        foreach my $match ( $block =~ / $code_anchor /xg ) {

            next unless defined $self{_code}{$match};

            $block =~ s/ << \s* \Q$match\E \s* >> / $self->{_code}{$match} /gx;

            delete $self{_code}{$match};
        }
        $self{_code}{$name} = $block;
    }

Tidying the Code

We've broken out the call to perltidy to simplify calling it in multiple locations in the code.

sub tidy( $code ) {

    my $output;

    #perltidy needs to see an empty ARGV
    local @ARGV = ();

    perltidy( source  \$code, destination  \$output );

    return $output;
}

The documentation file

method dump_docs {

    my $doc;

    $doc = '<html><head><title>'  $self{_doctitle}  '</title>';
    $doc ⋅=
      '<link rel="stylesheet" type="text/css" href="'  $parser{_css}  '" />'
      if $parser{_css};
    $doc ⋅= '</head><body>';
    $doc ⋅= $parser{_doc}  "</body></html>";

    my $toc = '<ul>'  build_toc( $parser{_toc} )  '</ul>';

    $doc =~ s| (</h1>) | $1 $toc |xm;
    return $doc;
}

sub build_toc( $toc ) {

    my $output = '';

    foreach my $header (@$toc) {
        my ( $title, $level, $subtitles ) = @$header;
        $output ⋅=
            '<li class="head' 
           $level
           '"><a href="#'
           ( lc $title )  '">'
           $title  '</a>';
        if (@$subtitles) {
            $output ⋅= '<ul>'  build_toc($subtitles)  "</ul>\n";
        }
        $output ⋅= "</li>\n";
    }
    return $output;
}

our %pod_commands = (

    'head1'  'h1',
    'head2'  'h2',
    'head3'  'h3',
    'head4'  'h4',
    'head5'  'h5',
    'head6'  'h6',
    'item'   'li',

    'over'  '<ul>',
    'back'  '</ul>',
);

sub htmlize_doc( $command, $paragraph ) {

    if ( my $tag = $pod_commands{$command} ) {
        return $tag if $tag =~ /</;
        return "<$tag>$paragraph</$tag>";

    }
    return '';
}

syntax highlighting

We use PPI::HTML to turn the code blocks into syntax-highlighted html. We want to leave the code anchors from the literate style intact, as the goal here is legibility, and so we have to jump through some regex hoops to keep them intact through the process.

Finally we eliminate the br tags that the highlighter inserts because we're going to wrap the output in a pre block.

  «the entity list»
              

method htmlize_code($code) {

    return $code if all { /^>?\s*$/ } split "\n", $code;

    escape_anchors( \$code );

      # "" to get the html to stringify
      $code = ""  $self{_highlighter}html( \( tidy($code) ) );

        «entity conversion»
              

      $code =~ s/ <br>  //gx;

      unescape_anchors( \$code );

      return $code;
  }

Entity Conversion

In the name of legibility we also take the time to change some of the perl operators into html entities.

    while ( my ( $repl, $match ) = each %conversions ) {

        $code =~
s/ <span\s+class\s*=\s*"operator"\s*> $match /<span class="operator">$repl/xg;
    }

The Entity List

Here is the list of entities. the keys and values are reversed from their sensible order so that we can use regexes to find and replace the operators:

our %conversions = (
    '&rarr;'    qr/ -&gt; /x,        # ->
    '&rArr;'    qr/ =&gt; /x,        # =>
    '&ge;'      qr/ &gt;= /x,        # >=
    '&le;'      qr/ &lt;= /x,        # <=
    '&ne;'      qr/ !=    /x,        # !=
    '&hellip;'  qr/ \.\.\. /x,       # ...
    '&sdot;'    qr/ \. (?!\.) /x,    # .
    '&times;'   qr/ \* /x,           # *
);

Dependencies

use base 'Pod::Parser';

use PPI::HTML;
use Perl::Tidy;

use File::Slurp qw( write_file );
use File::Spec::Functions qw( abs2rel splitpath catfile );
use File::Path;

use List::MoreUtils qw( all );

References

LazyWeb Suggestions

ToDo