455 lines
		
	
	
	
		
			17 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			455 lines
		
	
	
	
		
			17 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| # $Id$
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| 
 | |
| Text_Highlighter is a class for syntax highlighting. The main idea is to
 | |
| simplify creation of subclasses implementing syntax highlighting for
 | |
| particular language. Subclasses do not implement any new functioanality, they
 | |
| just provide syntax highlighting rules. The rules sources are in XML format.
 | |
| To create a highlighter for a language, there is no need to code a new class
 | |
| manually. Simply describe the rules in XML file and use Text_Highlighter_Generator
 | |
| to create a new class.
 | |
| 
 | |
| 
 | |
| This document does not contain a formal description of API - it is very
 | |
| simple, and I believe providing some examples of code is sufficient.
 | |
| 
 | |
| 
 | |
| Highlighter XML source
 | |
| ======================
 | |
| 
 | |
| Basics
 | |
| ------
 | |
| 
 | |
| Creating a new syntax highlighter begins with describing the highlighting
 | |
| rules. There are two basic elements: block and region. A block is just a
 | |
| portion of text matching a regular expression and highlighted with a single
 | |
| color. Keyword is an example of a block. A region is defined by two regular
 | |
| expressions: one for start of region, and another for the end. The main
 | |
| difference from a block is that a region can contain blocks and regions
 | |
| (including same-named regions). An example of a region is a group of
 | |
| statements enclosed in curly brackets (this is used in many languages, for
 | |
| example PHP and C). Also, characters matching start and end of a region may be
 | |
| highlighted with their own color, and region contents with another.
 | |
| 
 | |
| Blocks and regions may be declared as contained. Contained blocks and regions
 | |
| can only appear inside regions. If a region or a block is not declared as
 | |
| contained, it can appear both on top level and inside regions. Block or region
 | |
| declared as not-contained can only appear on top level.
 | |
| 
 | |
| For any region, a list of blocks and regions that can appear inside this
 | |
| region can be specified.
 | |
| 
 | |
| In this document, the term "color group" is used. Chunks of text assigned to
 | |
| same color group will be highlighted with same color. Note that in versions
 | |
| prior 0.5.0 color goups were refered as CSS classes, but since 0.5.0 not only
 | |
| HTML output is supported, so "color group" is more appropriate term.
 | |
| 
 | |
| Elements
 | |
| --------
 | |
| 
 | |
| The toplevel element is <highlight>. Attribute lang is required and denotes
 | |
| the name of the language. Its value is used as a part of generated class name,
 | |
| and must only contain letters, digits and underscores. Optional attribute
 | |
| case, when given value yes, makes the language case sensitive (default is case
 | |
| insensitive). Allowed subelements are:
 | |
| 
 | |
|     * <authors>: Information about the authors of the file.
 | |
|         <author>: Information about a single author of the file. (May be used
 | |
|         multiple times, one per author.)
 | |
|                 - name="...": Author's name. Required.
 | |
|                 - email="...": Author's email address. Optional.
 | |
| 
 | |
|     * <default>: Default color group.
 | |
|           - innerGroup="...": color group name. Required.
 | |
|     
 | |
|     * <region>: Region definition
 | |
|           - name="...": Region name. Required.
 | |
|           - innerGroup="...": Default color group of region contents. Required.
 | |
|           - delimGroup="...": color group of start and end of region. Optional,
 | |
|             defaults to value of innerGroup attribute.
 | |
|           - start="...", end="...": Regular expression matching start and end
 | |
|             of region. Required. Regular expression delimiters are optional, but
 | |
|             if you need to specify delimiter, use /. The only case when the
 | |
|             delimiters are needed, is specifying regular expression modifiers,
 | |
|             such as m or U. Examples: \/\* or /$/m.
 | |
|           - contained="yes": Marks region as contained.
 | |
|           - never-contained="yes": Marks region as not-contained.
 | |
|           - <contains>: Elements allowed inside this region.
 | |
|                 - all="yes" Region can contain any other region or block
 | |
|                 (except not-contained). May be used multiple times.
 | |
|                       - <but> Do not allow certain regions or blocks.
 | |
|                             - region="..." Name of region not allowed within
 | |
|                               current region.
 | |
|                             - block="..." Name of block not allowed within
 | |
|                               current region.
 | |
|                 - region="..." Name of region allowed within current region.
 | |
|                 - block="..." Name of block allowed within current region.
 | |
|           - <onlyin> Only allow this region within certain regions. May be
 | |
|             used multiple times.
 | |
|                 - block="..." Name of parent region
 | |
|     
 | |
|     * <block>: Block definition
 | |
|           - name="...": Block name. Required.
 | |
|           - innerGroup="...": color group of block contents. Optional. If not
 | |
|             specified, color group of parent region or default color group will be
 | |
|             used. One would only want to omit this attribute if there are
 | |
|             keyword groups (see below) inherited from this block, and no special
 | |
|             highlighting should apply when the block does not match the keyword.
 | |
|           - match="..." Regular expression matching the block. Required.
 | |
|             Regular expression delimiters are optional, but if you need to
 | |
|             specify delimiter, use /. The only case when the delimiters are
 | |
|             needed, is specifying regular expression modifiers, such as m or U.
 | |
|             Examples: #|\/\/ or /$/m.
 | |
|           - contained="yes": Marks block as contained.
 | |
|           - never-contained="yes": Marks block as not-contained.
 | |
|           - <onlyin> Only allow this block within certain regions. May be used
 | |
|               multiple times.
 | |
|                 - block="..." Name of parent region
 | |
|           - multiline="yes": Marks block as multi-line. By default, whole
 | |
|             blocks are assumed to reside in a single line. This make the things
 | |
|             faster. If you need to declare a multi-line block, use this
 | |
|             attribute.
 | |
|           - <partgroup>: Assigns another color group to a part of the block that
 | |
|               matched a subpattern.
 | |
|                 - index="n": Subpattern index. Required.
 | |
|                 - innerGroup="...": color group name. Required.
 | |
| 
 | |
|               This is an example from CSS highlighter: the measure is matched as
 | |
|               a whole, but the measurement units are highlighted with different
 | |
|               color.
 | |
| 
 | |
|                 <block name="measure"  match="\d*\.?\d+(\%|em|ex|pc|pt|px|in|mm|cm)"
 | |
|                         innerGroup="number" contained="yes">
 | |
|                     <onlyin region="property"/>
 | |
|                     <partGroup index="1" innerGroup="string" />
 | |
|                 </block>
 | |
|   
 | |
|     * <keywords>: Keyword group definition. Keyword groups are useful when you
 | |
|       want to highlight some words that match a condition for a block with a
 | |
|       different color. Keywords are defined with literal match, not regular
 | |
|       expressions. For example, you have a block named identifier matching a
 | |
|       general identifier, and want to highlight reserved words (which match
 | |
|       this block as well) with different color. You inherit a keyword group
 | |
|       "reserved" from "identifier" block.
 | |
|           - name="...": Keyword group. Required.
 | |
|           - ifdef="...", ifndef="..." : Conditional declaration. See
 | |
|             "Conditions" below.
 | |
|           - inherits="...": Inherited block name. Required.
 | |
|           - innerGroup="...": color group of keyword group. Required.
 | |
|           - case="yes|no": Overrides case-sensitivity of the language.
 | |
|             Optional, defaults to global value.
 | |
|           - <keyword>: Single keyword definition.
 | |
|                 - match="..." The keyword. Note: this is not a regular
 | |
|                   expression, but literal match (possibly case insensitive).
 | |
| 
 | |
| Note that for BC reasons element partClass is alias for partGroup, and
 | |
| attributes innerClass and  delimClass  are aliases of innerGroup and
 | |
| delimGroup, respectively.
 | |
|     
 | |
| 
 | |
| Conditions
 | |
| ----------
 | |
| 
 | |
| Conditional declarations allow enabling or disabling certain highlighting
 | |
| rules at runtime. For example, Java highlighter has a very big list of
 | |
| keywords matching Java standard classes. Finding a match in this list can take
 | |
| much time. For that reason, corresponding keyword group is declared with
 | |
| "ifdef" attribute :
 | |
| 
 | |
|   <keywords name="builtin" inherits="identifier" innerClass="builtin" 
 | |
|             case="yes" ifdef="java.builtins">
 | |
| 	<keyword match="AbstractAction" />
 | |
| 	<keyword match="AbstractBorder" />
 | |
| 	<keyword match="AbstractButton" />
 | |
|     ...
 | |
|     ...
 | |
| 	<keyword match="_Remote_Stub" />
 | |
| 	<keyword match="_ServantActivatorStub" />
 | |
| 	<keyword match="_ServantLocatorStub" />
 | |
|   </keywords>
 | |
| 
 | |
| This keyword group will be only enabled when "java.builtins" is passed as an
 | |
| element of "defines" option:
 | |
| 
 | |
|     $options = array(
 | |
|         'defines' => array(
 | |
|             'java.builtins',
 | |
|         ),
 | |
|         'numbers' => HL_NUMBERS_TABLE,
 | |
|     );
 | |
|     $highlighter = Text_Highlighter::factory('java', $options);
 | |
| 
 | |
| "ifndef" attribute has reverse meaning.
 | |
| 
 | |
| Currently, "ifdef" and "ifndef" attributes are only supported for <keywords>
 | |
| tag. 
 | |
| 
 | |
| 
 | |
| 
 | |
| Class generation
 | |
| ================
 | |
| 
 | |
| Creating XML description of highlighting rules is the most complicated part of
 | |
| the process. To generate the class, you need just few lines of code:
 | |
| 
 | |
|     <?php
 | |
|     require_once 'Text/Highlighter/Generator.php';
 | |
|     $generator = new Text_Highlighter_Generator('php.xml');
 | |
|     $generator->generate();
 | |
|     $generator->saveCode('PHP.php');
 | |
|     ?>
 | |
| 
 | |
| 
 | |
| 
 | |
| Command-line class generation tool
 | |
| ==================================
 | |
| 
 | |
| Example from previous section looks pretty simple, but it does not handle any
 | |
| errors which may occur during parsing of XML source. The package provides a
 | |
| command-line script to make generation of classes even more simple, and takes
 | |
| care of possible errors. It is called generate (on Unix/Linux) or generate.bat
 | |
| (on Windows). This script is able to process multiple files in one run, and
 | |
| also to process XML from standard input and write generated code to standard
 | |
| output.
 | |
| 
 | |
|     Usage:
 | |
|     generate options
 | |
| 
 | |
|     Options:
 | |
|       -x filename, --xml=filename
 | |
|             source XML file. Multiple input files can be specified, in which
 | |
|             case each -x option must be followed by -p unless -d is specified
 | |
|             Defaults to stdin
 | |
|       -p filename, --php=filename
 | |
|             destination PHP file. Defaults to stdout. If specied multiple times,
 | |
|             each -p must follow -x
 | |
|       -d dirname, --dir=dirname
 | |
|             Default destination directory. File names will be taken from XML input
 | |
|             ("lang" attribute of <highlight> tag)
 | |
|       -h, --help
 | |
|             This help
 | |
| 
 | |
| Examples
 | |
| 
 | |
|     Read from php.xml, write to PHP.php
 | |
| 
 | |
|         generate -x php.xml -p PHP.php
 | |
| 
 | |
|     Read from php.xml, write to standard output
 | |
| 
 | |
|         generate -x php.xml
 | |
| 
 | |
|     Read from php.xml, write to PHP.php, read from xml.xml, write to XML.php
 | |
| 
 | |
|         generate -x php.xml -p PHP.php -x xml.xml -p XML.php
 | |
| 
 | |
|     Read from php.xml, write to /some/dir/PHP.php, read from xml.xml, write to
 | |
|     /some/dir/XML.php (assuming that xml.xml contains <highlight lang="xml">, and
 | |
|     php.xml contains <highlight lang="php">)
 | |
| 
 | |
|         generate -x php.xml -x xml.xml -d /some/dir/
 | |
| 
 | |
| 
 | |
| 
 | |
| Renderers
 | |
| =========
 | |
| 
 | |
| Introduction
 | |
| ------------
 | |
| 
 | |
| Text_Highlighter supports renderes. Using renderers, you can get output in
 | |
| different formats. Two renderers are included in the package:
 | |
| 
 | |
|     - HTML renderer. Generates HTML output. A style sheet should be linked to
 | |
|       the document to display colored text
 | |
| 
 | |
|     - Console renderer. Can be used to output highlighted text to
 | |
|       color-capable terminals, either directly or trough less -r
 | |
| 
 | |
| 
 | |
| Renderers API
 | |
| -------------
 | |
| 
 | |
| Renderers are subclasses of Text_Highlighter_Renderer. Renderer should
 | |
| override at least two methods - acceptToken and getOutput. Overriding other
 | |
| methods is optional, depending on the nature of renderer's output and details
 | |
| of implementation.
 | |
| 
 | |
|     string reset()
 | |
|         resets renderer state. This method is called every time before a new
 | |
|         source file is highlighted.
 | |
| 
 | |
|     string preprocess(string $code)
 | |
|         preprocesses code. Can be used, for example, to normalize whitespace
 | |
|         before highlighting. Returns preprocessed string.
 | |
| 
 | |
|     void acceptToken(string $group, string $content)
 | |
|         the core method of the renderer. Highlighter passes chunks of text to
 | |
|         this method in $content, and color group in $group
 | |
| 
 | |
|     void finalize()
 | |
|         signals the renderer that no more tokens are available.
 | |
| 
 | |
|     mixed getOutput()
 | |
|         returns generated output.
 | |
| 
 | |
| 
 | |
| Setting renderer options
 | |
| --------------------------------
 | |
| 
 | |
| Renderers accept an optional argument to their constructor  - options array.
 | |
| Elements of this array are renderer-specific.
 | |
| 
 | |
| HTML renderer
 | |
| -------------
 | |
| 
 | |
| HTML renderer produces HTML output with optional line numbering. The renderer
 | |
| itself does not provide information about actual colors of highlighted text.
 | |
| Instead, <span class="hl-XXX"> is used, where XXX is replaced with color group
 | |
| name (hl-var, hl-string, etc.). It is up to you to create a CSS stylesheet.
 | |
| If 'use_language' option with value evaluating to true was passed, class names
 | |
| will be formatted as "LANG-hl-XXX", where LANG is language name as defined in
 | |
| highlighter XML source ("lang" attribute of <highlight> tag) in lower case.
 | |
| 
 | |
| There are 3 special CSS classes:
 | |
| 
 | |
|     hl-main - this class applies to whole output or right table column,
 | |
|               depending on 'numbers' option
 | |
|     hl-gutter - applies to left column in table
 | |
|     hl-table - applies to whole table
 | |
| 
 | |
| HTML renderer accepts following options (each being optional):
 | |
|     
 | |
|     * numbers - line numbering style.
 | |
|         0 - no numbering (default)
 | |
|         HL_NUMBERS_LI - use <ol></ol> for line numbering
 | |
|         HL_NUMBERS_TABLE  - create a 2-column table, with line numbers in left
 | |
|                             column and highlighted text in right column
 | |
| 
 | |
|     * tabsize - tabulation size. Defaults to 4
 | |
| 
 | |
|     Example:
 | |
|         
 | |
|         require_once 'Text/Highlighter/Renderer/Html.php';
 | |
|         $options = array(
 | |
|             'numbers' => HL_NUMBERS_LI,
 | |
|             'tabsize' => 8,
 | |
|         );
 | |
|         $renderer = new Text_Highlighter_Renderer_HTML($options);
 | |
| 
 | |
| Console renderer
 | |
| ----------------
 | |
| 
 | |
| Console renderer produces output for displaying on a color-capable terminal,
 | |
| either directly or through less -r, using ANSI escape sequences. By default,
 | |
| this renderer only highlights most common color groups. Additional colors
 | |
| can be specified using 'colors' option. This renderer also accepts 'numbers'
 | |
| option - a boolean value, and 'tabsize' option.
 | |
| 
 | |
|     Example :
 | |
| 
 | |
|         require_once 'Text/Highlighter/Renderer/Console.php';
 | |
|         $colors = array(
 | |
|             'prepro' => "\033[35m",
 | |
|             'types' => "\033[32m",
 | |
|         );
 | |
|         $options = array(
 | |
|             'numbers' => true,
 | |
|             'tabsize' => 8,
 | |
|             'colors' => $colors,
 | |
|         );
 | |
|         $renderer = new Text_Highlighter_Renderer_Console($options);
 | |
| 
 | |
| 
 | |
| ANSI color escape sequences have the following format:
 | |
| 
 | |
|     ESC[#;#;....;#m
 | |
| 
 | |
| where ESC is character with ASCII code 27 (033 octal, 0x1B hexadecimal). # is
 | |
| one of the following:
 | |
| 
 | |
|         0 for normal display
 | |
|         1 for bold on
 | |
|         4 underline (mono only)
 | |
|         5 blink on
 | |
|         7 reverse video on
 | |
|         8 nondisplayed (invisible)
 | |
|         30 black foreground
 | |
|         31 red foreground
 | |
|         32 green foreground
 | |
|         33 yellow foreground
 | |
|         34 blue foreground
 | |
|         35 magenta foreground
 | |
|         36 cyan foreground
 | |
|         37 white foreground
 | |
|         40 black background
 | |
|         41 red background
 | |
|         42 green background
 | |
|         43 yellow background
 | |
|         44 blue background
 | |
|         45 magenta background
 | |
|         46 cyan background
 | |
|         47 white background
 | |
| 
 | |
| 
 | |
| How to use Text_Highlighter class
 | |
| =================================
 | |
| 
 | |
| Creating a highlighter object
 | |
| -----------------------------
 | |
| 
 | |
| To create a highlighter for a certain language, use Text_Highlighter::factory()
 | |
| static method:
 | |
| 
 | |
|     require_once 'Text/Highlighter.php';
 | |
|     $hl = Text_Highlighter::factory('php');
 | |
| 
 | |
| 
 | |
| Setting a renderer
 | |
| ------------------
 | |
| 
 | |
| Actual output is produced by a renderer.
 | |
| 
 | |
|     require_once 'Text/Highlighter.php';
 | |
|     require_once 'Text/Highlighter/Renderer/Html.php';
 | |
|     $options = array(
 | |
|         'numbers' => HL_NUMBERS_LI,
 | |
|         'tabsize' => 8,
 | |
|     );
 | |
|     $renderer = new Text_Highlighter_Renderer_HTML($options);
 | |
|     $hl = Text_Highlighter::factory('php');
 | |
|     $hl->setRenderer($renderer);
 | |
| 
 | |
| Note that for BC reasons, it is possible to use highlighter without setting a
 | |
| renderer. If no renderer is set, HTML renderer will be used by default. In
 | |
| this case, you should pass options as second parameter to factory method. The
 | |
| following example works exactly as previous one:
 | |
| 
 | |
|     require_once 'Text/Highlighter.php';
 | |
|     $options = array(
 | |
|         'numbers' => HL_NUMBERS_LI,
 | |
|         'tabsize' => 8,
 | |
|     );
 | |
|     $hl = Text_Highlighter::factory('php', $options);
 | |
| 
 | |
| 
 | |
| Getting output
 | |
| --------------
 | |
| 
 | |
| And finally, do the highlighting and get the output:
 | |
| 
 | |
|     require_once 'Text/Highlighter.php';
 | |
|     require_once 'Text/Highlighter/Renderer/Html.php';
 | |
|     $options = array(
 | |
|         'numbers' => HL_NUMBERS_LI,
 | |
|         'tabsize' => 8,
 | |
|     );
 | |
|     $renderer = new Text_Highlighter_Renderer_HTML($options);
 | |
|     $hl = Text_Highlighter::factory('php');
 | |
|     $hl->setRenderer($renderer);
 | |
|     $html = $hl->highlight(file_get_contents('example.php'));
 | |
| 
 | |
| # vim: set autoindent tabstop=4 shiftwidth=4 softtabstop=4 tw=78: */
 | |
| 
 | 
