|
|
@ -1,455 +0,0 @@ |
|
|
|
# $Id$ |
|
|
|
|
|
|
|
Introduction |
|
|
|
============ |
|
|
|
|
|
|
|
Text_Highlighter is a class for syntax highlighting. The main idea is to |
|
|
|
simplify creation of subclasses implementing syntax highlighting for |
|
|
|
particular language. Subclasses do not implement any new functioanality, they |
|
|
|
just provide syntax highlighting rules. The rules sources are in XML format. |
|
|
|
To create a highlighter for a language, there is no need to code a new class |
|
|
|
manually. Simply describe the rules in XML file and use Text_Highlighter_Generator |
|
|
|
to create a new class. |
|
|
|
|
|
|
|
|
|
|
|
This document does not contain a formal description of API - it is very |
|
|
|
simple, and I believe providing some examples of code is sufficient. |
|
|
|
|
|
|
|
|
|
|
|
Highlighter XML source |
|
|
|
====================== |
|
|
|
|
|
|
|
Basics |
|
|
|
------ |
|
|
|
|
|
|
|
Creating a new syntax highlighter begins with describing the highlighting |
|
|
|
rules. There are two basic elements: block and region. A block is just a |
|
|
|
portion of text matching a regular expression and highlighted with a single |
|
|
|
color. Keyword is an example of a block. A region is defined by two regular |
|
|
|
expressions: one for start of region, and another for the end. The main |
|
|
|
difference from a block is that a region can contain blocks and regions |
|
|
|
(including same-named regions). An example of a region is a group of |
|
|
|
statements enclosed in curly brackets (this is used in many languages, for |
|
|
|
example PHP and C). Also, characters matching start and end of a region may be |
|
|
|
highlighted with their own color, and region contents with another. |
|
|
|
|
|
|
|
Blocks and regions may be declared as contained. Contained blocks and regions |
|
|
|
can only appear inside regions. If a region or a block is not declared as |
|
|
|
contained, it can appear both on top level and inside regions. Block or region |
|
|
|
declared as not-contained can only appear on top level. |
|
|
|
|
|
|
|
For any region, a list of blocks and regions that can appear inside this |
|
|
|
region can be specified. |
|
|
|
|
|
|
|
In this document, the term "color group" is used. Chunks of text assigned to |
|
|
|
same color group will be highlighted with same color. Note that in versions |
|
|
|
prior 0.5.0 color goups were refered as CSS classes, but since 0.5.0 not only |
|
|
|
HTML output is supported, so "color group" is more appropriate term. |
|
|
|
|
|
|
|
Elements |
|
|
|
-------- |
|
|
|
|
|
|
|
The toplevel element is <highlight>. Attribute lang is required and denotes |
|
|
|
the name of the language. Its value is used as a part of generated class name, |
|
|
|
and must only contain letters, digits and underscores. Optional attribute |
|
|
|
case, when given value yes, makes the language case sensitive (default is case |
|
|
|
insensitive). Allowed subelements are: |
|
|
|
|
|
|
|
* <authors>: Information about the authors of the file. |
|
|
|
<author>: Information about a single author of the file. (May be used |
|
|
|
multiple times, one per author.) |
|
|
|
- name="...": Author's name. Required. |
|
|
|
- email="...": Author's email address. Optional. |
|
|
|
|
|
|
|
* <default>: Default color group. |
|
|
|
- innerGroup="...": color group name. Required. |
|
|
|
|
|
|
|
* <region>: Region definition |
|
|
|
- name="...": Region name. Required. |
|
|
|
- innerGroup="...": Default color group of region contents. Required. |
|
|
|
- delimGroup="...": color group of start and end of region. Optional, |
|
|
|
defaults to value of innerGroup attribute. |
|
|
|
- start="...", end="...": Regular expression matching start and end |
|
|
|
of region. Required. Regular expression delimiters are optional, but |
|
|
|
if you need to specify delimiter, use /. The only case when the |
|
|
|
delimiters are needed, is specifying regular expression modifiers, |
|
|
|
such as m or U. Examples: \/\* or /$/m. |
|
|
|
- contained="yes": Marks region as contained. |
|
|
|
- never-contained="yes": Marks region as not-contained. |
|
|
|
- <contains>: Elements allowed inside this region. |
|
|
|
- all="yes" Region can contain any other region or block |
|
|
|
(except not-contained). May be used multiple times. |
|
|
|
- <but> Do not allow certain regions or blocks. |
|
|
|
- region="..." Name of region not allowed within |
|
|
|
current region. |
|
|
|
- block="..." Name of block not allowed within |
|
|
|
current region. |
|
|
|
- region="..." Name of region allowed within current region. |
|
|
|
- block="..." Name of block allowed within current region. |
|
|
|
- <onlyin> Only allow this region within certain regions. May be |
|
|
|
used multiple times. |
|
|
|
- block="..." Name of parent region |
|
|
|
|
|
|
|
* <block>: Block definition |
|
|
|
- name="...": Block name. Required. |
|
|
|
- innerGroup="...": color group of block contents. Optional. If not |
|
|
|
specified, color group of parent region or default color group will be |
|
|
|
used. One would only want to omit this attribute if there are |
|
|
|
keyword groups (see below) inherited from this block, and no special |
|
|
|
highlighting should apply when the block does not match the keyword. |
|
|
|
- match="..." Regular expression matching the block. Required. |
|
|
|
Regular expression delimiters are optional, but if you need to |
|
|
|
specify delimiter, use /. The only case when the delimiters are |
|
|
|
needed, is specifying regular expression modifiers, such as m or U. |
|
|
|
Examples: #|\/\/ or /$/m. |
|
|
|
- contained="yes": Marks block as contained. |
|
|
|
- never-contained="yes": Marks block as not-contained. |
|
|
|
- <onlyin> Only allow this block within certain regions. May be used |
|
|
|
multiple times. |
|
|
|
- block="..." Name of parent region |
|
|
|
- multiline="yes": Marks block as multi-line. By default, whole |
|
|
|
blocks are assumed to reside in a single line. This make the things |
|
|
|
faster. If you need to declare a multi-line block, use this |
|
|
|
attribute. |
|
|
|
- <partgroup>: Assigns another color group to a part of the block that |
|
|
|
matched a subpattern. |
|
|
|
- index="n": Subpattern index. Required. |
|
|
|
- innerGroup="...": color group name. Required. |
|
|
|
|
|
|
|
This is an example from CSS highlighter: the measure is matched as |
|
|
|
a whole, but the measurement units are highlighted with different |
|
|
|
color. |
|
|
|
|
|
|
|
<block name="measure" match="\d*\.?\d+(\%|em|ex|pc|pt|px|in|mm|cm)" |
|
|
|
innerGroup="number" contained="yes"> |
|
|
|
<onlyin region="property"/> |
|
|
|
<partGroup index="1" innerGroup="string" /> |
|
|
|
</block> |
|
|
|
|
|
|
|
* <keywords>: Keyword group definition. Keyword groups are useful when you |
|
|
|
want to highlight some words that match a condition for a block with a |
|
|
|
different color. Keywords are defined with literal match, not regular |
|
|
|
expressions. For example, you have a block named identifier matching a |
|
|
|
general identifier, and want to highlight reserved words (which match |
|
|
|
this block as well) with different color. You inherit a keyword group |
|
|
|
"reserved" from "identifier" block. |
|
|
|
- name="...": Keyword group. Required. |
|
|
|
- ifdef="...", ifndef="..." : Conditional declaration. See |
|
|
|
"Conditions" below. |
|
|
|
- inherits="...": Inherited block name. Required. |
|
|
|
- innerGroup="...": color group of keyword group. Required. |
|
|
|
- case="yes|no": Overrides case-sensitivity of the language. |
|
|
|
Optional, defaults to global value. |
|
|
|
- <keyword>: Single keyword definition. |
|
|
|
- match="..." The keyword. Note: this is not a regular |
|
|
|
expression, but literal match (possibly case insensitive). |
|
|
|
|
|
|
|
Note that for BC reasons element partClass is alias for partGroup, and |
|
|
|
attributes innerClass and delimClass are aliases of innerGroup and |
|
|
|
delimGroup, respectively. |
|
|
|
|
|
|
|
|
|
|
|
Conditions |
|
|
|
---------- |
|
|
|
|
|
|
|
Conditional declarations allow enabling or disabling certain highlighting |
|
|
|
rules at runtime. For example, Java highlighter has a very big list of |
|
|
|
keywords matching Java standard classes. Finding a match in this list can take |
|
|
|
much time. For that reason, corresponding keyword group is declared with |
|
|
|
"ifdef" attribute : |
|
|
|
|
|
|
|
<keywords name="builtin" inherits="identifier" innerClass="builtin" |
|
|
|
case="yes" ifdef="java.builtins"> |
|
|
|
<keyword match="AbstractAction" /> |
|
|
|
<keyword match="AbstractBorder" /> |
|
|
|
<keyword match="AbstractButton" /> |
|
|
|
... |
|
|
|
... |
|
|
|
<keyword match="_Remote_Stub" /> |
|
|
|
<keyword match="_ServantActivatorStub" /> |
|
|
|
<keyword match="_ServantLocatorStub" /> |
|
|
|
</keywords> |
|
|
|
|
|
|
|
This keyword group will be only enabled when "java.builtins" is passed as an |
|
|
|
element of "defines" option: |
|
|
|
|
|
|
|
$options = array( |
|
|
|
'defines' => array( |
|
|
|
'java.builtins', |
|
|
|
), |
|
|
|
'numbers' => HL_NUMBERS_TABLE, |
|
|
|
); |
|
|
|
$highlighter = Text_Highlighter::factory('java', $options); |
|
|
|
|
|
|
|
"ifndef" attribute has reverse meaning. |
|
|
|
|
|
|
|
Currently, "ifdef" and "ifndef" attributes are only supported for <keywords> |
|
|
|
tag. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Class generation |
|
|
|
================ |
|
|
|
|
|
|
|
Creating XML description of highlighting rules is the most complicated part of |
|
|
|
the process. To generate the class, you need just few lines of code: |
|
|
|
|
|
|
|
<?php |
|
|
|
require_once 'Text/Highlighter/Generator.php'; |
|
|
|
$generator = new Text_Highlighter_Generator('php.xml'); |
|
|
|
$generator->generate(); |
|
|
|
$generator->saveCode('PHP.php'); |
|
|
|
?> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Command-line class generation tool |
|
|
|
================================== |
|
|
|
|
|
|
|
Example from previous section looks pretty simple, but it does not handle any |
|
|
|
errors which may occur during parsing of XML source. The package provides a |
|
|
|
command-line script to make generation of classes even more simple, and takes |
|
|
|
care of possible errors. It is called generate (on Unix/Linux) or generate.bat |
|
|
|
(on Windows). This script is able to process multiple files in one run, and |
|
|
|
also to process XML from standard input and write generated code to standard |
|
|
|
output. |
|
|
|
|
|
|
|
Usage: |
|
|
|
generate options |
|
|
|
|
|
|
|
Options: |
|
|
|
-x filename, --xml=filename |
|
|
|
source XML file. Multiple input files can be specified, in which |
|
|
|
case each -x option must be followed by -p unless -d is specified |
|
|
|
Defaults to stdin |
|
|
|
-p filename, --php=filename |
|
|
|
destination PHP file. Defaults to stdout. If specied multiple times, |
|
|
|
each -p must follow -x |
|
|
|
-d dirname, --dir=dirname |
|
|
|
Default destination directory. File names will be taken from XML input |
|
|
|
("lang" attribute of <highlight> tag) |
|
|
|
-h, --help |
|
|
|
This help |
|
|
|
|
|
|
|
Examples |
|
|
|
|
|
|
|
Read from php.xml, write to PHP.php |
|
|
|
|
|
|
|
generate -x php.xml -p PHP.php |
|
|
|
|
|
|
|
Read from php.xml, write to standard output |
|
|
|
|
|
|
|
generate -x php.xml |
|
|
|
|
|
|
|
Read from php.xml, write to PHP.php, read from xml.xml, write to XML.php |
|
|
|
|
|
|
|
generate -x php.xml -p PHP.php -x xml.xml -p XML.php |
|
|
|
|
|
|
|
Read from php.xml, write to /some/dir/PHP.php, read from xml.xml, write to |
|
|
|
/some/dir/XML.php (assuming that xml.xml contains <highlight lang="xml">, and |
|
|
|
php.xml contains <highlight lang="php">) |
|
|
|
|
|
|
|
generate -x php.xml -x xml.xml -d /some/dir/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Renderers |
|
|
|
========= |
|
|
|
|
|
|
|
Introduction |
|
|
|
------------ |
|
|
|
|
|
|
|
Text_Highlighter supports renderes. Using renderers, you can get output in |
|
|
|
different formats. Two renderers are included in the package: |
|
|
|
|
|
|
|
- HTML renderer. Generates HTML output. A style sheet should be linked to |
|
|
|
the document to display colored text |
|
|
|
|
|
|
|
- Console renderer. Can be used to output highlighted text to |
|
|
|
color-capable terminals, either directly or trough less -r |
|
|
|
|
|
|
|
|
|
|
|
Renderers API |
|
|
|
------------- |
|
|
|
|
|
|
|
Renderers are subclasses of Text_Highlighter_Renderer. Renderer should |
|
|
|
override at least two methods - acceptToken and getOutput. Overriding other |
|
|
|
methods is optional, depending on the nature of renderer's output and details |
|
|
|
of implementation. |
|
|
|
|
|
|
|
string reset() |
|
|
|
resets renderer state. This method is called every time before a new |
|
|
|
source file is highlighted. |
|
|
|
|
|
|
|
string preprocess(string $code) |
|
|
|
preprocesses code. Can be used, for example, to normalize whitespace |
|
|
|
before highlighting. Returns preprocessed string. |
|
|
|
|
|
|
|
void acceptToken(string $group, string $content) |
|
|
|
the core method of the renderer. Highlighter passes chunks of text to |
|
|
|
this method in $content, and color group in $group |
|
|
|
|
|
|
|
void finalize() |
|
|
|
signals the renderer that no more tokens are available. |
|
|
|
|
|
|
|
mixed getOutput() |
|
|
|
returns generated output. |
|
|
|
|
|
|
|
|
|
|
|
Setting renderer options |
|
|
|
-------------------------------- |
|
|
|
|
|
|
|
Renderers accept an optional argument to their constructor - options array. |
|
|
|
Elements of this array are renderer-specific. |
|
|
|
|
|
|
|
HTML renderer |
|
|
|
------------- |
|
|
|
|
|
|
|
HTML renderer produces HTML output with optional line numbering. The renderer |
|
|
|
itself does not provide information about actual colors of highlighted text. |
|
|
|
Instead, <span class="hl-XXX"> is used, where XXX is replaced with color group |
|
|
|
name (hl-var, hl-string, etc.). It is up to you to create a CSS stylesheet. |
|
|
|
If 'use_language' option with value evaluating to true was passed, class names |
|
|
|
will be formatted as "LANG-hl-XXX", where LANG is language name as defined in |
|
|
|
highlighter XML source ("lang" attribute of <highlight> tag) in lower case. |
|
|
|
|
|
|
|
There are 3 special CSS classes: |
|
|
|
|
|
|
|
hl-main - this class applies to whole output or right table column, |
|
|
|
depending on 'numbers' option |
|
|
|
hl-gutter - applies to left column in table |
|
|
|
hl-table - applies to whole table |
|
|
|
|
|
|
|
HTML renderer accepts following options (each being optional): |
|
|
|
|
|
|
|
* numbers - line numbering style. |
|
|
|
0 - no numbering (default) |
|
|
|
HL_NUMBERS_LI - use <ol></ol> for line numbering |
|
|
|
HL_NUMBERS_TABLE - create a 2-column table, with line numbers in left |
|
|
|
column and highlighted text in right column |
|
|
|
|
|
|
|
* tabsize - tabulation size. Defaults to 4 |
|
|
|
|
|
|
|
Example: |
|
|
|
|
|
|
|
require_once 'Text/Highlighter/Renderer/Html.php'; |
|
|
|
$options = array( |
|
|
|
'numbers' => HL_NUMBERS_LI, |
|
|
|
'tabsize' => 8, |
|
|
|
); |
|
|
|
$renderer = new Text_Highlighter_Renderer_HTML($options); |
|
|
|
|
|
|
|
Console renderer |
|
|
|
---------------- |
|
|
|
|
|
|
|
Console renderer produces output for displaying on a color-capable terminal, |
|
|
|
either directly or through less -r, using ANSI escape sequences. By default, |
|
|
|
this renderer only highlights most common color groups. Additional colors |
|
|
|
can be specified using 'colors' option. This renderer also accepts 'numbers' |
|
|
|
option - a boolean value, and 'tabsize' option. |
|
|
|
|
|
|
|
Example : |
|
|
|
|
|
|
|
require_once 'Text/Highlighter/Renderer/Console.php'; |
|
|
|
$colors = array( |
|
|
|
'prepro' => "\033[35m", |
|
|
|
'types' => "\033[32m", |
|
|
|
); |
|
|
|
$options = array( |
|
|
|
'numbers' => true, |
|
|
|
'tabsize' => 8, |
|
|
|
'colors' => $colors, |
|
|
|
); |
|
|
|
$renderer = new Text_Highlighter_Renderer_Console($options); |
|
|
|
|
|
|
|
|
|
|
|
ANSI color escape sequences have the following format: |
|
|
|
|
|
|
|
ESC[#;#;....;#m |
|
|
|
|
|
|
|
where ESC is character with ASCII code 27 (033 octal, 0x1B hexadecimal). # is |
|
|
|
one of the following: |
|
|
|
|
|
|
|
0 for normal display |
|
|
|
1 for bold on |
|
|
|
4 underline (mono only) |
|
|
|
5 blink on |
|
|
|
7 reverse video on |
|
|
|
8 nondisplayed (invisible) |
|
|
|
30 black foreground |
|
|
|
31 red foreground |
|
|
|
32 green foreground |
|
|
|
33 yellow foreground |
|
|
|
34 blue foreground |
|
|
|
35 magenta foreground |
|
|
|
36 cyan foreground |
|
|
|
37 white foreground |
|
|
|
40 black background |
|
|
|
41 red background |
|
|
|
42 green background |
|
|
|
43 yellow background |
|
|
|
44 blue background |
|
|
|
45 magenta background |
|
|
|
46 cyan background |
|
|
|
47 white background |
|
|
|
|
|
|
|
|
|
|
|
How to use Text_Highlighter class |
|
|
|
================================= |
|
|
|
|
|
|
|
Creating a highlighter object |
|
|
|
----------------------------- |
|
|
|
|
|
|
|
To create a highlighter for a certain language, use Text_Highlighter::factory() |
|
|
|
static method: |
|
|
|
|
|
|
|
require_once 'Text/Highlighter.php'; |
|
|
|
$hl = Text_Highlighter::factory('php'); |
|
|
|
|
|
|
|
|
|
|
|
Setting a renderer |
|
|
|
------------------ |
|
|
|
|
|
|
|
Actual output is produced by a renderer. |
|
|
|
|
|
|
|
require_once 'Text/Highlighter.php'; |
|
|
|
require_once 'Text/Highlighter/Renderer/Html.php'; |
|
|
|
$options = array( |
|
|
|
'numbers' => HL_NUMBERS_LI, |
|
|
|
'tabsize' => 8, |
|
|
|
); |
|
|
|
$renderer = new Text_Highlighter_Renderer_HTML($options); |
|
|
|
$hl = Text_Highlighter::factory('php'); |
|
|
|
$hl->setRenderer($renderer); |
|
|
|
|
|
|
|
Note that for BC reasons, it is possible to use highlighter without setting a |
|
|
|
renderer. If no renderer is set, HTML renderer will be used by default. In |
|
|
|
this case, you should pass options as second parameter to factory method. The |
|
|
|
following example works exactly as previous one: |
|
|
|
|
|
|
|
require_once 'Text/Highlighter.php'; |
|
|
|
$options = array( |
|
|
|
'numbers' => HL_NUMBERS_LI, |
|
|
|
'tabsize' => 8, |
|
|
|
); |
|
|
|
$hl = Text_Highlighter::factory('php', $options); |
|
|
|
|
|
|
|
|
|
|
|
Getting output |
|
|
|
-------------- |
|
|
|
|
|
|
|
And finally, do the highlighting and get the output: |
|
|
|
|
|
|
|
require_once 'Text/Highlighter.php'; |
|
|
|
require_once 'Text/Highlighter/Renderer/Html.php'; |
|
|
|
$options = array( |
|
|
|
'numbers' => HL_NUMBERS_LI, |
|
|
|
'tabsize' => 8, |
|
|
|
); |
|
|
|
$renderer = new Text_Highlighter_Renderer_HTML($options); |
|
|
|
$hl = Text_Highlighter::factory('php'); |
|
|
|
$hl->setRenderer($renderer); |
|
|
|
$html = $hl->highlight(file_get_contents('example.php')); |
|
|
|
|
|
|
|
# vim: set autoindent tabstop=4 shiftwidth=4 softtabstop=4 tw=78: */ |
|
|
|
|