@ -0,0 +1,504 @@ | |||
GNU LESSER GENERAL PUBLIC LICENSE | |||
Version 2.1, February 1999 | |||
Copyright (C) 1991, 1999 Free Software Foundation, Inc. | |||
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA | |||
Everyone is permitted to copy and distribute verbatim copies | |||
of this license document, but changing it is not allowed. | |||
[This is the first released version of the Lesser GPL. It also counts | |||
as the successor of the GNU Library Public License, version 2, hence | |||
the version number 2.1.] | |||
Preamble | |||
The licenses for most software are designed to take away your | |||
freedom to share and change it. By contrast, the GNU General Public | |||
Licenses are intended to guarantee your freedom to share and change | |||
free software--to make sure the software is free for all its users. | |||
This license, the Lesser General Public License, applies to some | |||
specially designated software packages--typically libraries--of the | |||
Free Software Foundation and other authors who decide to use it. You | |||
can use it too, but we suggest you first think carefully about whether | |||
this license or the ordinary General Public License is the better | |||
strategy to use in any particular case, based on the explanations below. | |||
When we speak of free software, we are referring to freedom of use, | |||
not price. Our General Public Licenses are designed to make sure that | |||
you have the freedom to distribute copies of free software (and charge | |||
for this service if you wish); that you receive source code or can get | |||
it if you want it; that you can change the software and use pieces of | |||
it in new free programs; and that you are informed that you can do | |||
these things. | |||
To protect your rights, we need to make restrictions that forbid | |||
distributors to deny you these rights or to ask you to surrender these | |||
rights. These restrictions translate to certain responsibilities for | |||
you if you distribute copies of the library or if you modify it. | |||
For example, if you distribute copies of the library, whether gratis | |||
or for a fee, you must give the recipients all the rights that we gave | |||
you. You must make sure that they, too, receive or can get the source | |||
code. If you link other code with the library, you must provide | |||
complete object files to the recipients, so that they can relink them | |||
with the library after making changes to the library and recompiling | |||
it. And you must show them these terms so they know their rights. | |||
We protect your rights with a two-step method: (1) we copyright the | |||
library, and (2) we offer you this license, which gives you legal | |||
permission to copy, distribute and/or modify the library. | |||
To protect each distributor, we want to make it very clear that | |||
there is no warranty for the free library. Also, if the library is | |||
modified by someone else and passed on, the recipients should know | |||
that what they have is not the original version, so that the original | |||
author's reputation will not be affected by problems that might be | |||
introduced by others. | |||
Finally, software patents pose a constant threat to the existence of | |||
any free program. We wish to make sure that a company cannot | |||
effectively restrict the users of a free program by obtaining a | |||
restrictive license from a patent holder. Therefore, we insist that | |||
any patent license obtained for a version of the library must be | |||
consistent with the full freedom of use specified in this license. | |||
Most GNU software, including some libraries, is covered by the | |||
ordinary GNU General Public License. This license, the GNU Lesser | |||
General Public License, applies to certain designated libraries, and | |||
is quite different from the ordinary General Public License. We use | |||
this license for certain libraries in order to permit linking those | |||
libraries into non-free programs. | |||
When a program is linked with a library, whether statically or using | |||
a shared library, the combination of the two is legally speaking a | |||
combined work, a derivative of the original library. The ordinary | |||
General Public License therefore permits such linking only if the | |||
entire combination fits its criteria of freedom. The Lesser General | |||
Public License permits more lax criteria for linking other code with | |||
the library. | |||
We call this license the "Lesser" General Public License because it | |||
does Less to protect the user's freedom than the ordinary General | |||
Public License. It also provides other free software developers Less | |||
of an advantage over competing non-free programs. These disadvantages | |||
are the reason we use the ordinary General Public License for many | |||
libraries. However, the Lesser license provides advantages in certain | |||
special circumstances. | |||
For example, on rare occasions, there may be a special need to | |||
encourage the widest possible use of a certain library, so that it becomes | |||
a de-facto standard. To achieve this, non-free programs must be | |||
allowed to use the library. A more frequent case is that a free | |||
library does the same job as widely used non-free libraries. In this | |||
case, there is little to gain by limiting the free library to free | |||
software only, so we use the Lesser General Public License. | |||
In other cases, permission to use a particular library in non-free | |||
programs enables a greater number of people to use a large body of | |||
free software. For example, permission to use the GNU C Library in | |||
non-free programs enables many more people to use the whole GNU | |||
operating system, as well as its variant, the GNU/Linux operating | |||
system. | |||
Although the Lesser General Public License is Less protective of the | |||
users' freedom, it does ensure that the user of a program that is | |||
linked with the Library has the freedom and the wherewithal to run | |||
that program using a modified version of the Library. | |||
The precise terms and conditions for copying, distribution and | |||
modification follow. Pay close attention to the difference between a | |||
"work based on the library" and a "work that uses the library". The | |||
former contains code derived from the library, whereas the latter must | |||
be combined with the library in order to run. | |||
GNU LESSER GENERAL PUBLIC LICENSE | |||
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION | |||
0. This License Agreement applies to any software library or other | |||
program which contains a notice placed by the copyright holder or | |||
other authorized party saying it may be distributed under the terms of | |||
this Lesser General Public License (also called "this License"). | |||
Each licensee is addressed as "you". | |||
A "library" means a collection of software functions and/or data | |||
prepared so as to be conveniently linked with application programs | |||
(which use some of those functions and data) to form executables. | |||
The "Library", below, refers to any such software library or work | |||
which has been distributed under these terms. A "work based on the | |||
Library" means either the Library or any derivative work under | |||
copyright law: that is to say, a work containing the Library or a | |||
portion of it, either verbatim or with modifications and/or translated | |||
straightforwardly into another language. (Hereinafter, translation is | |||
included without limitation in the term "modification".) | |||
"Source code" for a work means the preferred form of the work for | |||
making modifications to it. For a library, complete source code means | |||
all the source code for all modules it contains, plus any associated | |||
interface definition files, plus the scripts used to control compilation | |||
and installation of the library. | |||
Activities other than copying, distribution and modification are not | |||
covered by this License; they are outside its scope. The act of | |||
running a program using the Library is not restricted, and output from | |||
such a program is covered only if its contents constitute a work based | |||
on the Library (independent of the use of the Library in a tool for | |||
writing it). Whether that is true depends on what the Library does | |||
and what the program that uses the Library does. | |||
1. You may copy and distribute verbatim copies of the Library's | |||
complete source code as you receive it, in any medium, provided that | |||
you conspicuously and appropriately publish on each copy an | |||
appropriate copyright notice and disclaimer of warranty; keep intact | |||
all the notices that refer to this License and to the absence of any | |||
warranty; and distribute a copy of this License along with the | |||
Library. | |||
You may charge a fee for the physical act of transferring a copy, | |||
and you may at your option offer warranty protection in exchange for a | |||
fee. | |||
2. You may modify your copy or copies of the Library or any portion | |||
of it, thus forming a work based on the Library, and copy and | |||
distribute such modifications or work under the terms of Section 1 | |||
above, provided that you also meet all of these conditions: | |||
a) The modified work must itself be a software library. | |||
b) You must cause the files modified to carry prominent notices | |||
stating that you changed the files and the date of any change. | |||
c) You must cause the whole of the work to be licensed at no | |||
charge to all third parties under the terms of this License. | |||
d) If a facility in the modified Library refers to a function or a | |||
table of data to be supplied by an application program that uses | |||
the facility, other than as an argument passed when the facility | |||
is invoked, then you must make a good faith effort to ensure that, | |||
in the event an application does not supply such function or | |||
table, the facility still operates, and performs whatever part of | |||
its purpose remains meaningful. | |||
(For example, a function in a library to compute square roots has | |||
a purpose that is entirely well-defined independent of the | |||
application. Therefore, Subsection 2d requires that any | |||
application-supplied function or table used by this function must | |||
be optional: if the application does not supply it, the square | |||
root function must still compute square roots.) | |||
These requirements apply to the modified work as a whole. If | |||
identifiable sections of that work are not derived from the Library, | |||
and can be reasonably considered independent and separate works in | |||
themselves, then this License, and its terms, do not apply to those | |||
sections when you distribute them as separate works. But when you | |||
distribute the same sections as part of a whole which is a work based | |||
on the Library, the distribution of the whole must be on the terms of | |||
this License, whose permissions for other licensees extend to the | |||
entire whole, and thus to each and every part regardless of who wrote | |||
it. | |||
Thus, it is not the intent of this section to claim rights or contest | |||
your rights to work written entirely by you; rather, the intent is to | |||
exercise the right to control the distribution of derivative or | |||
collective works based on the Library. | |||
In addition, mere aggregation of another work not based on the Library | |||
with the Library (or with a work based on the Library) on a volume of | |||
a storage or distribution medium does not bring the other work under | |||
the scope of this License. | |||
3. You may opt to apply the terms of the ordinary GNU General Public | |||
License instead of this License to a given copy of the Library. To do | |||
this, you must alter all the notices that refer to this License, so | |||
that they refer to the ordinary GNU General Public License, version 2, | |||
instead of to this License. (If a newer version than version 2 of the | |||
ordinary GNU General Public License has appeared, then you can specify | |||
that version instead if you wish.) Do not make any other change in | |||
these notices. | |||
Once this change is made in a given copy, it is irreversible for | |||
that copy, so the ordinary GNU General Public License applies to all | |||
subsequent copies and derivative works made from that copy. | |||
This option is useful when you wish to copy part of the code of | |||
the Library into a program that is not a library. | |||
4. You may copy and distribute the Library (or a portion or | |||
derivative of it, under Section 2) in object code or executable form | |||
under the terms of Sections 1 and 2 above provided that you accompany | |||
it with the complete corresponding machine-readable source code, which | |||
must be distributed under the terms of Sections 1 and 2 above on a | |||
medium customarily used for software interchange. | |||
If distribution of object code is made by offering access to copy | |||
from a designated place, then offering equivalent access to copy the | |||
source code from the same place satisfies the requirement to | |||
distribute the source code, even though third parties are not | |||
compelled to copy the source along with the object code. | |||
5. A program that contains no derivative of any portion of the | |||
Library, but is designed to work with the Library by being compiled or | |||
linked with it, is called a "work that uses the Library". Such a | |||
work, in isolation, is not a derivative work of the Library, and | |||
therefore falls outside the scope of this License. | |||
However, linking a "work that uses the Library" with the Library | |||
creates an executable that is a derivative of the Library (because it | |||
contains portions of the Library), rather than a "work that uses the | |||
library". The executable is therefore covered by this License. | |||
Section 6 states terms for distribution of such executables. | |||
When a "work that uses the Library" uses material from a header file | |||
that is part of the Library, the object code for the work may be a | |||
derivative work of the Library even though the source code is not. | |||
Whether this is true is especially significant if the work can be | |||
linked without the Library, or if the work is itself a library. The | |||
threshold for this to be true is not precisely defined by law. | |||
If such an object file uses only numerical parameters, data | |||
structure layouts and accessors, and small macros and small inline | |||
functions (ten lines or less in length), then the use of the object | |||
file is unrestricted, regardless of whether it is legally a derivative | |||
work. (Executables containing this object code plus portions of the | |||
Library will still fall under Section 6.) | |||
Otherwise, if the work is a derivative of the Library, you may | |||
distribute the object code for the work under the terms of Section 6. | |||
Any executables containing that work also fall under Section 6, | |||
whether or not they are linked directly with the Library itself. | |||
6. As an exception to the Sections above, you may also combine or | |||
link a "work that uses the Library" with the Library to produce a | |||
work containing portions of the Library, and distribute that work | |||
under terms of your choice, provided that the terms permit | |||
modification of the work for the customer's own use and reverse | |||
engineering for debugging such modifications. | |||
You must give prominent notice with each copy of the work that the | |||
Library is used in it and that the Library and its use are covered by | |||
this License. You must supply a copy of this License. If the work | |||
during execution displays copyright notices, you must include the | |||
copyright notice for the Library among them, as well as a reference | |||
directing the user to the copy of this License. Also, you must do one | |||
of these things: | |||
a) Accompany the work with the complete corresponding | |||
machine-readable source code for the Library including whatever | |||
changes were used in the work (which must be distributed under | |||
Sections 1 and 2 above); and, if the work is an executable linked | |||
with the Library, with the complete machine-readable "work that | |||
uses the Library", as object code and/or source code, so that the | |||
user can modify the Library and then relink to produce a modified | |||
executable containing the modified Library. (It is understood | |||
that the user who changes the contents of definitions files in the | |||
Library will not necessarily be able to recompile the application | |||
to use the modified definitions.) | |||
b) Use a suitable shared library mechanism for linking with the | |||
Library. A suitable mechanism is one that (1) uses at run time a | |||
copy of the library already present on the user's computer system, | |||
rather than copying library functions into the executable, and (2) | |||
will operate properly with a modified version of the library, if | |||
the user installs one, as long as the modified version is | |||
interface-compatible with the version that the work was made with. | |||
c) Accompany the work with a written offer, valid for at | |||
least three years, to give the same user the materials | |||
specified in Subsection 6a, above, for a charge no more | |||
than the cost of performing this distribution. | |||
d) If distribution of the work is made by offering access to copy | |||
from a designated place, offer equivalent access to copy the above | |||
specified materials from the same place. | |||
e) Verify that the user has already received a copy of these | |||
materials or that you have already sent this user a copy. | |||
For an executable, the required form of the "work that uses the | |||
Library" must include any data and utility programs needed for | |||
reproducing the executable from it. However, as a special exception, | |||
the materials to be distributed need not include anything that is | |||
normally distributed (in either source or binary form) with the major | |||
components (compiler, kernel, and so on) of the operating system on | |||
which the executable runs, unless that component itself accompanies | |||
the executable. | |||
It may happen that this requirement contradicts the license | |||
restrictions of other proprietary libraries that do not normally | |||
accompany the operating system. Such a contradiction means you cannot | |||
use both them and the Library together in an executable that you | |||
distribute. | |||
7. You may place library facilities that are a work based on the | |||
Library side-by-side in a single library together with other library | |||
facilities not covered by this License, and distribute such a combined | |||
library, provided that the separate distribution of the work based on | |||
the Library and of the other library facilities is otherwise | |||
permitted, and provided that you do these two things: | |||
a) Accompany the combined library with a copy of the same work | |||
based on the Library, uncombined with any other library | |||
facilities. This must be distributed under the terms of the | |||
Sections above. | |||
b) Give prominent notice with the combined library of the fact | |||
that part of it is a work based on the Library, and explaining | |||
where to find the accompanying uncombined form of the same work. | |||
8. You may not copy, modify, sublicense, link with, or distribute | |||
the Library except as expressly provided under this License. Any | |||
attempt otherwise to copy, modify, sublicense, link with, or | |||
distribute the Library is void, and will automatically terminate your | |||
rights under this License. However, parties who have received copies, | |||
or rights, from you under this License will not have their licenses | |||
terminated so long as such parties remain in full compliance. | |||
9. You are not required to accept this License, since you have not | |||
signed it. However, nothing else grants you permission to modify or | |||
distribute the Library or its derivative works. These actions are | |||
prohibited by law if you do not accept this License. Therefore, by | |||
modifying or distributing the Library (or any work based on the | |||
Library), you indicate your acceptance of this License to do so, and | |||
all its terms and conditions for copying, distributing or modifying | |||
the Library or works based on it. | |||
10. Each time you redistribute the Library (or any work based on the | |||
Library), the recipient automatically receives a license from the | |||
original licensor to copy, distribute, link with or modify the Library | |||
subject to these terms and conditions. You may not impose any further | |||
restrictions on the recipients' exercise of the rights granted herein. | |||
You are not responsible for enforcing compliance by third parties with | |||
this License. | |||
11. If, as a consequence of a court judgment or allegation of patent | |||
infringement or for any other reason (not limited to patent issues), | |||
conditions are imposed on you (whether by court order, agreement or | |||
otherwise) that contradict the conditions of this License, they do not | |||
excuse you from the conditions of this License. If you cannot | |||
distribute so as to satisfy simultaneously your obligations under this | |||
License and any other pertinent obligations, then as a consequence you | |||
may not distribute the Library at all. For example, if a patent | |||
license would not permit royalty-free redistribution of the Library by | |||
all those who receive copies directly or indirectly through you, then | |||
the only way you could satisfy both it and this License would be to | |||
refrain entirely from distribution of the Library. | |||
If any portion of this section is held invalid or unenforceable under any | |||
particular circumstance, the balance of the section is intended to apply, | |||
and the section as a whole is intended to apply in other circumstances. | |||
It is not the purpose of this section to induce you to infringe any | |||
patents or other property right claims or to contest validity of any | |||
such claims; this section has the sole purpose of protecting the | |||
integrity of the free software distribution system which is | |||
implemented by public license practices. Many people have made | |||
generous contributions to the wide range of software distributed | |||
through that system in reliance on consistent application of that | |||
system; it is up to the author/donor to decide if he or she is willing | |||
to distribute software through any other system and a licensee cannot | |||
impose that choice. | |||
This section is intended to make thoroughly clear what is believed to | |||
be a consequence of the rest of this License. | |||
12. If the distribution and/or use of the Library is restricted in | |||
certain countries either by patents or by copyrighted interfaces, the | |||
original copyright holder who places the Library under this License may add | |||
an explicit geographical distribution limitation excluding those countries, | |||
so that distribution is permitted only in or among countries not thus | |||
excluded. In such case, this License incorporates the limitation as if | |||
written in the body of this License. | |||
13. The Free Software Foundation may publish revised and/or new | |||
versions of the Lesser General Public License from time to time. | |||
Such new versions will be similar in spirit to the present version, | |||
but may differ in detail to address new problems or concerns. | |||
Each version is given a distinguishing version number. If the Library | |||
specifies a version number of this License which applies to it and | |||
"any later version", you have the option of following the terms and | |||
conditions either of that version or of any later version published by | |||
the Free Software Foundation. If the Library does not specify a | |||
license version number, you may choose any version ever published by | |||
the Free Software Foundation. | |||
14. If you wish to incorporate parts of the Library into other free | |||
programs whose distribution conditions are incompatible with these, | |||
write to the author to ask for permission. For software which is | |||
copyrighted by the Free Software Foundation, write to the Free | |||
Software Foundation; we sometimes make exceptions for this. Our | |||
decision will be guided by the two goals of preserving the free status | |||
of all derivatives of our free software and of promoting the sharing | |||
and reuse of software generally. | |||
NO WARRANTY | |||
15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO | |||
WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. | |||
EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR | |||
OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY | |||
KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE | |||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR | |||
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE | |||
LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME | |||
THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. | |||
16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN | |||
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY | |||
AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU | |||
FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR | |||
CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE | |||
LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING | |||
RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A | |||
FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF | |||
SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH | |||
DAMAGES. | |||
END OF TERMS AND CONDITIONS | |||
How to Apply These Terms to Your New Libraries | |||
If you develop a new library, and you want it to be of the greatest | |||
possible use to the public, we recommend making it free software that | |||
everyone can redistribute and change. You can do so by permitting | |||
redistribution under these terms (or, alternatively, under the terms of the | |||
ordinary General Public License). | |||
To apply these terms, attach the following notices to the library. It is | |||
safest to attach them to the start of each source file to most effectively | |||
convey the exclusion of warranty; and each file should have at least the | |||
"copyright" line and a pointer to where the full notice is found. | |||
<one line to give the library's name and a brief idea of what it does.> | |||
Copyright (C) <year> <name of author> | |||
This library is free software; you can redistribute it and/or | |||
modify it under the terms of the GNU Lesser General Public | |||
License as published by the Free Software Foundation; either | |||
version 2.1 of the License, or (at your option) any later version. | |||
This library is distributed in the hope that it will be useful, | |||
but WITHOUT ANY WARRANTY; without even the implied warranty of | |||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | |||
Lesser General Public License for more details. | |||
You should have received a copy of the GNU Lesser General Public | |||
License along with this library; if not, write to the Free Software | |||
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA | |||
Also add information on how to contact you by electronic and paper mail. | |||
You should also get your employer (if you work as a programmer) or your | |||
school, if any, to sign a "copyright disclaimer" for the library, if | |||
necessary. Here is a sample; alter the names: | |||
Yoyodyne, Inc., hereby disclaims all copyright interest in the | |||
library `Frob' (a library for tweaking knobs) written by James Random Hacker. | |||
<signature of Ty Coon>, 1 April 1990 | |||
Ty Coon, President of Vice | |||
That's all there is to it! | |||
@ -0,0 +1,29 @@ | |||
Markdownify | |||
=========== | |||
* handle non-markdownifiable lists (i.e. `<ul><li id="foobar">asdf</li></ul>`) | |||
* organize methods better (i.e. flushlinebreaks & setlinebreaks close to each other) | |||
* take a look at function names etc. | |||
* is the new (in rev. 93) lastclosedtag property needed? | |||
* word wrapping (some work is done but it's still very buggy) | |||
Markdownify Extra | |||
================= | |||
* handle table alignment with KEEP_HTML=false | |||
* handle tables without headings when KEEP_HTML=false is set | |||
* handle Markdown inside non-markdownable tags | |||
Implementation Thoughts | |||
======================= | |||
* non-markdownifiable lists and markdown inside non-markdownable tags as well as the current | |||
table implementation could be rewritten by using a rollback mechanism. | |||
example: | |||
<ul><li>asdf</li><li id="foobar">asdf</li></ul> | |||
we come to `<ul>`, know that this might fail and create a snapshot of our current parser | |||
we keep on parsing and when we reach `<li id="foobar">` we gotta rollback and keep this | |||
list in HTML format. |
@ -0,0 +1,51 @@ | |||
<?php | |||
error_reporting(E_ALL); | |||
if (!empty($_POST['input'])) { | |||
include 'markdownify_extra.php'; | |||
if (!isset($_POST['leap'])) { | |||
$leap = MDFY_LINKS_EACH_PARAGRAPH; | |||
} else { | |||
$leap = $_POST['leap']; | |||
} | |||
if (!isset($_POST['keepHTML'])) { | |||
$keephtml = MDFY_KEEPHTML; | |||
} else { | |||
$keephtml = $_POST['keepHTML']; | |||
} | |||
if (!empty($_POST['extra'])) { | |||
$md = new Markdownify_Extra($leap, MDFY_BODYWIDTH, $keephtml); | |||
} else { | |||
$md = new Markdownify($leap, MDFY_BODYWIDTH, $keephtml); | |||
} | |||
if (ini_get('magic_quotes_gpc')) { | |||
$_POST['input'] = stripslashes($_POST['input']); | |||
} | |||
$output = $md->parseString($_POST['input']); | |||
} else { | |||
$_POST['input'] = ''; | |||
} | |||
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> | |||
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US"> | |||
<head> | |||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> | |||
<title>HTML to Markdown Converter</title> | |||
</head> | |||
<body> | |||
<?php if (empty($_POST['input'])): ?> | |||
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post"> | |||
<fieldset> | |||
<legend>HTML Input</legend> | |||
<textarea style="width:100%;" cols="85" rows="40" name="input"><?php echo htmlspecialchars($_POST['input'], ENT_NOQUOTES, 'UTF-8'); ?></textarea> | |||
</fieldset> | |||
<label for="extra">Markdownify Extra: <input name="extra" checked="checked" id="extra" type="checkbox" value="1" /></label> | |||
<label for="leap">Links after each block elem: <input name="leap" id="leap" type="checkbox" value="1" /></label> | |||
<label for="keepHTML">keep HTML: <input name="keepHTML" id="keepHTML" type="checkbox" value="1" checked="checked" /></label> | |||
<input type="submit" name="submit" value="submit" /> | |||
</form> | |||
<?php else: ?> | |||
<h1 style="text-align:right;"><a href="<?php echo $_SERVER['PHP_SELF']; ?>">BACK</a></h1> | |||
<pre><?php echo htmlspecialchars($output, ENT_NOQUOTES, 'UTF-8'); ?></pre> | |||
<?php endif; ?> | |||
</body> | |||
</html> |
@ -0,0 +1,33 @@ | |||
#!/usr/bin/php | |||
<?php | |||
require dirname(__FILE__) .'/markdownify_extra.php'; | |||
function param($name, $default = false) { | |||
if (!in_array('--'.$name, $_SERVER['argv'])) | |||
return $default; | |||
reset($_SERVER['argv']); | |||
while (each($_SERVER['argv'])) { | |||
if (current($_SERVER['argv']) == '--'.$name) | |||
break; | |||
} | |||
$value = next($_SERVER['argv']); | |||
if ($value === false || substr($value, 0, 2) == '--') | |||
return true; | |||
else | |||
return $value; | |||
} | |||
$input = stream_get_contents(STDIN); | |||
$linksAfterEachParagraph = param('links'); | |||
$bodyWidth = param('width'); | |||
$keepHTML = param('html', true); | |||
if (param('no_extra')) { | |||
$parser = new Markdownify($linksAfterEachParagraph, $bodyWidth, $keepHTML); | |||
} else { | |||
$parser = new Markdownify_Extra($linksAfterEachParagraph, $bodyWidth, $keepHTML); | |||
} | |||
echo $parser->parseString($input) ."\n"; |
@ -0,0 +1,489 @@ | |||
<?php | |||
/** | |||
* Class to convert HTML to Markdown with PHP Markdown Extra syntax support. | |||
* | |||
* @version 1.0.0 alpha | |||
* @author Milian Wolff (<mail@milianw.de>, <http://milianw.de>) | |||
* @license LGPL, see LICENSE_LGPL.txt and the summary below | |||
* @copyright (C) 2007 Milian Wolff | |||
* | |||
* This library is free software; you can redistribute it and/or | |||
* modify it under the terms of the GNU Lesser General Public | |||
* License as published by the Free Software Foundation; either | |||
* version 2.1 of the License, or (at your option) any later version. | |||
* | |||
* This library is distributed in the hope that it will be useful, | |||
* but WITHOUT ANY WARRANTY; without even the implied warranty of | |||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | |||
* Lesser General Public License for more details. | |||
* | |||
* You should have received a copy of the GNU Lesser General Public | |||
* License along with this library; if not, write to the Free Software | |||
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA | |||
*/ | |||
/** | |||
* standard Markdownify class | |||
*/ | |||
require_once dirname(__FILE__).'/markdownify.php'; | |||
class Markdownify_Extra extends Markdownify { | |||
/** | |||
* table data, including rows with content and the maximum width of each col | |||
* | |||
* @var array | |||
*/ | |||
var $table = array(); | |||
/** | |||
* current col | |||
* | |||
* @var int | |||
*/ | |||
var $col = -1; | |||
/** | |||
* current row | |||
* | |||
* @var int | |||
*/ | |||
var $row = 0; | |||
/** | |||
* constructor, see Markdownify::Markdownify() for more information | |||
*/ | |||
function Markdownify_Extra($linksAfterEachParagraph = MDFY_LINKS_EACH_PARAGRAPH, $bodyWidth = MDFY_BODYWIDTH, $keepHTML = MDFY_KEEPHTML) { | |||
parent::Markdownify($linksAfterEachParagraph, $bodyWidth, $keepHTML); | |||
### new markdownable tags & attributes | |||
# header ids: # foo {bar} | |||
$this->isMarkdownable['h1']['id'] = 'optional'; | |||
$this->isMarkdownable['h2']['id'] = 'optional'; | |||
$this->isMarkdownable['h3']['id'] = 'optional'; | |||
$this->isMarkdownable['h4']['id'] = 'optional'; | |||
$this->isMarkdownable['h5']['id'] = 'optional'; | |||
$this->isMarkdownable['h6']['id'] = 'optional'; | |||
# tables | |||
$this->isMarkdownable['table'] = array(); | |||
$this->isMarkdownable['th'] = array( | |||
'align' => 'optional', | |||
); | |||
$this->isMarkdownable['td'] = array( | |||
'align' => 'optional', | |||
); | |||
$this->isMarkdownable['tr'] = array(); | |||
array_push($this->ignore, 'thead'); | |||
array_push($this->ignore, 'tbody'); | |||
array_push($this->ignore, 'tfoot'); | |||
# definition lists | |||
$this->isMarkdownable['dl'] = array(); | |||
$this->isMarkdownable['dd'] = array(); | |||
$this->isMarkdownable['dt'] = array(); | |||
# footnotes | |||
$this->isMarkdownable['fnref'] = array( | |||
'target' => 'required', | |||
); | |||
$this->isMarkdownable['footnotes'] = array(); | |||
$this->isMarkdownable['fn'] = array( | |||
'name' => 'required', | |||
); | |||
$this->parser->blockElements['fnref'] = false; | |||
$this->parser->blockElements['fn'] = true; | |||
$this->parser->blockElements['footnotes'] = true; | |||
# abbr | |||
$this->isMarkdownable['abbr'] = array( | |||
'title' => 'required', | |||
); | |||
# build RegEx lookahead to decide wether table can pe parsed or not | |||
$inlineTags = array_keys($this->parser->blockElements, false); | |||
$colContents = '(?:[^<]|<(?:'.implode('|', $inlineTags).'|[^a-z]))+'; | |||
$this->tableLookaheadHeader = '{ | |||
^\s*(?:<thead\s*>)?\s* # open optional thead | |||
<tr\s*>\s*(?: # start required row with headers | |||
<th(?:\s+align=("|\')(?:left|center|right)\1)?\s*> # header with optional align | |||
\s*'.$colContents.'\s* # contents | |||
</th>\s* # close header | |||
)+</tr> # close row with headers | |||
\s*(?:</thead>)? # close optional thead | |||
}sxi'; | |||
$this->tdSubstitute = '\s*'.$colContents.'\s* # contents | |||
</td>\s*'; | |||
$this->tableLookaheadBody = '{ | |||
\s*(?:<tbody\s*>)?\s* # open optional tbody | |||
(?:<tr\s*>\s* # start row | |||
%s # cols to be substituted | |||
</tr>)+ # close row | |||
\s*(?:</tbody>)? # close optional tbody | |||
\s*</table> # close table | |||
}sxi'; | |||
} | |||
/** | |||
* handle header tags (<h1> - <h6>) | |||
* | |||
* @param int $level 1-6 | |||
* @return void | |||
*/ | |||
function handleHeader($level) { | |||
static $id = null; | |||
if ($this->parser->isStartTag) { | |||
if (isset($this->parser->tagAttributes['id'])) { | |||
$id = $this->parser->tagAttributes['id']; | |||
} | |||
} else { | |||
if (!is_null($id)) { | |||
$this->out(' {#'.$id.'}'); | |||
$id = null; | |||
} | |||
} | |||
parent::handleHeader($level); | |||
} | |||
/** | |||
* handle <abbr> tags | |||
* | |||
* @param void | |||
* @return void | |||
*/ | |||
function handleTag_abbr() { | |||
if ($this->parser->isStartTag) { | |||
$this->stack(); | |||
$this->buffer(); | |||
} else { | |||
$tag = $this->unstack(); | |||
$tag['text'] = $this->unbuffer(); | |||
$add = true; | |||
foreach ($this->stack['abbr'] as $stacked) { | |||
if ($stacked['text'] == $tag['text']) { | |||
/** TODO: differing abbr definitions, i.e. different titles for same text **/ | |||
$add = false; | |||
break; | |||
} | |||
} | |||
$this->out($tag['text']); | |||
if ($add) { | |||
array_push($this->stack['abbr'], $tag); | |||
} | |||
} | |||
} | |||
/** | |||
* flush stacked abbr tags | |||
* | |||
* @param void | |||
* @return void | |||
*/ | |||
function flushStacked_abbr() { | |||
$out = array(); | |||
foreach ($this->stack['abbr'] as $k => $tag) { | |||
if (!isset($tag['unstacked'])) { | |||
array_push($out, ' *['.$tag['text'].']: '.$tag['title']); | |||
$tag['unstacked'] = true; | |||
$this->stack['abbr'][$k] = $tag; | |||
} | |||
} | |||
if (!empty($out)) { | |||
$this->out("\n\n".implode("\n", $out)); | |||
} | |||
} | |||
/** | |||
* handle <table> tags | |||
* | |||
* @param void | |||
* @return void | |||
*/ | |||
function handleTag_table() { | |||
if ($this->parser->isStartTag) { | |||
# check if upcoming table can be converted | |||
if ($this->keepHTML) { | |||
if (preg_match($this->tableLookaheadHeader, $this->parser->html, $matches)) { | |||
# header seems good, now check body | |||
# get align & number of cols | |||
preg_match_all('#<th(?:\s+align=("|\')(left|right|center)\1)?\s*>#si', $matches[0], $cols); | |||
$regEx = ''; | |||
$i = 1; | |||
$aligns = array(); | |||
foreach ($cols[2] as $align) { | |||
$align = strtolower($align); | |||
array_push($aligns, $align); | |||
if (empty($align)) { | |||
$align = 'left'; # default value | |||
} | |||
$td = '\s+align=("|\')'.$align.'\\'.$i; | |||
$i++; | |||
if ($align == 'left') { | |||
# look for empty align or left | |||
$td = '(?:'.$td.')?'; | |||
} | |||
$td = '<td'.$td.'\s*>'; | |||
$regEx .= $td.$this->tdSubstitute; | |||
} | |||
$regEx = sprintf($this->tableLookaheadBody, $regEx); | |||
if (preg_match($regEx, $this->parser->html, $matches, null, strlen($matches[0]))) { | |||
# this is a markdownable table tag! | |||
$this->table = array( | |||
'rows' => array(), | |||
'col_widths' => array(), | |||
'aligns' => $aligns, | |||
); | |||
$this->row = 0; | |||
} else { | |||
# non markdownable table | |||
$this->handleTagToText(); | |||
} | |||
} else { | |||
# non markdownable table | |||
$this->handleTagToText(); | |||
} | |||
} else { | |||
$this->table = array( | |||
'rows' => array(), | |||
'col_widths' => array(), | |||
'aligns' => array(), | |||
); | |||
$this->row = 0; | |||
} | |||
} else { | |||
# finally build the table in Markdown Extra syntax | |||
$separator = array(); | |||
# seperator with correct align identifikators | |||
foreach($this->table['aligns'] as $col => $align) { | |||
if (!$this->keepHTML && !isset($this->table['col_widths'][$col])) { | |||
break; | |||
} | |||
$left = ' '; | |||
$right = ' '; | |||
switch ($align) { | |||
case 'left': | |||
$left = ':'; | |||
break; | |||
case 'center': | |||
$right = ':'; | |||
$left = ':'; | |||
case 'right': | |||
$right = ':'; | |||
break; | |||
} | |||
array_push($separator, $left.str_repeat('-', $this->table['col_widths'][$col]).$right); | |||
} | |||
$separator = '|'.implode('|', $separator).'|'; | |||
$rows = array(); | |||
# add padding | |||
array_walk_recursive($this->table['rows'], array(&$this, 'alignTdContent')); | |||
$header = array_shift($this->table['rows']); | |||
array_push($rows, '| '.implode(' | ', $header).' |'); | |||
array_push($rows, $separator); | |||
foreach ($this->table['rows'] as $row) { | |||
array_push($rows, '| '.implode(' | ', $row).' |'); | |||
} | |||
$this->out(implode("\n".$this->indent, $rows)); | |||
$this->table = array(); | |||
$this->setLineBreaks(2); | |||
} | |||
} | |||
/** | |||
* properly pad content so it is aligned as whished | |||
* should be used with array_walk_recursive on $this->table['rows'] | |||
* | |||
* @param string &$content | |||
* @param int $col | |||
* @return void | |||
*/ | |||
function alignTdContent(&$content, $col) { | |||
switch ($this->table['aligns'][$col]) { | |||
default: | |||
case 'left': | |||
$content .= str_repeat(' ', $this->table['col_widths'][$col] - $this->strlen($content)); | |||
break; | |||
case 'right': | |||
$content = str_repeat(' ', $this->table['col_widths'][$col] - $this->strlen($content)).$content; | |||
break; | |||
case 'center': | |||
$paddingNeeded = $this->table['col_widths'][$col] - $this->strlen($content); | |||
$left = floor($paddingNeeded / 2); | |||
$right = $paddingNeeded - $left; | |||
$content = str_repeat(' ', $left).$content.str_repeat(' ', $right); | |||
break; | |||
} | |||
} | |||
/** | |||
* handle <tr> tags | |||
* | |||
* @param void | |||
* @return void | |||
*/ | |||
function handleTag_tr() { | |||
if ($this->parser->isStartTag) { | |||
$this->col = -1; | |||
} else { | |||
$this->row++; | |||
} | |||
} | |||
/** | |||
* handle <td> tags | |||
* | |||
* @param void | |||
* @return void | |||
*/ | |||
function handleTag_td() { | |||
if ($this->parser->isStartTag) { | |||
$this->col++; | |||
if (!isset($this->table['col_widths'][$this->col])) { | |||
$this->table['col_widths'][$this->col] = 0; | |||
} | |||
$this->buffer(); | |||
} else { | |||
$buffer = trim($this->unbuffer()); | |||
$this->table['col_widths'][$this->col] = max($this->table['col_widths'][$this->col], $this->strlen($buffer)); | |||
$this->table['rows'][$this->row][$this->col] = $buffer; | |||
} | |||
} | |||
/** | |||
* handle <th> tags | |||
* | |||
* @param void | |||
* @return void | |||
*/ | |||
function handleTag_th() { | |||
if (!$this->keepHTML && !isset($this->table['rows'][1]) && !isset($this->table['aligns'][$this->col+1])) { | |||
if (isset($this->parser->tagAttributes['align'])) { | |||
$this->table['aligns'][$this->col+1] = $this->parser->tagAttributes['align']; | |||
} else { | |||
$this->table['aligns'][$this->col+1] = ''; | |||
} | |||
} | |||
$this->handleTag_td(); | |||
} | |||
/** | |||
* handle <dl> tags | |||
* | |||
* @param void | |||
* @return void | |||
*/ | |||
function handleTag_dl() { | |||
if (!$this->parser->isStartTag) { | |||
$this->setLineBreaks(2); | |||
} | |||
} | |||
/** | |||
* handle <dt> tags | |||
* | |||
* @param void | |||
* @return void | |||
**/ | |||
function handleTag_dt() { | |||
if (!$this->parser->isStartTag) { | |||
$this->setLineBreaks(1); | |||
} | |||
} | |||
/** | |||
* handle <dd> tags | |||
* | |||
* @param void | |||
* @return void | |||
*/ | |||
function handleTag_dd() { | |||
if ($this->parser->isStartTag) { | |||
if (substr(ltrim($this->parser->html), 0, 3) == '<p>') { | |||
# next comes a paragraph, so we'll need an extra line | |||
$this->out("\n".$this->indent); | |||
} elseif (substr($this->output, -2) == "\n\n") { | |||
$this->output = substr($this->output, 0, -1); | |||
} | |||
$this->out(': '); | |||
$this->indent(' ', false); | |||
} else { | |||
# lookahead for next dt | |||
if (substr(ltrim($this->parser->html), 0, 4) == '<dt>') { | |||
$this->setLineBreaks(2); | |||
} else { | |||
$this->setLineBreaks(1); | |||
} | |||
$this->indent(' '); | |||
} | |||
} | |||
/** | |||
* handle <fnref /> tags (custom footnote references, see markdownify_extra::parseString()) | |||
* | |||
* @param void | |||
* @return void | |||
*/ | |||
function handleTag_fnref() { | |||
$this->out('[^'.$this->parser->tagAttributes['target'].']'); | |||
} | |||
/** | |||
* handle <fn> tags (custom footnotes, see markdownify_extra::parseString() | |||
* and markdownify_extra::_makeFootnotes()) | |||
* | |||
* @param void | |||
* @return void | |||
*/ | |||
function handleTag_fn() { | |||
if ($this->parser->isStartTag) { | |||
$this->out('[^'.$this->parser->tagAttributes['name'].']:'); | |||
$this->setLineBreaks(1); | |||
} else { | |||
$this->setLineBreaks(2); | |||
} | |||
$this->indent(' '); | |||
} | |||
/** | |||
* handle <footnotes> tag (custom footnotes, see markdownify_extra::parseString() | |||
* and markdownify_extra::_makeFootnotes()) | |||
* | |||
* @param void | |||
* @return void | |||
*/ | |||
function handleTag_footnotes() { | |||
if (!$this->parser->isStartTag) { | |||
$this->setLineBreaks(2); | |||
} | |||
} | |||
/** | |||
* parse a HTML string, clean up footnotes prior | |||
* | |||
* @param string $HTML input | |||
* @return string Markdown formatted output | |||
*/ | |||
function parseString($html) { | |||
/** TODO: custom markdown-extra options, e.g. titles & classes **/ | |||
# <sup id="fnref:..."><a href"#fn..." rel="footnote">...</a></sup> | |||
# => <fnref target="..." /> | |||
$html = preg_replace('@<sup id="fnref:([^"]+)">\s*<a href="#fn:\1" rel="footnote">\s*\d+\s*</a>\s*</sup>@Us', '<fnref target="$1" />', $html); | |||
# <div class="footnotes"> | |||
# <hr /> | |||
# <ol> | |||
# | |||
# <li id="fn:...">...</li> | |||
# ... | |||
# | |||
# </ol> | |||
# </div> | |||
# => | |||
# <footnotes> | |||
# <fn name="...">...</fn> | |||
# ... | |||
# </footnotes> | |||
$html = preg_replace_callback('#<div class="footnotes">\s*<hr />\s*<ol>\s*(.+)\s*</ol>\s*</div>#Us', array(&$this, '_makeFootnotes'), $html); | |||
return parent::parseString($html); | |||
} | |||
/** | |||
* replace HTML representation of footnotes with something more easily parsable | |||
* | |||
* @note this is a callback to be used in parseString() | |||
* | |||
* @param array $matches | |||
* @return string | |||
*/ | |||
function _makeFootnotes($matches) { | |||
# <li id="fn:1"> | |||
# ... | |||
# <a href="#fnref:block" rev="footnote">↩</a></p> | |||
# </li> | |||
# => <fn name="1">...</fn> | |||
# remove footnote link | |||
$fns = preg_replace('@\s*( \s*)?<a href="#fnref:[^"]+" rev="footnote"[^>]*>↩</a>\s*@s', '', $matches[1]); | |||
# remove empty paragraph | |||
$fns = preg_replace('@<p>\s*</p>@s', '', $fns); | |||
# <li id="fn:1">...</li> -> <footnote nr="1">...</footnote> | |||
$fns = str_replace('<li id="fn:', '<fn name="', $fns); | |||
$fns = '<footnotes>'.$fns.'</footnotes>'; | |||
return preg_replace('#</li>\s*(?=(?:<fn|</footnotes>))#s', '</fn>$1', $fns); | |||
} | |||
} |
@ -0,0 +1,618 @@ | |||
<?php | |||
/** | |||
* parseHTML is a HTML parser which works with PHP 4 and above. | |||
* It tries to handle invalid HTML to some degree. | |||
* | |||
* @version 1.0 beta | |||
* @author Milian Wolff (mail@milianw.de, http://milianw.de) | |||
* @license LGPL, see LICENSE_LGPL.txt and the summary below | |||
* @copyright (C) 2007 Milian Wolff | |||
* | |||
* This library is free software; you can redistribute it and/or | |||
* modify it under the terms of the GNU Lesser General Public | |||
* License as published by the Free Software Foundation; either | |||
* version 2.1 of the License, or (at your option) any later version. | |||
* | |||
* This library is distributed in the hope that it will be useful, | |||
* but WITHOUT ANY WARRANTY; without even the implied warranty of | |||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | |||
* Lesser General Public License for more details. | |||
* | |||
* You should have received a copy of the GNU Lesser General Public | |||
* License along with this library; if not, write to the Free Software | |||
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA | |||
*/ | |||
class parseHTML { | |||
/** | |||
* tags which are always empty (<br /> etc.) | |||
* | |||
* @var array<string> | |||
*/ | |||
var $emptyTags = array( | |||
'br', | |||
'hr', | |||
'input', | |||
'img', | |||
'area', | |||
'link', | |||
'meta', | |||
'param', | |||
); | |||
/** | |||
* tags with preformatted text | |||
* whitespaces wont be touched in them | |||
* | |||
* @var array<string> | |||
*/ | |||
var $preformattedTags = array( | |||
'script', | |||
'style', | |||
'pre', | |||
'code', | |||
); | |||
/** | |||
* supress HTML tags inside preformatted tags (see above) | |||
* | |||
* @var bool | |||
*/ | |||
var $noTagsInCode = false; | |||
/** | |||
* html to be parsed | |||
* | |||
* @var string | |||
*/ | |||
var $html = ''; | |||
/** | |||
* node type: | |||
* | |||
* - tag (see isStartTag) | |||
* - text (includes cdata) | |||
* - comment | |||
* - doctype | |||
* - pi (processing instruction) | |||
* | |||
* @var string | |||
*/ | |||
var $nodeType = ''; | |||
/** | |||
* current node content, i.e. either a | |||
* simple string (text node), or something like | |||
* <tag attrib="value"...> | |||
* | |||
* @var string | |||
*/ | |||
var $node = ''; | |||
/** | |||
* wether current node is an opening tag (<a>) or not (</a>) | |||
* set to NULL if current node is not a tag | |||
* NOTE: empty tags (<br />) set this to true as well! | |||
* | |||
* @var bool | null | |||
*/ | |||
var $isStartTag = null; | |||
/** | |||
* wether current node is an empty tag (<br />) or not (<a></a>) | |||
* | |||
* @var bool | null | |||
*/ | |||
var $isEmptyTag = null; | |||
/** | |||
* tag name | |||
* | |||
* @var string | null | |||
*/ | |||
var $tagName = ''; | |||
/** | |||
* attributes of current tag | |||
* | |||
* @var array (attribName=>value) | null | |||
*/ | |||
var $tagAttributes = null; | |||
/** | |||
* wether the current tag is a block element | |||
* | |||
* @var bool | null | |||
*/ | |||
var $isBlockElement = null; | |||
/** | |||
* keep whitespace | |||
* | |||
* @var int | |||
*/ | |||
var $keepWhitespace = 0; | |||
/** | |||
* list of open tags | |||
* count this to get current depth | |||
* | |||
* @var array | |||
*/ | |||
var $openTags = array(); | |||
/** | |||
* list of block elements | |||
* | |||
* @var array | |||
* TODO: what shall we do with <del> and <ins> ?! | |||
*/ | |||
var $blockElements = array ( | |||
# tag name => <bool> is block | |||
# block elements | |||
'address' => true, | |||
'blockquote' => true, | |||
'center' => true, | |||
'del' => true, | |||
'dir' => true, | |||
'div' => true, | |||
'dl' => true, | |||
'fieldset' => true, | |||
'form' => true, | |||
'h1' => true, | |||
'h2' => true, | |||
'h3' => true, | |||
'h4' => true, | |||
'h5' => true, | |||
'h6' => true, | |||
'hr' => true, | |||
'ins' => true, | |||
'isindex' => true, | |||
'menu' => true, | |||
'noframes' => true, | |||
'noscript' => true, | |||
'ol' => true, | |||
'p' => true, | |||
'pre' => true, | |||
'table' => true, | |||
'ul' => true, | |||
# set table elements and list items to block as well | |||
'thead' => true, | |||
'tbody' => true, | |||
'tfoot' => true, | |||
'td' => true, | |||
'tr' => true, | |||
'th' => true, | |||
'li' => true, | |||
'dd' => true, | |||
'dt' => true, | |||
# header items and html / body as well | |||
'html' => true, | |||
'body' => true, | |||
'head' => true, | |||
'meta' => true, | |||
'link' => true, | |||
'style' => true, | |||
'title' => true, | |||
# unfancy media tags, when indented should be rendered as block | |||
'map' => true, | |||
'object' => true, | |||
'param' => true, | |||
'embed' => true, | |||
'area' => true, | |||
# inline elements | |||
'a' => false, | |||
'abbr' => false, | |||
'acronym' => false, | |||
'applet' => false, | |||
'b' => false, | |||
'basefont' => false, | |||
'bdo' => false, | |||
'big' => false, | |||
'br' => false, | |||
'button' => false, | |||
'cite' => false, | |||
'code' => false, | |||
'del' => false, | |||
'dfn' => false, | |||
'em' => false, | |||
'font' => false, | |||
'i' => false, | |||
'img' => false, | |||
'ins' => false, | |||
'input' => false, | |||
'iframe' => false, | |||
'kbd' => false, | |||
'label' => false, | |||
'q' => false, | |||
'samp' => false, | |||
'script' => false, | |||
'select' => false, | |||
'small' => false, | |||
'span' => false, | |||
'strong' => false, | |||
'sub' => false, | |||
'sup' => false, | |||
'textarea' => false, | |||
'tt' => false, | |||
'var' => false, | |||
); | |||
/** | |||
* get next node, set $this->html prior! | |||
* | |||
* @param void | |||
* @return bool | |||
*/ | |||
function nextNode() { | |||
if (empty($this->html)) { | |||
# we are done with parsing the html string | |||
return false; | |||
} | |||
static $skipWhitespace = true; | |||
if ($this->isStartTag && !$this->isEmptyTag) { | |||
array_push($this->openTags, $this->tagName); | |||
if (in_array($this->tagName, $this->preformattedTags)) { | |||
# dont truncate whitespaces for <code> or <pre> contents | |||
$this->keepWhitespace++; | |||
} | |||
} | |||
if ($this->html[0] == '<') { | |||
$token = substr($this->html, 0, 9); | |||
if (substr($token, 0, 2) == '<?') { | |||
# xml prolog or other pi's | |||
/** TODO **/ | |||
#trigger_error('this might need some work', E_USER_NOTICE); | |||
$pos = strpos($this->html, '>'); | |||
$this->setNode('pi', $pos + 1); | |||
return true; | |||
} | |||
if (substr($token, 0, 4) == '<!--') { | |||
# comment | |||
$pos = strpos($this->html, '-->'); | |||
if ($pos === false) { | |||
# could not find a closing -->, use next gt instead | |||
# this is firefox' behaviour | |||
$pos = strpos($this->html, '>') + 1; | |||
} else { | |||
$pos += 3; | |||
} | |||
$this->setNode('comment', $pos); | |||
$skipWhitespace = true; | |||
return true; | |||
} | |||
if ($token == '<!DOCTYPE') { | |||
# doctype | |||
$this->setNode('doctype', strpos($this->html, '>')+1); | |||
$skipWhitespace = true; | |||
return true; | |||
} | |||
if ($token == '<![CDATA[') { | |||
# cdata, use text node | |||
# remove leading <![CDATA[ | |||
$this->html = substr($this->html, 9); | |||
$this->setNode('text', strpos($this->html, ']]>')+3); | |||
# remove trailing ]]> and trim | |||
$this->node = substr($this->node, 0, -3); | |||
$this->handleWhitespaces(); | |||
$skipWhitespace = true; | |||
return true; | |||
} | |||
if ($this->parseTag()) { | |||
# seems to be a tag | |||
# handle whitespaces | |||
if ($this->isBlockElement) { | |||
$skipWhitespace = true; | |||
} else { | |||
$skipWhitespace = false; | |||
} | |||
return true; | |||
} | |||
} | |||
if ($this->keepWhitespace) { | |||
$skipWhitespace = false; | |||
} | |||
# when we get here it seems to be a text node | |||
$pos = strpos($this->html, '<'); | |||
if ($pos === false) { | |||
$pos = strlen($this->html); | |||
} | |||
$this->setNode('text', $pos); | |||
$this->handleWhitespaces(); | |||
if ($skipWhitespace && $this->node == ' ') { | |||
return $this->nextNode(); | |||
} | |||
$skipWhitespace = false; | |||
return true; | |||
} | |||
/** | |||
* parse tag, set tag name and attributes, see if it's a closing tag and so forth... | |||
* | |||
* @param void | |||
* @return bool | |||
*/ | |||
function parseTag() { | |||
static $a_ord, $z_ord, $special_ords; | |||
if (!isset($a_ord)) { | |||
$a_ord = ord('a'); | |||
$z_ord = ord('z'); | |||
$special_ords = array( | |||
ord(':'), // for xml:lang | |||
ord('-'), // for http-equiv | |||
); | |||
} | |||
$tagName = ''; | |||
$pos = 1; | |||
$isStartTag = $this->html[$pos] != '/'; | |||
if (!$isStartTag) { | |||
$pos++; | |||
} | |||
# get tagName | |||
while (isset($this->html[$pos])) { | |||
$pos_ord = ord(strtolower($this->html[$pos])); | |||
if (($pos_ord >= $a_ord && $pos_ord <= $z_ord) || (!empty($tagName) && is_numeric($this->html[$pos]))) { | |||
$tagName .= $this->html[$pos]; | |||
$pos++; | |||
} else { | |||
$pos--; | |||
break; | |||
} | |||
} | |||
$tagName = strtolower($tagName); | |||
if (empty($tagName) || !isset($this->blockElements[$tagName])) { | |||
# something went wrong => invalid tag | |||
$this->invalidTag(); | |||
return false; | |||
} | |||
if ($this->noTagsInCode && end($this->openTags) == 'code' && !($tagName == 'code' && !$isStartTag)) { | |||
# we supress all HTML tags inside code tags | |||
$this->invalidTag(); | |||
return false; | |||
} | |||
# get tag attributes | |||
/** TODO: in html 4 attributes do not need to be quoted **/ | |||
$isEmptyTag = false; | |||
$attributes = array(); | |||
$currAttrib = ''; | |||
while (isset($this->html[$pos+1])) { | |||
$pos++; | |||
# close tag | |||
if ($this->html[$pos] == '>' || $this->html[$pos].$this->html[$pos+1] == '/>') { | |||
if ($this->html[$pos] == '/') { | |||
$isEmptyTag = true; | |||
$pos++; | |||
} | |||
break; | |||
} | |||
$pos_ord = ord(strtolower($this->html[$pos])); | |||
if ( ($pos_ord >= $a_ord && $pos_ord <= $z_ord) || in_array($pos_ord, $special_ords)) { | |||
# attribute name | |||
$currAttrib .= $this->html[$pos]; | |||
} elseif (in_array($this->html[$pos], array(' ', "\t", "\n"))) { | |||
# drop whitespace | |||
} elseif (in_array($this->html[$pos].$this->html[$pos+1], array('="', "='"))) { | |||
# get attribute value | |||
$pos++; | |||
$await = $this->html[$pos]; # single or double quote | |||
$pos++; | |||
$value = ''; | |||
while (isset($this->html[$pos]) && $this->html[$pos] != $await) { | |||
$value .= $this->html[$pos]; | |||
$pos++; | |||
} | |||
$attributes[$currAttrib] = $value; | |||
$currAttrib = ''; | |||
} else { | |||
$this->invalidTag(); | |||
return false; | |||
} | |||
} | |||
if ($this->html[$pos] != '>') { | |||
$this->invalidTag(); | |||
return false; | |||
} | |||
if (!empty($currAttrib)) { | |||
# html 4 allows something like <option selected> instead of <option selected="selected"> | |||
$attributes[$currAttrib] = $currAttrib; | |||
} | |||
if (!$isStartTag) { | |||
if (!empty($attributes) || $tagName != end($this->openTags)) { | |||
# end tags must not contain any attributes | |||
# or maybe we did not expect a different tag to be closed | |||
$this->invalidTag(); | |||
return false; | |||
} | |||
array_pop($this->openTags); | |||
if (in_array($tagName, $this->preformattedTags)) { | |||
$this->keepWhitespace--; | |||
} | |||
} | |||
$pos++; | |||
$this->node = substr($this->html, 0, $pos); | |||
$this->html = substr($this->html, $pos); | |||
$this->tagName = $tagName; | |||
$this->tagAttributes = $attributes; | |||
$this->isStartTag = $isStartTag; | |||
$this->isEmptyTag = $isEmptyTag || in_array($tagName, $this->emptyTags); | |||
if ($this->isEmptyTag) { | |||
# might be not well formed | |||
$this->node = preg_replace('# */? *>$#', ' />', $this->node); | |||
} | |||
$this->nodeType = 'tag'; | |||
$this->isBlockElement = $this->blockElements[$tagName]; | |||
return true; | |||
} | |||
/** | |||
* handle invalid tags | |||
* | |||
* @param void | |||
* @return void | |||
*/ | |||
function invalidTag() { | |||
$this->html = substr_replace($this->html, '<', 0, 1); | |||
} | |||
/** | |||
* update all vars and make $this->html shorter | |||
* | |||
* @param string $type see description for $this->nodeType | |||
* @param int $pos to which position shall we cut? | |||
* @return void | |||
*/ | |||
function setNode($type, $pos) { | |||
if ($this->nodeType == 'tag') { | |||
# set tag specific vars to null | |||
# $type == tag should not be called here | |||
# see this::parseTag() for more | |||
$this->tagName = null; | |||