1
0
mirror of synced 2026-02-27 09:28:48 +00:00

add cl-benchmarks

benchmarks probably belong under internal/benchmarks
This commit is contained in:
Larry Masinter
2020-09-16 23:17:10 -07:00
parent b47b2290c7
commit 02ed8d4bf4
80 changed files with 28618 additions and 0 deletions

33
cl-bench/Makefile Normal file
View File

@@ -0,0 +1,33 @@
# To modify optimization settings, you can change the $(OPTIMIZE)
# variable below. This is prepended to the lisp source files before
# they are compiled.
#
# Requires GNU Make or equivalent.
SHELL = /bin/sh
OPTIMIZE = "(declaim (optimize (speed 3) (space 1) (safety 0) (debug 0) (compilation-speed 0)))"
FILES := $(patsubst %.lisp,%.olisp, $(wildcard files/*.lisp))
vpath %.lisp files
%.olisp: %.lisp
echo $(OPTIMIZE) > $@
cat $< >> $@
optimize-files: $(FILES)
clean-results:
-rm -f /var/tmp/CL-bench*
clean:
find . \( -name '*.abcl' -o -name '*.cls' -o -name '*.sparcf' -o -name "*.ppcf" -o -name '*.x86f' -o -name '*.lbytef' -o -name "*.err" -o -name '*.fas' -o -name '*.fasl' -o -name "*.faslmt" -o -name '*.lib' -o -name '*.o' -o -name '*.so' -o -name "*.pfsl" -o -name "*.ufsl" -o -name "*.dfsl" -o -name "*.olisp" -o -name "*.dfsl" -o -name "*.fsl" -o -name "*.nfasl" \) -print | xargs rm -f
distclean: clean clean-results
.PHONY: clean clean-results distclean optimize-files
# EOF

42
cl-bench/NEWS Normal file
View File

@@ -0,0 +1,42 @@
=== New in 2004-08-01 ===
Fixes to WALK-LIST/SEQ and WALK-LIST/MESS that were previously
non-deterministic, due to the use of RANDOM in creating the data
structures that they work on.
=== New in 2004-01-03 ===
New tests: SUM-PERMUTATIONS, WALK-LIST/SEQ, WALK-LIST/MESS, COMPILER,
LOAD-FASL. SUM-PERMUTATIONS is inspired by Scheme benchmarks by
Will Clinger. The WALK-LIST/ tests are inspired by Richard Fateman.
A setup file for MCL, contributed by James Anderson.
New ascii-graphical report generator contributed by Johannes
Grödem (file graph-report.lisp).
New PDF report generator using the cl-pdf library (file
pdf-report.lisp)
Improvements to the CMUCL performance-counter code: better
reporting of data. Look for a new version of cpc.lisp that supports
UltraSPARC-III processors.
=== New in 2003-12-05 ===
Split the FPRINT test (speed of printer when writing to a file)
into FPRINT/PRETTY and FPRINT/UGLY. The former does pretty-printing
using PPRINT (and binds other pretty-printer-related variables to
T), whereas the latter disables the pretty-printer. Suggested by
Christopher Rhodes.
=== New in 2003-10-06 ===
Support for Allegro CL, thanks to Kevin Layer. See the file
run-acl.sh and sysdep/setup-acl.lisp.
Fiddle with CPU Performance Counters library on UltraSPARC processors
with CMUCL; see play/cmucl-performance-counters.lisp

353
cl-bench/README Normal file
View File

@@ -0,0 +1,353 @@
Common Lisp benchmarking suite
==============================
This package contains Lisp code intended for performance benchmarking
of different Common Lisp implementations. The tests it runs include
- the well-known Gabriel benchmarks, with a number of bugfixes from
Henry Baker and limited rewriting to bring them a little closer to
modern Common Lisp coding style
- hashtable exercising
- READ-LINE exercising
- mathematical functions: factorial, fibonnaci, ackermann's number
- some bignum-intensive code from Bruno Haible
- various aspects of CLOS: time taken to compile and execute
DEFCLASS forms which create a class hierarchy, instance creation
time, time taken to compile and execute DEFMETHOD forms, execution
of method combinations, both with and without :after methods.
- various operations on arrays, bitvectors and strings
- performance-intensive kernels such as CRC generation and an
implementation of the DEFLATE algorithm
Except for the CLOS COMPILER tests, timings do not include compilation
time. The garbage collector is run before each test to try to make the
timings more repeatable. For certain targets, we assume that the times
reported by GET-INTERNAL-RUN-TIME and GET-INTERNAL-REAL-TIME are
accurate. Timings for a given Common Lisp environment may be quite
sensitive to the optimization settings; these are set at the beginning
of the Makefile.
Common Lisp is a very large language, so it is difficult to evaluate
the performance of all aspects of an implementation. Remember that the
only real benchmark is your application: this code is only
representative of real-life programs to a limited extent.
Running the suite
=================
Each implementation has a run-<impl>.sh file, and a setup-<impl>.lisp
file in the sysdep directory. To execute the test for CMUCL, for
exampe, type
bash run-cmucl.sh
This will create files with the optimization setting requested in the
Makefile, then invoke CMUCL to compile the files (lots of warnings and
compilation notes are to be expected), then run the various tests.
Execution takes quite a while: around 30 minutes on a 1GHz PIII. A
large part of this time is due to the bignum benchmarks, which you can
comment out if they don't interest you. You can also reduce the number
of executions of each particular test by editing the file "tests.lisp"
-- modify the :runs parameter of the DEFBENCH forms in the file
"tests.lisp". For the results to be meaningful, the time for each test
should be more than a second. However, when comparing implementations
it is not the absolute time which counts, but the time relative to the
other implementations. Also note that you may wish to reduce the size
of the problems, particularly in the array sections, because with some
setups (CMUCL on Linux for example), your system may become unusable
if you exceed available RAM. You can do this by editing the file
"tests.lisp" and modifying the arguments to the :code parameters.
Repeat this operation for other implementations you have on your
machine. I have tried it with CMUCL, SBCL, CLISP, OpenMCL, Poplog
CL and LispWorks on various platforms. GCL and ECL are able to run
some of the tests.
If you're not running on a Unix platform, you may not be able to
script the different steps of a run. In this case, follow the
following steps:
1. Say "make clean optimize-lisp" to create source files that contain
optimization declarations. The optimize settings can be changed
in the Makefile.
2. Load the file "generate.lisp" into your implementation. This
should create two files "do-compilation-script.lisp" and
"do-execute-script.lisp", that contain a sequence of compilation
and load steps.
3. Load the file "sysdep/setup-<yourimpl>.lisp", which you may need
to write yourself. This requires a definition of a BENCH-GC
function that triggers a full garbage collection.
4. Load the file "do-compilation-script.lisp" into your
implementation, which should result in all the source files
being compiled.
5. Load the file "do-execute-script.lisp", which should cause all
the tests to be executed.
For each tested implementation, you should have a file in /var/tmp (or
in the current directory on Windows) named "CL-benchmark-<date>".
These files will have the following format:
,---- /var/tmp/CL-benchmark-20010821T2208 ---
| ;; -*- lisp -*- CMU Common Lisp CVS sources, level-1 built 2001-08-22 on maftia1
| ;;
| ;; Implementation *features*:
| ;; (:PCL-STRUCTURES :PORTABLE-COMMONLOOPS :PYTHON :PCL :COMPLEX-FP-VOPS :PPRO
| ;; :PENTIUM :MP :HASH-NEW :RANDOM-MT19937 :PROPAGATE-FUN-TYPE
| ;; :PROPAGATE-FLOAT-TYPE :CONSTRAIN-FLOAT-TYPE :CMU18C :CMU18 :GENCGC :X86
| ;; :LINUX :GLIBC2 :UNIX :COMMON :COMMON-LISP :ANSI-CL :CMU
| ;; :IEEE-FLOATING-POINT)
| ;;
| ;; Function real user sys consed
| ;; ----------------------------------------------------------------
| ;; Boyer 1.50 1.22 0.28 54349520
| ;; Browse 0.97 0.79 0.18 36219256
| ;; DDerviv 0.88 0.50 0.39 67197656
| ;; Deriv 1.64 0.87 0.77 127195824
| ;; Destructive 0.30 0.24 0.05 12819928
| ;; div2-test-1 0.52 0.32 0.20 38398176
| ;; div2-test2 0.66 0.42 0.24 47999936
| ;; FFT 0.40 0.40 0.00 0
| ;; frpoly/fixnum 0.65 0.54 0.10 19172440
| ;; frpoly/bignum 1.54 1.25 0.28 55628704
| ;; frpoly/float 7.59 6.50 1.09 213052408
| ;; Puzzle 0.82 0.82 0.00 0
| ;; CTak 0.81 0.81 0.00 0
| ;; Tak 0.53 0.54 0.00 0
| ;; RTak 0.35 0.36 0.00 0
| ;; takl 1.60 1.60 0.00 0
| ;; stak 1.15 1.14 0.00 0
| ;; fprint 0.25 0.25 0.01 1948416
| ;; fread 0.82 0.68 0.13 28487280
| ;; traverse 5.28 5.24 0.03 4493288
| ;; triangle 2.29 2.28 0.00 499712
| ;; factorial 0.37 0.18 0.20 26120296
| ;; fib 2.39 2.39 0.00 0
| ;; hashtable 0.72 0.69 0.04 9888912
| ;; CLOS/defclass 2.82 2.31 0.12 32757328
| ;; CLOS/defmethod 10.94 10.09 0.55 120612624
| ;; CLOS/instantiate 7.13 6.27 0.86 229048352
| ;; CLOS/methodcalls 6.56 5.22 1.34 301057608
| ;; CLOS/method+after 12.02 11.09 0.93 197058816
| ;; CLOS/complex-methods 0.38 0.38 0.00 286600
| ;; 1D-arrays 2.46 2.17 0.30 60002400
| ;; 2D-arrays 20.57 19.50 1.07 240000240
| ;; bitvectors 18.75 18.51 0.23 50003200
| ;; fill-strings 21.12 15.23 5.88 1000016000
| ;; fill-strings/adjustable 57.10 56.25 0.85 259729520
|
| ("CMU Common Lisp CVS sources, level-1 built 2001-08-22 on maftia1"
| ("fill-strings/adjustable" 57.1 56.25 0.85 259729520)
| ("fill-strings" 21.12 15.23 5.88 1000016000)
| ("bitvectors" 18.75 18.51 0.23 50003200)
| ("2D-arrays" 20.57 19.5 1.07 240000240) ("1D-arrays" 2.46 2.17 0.3 60002400)
| ("CLOS/complex-methods" 0.38 0.38 0.0 286600)
| ("CLOS/method+after" 12.02 11.09 0.93 197058816)
| ("CLOS/methodcalls" 6.56 5.22 1.34 301057608)
| ("CLOS/instantiate" 7.13 6.27 0.86 229048352)
| ("CLOS/defmethod" 10.94 10.09 0.55 120612624)
| ("CLOS/defclass" 2.82 2.31 0.12 32757328) ("hashtable" 0.72 0.69 0.04 9888912)
| ("fib" 2.39 2.39 0.0 0) ("factorial" 0.37 0.18 0.2 26120296)
| ("triangle" 2.29 2.28 0.0 499712) ("traverse" 5.28 5.24 0.03 4493288)
| ("fread" 0.82 0.68 0.13 28487280) ("fprint" 0.25 0.25 0.01 1948416)
| ("stak" 1.15 1.14 0.0 0) ("takl" 1.6 1.6 0.0 0) ("RTak" 0.35 0.36 0.0 0)
| ("Tak" 0.53 0.54 0.0 0) ("CTak" 0.81 0.81 0.0 0) ("Puzzle" 0.82 0.82 0.0 0)
| ("frpoly/float" 7.59 6.5 1.09 213052408)
| ("frpoly/bignum" 1.54 1.25 0.28 55628704)
| ("frpoly/fixnum" 0.65 0.54 0.1 19172440) ("FFT" 0.4 0.4 0.0 0)
| ("div2-test2" 0.66 0.42 0.24 47999936) ("div2-test-1" 0.52 0.32 0.2 38398176)
| ("Destructive" 0.3 0.24 0.05 12819928) ("Deriv" 1.64 0.87 0.77 127195824)
| ("DDerviv" 0.88 0.5 0.39 67197656) ("Browse" 0.97 0.79 0.18 36219256)
| ("Boyer" 1.5 1.22 0.28 54349520))
`----
The first section of the file is intended to be human readable, and
the second section to be READ by a Common Lisp implementation. For
each test, you should see the elapsed user time, and possibly (if this
has been coded for your implementation) elapsed system time and the
number of bytes consed during the test execution.
The data in the different /var/tmp/CL-benchmark-* files is analysed by the
file "report.lisp", to generate a report comparing the performance of
the different implementations. This file needs to be run in a Common
Lisp implementation; the one you use will be considered the
"reference" implementation. In the report which is generated, for each
test the timing for the reference implementation will be shown, as
well as the _relative times_ for each of the other tested
implementations. A relative time means that a score under 1 is better,
and a score of 2 means it is two times slower -- for that test -- than
the reference implementation. If a given test doesn't work in a
particular implementation (for example CLISP doesn't do non-standard
method combination), its entry will be -1.0.
Here is an example of the type of results you can obtain, for x86 and
SPARC:
,---- PentiumIII at 1GHz, 256MB RAM, Linux 2.4.2 ---
|
| Benchmark Reference CLISP CMU C SBCL
| ----------------------------------------------------------------
| BOYER 2.36 4.54 0.67 0.94
| BROWSE 1.04 2.15 0.65 1.04
| DDerviv 1.19 1.96 0.48 1.06
| Deriv 2.27 1.93 0.42 1.04
| DESTRUCTIVE 1.52 2.79 0.89 1.06
| DIV2-TEST-1 1.73 1.84 0.51 1.09
| DIV2-TEST-2 0.85 1.87 0.46 1.12
| FFT 0.22 36.09 1.14 1.14
| FRPOLY/FIXNUM 0.79 5.81 0.81 0.96
| FRPOLY/BIGNUM 1.99 2.03 0.68 0.96
| FRPOLY/FLOAT 0.78 3.79 0.72 0.99
| PUZZLE 0.79 23.09 1.15 9.95
| CTAK 0.86 6.28 1.10 1.06
| TAK 0.91 8.86 1.18 1.34
| RTAK 0.91 8.86 1.13 1.34
| TAKL 1.67 7.33 1.16 1.22
| STAK 1.15 6.66 1.15 1.10
| FPRINT 1.42 1.12 1.05 2.37
| TRAVERSE 4.35 6.75 1.19 1.64
| TRIANGLE 2.01 17.22 1.14 1.27
| CASCOR 4.06 80.47 1.23 0.92
| RICHARDS 0.58 24.78 1.22 1.07
| FACTORIAL 0.50 3.56 0.68 1.38
| FIB 0.39 5.67 1.13 1.08
| BIGNUM/ELEM-100-1000 1.33 0.11 1.02 0.98
| BIGNUM/ELEM-1000-100 6.03 0.07 1.11 0.93
| BIGNUM/ELEM-10000-1 6.14 0.05 1.00 0.93
| BIGNUM/PARI-100-10 1.51 0.06 0.96 0.91
| BIGNUM/PARI-200-5 17.32 0.02 0.99 0.88
| HASH-STRINGS 1.65 3.00 0.82 0.99
| HASH-INTEGERS 0.73 2.30 0.68 1.05
| BOEHM-GC 10.18 3.94 0.39 1.19
| CLOS/defclass 2.64 0.35 0.83 2.33
| CLOS/defmethod 13.93 0.02 0.70 1.94
| CLOS/instantiate 7.23 1.02 0.63 1.39
| CLOS/methodcalls 8.43 1.33 0.50 1.37
| CLOS/method+after 13.90 0.45 0.79 2.64
| CLOS/complex-methods 0.35 -1.00 1.06 5.40
| 1D-ARRAYS 3.74 4.57 0.69 1.26
| 2D-ARRAYS 15.19 3.25 0.92 4.00
| BITVECTORS 2.92 0.61 0.70 0.95
| FILL-STRINGS 10.62 1.15 0.76 3.03
| fill-strings/adjustable 18.13 1.08 0.92 1.65
| BENCH-STRING-CONCAT 9.99 -1.00 0.81 2.34
|
| Reference implementation: CMU Common Lisp 18c
| Impl CLISP: CLISP 2.27 (released 2001-07-17) (built 3204291355) (memory 3205854943)
| Impl CMU C: CMU Common Lisp 18d-pre, level-1 built 2001-12-14 on maftia1
| Impl SBCL : SBCL 0.7.0
| Linux maftia1 2.4.2-2 #1 Sun Apr 8 20:41:30 EDT 2001 i686 unknown
`----
,---- UltraSPARCIIe at 500MHz, 640MB RAM, SunOS 5.8 ---
|
| Benchmark Reference CMU C CLISP
| -----------------------------------------------------
| BOYER 3.98 0.91 8.03
| BROWSE 1.72 0.91 2.85
| DDerviv 2.02 0.75 3.21
| Deriv 3.63 0.81 3.13
| DESTRUCTIVE 3.11 1.01 4.18
| DIV2-TEST-1 2.19 0.83 3.92
| DIV2-TEST-2 1.12 0.82 3.85
| FFT 0.74 1.03 28.86
| FRPOLY/FIXNUM 1.87 1.01 7.89
| FRPOLY/BIGNUM 4.59 1.29 3.07
| FRPOLY/FLOAT 1.65 0.96 5.68
| PUZZLE 2.07 0.95 30.62
| CTAK 2.74 1.01 9.04
| TAK 1.84 1.00 14.08
| RTAK 1.84 1.01 13.95
| TAKL 3.37 1.01 11.63
| STAK 2.32 1.01 8.87
| FPRINT 4.17 1.02 1.12
| TRAVERSE 5.84 0.99 13.74
| TRIANGLE 5.53 0.86 15.57
| CASCOR 10.53 0.73 52.81
| RICHARDS 2.35 0.94 22.46
| FACTORIAL 1.46 1.48 2.88
| FIB 0.94 0.99 6.71
| BIGNUM/ELEM-100-1000 2.80 1.24 0.28
| BIGNUM/ELEM-1000-100 10.14 1.19 0.44
| BIGNUM/ELEM-10000-1 11.38 1.35 0.41
| BIGNUM/PARI-100-10 2.76 1.15 0.09
| BIGNUM/PARI-200-5 27.19 1.06 0.05
| READ-LINE 3.39 1.06 1.19
| HASH-STRINGS 5.42 1.20 2.19
| HASH-INTEGERS 1.61 0.76 2.00
| BOEHM-GC 19.97 0.76 4.14
| CLOS/defclass 4.78 1.01 0.81
| CLOS/defmethod 27.61 0.89 0.03
| CLOS/instantiate 20.93 0.85 1.28
| CLOS/methodcalls 23.62 1.08 1.94
| CLOS/method+after 33.70 1.07 0.78
| CLOS/complex-methods 1.41 0.92 -1.00
| 1D-ARRAYS 10.77 0.92 3.51
| 2D-ARRAYS 56.66 1.40 2.61
| BITVECTORS 5.35 0.86 0.42
| FILL-STRINGS 18.88 1.07 0.97
| fill-strings/adjustable 45.09 1.46 1.41
| BENCH-STRING-CONCAT 48.10 0.90 -1.00
|
| Reference implementation: CMU Common Lisp 18c, Built 2000-11-27
| Impl CMU C: CMU Common Lisp 18d-pre, level-1 built 2001-12-12 on liszt
| Impl CLISP: CLISP 2.27.2 (released 2001-10-05) (built on moustacho)
| SunOS eagles 5.8 Generic_108528-10 sun4u sparc SUNW,Sun-Blade-100
`----
Note that the test suite doesn't take compilation time into account
(except for the CLOS-related tests, where the compiler may be used at
runtime to create effective methods). You can use the time taken to
load do-compilation-script.lisp as a rough proxy for compilation time.
"Life is short and it was not meant to be spent making people feel guilty
about instruction pipelines being only partly full or caches being missed."
-- Kent Pitman in <sfw7ksm3b7k.fsf@shell01.TheWorld.com>
Thanks
======
Raymond Toy, Christophe Rhodes, Peter Van Eynde, Sven Van
Caekenberghe, Christophe Rhodes, Kevin Layers and possibly others that
I have forgotten to note.
Related work
============
- @misc{ gabriel86performance,
author = "R. Gabriel",
title = "Performance and Evaluation of Lisp Systems",
text = "R. P. Gabriel. Performance and Evaluation of Lisp Systems. MIT Press, Cambridge,
Massachusetts, 1986.",
year = "1986" }
- Scheme benchmarks by Will Clinger (Larceny and TwoBit compilers)
<URL:http://www.ccs.neu.edu/home/will/GC/sourcecode.html>
- Bagley's Programming Language Shootout,
<URL:http://www.bagley.org/~doug/shootout/>
Eric Marsden <emarsden@laas.fr>

2
cl-bench/TODO Normal file
View File

@@ -0,0 +1,2 @@
- convert all physical pathnames to constructed pathnames

119
cl-bench/defpackage.lisp Normal file
View File

@@ -0,0 +1,119 @@
;;; defpackage.lisp -- DEFPACKAGE forms for the cl-bench modules
;;
;; Time-stamp: <2004-01-01 emarsden>
(defpackage :cl-bench
(:use :common-lisp
#+cmu :ext
#+clisp :ext
#+allegro :excl))
(defpackage :cl-bench.gabriel
(:use :common-lisp)
(:export #:boyer
#:browse
#:dderiv-run
#:deriv-run
#:run-destructive
#:run-div2-test1
#:run-div2-test2
#:div2-l
#:run-fft
#:run-frpoly/fixnum
#:run-frpoly/bignum
#:run-frpoly/float
#:run-puzzle
#:run-tak
#:run-ctak
#:run-trtak
#:run-takl
#:run-stak
#:fprint/pretty
#:fprint/ugly
#:run-traverse
#:run-triangle))
(defpackage :cl-bench.math
(:use :common-lisp)
(:export #:run-factorial
#:run-fib
#:run-fib-ratio
#:run-ackermann
#:run-mandelbrot/complex
#:run-mandelbrot/dfloat
#:run-mrg32k3a))
(defpackage :cl-bench.crc
(:use :common-lisp)
(:export #:run-crc40))
(defpackage :cl-bench.bignum
(:use :common-lisp)
(:export #:run-elem-100-1000
#:run-elem-1000-100
#:run-elem-10000-1
#:run-pari-100-10
#:run-pari-200-5
#:run-pari-1000-1
#:run-pi-decimal/small
#:run-pi-decimal/big
#:run-pi-atan))
(defpackage :cl-bench.ratios
(:use :common-lisp)
(:export #:run-pi-ratios))
(defpackage :cl-bench.hash
(:use :common-lisp)
(:export #:run-slurp-lines
#:hash-strings
#:hash-integers))
(defpackage :cl-bench.boehm-gc
(:use :common-lisp)
(:export #:gc-benchmark))
(defpackage :cl-bench.deflate
(:use :common-lisp)
(:export #:run-deflate-file))
(defpackage :cl-bench.arrays
(:use :common-lisp)
(:export #:bench-1d-arrays
#:bench-2d-arrays
#:bench-3d-arrays
#:bench-bitvectors
#:bench-strings
#:bench-strings/adjustable
#:bench-string-concat
#:bench-search-sequence))
(defpackage :cl-bench.richards
(:use :common-lisp)
(:export #:richards))
(defpackage :cl-bench.clos
(:use :common-lisp)
(:export #:run-defclass
#:run-defmethod
#:make-instances
#:make-instances/simple
#:methodcalls/simple
#:methodcalls/simple+after
#:methodcalls/complex
#:run-eql-fib))
(defpackage :cl-bench.misc
(:use :common-lisp)
(:export #:run-compiler
#:run-fasload
#:run-permutations
#:walk-list/seq
#:walk-list/mess))
(defpackage :cl-ppcre-test
(:use :common-lisp)
(:export #:test))
;; EOF

View File

@@ -0,0 +1,15 @@
;;; auto-generated from file #p"generate.lisp"
(IN-PACKAGE :CL-USER)
(LOAD #p"defpackage.lisp")
(COMPILE-FILE #p"files/arrays.olisp")
(COMPILE-FILE #p"files/bignum.olisp")
(COMPILE-FILE #p"files/boehm-gc.olisp")
(COMPILE-FILE #p"files/clos.olisp")
(COMPILE-FILE #p"files/crc40.olisp")
(COMPILE-FILE #p"files/deflate.olisp")
(COMPILE-FILE #p"files/gabriel.olisp")
(COMPILE-FILE #p"files/hash.olisp")
(COMPILE-FILE #p"files/math.olisp")
(COMPILE-FILE #p"files/ratios.olisp")
(COMPILE-FILE #p"files/richards.olisp")
(COMPILE-FILE #p"files/misc.olisp")

View File

@@ -0,0 +1,385 @@
;;; auto-generated from file #p"generate.lisp"
(IN-PACKAGE :CL-USER)
(LOAD #p"defpackage.lisp")
(LOAD (COMPILE-FILE-PATHNAME #p"files/arrays.olisp"))
(LOAD (COMPILE-FILE-PATHNAME #p"files/bignum.olisp"))
(LOAD (COMPILE-FILE-PATHNAME #p"files/boehm-gc.olisp"))
(LOAD (COMPILE-FILE-PATHNAME #p"files/clos.olisp"))
(LOAD (COMPILE-FILE-PATHNAME #p"files/crc40.olisp"))
(LOAD (COMPILE-FILE-PATHNAME #p"files/deflate.olisp"))
(LOAD (COMPILE-FILE-PATHNAME #p"files/gabriel.olisp"))
(LOAD (COMPILE-FILE-PATHNAME #p"files/hash.olisp"))
(LOAD (COMPILE-FILE-PATHNAME #p"files/math.olisp"))
(LOAD (COMPILE-FILE-PATHNAME #p"files/ratios.olisp"))
(LOAD (COMPILE-FILE-PATHNAME #p"files/richards.olisp"))
(LOAD (COMPILE-FILE-PATHNAME #p"files/misc.olisp"))
(COMPILE-FILE #p"support.lisp")
(LOAD (COMPILE-FILE-PATHNAME #p"support.lisp"))
(IN-PACKAGE :CL-BENCH)
(defun run-benchmarks ()
(with-open-file (f (benchmark-report-file)
:direction :output
:if-exists :supersede)
(let ((*benchmark-output* f)
(*print-length* nil)
(*load-verbose* nil)
(*compile-verbose* nil)
(*compile-print* nil))
(bench-report-header)
#-(or gcl armedbear)
(progn
(format t "=== running #<benchmark COMPILER for 3 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.misc:run-compiler "COMPILER" 3))
#-(or gcl armedbear ecl)
(progn
(format t "=== running #<benchmark LOAD-FASL for 20 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.misc:run-fasload "LOAD-FASL" 20))
#-(or lispworks-personal-edition ecl)
(progn
(format t "=== running #<benchmark SUM-PERMUTATIONS for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.misc:run-permutations "SUM-PERMUTATIONS" 2))
#-(or lispworks-personal-edition armedbear)
(progn
(format t "=== running #<benchmark WALK-LIST/SEQ for 2 runs>~%")
(force-output)
(bench-gc)
(funcall 'cl-bench.misc::setup-walk-list/seq)
(bench-report 'cl-bench.misc:walk-list/seq "WALK-LIST/SEQ" 2))
#-(or lispworks-personal-edition armedbear poplog)
(progn
(format t "=== running #<benchmark WALK-LIST/MESS for 1 runs>~%")
(force-output)
(bench-gc)
(funcall 'cl-bench.misc::setup-walk-list/mess)
(bench-report 'cl-bench.misc:walk-list/mess "WALK-LIST/MESS" 1))
(progn
(format t "=== running #<benchmark BOYER for 30 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:boyer "BOYER" 30))
(progn
(format t "=== running #<benchmark BROWSE for 10 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:browse "BROWSE" 10))
(progn
(format t "=== running #<benchmark DDERIV for 50 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:dderiv-run "DDERIV" 50))
(progn
(format t "=== running #<benchmark DERIV for 60 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:deriv-run "DERIV" 60))
(progn
(format t "=== running #<benchmark DESTRUCTIVE for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-destructive "DESTRUCTIVE" 100))
(progn
(format t "=== running #<benchmark DIV2-TEST-1 for 200 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-div2-test1 "DIV2-TEST-1" 200))
(progn
(format t "=== running #<benchmark DIV2-TEST-2 for 200 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-div2-test2 "DIV2-TEST-2" 200))
(progn
(format t "=== running #<benchmark FFT for 30 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-fft "FFT" 30))
(progn
(format t "=== running #<benchmark FRPOLY/FIXNUM for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-frpoly/fixnum "FRPOLY/FIXNUM" 100))
(progn
(format t "=== running #<benchmark FRPOLY/BIGNUM for 30 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-frpoly/bignum "FRPOLY/BIGNUM" 30))
(progn
(format t "=== running #<benchmark FRPOLY/FLOAT for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-frpoly/float "FRPOLY/FLOAT" 100))
(progn
(format t "=== running #<benchmark PUZZLE for 1500 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-puzzle "PUZZLE" 1500))
(progn
(format t "=== running #<benchmark TAK for 500 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-tak "TAK" 500))
(progn
(format t "=== running #<benchmark CTAK for 900 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-ctak "CTAK" 900))
(progn
(format t "=== running #<benchmark TRTAK for 500 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-trtak "TRTAK" 500))
(progn
(format t "=== running #<benchmark TAKL for 150 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-takl "TAKL" 150))
(progn
(format t "=== running #<benchmark STAK for 200 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-stak "STAK" 200))
(progn
(format t "=== running #<benchmark FPRINT/UGLY for 200 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:fprint/ugly "FPRINT/UGLY" 200))
(progn
(format t "=== running #<benchmark FPRINT/PRETTY for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:fprint/pretty "FPRINT/PRETTY" 100))
(progn
(format t "=== running #<benchmark TRAVERSE for 15 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-traverse "TRAVERSE" 15))
(progn
(format t "=== running #<benchmark TRIANGLE for 5 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-triangle "TRIANGLE" 5))
(progn
(format t "=== running #<benchmark RICHARDS for 5 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.richards:richards "RICHARDS" 5))
(progn
(format t "=== running #<benchmark FACTORIAL for 1000 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.math:run-factorial "FACTORIAL" 1000))
(progn
(format t "=== running #<benchmark FIB for 50 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.math:run-fib "FIB" 50))
(progn
(format t "=== running #<benchmark FIB-RATIO for 500 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.math:run-fib-ratio "FIB-RATIO" 500))
(progn
(format t "=== running #<benchmark ACKERMANN for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.math:run-ackermann "ACKERMANN" 1))
(progn
(format t "=== running #<benchmark MANDELBROT/COMPLEX for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.math:run-mandelbrot/complex "MANDELBROT/COMPLEX" 100))
(progn
(format t "=== running #<benchmark MANDELBROT/DFLOAT for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.math:run-mandelbrot/dfloat "MANDELBROT/DFLOAT" 100))
(progn
(format t "=== running #<benchmark MRG32K3A for 20 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.math:run-mrg32k3a "MRG32K3A" 20))
(progn
(format t "=== running #<benchmark CRC40 for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.crc:run-crc40 "CRC40" 2))
(progn
(format t "=== running #<benchmark BIGNUM/ELEM-100-1000 for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-elem-100-1000 "BIGNUM/ELEM-100-1000" 1))
(progn
(format t "=== running #<benchmark BIGNUM/ELEM-1000-100 for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-elem-1000-100 "BIGNUM/ELEM-1000-100" 1))
(progn
(format t "=== running #<benchmark BIGNUM/ELEM-10000-1 for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-elem-10000-1 "BIGNUM/ELEM-10000-1" 1))
(progn
(format t "=== running #<benchmark BIGNUM/PARI-100-10 for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-pari-100-10 "BIGNUM/PARI-100-10" 1))
(progn
(format t "=== running #<benchmark BIGNUM/PARI-200-5 for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-pari-200-5 "BIGNUM/PARI-200-5" 1))
(progn
(format t "=== running #<benchmark PI-DECIMAL/SMALL for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-pi-decimal/small "PI-DECIMAL/SMALL" 100))
(progn
(format t "=== running #<benchmark PI-DECIMAL/BIG for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-pi-decimal/big "PI-DECIMAL/BIG" 2))
(progn
(format t "=== running #<benchmark PI-ATAN for 200 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-pi-atan "PI-ATAN" 200))
(progn
(format t "=== running #<benchmark PI-RATIOS for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.ratios:run-pi-ratios "PI-RATIOS" 2))
(progn
(format t "=== running #<benchmark HASH-STRINGS for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.hash:hash-strings "HASH-STRINGS" 2))
(progn
(format t "=== running #<benchmark HASH-INTEGERS for 10 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.hash:hash-integers "HASH-INTEGERS" 10))
(progn
(format t "=== running #<benchmark SLURP-LINES for 30 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.hash:run-slurp-lines "SLURP-LINES" 30))
#-(or lispworks-personal-edition)
(progn
(format t "=== running #<benchmark BOEHM-GC for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.boehm-gc:gc-benchmark "BOEHM-GC" 1))
(progn
(format t "=== running #<benchmark DEFLATE-FILE for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.deflate:run-deflate-file "DEFLATE-FILE" 100))
#-(or lispworks-personal-edition)
(progn
(format t "=== running #<benchmark 1D-ARRAYS for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-1d-arrays "1D-ARRAYS" 1))
#-(or lispworks-personal-edition)
(progn
(format t "=== running #<benchmark 2D-ARRAYS for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-2d-arrays "2D-ARRAYS" 1))
#-(or lispworks-personal-edition)
(progn
(format t "=== running #<benchmark 3D-ARRAYS for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-3d-arrays "3D-ARRAYS" 1))
(progn
(format t "=== running #<benchmark BITVECTORS for 3 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-bitvectors "BITVECTORS" 3))
#-(or lispworks-personal-edition)
(progn
(format t "=== running #<benchmark BENCH-STRINGS for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-strings "BENCH-STRINGS" 1))
#-(or lispworks-personal-edition)
(progn
(format t "=== running #<benchmark fill-strings/adjustable for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-strings/adjustable "fill-strings/adjustable" 1))
#-(or allegro lispworks-personal-edition poplog)
(progn
(format t "=== running #<benchmark STRING-CONCAT for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-string-concat "STRING-CONCAT" 1))
#-(or lispworks-personal-edition)
(progn
(format t "=== running #<benchmark SEARCH-SEQUENCE for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-search-sequence "SEARCH-SEQUENCE" 1))
(progn
(format t "=== running #<benchmark CLOS/defclass for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:run-defclass "CLOS/defclass" 1))
(progn
(format t "=== running #<benchmark CLOS/defmethod for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:run-defmethod "CLOS/defmethod" 1))
(progn
(format t "=== running #<benchmark CLOS/instantiate for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:make-instances "CLOS/instantiate" 2))
(progn
(format t "=== running #<benchmark CLOS/simple-instantiate for 200 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:make-instances/simple "CLOS/simple-instantiate" 200))
(progn
(format t "=== running #<benchmark CLOS/methodcalls for 5 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:methodcalls/simple "CLOS/methodcalls" 5))
(progn
(format t "=== running #<benchmark CLOS/method+after for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:methodcalls/simple+after "CLOS/method+after" 2))
#-(or clisp poplog)
(progn
(format t "=== running #<benchmark CLOS/complex-methods for 5 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:methodcalls/complex "CLOS/complex-methods" 5))
(progn
(format t "=== running #<benchmark EQL-SPECIALIZED-FIB for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:run-eql-fib "EQL-SPECIALIZED-FIB" 2))
(bench-report-footer))))
(run-benchmarks)

View File

@@ -0,0 +1,384 @@
;;; auto-generated from file #p"generate.lisp"
(IN-PACKAGE :CL-USER)
(LOAD #p"defpackage.lisp")
(LOAD #p"files/arrays.olisp")
(LOAD #p"files/bignum.olisp")
(LOAD #p"files/boehm-gc.olisp")
(LOAD #p"files/clos.olisp")
(LOAD #p"files/crc40.olisp")
(LOAD #p"files/deflate.olisp")
(LOAD #p"files/gabriel.olisp")
(LOAD #p"files/hash.olisp")
(LOAD #p"files/math.olisp")
(LOAD #p"files/ratios.olisp")
(LOAD #p"files/richards.olisp")
(LOAD #p"files/misc.olisp")
(LOAD #p"support.lisp")
(IN-PACKAGE :CL-BENCH)
(defun run-benchmarks ()
(with-open-file (f (benchmark-report-file)
:direction :output
:if-exists :supersede)
(let ((*benchmark-output* f)
(*print-length* nil)
(*load-verbose* nil)
(*compile-verbose* nil)
(*compile-print* nil))
(bench-report-header)
#-(or gcl armedbear)
(progn
(format t "=== running #<benchmark COMPILER for 3 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.misc:run-compiler "COMPILER" 3))
#-(or gcl armedbear ecl)
(progn
(format t "=== running #<benchmark LOAD-FASL for 20 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.misc:run-fasload "LOAD-FASL" 20))
#-(or lispworks-personal-edition ecl)
(progn
(format t "=== running #<benchmark SUM-PERMUTATIONS for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.misc:run-permutations "SUM-PERMUTATIONS" 2))
#-(or lispworks-personal-edition armedbear)
(progn
(format t "=== running #<benchmark WALK-LIST/SEQ for 2 runs>~%")
(force-output)
(bench-gc)
(funcall 'cl-bench.misc::setup-walk-list/seq)
(bench-report 'cl-bench.misc:walk-list/seq "WALK-LIST/SEQ" 2))
#-(or lispworks-personal-edition armedbear poplog)
(progn
(format t "=== running #<benchmark WALK-LIST/MESS for 1 runs>~%")
(force-output)
(bench-gc)
(funcall 'cl-bench.misc::setup-walk-list/mess)
(bench-report 'cl-bench.misc:walk-list/mess "WALK-LIST/MESS" 1))
(progn
(format t "=== running #<benchmark BOYER for 30 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:boyer "BOYER" 30))
(progn
(format t "=== running #<benchmark BROWSE for 10 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:browse "BROWSE" 10))
(progn
(format t "=== running #<benchmark DDERIV for 50 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:dderiv-run "DDERIV" 50))
(progn
(format t "=== running #<benchmark DERIV for 60 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:deriv-run "DERIV" 60))
(progn
(format t "=== running #<benchmark DESTRUCTIVE for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-destructive "DESTRUCTIVE" 100))
(progn
(format t "=== running #<benchmark DIV2-TEST-1 for 200 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-div2-test1 "DIV2-TEST-1" 200))
(progn
(format t "=== running #<benchmark DIV2-TEST-2 for 200 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-div2-test2 "DIV2-TEST-2" 200))
(progn
(format t "=== running #<benchmark FFT for 30 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-fft "FFT" 30))
(progn
(format t "=== running #<benchmark FRPOLY/FIXNUM for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-frpoly/fixnum "FRPOLY/FIXNUM" 100))
(progn
(format t "=== running #<benchmark FRPOLY/BIGNUM for 30 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-frpoly/bignum "FRPOLY/BIGNUM" 30))
(progn
(format t "=== running #<benchmark FRPOLY/FLOAT for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-frpoly/float "FRPOLY/FLOAT" 100))
(progn
(format t "=== running #<benchmark PUZZLE for 1500 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-puzzle "PUZZLE" 1500))
(progn
(format t "=== running #<benchmark TAK for 500 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-tak "TAK" 500))
(progn
(format t "=== running #<benchmark CTAK for 900 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-ctak "CTAK" 900))
(progn
(format t "=== running #<benchmark TRTAK for 500 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-trtak "TRTAK" 500))
(progn
(format t "=== running #<benchmark TAKL for 150 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-takl "TAKL" 150))
(progn
(format t "=== running #<benchmark STAK for 200 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-stak "STAK" 200))
(progn
(format t "=== running #<benchmark FPRINT/UGLY for 200 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:fprint/ugly "FPRINT/UGLY" 200))
(progn
(format t "=== running #<benchmark FPRINT/PRETTY for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:fprint/pretty "FPRINT/PRETTY" 100))
(progn
(format t "=== running #<benchmark TRAVERSE for 15 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-traverse "TRAVERSE" 15))
(progn
(format t "=== running #<benchmark TRIANGLE for 5 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.gabriel:run-triangle "TRIANGLE" 5))
(progn
(format t "=== running #<benchmark RICHARDS for 5 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.richards:richards "RICHARDS" 5))
(progn
(format t "=== running #<benchmark FACTORIAL for 1000 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.math:run-factorial "FACTORIAL" 1000))
(progn
(format t "=== running #<benchmark FIB for 50 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.math:run-fib "FIB" 50))
(progn
(format t "=== running #<benchmark FIB-RATIO for 500 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.math:run-fib-ratio "FIB-RATIO" 500))
(progn
(format t "=== running #<benchmark ACKERMANN for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.math:run-ackermann "ACKERMANN" 1))
(progn
(format t "=== running #<benchmark MANDELBROT/COMPLEX for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.math:run-mandelbrot/complex "MANDELBROT/COMPLEX" 100))
(progn
(format t "=== running #<benchmark MANDELBROT/DFLOAT for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.math:run-mandelbrot/dfloat "MANDELBROT/DFLOAT" 100))
(progn
(format t "=== running #<benchmark MRG32K3A for 20 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.math:run-mrg32k3a "MRG32K3A" 20))
(progn
(format t "=== running #<benchmark CRC40 for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.crc:run-crc40 "CRC40" 2))
(progn
(format t "=== running #<benchmark BIGNUM/ELEM-100-1000 for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-elem-100-1000 "BIGNUM/ELEM-100-1000" 1))
(progn
(format t "=== running #<benchmark BIGNUM/ELEM-1000-100 for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-elem-1000-100 "BIGNUM/ELEM-1000-100" 1))
(progn
(format t "=== running #<benchmark BIGNUM/ELEM-10000-1 for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-elem-10000-1 "BIGNUM/ELEM-10000-1" 1))
(progn
(format t "=== running #<benchmark BIGNUM/PARI-100-10 for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-pari-100-10 "BIGNUM/PARI-100-10" 1))
(progn
(format t "=== running #<benchmark BIGNUM/PARI-200-5 for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-pari-200-5 "BIGNUM/PARI-200-5" 1))
(progn
(format t "=== running #<benchmark PI-DECIMAL/SMALL for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-pi-decimal/small "PI-DECIMAL/SMALL" 100))
(progn
(format t "=== running #<benchmark PI-DECIMAL/BIG for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-pi-decimal/big "PI-DECIMAL/BIG" 2))
(progn
(format t "=== running #<benchmark PI-ATAN for 200 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.bignum:run-pi-atan "PI-ATAN" 200))
(progn
(format t "=== running #<benchmark PI-RATIOS for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.ratios:run-pi-ratios "PI-RATIOS" 2))
(progn
(format t "=== running #<benchmark HASH-STRINGS for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.hash:hash-strings "HASH-STRINGS" 2))
(progn
(format t "=== running #<benchmark HASH-INTEGERS for 10 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.hash:hash-integers "HASH-INTEGERS" 10))
(progn
(format t "=== running #<benchmark SLURP-LINES for 30 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.hash:run-slurp-lines "SLURP-LINES" 30))
#-(or lispworks-personal-edition)
(progn
(format t "=== running #<benchmark BOEHM-GC for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.boehm-gc:gc-benchmark "BOEHM-GC" 1))
(progn
(format t "=== running #<benchmark DEFLATE-FILE for 100 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.deflate:run-deflate-file "DEFLATE-FILE" 100))
#-(or lispworks-personal-edition)
(progn
(format t "=== running #<benchmark 1D-ARRAYS for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-1d-arrays "1D-ARRAYS" 1))
#-(or lispworks-personal-edition)
(progn
(format t "=== running #<benchmark 2D-ARRAYS for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-2d-arrays "2D-ARRAYS" 1))
#-(or lispworks-personal-edition)
(progn
(format t "=== running #<benchmark 3D-ARRAYS for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-3d-arrays "3D-ARRAYS" 1))
(progn
(format t "=== running #<benchmark BITVECTORS for 3 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-bitvectors "BITVECTORS" 3))
#-(or lispworks-personal-edition)
(progn
(format t "=== running #<benchmark BENCH-STRINGS for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-strings "BENCH-STRINGS" 1))
#-(or lispworks-personal-edition)
(progn
(format t "=== running #<benchmark fill-strings/adjustable for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-strings/adjustable "fill-strings/adjustable" 1))
#-(or allegro lispworks-personal-edition poplog)
(progn
(format t "=== running #<benchmark STRING-CONCAT for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-string-concat "STRING-CONCAT" 1))
#-(or lispworks-personal-edition)
(progn
(format t "=== running #<benchmark SEARCH-SEQUENCE for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.arrays:bench-search-sequence "SEARCH-SEQUENCE" 1))
(progn
(format t "=== running #<benchmark CLOS/defclass for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:run-defclass "CLOS/defclass" 1))
(progn
(format t "=== running #<benchmark CLOS/defmethod for 1 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:run-defmethod "CLOS/defmethod" 1))
(progn
(format t "=== running #<benchmark CLOS/instantiate for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:make-instances "CLOS/instantiate" 2))
(progn
(format t "=== running #<benchmark CLOS/simple-instantiate for 200 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:make-instances/simple "CLOS/simple-instantiate" 200))
(progn
(format t "=== running #<benchmark CLOS/methodcalls for 5 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:methodcalls/simple "CLOS/methodcalls" 5))
(progn
(format t "=== running #<benchmark CLOS/method+after for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:methodcalls/simple+after "CLOS/method+after" 2))
#-(or clisp poplog)
(progn
(format t "=== running #<benchmark CLOS/complex-methods for 5 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:methodcalls/complex "CLOS/complex-methods" 5))
(progn
(format t "=== running #<benchmark EQL-SPECIALIZED-FIB for 2 runs>~%")
(force-output)
(bench-gc)
(bench-report 'cl-bench.clos:run-eql-fib "EQL-SPECIALIZED-FIB" 2))
(bench-report-footer))))
(run-benchmarks)

113
cl-bench/files/arrays.lisp Normal file
View File

@@ -0,0 +1,113 @@
;; benchmarks speed of array and sequence operations
;;
;; Author: Eric Marsden <emarsden@laas.fr>
;; Time-stamp: <2003-12-29 emarsden>
;;
;;
;; Timing tests for creation, initialization, access and garbage
;; collection for arrays, vectors, bitvectors and strings.
;;
;; NOTE: be careful running these in CMUCL on Linux with sizes larger
;; than your RAM; you will most likely crash your machine.
(in-package :cl-bench.arrays)
(defun bench-1d-arrays (&optional (size 100000) (runs 10))
(declare (fixnum size))
(let ((ones (make-array size :element-type '(integer 0 1000) :initial-element 1))
(twos (make-array size :element-type '(integer 0 1000) :initial-element 2))
(threes (make-array size :element-type '(integer 0 2000))))
(dotimes (runs runs)
(dotimes (pos size)
(setf (aref threes pos) (+ (aref ones pos) (aref twos pos))))
(assert (null (search (list 4 5 6) threes)))))
(values))
(defun bench-2d-arrays (&optional (size 2000) (runs 10))
(declare (fixnum size))
(let ((ones (make-array (list size size) :element-type '(integer 0 1000) :initial-element 1))
(twos (make-array (list size size) :element-type '(integer 0 1000) :initial-element 2))
(threes (make-array (list size size) :element-type '(integer 0 2000))))
(dotimes (runs runs)
(dotimes (i size)
(dotimes (j size)
(setf (aref threes i j)
(+ (aref ones i j) (aref twos i j)))))
(assert (eql 3 (aref threes 3 3)))))
(values))
(defun bench-3d-arrays (&optional (size 200) (runs 10))
(declare (fixnum size))
(let ((ones (make-array (list size size size) :element-type '(integer 0 1000) :initial-element 1))
(twos (make-array (list size size size) :element-type '(integer 0 1000) :initial-element 2))
(threes (make-array (list size size size) :element-type '(integer 0 2000))))
(dotimes (runs runs)
(dotimes (i size)
(dotimes (j size)
(dotimes (k size)
(setf (aref threes i j k)
(+ (aref ones i j k) (aref twos i j k))))))
(assert (eql 3 (aref threes 3 3 3)))))
(values))
(defun bench-bitvectors (&optional (size 1000000) (runs 700))
(declare (fixnum size))
(let ((zeros (make-array size :element-type 'bit :initial-element 0))
(ones (make-array size :element-type 'bit :initial-element 1))
(xors (make-array size :element-type 'bit)))
(dotimes (runs runs)
(bit-xor zeros ones xors)
(bit-nand zeros ones xors)
(bit-and zeros xors)))
(values))
(defun bench-strings (&optional (size 1000000) (runs 50))
(declare (fixnum size))
(let ((zzz (make-string size :initial-element #\z))
(xxx (make-string size)))
(dotimes (runs runs)
(and (fill xxx #\x)
(replace xxx zzz)
(search "xxxd" xxx)
(nstring-upcase xxx))))
(values))
(defun bench-strings/adjustable (&optional (size 1000000) (runs 100))
(declare (fixnum size))
(dotimes (runs runs)
(let ((sink (make-array 10 :element-type 'character :adjustable t :fill-pointer 0)))
(dotimes (i size)
(vector-push-extend (code-char (mod i 128)) sink))))
(values))
;; certain implementations such as OpenMCL have an array (and thus
;; string) length limit of (expt 2 24), so don't try this on humungous
;; sizes
(defun bench-string-concat (&optional (size 1000000) (runs 100))
(declare (fixnum size))
(dotimes (runs runs)
(let ((len (length
(with-output-to-string (string)
(dotimes (i size)
(write-sequence "hi there!" string))))))
(assert (eql len (* size (length "hi there!")))))
(values)))
(defun bench-search-sequence (&optional (size 1000000) (runs 10))
(declare (fixnum size))
(let ((haystack (make-array size :element-type '(integer 0 1000))))
(dotimes (runs runs)
(dotimes (i size)
(setf (aref haystack i) (mod i 1000)))
(assert (null (find -1 haystack :test #'=)))
(assert (null (find-if #'minusp haystack)))
(assert (null (position -1 haystack :test #'= :from-end t)))
(loop :for i :from 20 :to 900 :by 20
:do (assert (eql i (position i haystack :test #'=))))
(assert (eql 0 (search #(0 1 2 3 4) haystack :end2 1000 :from-end t)))))
(values))
;; EOF

140
cl-bench/files/bignum.lisp Normal file
View File

@@ -0,0 +1,140 @@
;;; bignum.lisp -- bignum operations from Bruno Haible
;;
;; Time-stamp: <2004-01-09 emarsden>
;;
;; code from Bruno Haible <haible@ilog.fr>
;;
;; A. Elementary integer computations:
;; The tests are run with N = 100, 1000, 10000, 100000 decimal digits.
;; Precompute *x1* = floor((sqrt(5)+1)/2 * 10^(2N))
;; *x2* = floor(sqrt(3) * 10^N)
;; *x3* = 10^N+1
;; Then time the following operations:
;; 1. Multiplication *x1* * *x2*,
;; 2. Division (with remainder) *x1* / *x2*,
;; 3. integer_sqrt (*x3*),
;; 4. gcd (*x1*, *x2*),
;;
;; B. (from Pari)
;; u=1;v=1;p=1;q=1;for(k=1..1000){w=u+v;u=v;v=w;p=p*w;q=lcm(q,w);}
(in-package :cl-bench.bignum)
(defvar *x1*)
(defvar *x2*)
(defvar *x3*)
(defvar *y*)
(defvar *z*)
;; this can be 1e-6 on most compilers, but for COMPUTE-PI-DECIMAL on
;; OpenMCL we lose lotsa precision
(defun fuzzy-eql (a b)
(< (abs (/ (- a b) b)) 1e-4))
(defun elementary-benchmark (N repeat)
(setq *x1* (floor (+ (isqrt (* 5 (expt 10 (* 4 N)))) (expt 10 (* 2 N))) 2))
(setq *x2* (isqrt (* 3 (expt 10 (* 2 N)))))
(setq *x3* (+ (expt 10 N) 1))
;; (format t "~&~%N = ~D, Multiplication *x1* * *x2*, divide times by ~D~%" N repeat)
(dotimes (count 3)
(dotimes (_ repeat)
(setq *y* (* *x1* *x2*))))
;; (format t "~&~%N = ~D, Division (with remainder) *x1* / *x2*, divide times by ~D~%" N repeat)
(dotimes (count 3)
(dotimes (_ repeat)
(multiple-value-setq (*y* *z*) (floor *x1* *x2*))))
;; (format t "~&~%N = ~D, integer_sqrt(*x3*), divide times by ~D~%" N repeat)
(dotimes (count 3)
(dotimes (_ repeat)
(setq *y* (isqrt *x3*))))
;; (format t "~&~%N = ~D, gcd(*x1*,*x2*), divide times by ~D~%" N repeat)
(dotimes (count 3)
(dotimes (_ repeat)
(setq *y* (gcd *x1* *x2*)))))
(defun run-elem-100-1000 ()
(elementary-benchmark 100 1000))
(defun run-elem-1000-100 ()
(elementary-benchmark 1000 100))
(defun run-elem-10000-1 ()
(elementary-benchmark 10000 1))
(defun pari-benchmark (N repeat)
(dotimes (count 3)
(dotimes (_ repeat)
(let ((u 1) (v 1) (p 1) (q 1))
(do ((k 1 (1+ k)))
((> k N) (setq *y* p *z* q))
(let ((w (+ u v)))
(shiftf u v w)
(setq p (* p w))
(setq q (lcm q w))))))))
(defun run-pari-100-10 ()
(pari-benchmark 100 10))
(defun run-pari-200-5 ()
(pari-benchmark 200 5))
(defun run-pari-1000-1 ()
(pari-benchmark 1000 1))
;; calculating pi using ratios
(defun compute-pi-decimal (n)
(let ((p 0)
(r nil)
(dpi 0))
(dotimes (i n)
(incf p (/ (- (/ 4 (+ 1 (* 8 i)))
(/ 2 (+ 4 (* 8 i)))
(/ 1 (+ 5 (* 8 i)))
(/ 1 (+ 6 (* 8 i))))
(expt 16 i))))
(dotimes (i n)
(multiple-value-setq (r p) (truncate p 10))
(setf dpi (+ (* 10 dpi) r))
(setf p (* p 10)))
dpi))
(defun run-pi-decimal/small ()
(assert (fuzzy-eql pi (/ (compute-pi-decimal 200) (expt 10 198)))))
(defun run-pi-decimal/big ()
(assert (fuzzy-eql pi (/ (compute-pi-decimal 1000) (expt 10 998)))))
(defun pi-atan (k n)
(do* ((a 0) (w (* n k)) (k2 (* k k)) (i -1))
((= w 0) a)
(setq w (truncate w k2))
(incf i 2)
(incf a (truncate w i))
(setq w (truncate w k2))
(incf i 2)
(decf a (truncate w i))))
(defun calc-pi-atan (digits)
(let* ((n digits)
(m (+ n 3))
(tenpower (expt 10 m)))
(values (truncate (- (+ (pi-atan 18 (* tenpower 48))
(pi-atan 57 (* tenpower 32)))
(pi-atan 239 (* tenpower 20)))
1000))))
(defun run-pi-atan ()
(let ((api (calc-pi-atan 1000)))
(assert (fuzzy-eql pi (/ api (expt 10 1000))))))
;; EOF

View File

@@ -0,0 +1,123 @@
;;; boehm-gc.lisp -- benchmark testing GC performance
;;
;; Time-stamp: <2002-11-22 emarsden>
;;
;; see <URL:http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_bench.html>
;; for original code in C, C++, Java and Scheme. This is adapted from
;; the Scheme version.
;;
;;
;; This is adapted from a benchmark written by John Ellis and Pete Kovac
;; of Post Communications.
;; It was modified by Hans Boehm of Silicon Graphics.
;; It was translated into Scheme by William D Clinger of Northeastern Univ;
;; the Scheme version uses (RUN-BENCHMARK <string> <thunk>)
;; Last modified 30 May 1997.
;;
;; This is no substitute for real applications. No actual application
;; is likely to behave in exactly this way. However, this benchmark was
;; designed to be more representative of real applications than other
;; Java GC benchmarks of which we are aware.
;; It attempts to model those properties of allocation requests that
;; are important to current GC techniques.
;; It is designed to be used either to obtain a single overall performance
;; number, or to give a more detailed estimate of how collector
;; performance varies with object lifetimes. It prints the time
;; required to allocate and collect balanced binary trees of various
;; sizes. Smaller trees result in shorter object lifetimes. Each cycle
;; allocates roughly the same amount of memory.
;; Two data structures are kept around during the entire process, so
;; that the measured performance is representative of applications
;; that maintain some live in-memory data. One of these is a tree
;; containing many pointers. The other is a large array containing
;; double precision floating point numbers. Both should be of comparable
;; size.
;;
;; The results are only really meaningful together with a specification
;; of how much memory was used. It is possible to trade memory for
;; better time performance. This benchmark should be run in a 32MB
;; heap, though we don't currently know how to enforce that uniformly.
(in-package :cl-bench.boehm-gc)
(defstruct node left right dummy1 dummy2)
;; build tree top down, assigning to older objects
(defun populate (depth thisNode)
(when (> depth 0)
(setf (node-left thisNode) (make-node))
(setf (node-right thisNode) (make-node))
(populate (1- depth) (node-left thisNode))
(populate (1- depth) (node-right thisNode))))
;; build tree bottom-up
(defun make-tree (depth)
(if (<= depth 0) (make-node)
(make-node :left (make-tree (- depth 1))
:right (make-tree (- depth 1)))))
;; nodes used by a tree of a given size
(defmacro tree-size (i) `(- (expt 2 (1+ ,i)) 1))
;; number of iterations to use for a given tree depth
(defmacro iteration-count (i)
`(floor (* 2 (tree-size stretch-tree-depth))
(tree-size ,i)))
(defun gcbench (stretch-tree-depth)
(declare (fixnum stretch-tree-depth))
;; Parameters are determined by stretch-tree-depth.
;; In Boehm's version the parameters were fixed as follows:
;; public static final int stretch-tree-depth = 18; // about 16Mb
;; public static final int kLongLivedTreeDepth = 16; // about 4Mb
;; public static final int kArraySize = 500000; // about 4Mb
;; public static final int kMinTreeDepth = 4;
;; public static final int kMaxTreeDepth = 16;
;; In Larceny the storage numbers above would be 12 Mby, 3 Mby, 6 Mby.
(let* ((kLongLivedTreeDepth (- stretch-tree-depth 2))
(kArraySize (* 4 (tree-size kLongLivedTreeDepth)))
(kMinTreeDepth 4)
(kMaxTreeDepth kLongLivedTreeDepth))
;; (format t "Stretching memory with a binary tree of depth ~d~%" stretch-tree-depth)
;; stretch the memory space quickly
(make-tree stretch-tree-depth)
;; Create a long lived object
;; (format t "Creating a long-lived binary tree of depth ~d~%" kLongLivedTreeDepth)
(let ((longLivedTree (make-node)))
(populate kLongLivedTreeDepth longLivedTree)
;; create long-lived array, filling half of it
;; (format t "Creating a long-lived array of ~d inexact reals~%" kArraySize)
(let ((array (make-array kArraySize :element-type 'single-float)))
(loop :for i :below (floor kArraySize 2)
:do (setf (aref array i) (/ 1.0 (1+ i))))
(do ((d kMinTreeDepth (+ d 2)))
((> d kMaxTreeDepth))
(let ((iteration-count (iteration-count d)))
;; (format t "~&Creating ~d trees of depth ~d~%" iteration-count d)
;; (format t "GCBench: Top down construction~%")
(dotimes (i iteration-count) (populate d (make-node)))
;; (format t "GCBench: Bottom up construction~%")
(dotimes (i iteration-count) (make-tree d))))
;; these are fake references to LongLivedTree and array to
;; keep them from being optimized away
(assert (not (null longLivedTree)))
(assert (let ((n (min 1000 (1- (floor (length array) 2)))))
(= (round (aref array n)) (round (/ 1.0 (1+ n))))))))))
(defun gc-benchmark (&optional (k 18))
;; (format t "The garbage collector should touch about ~d megabytes of heap storage.~%"
;; (expt 2 (- k 13)))
;; (format t "The use of more or less memory will skew the results.~%")
(gcbench k))
;; EOF

View File

@@ -0,0 +1,853 @@
;;; -*- Mode: LISP; Syntax: COMMON-LISP; Package: CL-PPCRE; Base: 10 -*-
;;; $Header: /usr/local/cvsrep/cl-ppcre/api.lisp,v 1.1.1.1 2002/12/20 10:10:44 edi Exp $
;;; The external API for creating and using scanners.
;;; Copyright (c) 2002, Dr. Edmund Weitz. All rights reserved.
;;; Redistribution and use in source and binary forms, with or without
;;; modification, are permitted provided that the following conditions
;;; are met:
;;; * Redistributions of source code must retain the above copyright
;;; notice, this list of conditions and the following disclaimer.
;;; * Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials
;;; provided with the distribution.
;;; THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESSED
;;; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
;;; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;;; ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
;;; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
;;; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
;;; GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
;;; WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(in-package "CL-PPCRE")
(defgeneric create-scanner (regex &key case-insensitive-mode
multi-line-mode
single-line-mode
extended-mode)
(:documentation "Accepts a regular expression - either as a
parse-tree or as a string - and returns a scan closure which will scan
strings for this regular expression. The \"mode\" keyboard arguments
are equivalent to the imsx modifiers in Perl."))
(defmethod create-scanner ((regex-string string) &key case-insensitive-mode
multi-line-mode
single-line-mode
extended-mode)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
;; parse the string into a parse-tree and then call CREATE-SCANNER
;; again
(let (prefix
(*error-msg-offset* 0))
;; build a prefix like "(?imsx)" if the corresponding modifiers
;; are present
(if extended-mode
(push #\x prefix))
(if single-line-mode
(push #\s prefix))
(if multi-line-mode
(push #\m prefix))
(if case-insensitive-mode
(push #\i prefix))
(when prefix
;; adjust *ERROR-MSG-OFFSET* according to PREFIX so error
;; messages reflect positions in the string provided by the user
(setq *error-msg-offset* (+ 3 (length prefix))
regex-string (concatenate 'string "(?" prefix ")" regex-string)))
;; wrap the result with :SEQUENCE to avoid infinite loops for
;; constant strings
(create-scanner (cons :sequence (list (parse-string regex-string))))))
(defmethod create-scanner ((scanner function) &key case-insensitive-mode
multi-line-mode
single-line-mode
extended-mode)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
(when (or case-insensitive-mode multi-line-mode single-line-mode extended-mode)
(error "You can't use the keyword arguments to modify an existing scanner."))
scanner)
(defmethod create-scanner ((parse-tree t) &key case-insensitive-mode
multi-line-mode
single-line-mode
extended-mode)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
(when extended-mode
(error "Extended mode doesn't make sense in parse trees."))
;; convert parse-tree into internal representation REGEX and at the
;; same time compute the number of registers and the constant string
;; (or anchor) the regex starts with (if any)
(let (flags)
(if single-line-mode
(push :single-line-mode-p flags))
(if multi-line-mode
(push :multi-line-mode-p flags))
(if case-insensitive-mode
(push :case-insensitive-p flags))
(when flags
(setq parse-tree (append (append :flags flags) parse-tree))))
(multiple-value-bind (regex reg-num starts-with)
(convert parse-tree)
;; simplify REGEX by flattening nested SEQ and ALTERNATION
;; constructs and gathering STR objects
(let ((regex (gather-strings (flatten regex))))
;; set the MIN-REST slots of the REPETITION objects
(compute-min-rest regex 0)
;; set the OFFSET slots of the STR objects
(compute-offsets regex 0)
(let* (end-string-offset
end-anchored-p
;; compute the constant string the regex ends with (if
;; any) and at the same time set the special variables
;; END-STRING-OFFSET and END-ANCHORED-P
(end-string (end-string regex))
;; if we found a non-zero-length end-string we create an
;; efficient search function for it
(end-string-test (and end-string
(plusp (len end-string))
(if (= 1 (len end-string))
(create-char-searcher
(schar (str end-string) 0)
(case-insensitive-p end-string))
(create-bmh-matcher
(str end-string)
(case-insensitive-p end-string)))))
;; initialize the counters for CREATE-MATCHER-AUX
(*rep-num* 0)
(*zero-length-num* 0)
;; create the actual matcher function (which does all the
;; work of matching the regular expression) corresponding
;; to REGEX and at the same time set the special
;; variables *REP-NUM* and *ZERO-LENGTH-NUM*
(match-fn (create-matcher-aux regex #'identity))
;; if the regex starts with a string we create an
;; efficient search function for it
(start-string-test (and (typep starts-with 'str)
(plusp (len starts-with))
(if (= 1 (len starts-with))
(create-char-searcher
(schar (str starts-with) 0)
(case-insensitive-p starts-with))
(create-bmh-matcher
(str starts-with)
(case-insensitive-p starts-with))))))
(declare (special end-string-offset end-anchored-p end-string))
;; now create the scanner and return it
(create-scanner-aux match-fn
(regex-min-length regex)
(or (start-anchored-p regex)
;; a dot in single-line-mode also
;; implicitely anchors the regex at
;; the start, i.e. if we can't match
;; from the first position we won't
;; match at all
(and (typep starts-with 'everything)
(single-line-p starts-with)))
starts-with
start-string-test
;; only mark regex as end-anchored if we
;; found a non-zero-length string before
;; the anchor
(and end-string-test end-anchored-p)
end-string-test
(if end-string-test
(len end-string)
nil)
end-string-offset
*rep-num*
*zero-length-num*
reg-num)))))
(defgeneric scan (regex target-string &key start end)
(:documentation "Searches TARGET-STRING from START to END and tries
to match REGEX. On success returns four values - the start of the
match, the end of the match, and two arrays denoting the beginnings
and ends of register matches. On failure returns NIL. REGEX can be a
string which will be parsed according to Perl syntax, a parse tree, or
a pre-compiled scanner created by CREATE-SCANNER. TARGET-STRING will
be coerced to a simple string if it isn't one already."))
(defmethod scan ((regex-string string) target-string
&key (start 0)
(end (length target-string)))
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
;; note that the scanners are optimized for simple strings so we
;; have to coerce TARGET-STRING into one if it isn't already
(funcall (create-scanner regex-string)
(if (simple-string-p target-string)
target-string
(coerce target-string 'simple-string))
start end))
(defmethod scan ((scanner function) target-string
&key (start 0)
(end (length target-string)))
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
(funcall scanner
(if (simple-string-p target-string)
target-string
(coerce target-string 'simple-string))
start end))
(defmethod scan ((parse-tree t) target-string
&key (start 0)
(end (length target-string)))
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
(funcall (create-scanner parse-tree)
(if (simple-string-p target-string)
target-string
(coerce target-string 'simple-string))
start end))
(defun scan-to-strings (regex target-string &key (start 0)
(end (length target-string)))
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Like SCAN but returns substrings of TARGET-STRING instead of
positions, i.e. this function returns two values on success: the whole
match as a string plus an array of substrings (or NILs) corresponding
to the matched registers."
(multiple-value-bind (match-start match-end reg-starts reg-ends)
(scan regex target-string :start start :end end)
(unless match-start
(return-from scan-to-strings nil))
(values (subseq target-string match-start match-end)
(map 'vector
(lambda (reg-start reg-end)
(if reg-start
(subseq target-string reg-start reg-end)
nil))
reg-starts
reg-ends))))
(defmacro do-scans ((match-start match-end reg-starts reg-ends regex
target-string
&optional result-form
&key start end)
&body body)
"Iterates over TARGET-STRING and tries to match REGEX as often as
possible evaluating BODY with MATCH-START, MATCH-END, REG-STARTS, and
REG-ENDS bound to the four return values of each match in turn. After
the last match, returns RESULT-FORM if provided or NIL otherwise. An
implicit block named NIL surrounds DO-SCANS; RETURN may be used to
terminate the loop immediately. If REGEX matches an empty string the
scan is continued one position behind this match. BODY may start with
declarations."
(let ((=target-string= (gensym))
(=start= (gensym))
(=end= (gensym))
(=regex= (gensym))
(=scanner= (gensym))
(=loop-tag= (gensym))
(=block-name= (gensym)))
;; the NIL BLOCK to enable exits via (RETURN ...)
`(block nil
(let* ((,=target-string= ,target-string)
(,=start= (or ,start 0))
(,=end= (or ,end (length ,=target-string=)))
(,=regex= ,regex)
;; create a scanner unless the regex is already a
;; function (otherwise SCAN will do this on each
;; iteration)
(,=scanner= (typecase ,=regex=
(function ,=regex=)
(otherwise (create-scanner ,=regex=)))))
(unless (typep ,=target-string= 'simple-string)
;; coerce TARGET-STRING to a simple string unless it is one
;; already (otherwise SCAN will do this on each iteration)
(setq ,=target-string= (coerce ,=target-string= 'simple-string)))
;; a named BLOCK so we can exit the TAGBODY
(block ,=block-name=
(tagbody
,=loop-tag=
;; invoke SCAN and bind the returned values to the
;; provided variables
(multiple-value-bind
(,match-start ,match-end ,reg-starts ,reg-ends)
(scan ,=scanner= ,=target-string= :start ,=start= :end ,=end=)
;; declare the variables to be IGNORABLE to prevent the
;; compiler from issuing warnings
(declare
(ignorable ,match-start ,match-end ,reg-starts ,reg-ends))
(unless ,match-start
;; stop iteration on first failure
(return-from ,=block-name= ,result-form))
;; execute BODY (wrapped in LOCALLY so it can start with
;; declarations)
(locally
,@body)
;; advance by one position if we had a zero-length match
(setq ,=start= (if (= ,=start= ,match-end)
(1+ ,match-end)
,match-end)))
(go ,=loop-tag=)))))))
(defmacro do-matches ((match-start match-end regex
target-string
&optional result-form
&key start end)
&body body)
"Iterates over TARGET-STRING and tries to match REGEX as often as
possible evaluating BODY with MATCH-START and MATCH-END bound to the
start/end positions of each match in turn. After the last match,
returns RESULT-FORM if provided or NIL otherwise. An implicit block
named NIL surrounds DO-MATCHES; RETURN may be used to terminate the
loop immediately. If REGEX matches an empty string the scan is
continued one position behind this match. BODY may start with
declarations."
;; this is a simplified form of DO-SCANS - we just provide to dummy
;; vars and ignore them
(let ((=reg-starts= (gensym))
(=reg-ends= (gensym)))
`(do-scans (,match-start ,match-end
,=reg-starts= ,=reg-ends=
,regex ,target-string
,result-form
:start ,start :end ,end)
,@body)))
(defmacro do-matches-as-strings ((match-var regex
target-string
&optional result-form
&key start end)
&body body)
"Iterates over TARGET-STRING and tries to match REGEX as often as
possible evaluating BODY with MATCH-VAR bound to the substring of
TARGET-STRING corresponding to each match in turn. After the last
match, returns RESULT-FORM if provided or NIL otherwise. An implicit
block named NIL surrounds DO-MATCHES-AS-STRINGS; RETURN may be used to
terminate the loop immediately. If REGEX matches an empty string the
scan is continued one position behind this match. BODY may start with
declarations."
(let ((=match-start= (gensym))
(=match-end= (gensym))
(=target-string= (gensym)))
`(let ((,=target-string= ,target-string))
;; simple use DO-MATCHES to extract the substrings
(do-matches (,=match-start= ,=match-end= ,regex ,=target-string=
,result-form :start ,start :end ,end)
(let ((,match-var
(subseq ,=target-string= ,=match-start= ,=match-end=)))
,@body)))))
(defun all-matches (regex target-string
&key (start 0)
(end (length target-string)))
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Returns a list containing the start and end positions of all
matches of REGEX against TARGET-STRING, i.e. if there are N matches
the list contains (* 2 N) elements. If REGEX matches an empty string
the scan is continued one position behind this match."
(let (result-list)
(do-matches (match-start match-end
regex target-string
(nreverse result-list)
:start start :end end)
(push match-start result-list)
(push match-end result-list))))
(defun all-matches-as-strings (regex target-string
&key (start 0)
(end (length target-string)))
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Returns a list containing all substrings of TARGET-STRING which
match REGEX. If REGEX matches an empty string the scan is continued
one position behind this match."
(let (result-list)
(do-matches-as-strings (match regex target-string (nreverse result-list)
:start start :end end)
(push match result-list))))
(defun split (regex target-string
&key (start 0)
(end (length target-string))
with-registers-p)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Matches REGEX against TARGET-STRING as often as possible and
returns a list of the substrings between the matches. If
WITH-REGISTERS-P is true, substrings corresponding to matched
registers (if any) are inserted into the list as well. If REGEX
matches an empty string the scan is continued one position behind this
match. Empty matches at the start or end of the target string are
always left out."
(let (end-match-p
(last-end-pos start)
;; initialize list of positions to extract substrings with
;; START so that the start of the next match will mark the end
;; of the first substring
(pos-list (list start)))
(do-scans (match-start match-end
reg-starts reg-ends
regex target-string nil
:start start :end end)
(when (= match-end end)
;; remember that this match is at the end of the target string
(setq end-match-p t)
(when (= match-start match-end)
;; and if this also was a zero-length match just stop here
(return)))
(unless (eql last-end-pos match-start)
;; push start of match on list unless this match didn't move
;; past the last one
(push match-start pos-list)
(when with-registers-p
;; optionally insert matched registers
(loop for reg-start across reg-starts
for reg-end across reg-ends
if reg-start
;; but only if they've matched
do (push reg-start pos-list)
(push reg-end pos-list))))
(unless (or (eql last-end-pos match-start)
end-match-p)
;; push end of match on list unless this match didn't move
;; past the last one or we're at the end of the target string
(push match-end pos-list))
;; remember this position for the next iteration
(setq last-end-pos match-end))
(unless end-match-p
;; END is last element unless the last match extended until the
;; end of the target string
(push end pos-list))
;; now collect substrings
(loop for (this-start this-end) on (nreverse pos-list) by #'cddr
collect (subseq target-string this-start this-end))))
(defun string-case-modifier (str from to start end)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
(declare (type fixnum from to start end))
"Checks whether all words in STR between FROM and TO are upcased,
downcased or capitalized and returns a function which applies a
corresponding case modification to strings. Returns #'IDENTITY
otherwise, especially if words in the target area extend beyond FROM
or TO. STR is supposed to be bounded by START and END. It is assumed
that (<= START FROM TO END)."
(case
(if (or (<= to from)
(and (< start from)
(alphanumericp (char str (1- from)))
(alphanumericp (char str from)))
(and (< to end)
(alphanumericp (char str to))
(alphanumericp (char str (1- to)))))
;; if it's a zero-length string or if words extend beyond FROM
;; or TO we return NIL, i.e. #'IDENTITY
nil
;; otherwise we loop through STR from FROM to TO
(loop with last-char-both-case
with current-result
for index of-type fixnum from from below to
for chr = (char str index)
do (cond ((not (both-case-p chr))
;; this character doesn't have a case so we
;; consider it as a word boundary (note that
;; this differs from how \b works in Perl)
(setq last-char-both-case nil))
((upper-case-p chr)
;; an uppercase character
(setq current-result
(if last-char-both-case
;; not the first character in a
(case current-result
((:undecided) :upcase)
((:downcase :capitalize) (return nil))
((:upcase) current-result))
(case current-result
((nil) :undecided)
((:downcase) (return nil))
((:capitalize :upcase) current-result)))
last-char-both-case t))
(t
;; a lowercase character
(setq current-result
(case current-result
((nil) :downcase)
((:undecided) :capitalize)
((:downcase) current-result)
((:capitalize) (if last-char-both-case
current-result
(return nil)))
((:upcase) (return nil)))
last-char-both-case t)))
finally (return current-result)))
((nil) #'identity)
((:undecided :upcase) #'string-upcase)
((:downcase) #'string-downcase)
((:capitalize) #'string-capitalize)))
;; first create a scanner to identify the special parts of the
;; replacement string
(let ((reg-scanner (create-scanner "\\\\(?:{\\d+}|\\d+|&|`|')")))
(defun build-replacement-template (replacement-string)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Converts a replacement string for REGEX-REPLACE or
REGEX-REPLACE-ALL into a replacement template which is an
S-expression."
(let ((from 0)
;; COLLECTOR will hold the (reversed) template
(collector '()))
;; scan through all special parts of the replacement string
(do-matches (match-start match-end reg-scanner replacement-string)
(when (< from match-start)
;; strings between matches are copied verbatim
(push (subseq replacement-string from match-start) collector))
;; PARSE-START is true if the pattern matched a number which
;; refers to a register
(let* ((parse-start (position-if #'digit-char-p
replacement-string
:start match-start
:end match-end))
(token (if parse-start
(1- (parse-integer replacement-string
:start parse-start
:junk-allowed t))
;; if we didn't match a number we convert the
;; character to a symbol
(case (char replacement-string (1+ match-start))
((#\&) :match)
((#\`) :before-match)
((#\') :after-match)))))
(when (and (numberp token) (< token 0))
;; make sure we don't accept something like "\\0"
(error "Illegal substring ~S in replacement string"
(subseq replacement-string match-start match-end)))
(push token collector))
;; remember where the match ended
(setq from match-end))
(when (< from (length replacement-string))
;; push the rest of the replacement string onto the list
(push (subseq replacement-string from) collector))
(nreverse collector))))
(defun build-replacement (replacement-template
target-string
start end
match-start match-end
reg-starts reg-ends)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Accepts a replacement template and the current values from the
matching process in REGEX-REPLACE or REGEX-REPLACE-ALL and returns the
corresponding template."
;; the upper exclusive bound of the register numbers in the regular
;; expression
(let ((reg-bound (if reg-starts
(array-dimension reg-starts 0)
0)))
(with-output-to-string (s)
(loop for token in replacement-template
do (typecase token
(string
;; transfer string parts verbatim
(write-string token s))
(integer
;; replace numbers with the corresponding registers
(when (>= token reg-bound)
;; but only if the register was referenced in the
;; regular expression
(error "Reference to non-existent register ~A in replacement string"
(1+ token)))
(when (svref reg-starts token)
;; and only if it matched, i.e. no match results
;; in an empty string
(write-string target-string s
:start (svref reg-starts token)
:end (svref reg-ends token))))
(symbol
(case token
((:match)
;; the whole match
(write-string target-string s
:start match-start
:end match-end))
((:before-match)
;; the part of the target string before the match
(write-string target-string s
:start start
:end match-start))
((:after-match)
;; the part of the target string after the match
(write-string target-string s
:start match-end
:end end)))))))))
(defun replace-aux (target-string replacement pos-list reg-list start end preserve-case)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Auxiliary function used by REGEX-REPLACE and
REGEX-REPLACE-ALL. POS-LIST contains a list with the start and end
positions of all matches while REG-LIST contains a list of arrays
representing the corresponding register start and end positions."
;; build the template once before we start the loop
(let ((replacement-template (build-replacement-template replacement)))
(with-output-to-string (s)
;; loop through all matches and take the start and end of the
;; whole string into account
(loop for (from to) on (append (list start) pos-list (list end))
;; alternate between replacement and no replacement
for replace = nil then (and (not replace) to)
for reg-starts = (if replace (pop reg-list) nil)
for reg-ends = (if replace (pop reg-list) nil)
for curr-replacement = (if replace
;; build the replacement string
(build-replacement replacement-template
target-string
start end
from to
reg-starts reg-ends)
nil)
while to
if replace
do (write-string (if preserve-case
;; modify the case of the replacement
;; string if necessary
(funcall (string-case-modifier target-string
from to
start end)
curr-replacement)
curr-replacement)
s)
else
;; no replacement
do (write-string target-string s :start from :end to)))))
(defun regex-replace (regex target-string replacement
&key (start 0)
(end (length target-string))
preserve-case)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Try to match TARGET-STRING between START and END against REGEX and
replace the first match with REPLACEMENT. REPLACEMENT can contain the
special substrings \"\\&\" for the whole match, \"\\`\" for the part
of TARGET-STRING before the match, \"\\'\" for the part of
TARGET-STRING after the match, \"\\N\" or \"\\{N}\" for the Nth
register where N is a positive integer. If PRESERVE-CASE is true, the
replacement will try to preserve the case (all upper case, all lower
case, or capitalized) of the match. The result will always be a fresh
string, even if REGEX doesn't match."
(multiple-value-bind (match-start match-end reg-starts reg-ends)
(scan regex target-string :start start :end end)
(if match-start
(replace-aux target-string replacement
(list match-start match-end)
(list reg-starts reg-ends)
start end preserve-case)
(copy-seq target-string))))
(defun regex-replace-all (regex target-string replacement
&key (start 0)
(end (length target-string))
preserve-case)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Try to match TARGET-STRING between START and END against REGEX and
replace all matches with REPLACEMENT. REPLACEMENT can contain the
special substrings \"\\&\" for the whole match, \"\\`\" for the part
of TARGET-STRING before the match, \"\\'\" for the part of
TARGET-STRING after the match, \"\\N\" or \"\\{N}\" for the Nth
register where N is a positive integer. If PRESERVE-CASE is true, the
replacement will try to preserve the case (all upper case, all lower
case, or capitalized) of the match. The result will always be a fresh
string, even if REGEX doesn't match." (let ((pos-list '())
(reg-list '()))
(do-scans (match-start match-end reg-starts reg-ends regex target-string
nil
:start start :end end)
(push match-start pos-list)
(push match-end pos-list)
(push reg-starts reg-list)
(push reg-ends reg-list))
(if pos-list
(replace-aux target-string replacement
(nreverse pos-list)
(nreverse reg-list)
start end preserve-case)
(copy-seq target-string))))
(defmacro regex-apropos-aux ((regex packages case-insensitive &optional return-form)
&body body)
"Auxiliary macro used by REGEX-APROPOS and REGEX-APROPOS-LIST. Loops
through PACKAGES and executes BODY with SYMBOL bound to each symbol
which matches REGEX. Optionally evaluates and returns RETURN-FORM at
the end. If CASE-INSENSITIVE is true and REGEX isn't already a
scanner, a case-insensitive scanner is used."
(let ((=scanner= (gensym))
(=regex= (gensym))
(=packages= (gensym))
(=next= (gensym))
(=morep= (gensym)))
`(let* ((,=regex= ,regex)
(,=scanner= (create-scanner ,=regex=
:case-insensitive-mode
(and ,case-insensitive
(not (functionp ,=regex=)))))
(,=packages= (or ,packages
(list-all-packages))))
(with-package-iterator (,=next= ,=packages= :external :internal)
(loop
(multiple-value-bind (,=morep= symbol)
(,=next=)
(unless ,=morep=
(return ,return-form))
(when (scan ,=scanner= (symbol-name symbol))
,@body)))))))
(defun regex-apropos-list (regex &optional packages &key (case-insensitive t))
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Similar to the standard function APROPOS-LIST but returns a list of
all symbols which match the regular expression REGEX. If
CASE-INSENSITIVE is true and REGEX isn't already a scanner, a
case-insensitive scanner is used."
(let ((collector '()))
(regex-apropos-aux (regex packages case-insensitive collector)
(push symbol collector))))
(defun print-symbol-info (symbol)
"Auxiliary function used by REGEX-APROPOS. Tries to print some
meaningful information about a symbol."
(handler-case
(let ((output-list '()))
(cond ((special-operator-p symbol)
(push "[special operator]" output-list))
((macro-function symbol)
(push "[macro]" output-list))
((fboundp symbol)
(let* ((function (symbol-function symbol))
(compiledp (compiled-function-p function)))
(multiple-value-bind (lambda-expr closurep)
(function-lambda-expression function)
(push
(format nil
"[~:[~;compiled ~]~:[function~;closure~]]~:[~; ~A~]"
compiledp closurep lambda-expr (cadr lambda-expr))
output-list)))))
(let ((class (find-class symbol nil)))
(when class
(push (format nil "[class] ~S" class) output-list)))
(cond ((keywordp symbol)
(push "[keyword]" output-list))
((constantp symbol)
(push (format nil "[constant]~:[~; value: ~S~]"
(boundp symbol) (symbol-value symbol)) output-list))
((boundp symbol)
(push #+(or LispWorks CLISP) "[variable]"
#-(or LispWorks CLISP) (format nil "[variable] value: ~S"
(symbol-value symbol))
output-list)))
(format t "~&~S ~<~;~^~A~@{~:@_~A~}~;~:>" symbol output-list))
(condition ()
;; this seems to be necessary due to some errors I encountered
;; with LispWorks
(format t "~&~S [an error occured while trying to print more info]" symbol))))
(defun regex-apropos (regex &optional packages &key (case-insensitive t))
"Similar to the standard function APROPOS but returns a list of all
symbols which match the regular expression REGEX. If CASE-INSENSITIVE
is true and REGEX isn't already a scanner, a case-insensitive scanner
is used."
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
(regex-apropos-aux (regex packages case-insensitive)
(print-symbol-info symbol))
(values))
;; Local variables:
;; eval: (put 'do-scans 'common-lisp-indent-function '((&whole 4 1) &body))
;; eval: (put 'do-matches 'common-lisp-indent-function '((&whole 4 1) &body))
;; eval: (put 'do-matches-as-strings 'common-lisp-indent-function '((&whole 4 1) &body))
;; eval: (put 'regex-apropos-aux 'common-lisp-indent-function '((&whole 4 1) &body))

View File

@@ -0,0 +1,35 @@
;;; -*- Mode: LISP; Syntax: COMMON-LISP; Package: CL-USER; Base: 10 -*-
;;; $Header: /usr/local/cvsrep/cl-ppcre/cl-ppcre-test.system,v 1.1.1.1 2002/12/20 10:10:44 edi Exp $
;;; Copyright (c) 2002, Dr. Edmund Weitz. All rights reserved.
;;; Redistribution and use in source and binary forms, with or without
;;; modification, are permitted provided that the following conditions
;;; are met:
;;; * Redistributions of source code must retain the above copyright
;;; notice, this list of conditions and the following disclaimer.
;;; * Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials
;;; provided with the distribution.
;;; THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESSED
;;; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
;;; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;;; ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
;;; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
;;; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
;;; GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
;;; WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(in-package "CL-USER")
(mk:defsystem "CL-PPCRE-TEST"
:source-extension "lisp"
:initially-do (mk:compile-system "cl-ppcre")
:components ((:file "ppcre-tests")))

View File

@@ -0,0 +1,45 @@
;;; -*- Mode: LISP; Syntax: COMMON-LISP; Package: CL-USER; Base: 10 -*-
;;; $Header: /usr/local/cvsrep/cl-ppcre/cl-ppcre.system,v 1.1.1.1 2002/12/20 10:10:44 edi Exp $
;;; Copyright (c) 2002, Dr. Edmund Weitz. All rights reserved.
;;; Redistribution and use in source and binary forms, with or without
;;; modification, are permitted provided that the following conditions
;;; are met:
;;; * Redistributions of source code must retain the above copyright
;;; notice, this list of conditions and the following disclaimer.
;;; * Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials
;;; provided with the distribution.
;;; THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESSED
;;; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
;;; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;;; ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
;;; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
;;; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
;;; GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
;;; WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(in-package "CL-USER")
(mk:defsystem "CL-PPCRE"
:source-extension "lisp"
:components ((:file "packages")
(:file "specials" :depends-on ("packages"))
(:file "util" :depends-on ("packages"))
(:file "lexer" :depends-on ("util" "specials"))
(:file "parser" :depends-on ("lexer"))
(:file "regex-class" :depends-on ("parser"))
(:file "convert" :depends-on ("regex-class"))
(:file "optimize" :depends-on ("convert"))
(:file "closures" :depends-on ("optimize" "specials"))
(:file "repetition-closures" :depends-on ("closures"))
(:file "scanner" :depends-on ("repetition-closures"))
(:file "api" :depends-on ("scanner"))))

View File

@@ -0,0 +1,576 @@
;;; -*- Mode: LISP; Syntax: COMMON-LISP; Package: CL-PPCRE; Base: 10 -*-
;;; $Header: /usr/local/cvsrep/cl-ppcre/closures.lisp,v 1.1.1.1 2002/12/20 10:10:44 edi Exp $
;;; Here we create the closures which together build the final
;;; scanner.
;;; Copyright (c) 2002, Dr. Edmund Weitz. All rights reserved.
;;; Redistribution and use in source and binary forms, with or without
;;; modification, are permitted provided that the following conditions
;;; are met:
;;; * Redistributions of source code must retain the above copyright
;;; notice, this list of conditions and the following disclaimer.
;;; * Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials
;;; provided with the distribution.
;;; THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESSED
;;; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
;;; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;;; ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
;;; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
;;; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
;;; GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
;;; WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(in-package "CL-PPCRE")
(declaim (inline special-string= special-string-equal))
(defun *string*= (string2 start1 end1 start2 end2)
"Like STRING=, i.e. compares the special string *STRING* from START1
to END1 with STRING2 from START2 to END2. Note that there's no
boundary check - this has to be implemented by the caller."
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
(declare (type fixnum start1 end1 start2 end2))
(loop for string1-idx of-type fixnum from start1 below end1
for string2-idx of-type fixnum from start2 below end2
always (char= (schar *string* string1-idx)
(schar string2 string2-idx))))
(defun *string*-equal (string2 start1 end1 start2 end2)
"Like STRING-EQUAL, i.e. compares the special string *STRING* from
START1 to END1 with STRING2 from START2 to END2. Note that there's no
boundary check - this has to be implemented by the caller."
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
(declare (type fixnum start1 end1 start2 end2))
(loop for string1-idx of-type fixnum from start1 below end1
for string2-idx of-type fixnum from start2 below end2
always (char-equal (schar *string* string1-idx)
(schar string2 string2-idx))))
(defgeneric create-matcher-aux (regex next-fn)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
(:documentation "Creates a closure which takes one parameter,
START-POS, and tests whether REGEX can match *STRING* at START-POS
such that the call to NEXT-FN after the match would succeed."))
(defmethod create-matcher-aux ((seq seq) next-fn)
;; the closure for a SEQ is a chain of closures for the elements of
;; this sequence which call each other in turn; the last closure
;; calls NEXT-FN
(loop for element in (reverse (elements seq))
for curr-matcher = next-fn then next-matcher
for next-matcher = (create-matcher-aux element curr-matcher)
finally (return next-matcher)))
(defmethod create-matcher-aux ((alternation alternation) next-fn)
;; first create closures for all alternations of ALTERNATION
(let ((all-matchers (mapcar #'(lambda (choice)
(create-matcher-aux choice next-fn))
(choices alternation))))
;; now create a closure which checks if one of the closures
;; created above can succeed
(lambda (start-pos)
(declare (type fixnum start-pos))
(loop for matcher in all-matchers
thereis (funcall (the function matcher) start-pos)))))
(defmethod create-matcher-aux ((register register) next-fn)
;; the position of this REGISTER within the whole regex; we start to
;; count at 0
(let ((num (num register)))
(declare (type fixnum num))
;; STORE-END-OF-REG is a thin wrapper around NEXT-FN which will
;; update the corresponding values of *REGS-START* and *REGS-END*
;; after the inner matcher has succeeded
(flet ((store-end-of-reg (start-pos)
(declare (type fixnum start-pos)
(type function next-fn))
(setf (svref *reg-starts* num) (svref *regs-maybe-start* num)
(svref *reg-ends* num) start-pos)
(funcall next-fn start-pos)))
;; the inner matcher is a closure corresponding to the regex
;; wrapped by this REGISTER
(let ((inner-matcher (create-matcher-aux (regex register)
#'store-end-of-reg)))
(declare (type function inner-matcher))
;; here comes the actual closure for REGISTER
(lambda (start-pos)
(declare (type fixnum start-pos))
;; remember the old values of *REGS-START* and friends in
;; case we cannot match
(let ((old-*reg-starts* (svref *reg-starts* num))
(old-*regs-maybe-start* (svref *regs-maybe-start* num))
(old-*reg-ends* (svref *reg-ends* num)))
;; we cannot use *REGS-START* here because Perl allows
;; regular expressions like /(a|\1x)*/
(setf (svref *regs-maybe-start* num) start-pos)
(let ((next-pos (funcall inner-matcher start-pos)))
(unless next-pos
;; restore old values on failure
(setf (svref *reg-starts* num) old-*reg-starts*
(svref *regs-maybe-start* num) old-*regs-maybe-start*
(svref *reg-ends* num) old-*reg-ends*))
next-pos)))))))
(defmethod create-matcher-aux ((lookahead lookahead) next-fn)
;; create a closure which just checks for the inner regex and
;; doesn't care about NEXT-FN
(let ((test-matcher (create-matcher-aux (regex lookahead) #'identity)))
(declare (type function next-fn test-matcher))
(if (positivep lookahead)
;; positive look-ahead: check success of inner regex, then call
;; NEXT-FN
(lambda (start-pos)
(and (funcall test-matcher start-pos)
(funcall next-fn start-pos)))
;; negative look-ahead: check failure of inner regex, then call
;; NEXT-FN
(lambda (start-pos)
(and (not (funcall test-matcher start-pos))
(funcall next-fn start-pos))))))
(defmethod create-matcher-aux ((lookbehind lookbehind) next-fn)
(let ((len (len lookbehind))
;; create a closure which just checks for the inner regex and
;; doesn't care about NEXT-FN
(test-matcher (create-matcher-aux (regex lookbehind) #'identity)))
(declare (type function next-fn test-matcher)
(type fixnum len))
(if (positivep lookbehind)
;; positive look-behind: check success of inner regex (if we're
;; far enough from the start of *STRING*), then call NEXT-FN
(lambda (start-pos)
(declare (type fixnum start-pos))
(and (>= (- start-pos *start-pos*) len)
(funcall test-matcher (- start-pos len))
(funcall next-fn start-pos)))
;; negative look-behind: check failure of inner regex (if we're
;; far enough from the start of *STRING*), then call NEXT-FN
(lambda (start-pos)
(declare (type fixnum start-pos))
(and (or (< start-pos len)
(not (funcall test-matcher (- start-pos len))))
(funcall next-fn start-pos))))))
(defmacro insert-char-class-tester ((char-class chr-expr) &body body)
"Utility macro to replace each occurence of '(CHAR-CLASS-TEST)
within BODY with the correct test (corresponding to CHAR-CLASS)
against CHR-EXPR."
(let ((=char-class= (gensym)))
;; the actual substitution is done here: replace
;; '(CHAR-CLASS-TEST) with NEW
(flet ((substitute-char-class-tester (new)
(subst new '(char-class-test) body
:test #'equalp)))
`(let* ((,=char-class= ,char-class)
(hash (hash ,=char-class=))
(count (hash-table-count hash))
;; collect a list of "all" characters in the hash if
;; there aren't more than two
(key-list (if (<= count 2)
(loop for chr being the hash-keys of hash
collect chr)
nil))
downcasedp)
;; check if we can partition the hash into three ranges (or
;; less)
(multiple-value-bind (min1 max1 min2 max2 min3 max3)
(create-ranges-from-hash hash)
;; if that didn't work and CHAR-CLASS is case-insensitive we
;; try it again with every character downcased
(when (and (not min1)
(case-insensitive-p ,=char-class=))
(multiple-value-setq (min1 max1 min2 max2 min3 max3)
(create-ranges-from-hash hash :downcasep t))
(setq downcasedp t))
(cond ((= count 1)
;; hash contains exactly one character so we just
;; check for this single character; (note that this
;; actually can't happen because this case is
;; optimized away in CONVERT already...)
(let ((chr1 (first key-list)))
,@(substitute-char-class-tester
`(char= ,chr-expr chr1))))
((= count 2)
;; hash contains exactly two characters
(let ((chr1 (first key-list))
(chr2 (second key-list)))
,@(substitute-char-class-tester
`(let ((chr ,chr-expr))
(or (char= chr chr1)
(char= chr chr2))))))
((word-char-class-p ,=char-class=)
;; special-case: hash is \w, \W, [\w], [\W] or
;; something equivalent
,@(substitute-char-class-tester
`(word-char-p ,chr-expr)))
((= count +regex-char-code-limit+)
;; according to the ANSI standard we might have all
;; possible characters in the hash even if it
;; doesn't contain +CHAR-CODE-LIMIT+ characters but
;; this doesn't seem to be the case for current
;; implementations (also note that this optimization
;; implies that you must not have characters with
;; character codes beyond +REGEX-CHAR-CODE-LIMIT+ in
;; your regexes if you've changed this limit); we
;; expect the compiler to optimize this T "test"
;; away
,@(substitute-char-class-tester t))
((and downcasedp min1 min2 min3)
;; three different ranges, downcased
,@(substitute-char-class-tester
`(let ((down-chr (char-downcase ,chr-expr)))
(or (char<= min1 down-chr max1)
(char<= min2 down-chr max2)
(char<= min3 down-chr max3)))))
((and downcasedp min1 min2)
;; two ranges, downcased
,@(substitute-char-class-tester
`(let ((down-chr (char-downcase ,chr-expr)))
(or (char<= min1 down-chr max1)
(char<= min2 down-chr max2)))))
((and downcasedp min1)
;; one downcased range
,@(substitute-char-class-tester
`(char<= min1 (char-downcase ,chr-expr) max1)))
((and min1 min2 min3)
;; three ranges
,@(substitute-char-class-tester
`(let ((chr ,chr-expr))
(or (char<= min1 chr max1)
(char<= min2 chr max2)
(char<= min3 chr max3)))))
((and min1 min2)
;; two ranges
,@(substitute-char-class-tester
`(let ((chr ,chr-expr))
(or (char<= min1 chr max1)
(char<= min2 chr max2)))))
(min1
;; one range
,@(substitute-char-class-tester
`(char<= min1 ,chr-expr max1)))
(t
;; the general case; note that most of the above
;; "optimizations" are based on experiences and
;; benchmarks with CMUCL - if you're really
;; concerned with speed you might find out that the
;; general case is almost always the best one for
;; other implementations (because the speed of their
;; hash-table access in relation to other operations
;; might be better than in CMUCL)
,@(substitute-char-class-tester
`(gethash ,chr-expr hash)))))))))
(defmethod create-matcher-aux ((char-class char-class) next-fn)
(declare (type function next-fn))
;; insert a test against the current character within *STRING*
(insert-char-class-tester (char-class (schar *string* start-pos))
(if (invertedp char-class)
(lambda (start-pos)
(declare (type fixnum start-pos))
(and (< start-pos *end-pos*)
(not (char-class-test))
(funcall next-fn (1+ start-pos))))
(lambda (start-pos)
(declare (type fixnum start-pos))
(and (< start-pos *end-pos*)
(char-class-test)
(funcall next-fn (1+ start-pos)))))))
(defmethod create-matcher-aux ((str str) next-fn)
(declare (type fixnum *end-string-pos*)
(type function next-fn)
;; this special value is set by CREATE-SCANNER when the
;; closures are built
(special end-string))
(let* ((len (len str))
(case-insensitive-p (case-insensitive-p str))
(start-of-end-string-p (start-of-end-string-p str))
(skip (skip str))
(str (str str))
(chr (schar str 0))
(end-string (and end-string (str end-string)))
(end-string-len (if end-string
(length end-string)
nil)))
(declare (type fixnum len))
(cond ((and start-of-end-string-p case-insensitive-p)
;; closure for the first STR which belongs to the constant
;; string at the end of the regular expression;
;; case-insensitive version
(lambda (start-pos)
(declare (type fixnum start-pos end-string-len))
(let ((test-end-pos (+ start-pos end-string-len)))
(declare (type fixnum test-end-pos))
;; either we're at *END-STRING-POS* (which means that
;; it has already been confirmed that end-string
;; starts here) or we really have to test
(and (or (= start-pos *end-string-pos*)
(and (<= test-end-pos *end-pos*)
(*string*-equal end-string start-pos test-end-pos
0 end-string-len)))
(funcall next-fn (+ start-pos len))))))
(start-of-end-string-p
;; closure for the first STR which belongs to the constant
;; string at the end of the regular expression;
;; case-sensitive version
(lambda (start-pos)
(declare (type fixnum start-pos end-string-len))
(let ((test-end-pos (+ start-pos end-string-len)))
(declare (type fixnum test-end-pos))
;; either we're at *END-STRING-POS* (which means that
;; it has already been confirmed that end-string
;; starts here) or we really have to test
(and (or (= start-pos *end-string-pos*)
(and (<= test-end-pos *end-pos*)
(*string*= end-string start-pos test-end-pos
0 end-string-len)))
(funcall next-fn (+ start-pos len))))))
(skip
;; a STR which can be skipped because some other function
;; has already confirmed that it matches
(lambda (start-pos)
(declare (type fixnum start-pos))
(funcall next-fn (+ start-pos len))))
((and (= len 1) case-insensitive-p)
;; STR represent exactly one character; case-insensitive
;; version
(lambda (start-pos)
(declare (type fixnum start-pos))
(and (< start-pos *end-pos*)
(char-equal (schar *string* start-pos) chr)
(funcall next-fn (1+ start-pos)))))
((= len 1)
;; STR represent exactly one character; case-sensitive
;; version
(lambda (start-pos)
(declare (type fixnum start-pos))
(and (< start-pos *end-pos*)
(char= (schar *string* start-pos) chr)
(funcall next-fn (1+ start-pos)))))
(case-insensitive-p
;; general case, case-insensitive version
(lambda (start-pos)
(declare (type fixnum start-pos))
(let ((next-pos (+ start-pos len)))
(declare (type fixnum next-pos))
(and (<= next-pos *end-pos*)
(*string*-equal str start-pos next-pos 0 len)
(funcall next-fn next-pos)))))
(t
;; general case, case-sensitive version
(lambda (start-pos)
(declare (type fixnum start-pos))
(let ((next-pos (+ start-pos len)))
(declare (type fixnum next-pos))
(and (<= next-pos *end-pos*)
(*string*= str start-pos next-pos 0 len)
(funcall next-fn next-pos))))))))
(declaim (inline word-boundary-p))
(defun word-boundary-p (start-pos)
"Check whether START-POS is a word-boundary within *STRING*."
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
(declare (type fixnum start-pos))
(let ((1-start-pos (1- start-pos)))
;; either the character before START-POS is a word-constituent and
;; the character at START-POS isn't...
(or (and (or (= start-pos *end-pos*)
(and (< start-pos *end-pos*)
(not (word-char-p (schar *string* start-pos)))))
(and (< 1-start-pos *end-pos*)
(<= *start-pos* 1-start-pos)
(word-char-p (schar *string* 1-start-pos))))
;; ...or vice versa
(and (or (= start-pos *start-pos*)
(and (< 1-start-pos *end-pos*)
(<= *start-pos* 1-start-pos)
(not (word-char-p (schar *string* 1-start-pos)))))
(and (< start-pos *end-pos*)
(word-char-p (schar *string* start-pos)))))))
(defmethod create-matcher-aux ((word-boundary word-boundary) next-fn)
(declare (type function next-fn))
(if (negatedp word-boundary)
(lambda (start-pos)
(and (not (word-boundary-p start-pos))
(funcall next-fn start-pos)))
(lambda (start-pos)
(and (word-boundary-p start-pos)
(funcall next-fn start-pos)))))
(defmethod create-matcher-aux ((everything everything) next-fn)
(declare (type function next-fn))
(if (single-line-p everything)
;; closure for single-line-mode: we really match everything, so we
;; just advance the index into *STRING* by one and carry on
(lambda (start-pos)
(declare (type fixnum start-pos))
(and (< start-pos *end-pos*)
(funcall next-fn (1+ start-pos))))
;; not single-line-mode, so we have to make sure we don't match
;; #\Newline
(lambda (start-pos)
(declare (type fixnum start-pos))
(and (< start-pos *end-pos*)
(char/= (schar *string* start-pos) #\Newline)
(funcall next-fn (1+ start-pos))))))
(defmethod create-matcher-aux ((anchor anchor) next-fn)
(declare (type function next-fn))
(let ((startp (startp anchor))
(multi-line-p (multi-line-p anchor)))
(cond ((no-newline-p anchor)
;; this must be and end-anchor and it must be modeless, so
;; we just have to check whether START-POS equals
;; *END-POS*
(lambda (start-pos)
(declare (type fixnum start-pos))
(and (= start-pos *end-pos*)
(funcall next-fn start-pos))))
((and startp multi-line-p)
;; a start-anchor in multi-line-mode: check if we're at
;; *START-POS* or if the last character was #\Newline
(lambda (start-pos)
(declare (type fixnum start-pos))
(and (or (= start-pos *start-pos*)
(and (<= start-pos *end-pos*)
(> start-pos *start-pos*)
(char= #\Newline
(schar *string* (1- start-pos)))))
(funcall next-fn start-pos))))
(startp
;; a start-anchor which is not in multi-line-mode, so just
;; check whether we're at *START-POS*
(lambda (start-pos)
(declare (type fixnum start-pos))
(and (= start-pos *start-pos*)
(funcall next-fn start-pos))))
(multi-line-p
;; an end-anchor in multi-line-mode: check if we're at
;; *END-POS* or if the character we're looking at is
;; #\Newline
(lambda (start-pos)
(declare (type fixnum start-pos))
(and (or (= start-pos *end-pos*)
(and (< start-pos *end-pos*)
(char= #\Newline
(schar *string* start-pos))))
(funcall next-fn start-pos))))
(t
;; an end-anchor which is not in multi-line-mode, so just
;; check if we're at *END-POS* or if we're looking at
;; #\Newline and there's nothing behind it
(lambda (start-pos)
(declare (type fixnum start-pos))
(and (or (= start-pos *end-pos*)
(and (= start-pos (1- *end-pos*))
(char= #\Newline
(schar *string* start-pos))))
(funcall next-fn start-pos)))))))
(defmethod create-matcher-aux ((back-reference back-reference) next-fn)
(declare (type function next-fn))
;; the position of the corresponding REGISTER within the whole
;; regex; we start to count at 0
(let ((num (num back-reference)))
(if (case-insensitive-p back-reference)
;; the case-insensitive version
(lambda (start-pos)
(declare (type fixnum start-pos))
(let ((reg-start (svref *reg-starts* num))
(reg-end (svref *reg-ends* num)))
;; only bother to check if the corresponding REGISTER as
;; matched successfully already
(and reg-start
(let ((next-pos (+ start-pos (- (the fixnum reg-end)
(the fixnum reg-start)))))
(declare (type fixnum next-pos))
(and
(<= next-pos *end-pos*)
(*string*-equal *string* start-pos next-pos
reg-start reg-end)
(funcall next-fn next-pos))))))
;; the case-sensitive version
(lambda (start-pos)
(declare (type fixnum start-pos))
(let ((reg-start (svref *reg-starts* num))
(reg-end (svref *reg-ends* num)))
;; only bother to check if the corresponding REGISTER as
;; matched successfully already
(and reg-start
(let ((next-pos (+ start-pos (- (the fixnum reg-end)
(the fixnum reg-start)))))
(declare (type fixnum next-pos))
(and
(<= next-pos *end-pos*)
(*string*= *string* start-pos next-pos
reg-start reg-end)
(funcall next-fn next-pos)))))))))
(defmethod create-matcher-aux ((branch branch) next-fn)
(with-slots ((test test))
branch
(let ((then-matcher (create-matcher-aux (then-regex branch) next-fn))
(else-matcher (create-matcher-aux (else-regex branch) next-fn)))
(declare (type function then-matcher else-matcher))
(cond ((numberp test)
(lambda (start-pos)
(declare (type fixnum test))
(if (and (< test (length *reg-starts*))
(svref *reg-starts* test))
(funcall then-matcher start-pos)
(funcall else-matcher start-pos))))
(t
(let ((test-matcher (create-matcher-aux test #'identity)))
(declare (type function test-matcher))
(lambda (start-pos)
(if (funcall test-matcher start-pos)
(funcall then-matcher start-pos)
(funcall else-matcher start-pos)))))))))
(defmethod create-matcher-aux ((standalone standalone) next-fn)
(let ((inner-matcher (create-matcher-aux (regex standalone) #'identity)))
(declare (type function next-fn inner-matcher))
(lambda (start-pos)
(let ((next-pos (funcall inner-matcher start-pos)))
(and next-pos
(funcall next-fn next-pos))))))
(defmethod create-matcher-aux ((void void) next-fn)
;; optimize away VOIDs: don't create a closure, just return NEXT-FN
next-fn)
;; Local variables:
;; eval: (put 'insert-char-class-tester 'common-lisp-indent-function '((&whole 4 1) &body))

View File

@@ -0,0 +1,694 @@
;;; -*- Mode: LISP; Syntax: COMMON-LISP; Package: CL-PPCRE; Base: 10 -*-
;;; $Header: /usr/local/cvsrep/cl-ppcre/convert.lisp,v 1.1.1.1 2002/12/20 10:10:44 edi Exp $
;;; Here the parse tree is converted into its internal representation
;;; using REGEX objects. At the same time some optimizations are
;;; already applied.
;;; Copyright (c) 2002, Dr. Edmund Weitz. All rights reserved.
;;; Redistribution and use in source and binary forms, with or without
;;; modification, are permitted provided that the following conditions
;;; are met:
;;; * Redistributions of source code must retain the above copyright
;;; notice, this list of conditions and the following disclaimer.
;;; * Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials
;;; provided with the distribution.
;;; THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESSED
;;; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
;;; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;;; ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
;;; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
;;; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
;;; GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
;;; WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(in-package "CL-PPCRE")
(defun pre-flatten (list token)
(declare (optimize speed space))
"Recursively merges nested sublists of LIST which start with TOKEN
directly into LIST. This is a destructive operation on LIST."
(cond ((null list) nil)
((and (consp (first list))
(eq token (first (first list))))
(nconc (pre-flatten (rest (first list)) token)
(pre-flatten (rest list) token)))
(t
(setf (rest list)
(pre-flatten (rest list) token))
list)))
;;; The flags that represent the "ism" modifiers are always kept
;;; together in a three-element list. We use the following macros to
;;; access individual elements.
(defmacro case-insensitive-mode-p (flags)
"Accessor macro to extract the first flag out of a three-element flag list."
`(first ,flags))
(defmacro multi-line-mode-p (flags)
"Accessor macro to extract the second flag out of a three-element flag list."
`(second ,flags))
(defmacro single-line-mode-p (flags)
"Accessor macro to extract the third flag out of a three-element flag list."
`(third ,flags))
(defun set-flag (token)
(declare (optimize speed space))
(declare (special flags))
"Reads a flag token and sets or unsets the corresponding entry in
the special FLAGS list."
(case token
((:case-insensitive-p)
(setf (case-insensitive-mode-p flags) t))
((:case-sensitive-p)
(setf (case-insensitive-mode-p flags) nil))
((:multi-line-mode-p)
(setf (multi-line-mode-p flags) t))
((:not-multi-line-mode-p)
(setf (multi-line-mode-p flags) nil))
((:single-line-mode-p)
(setf (single-line-mode-p flags) t))
((:not-single-line-mode-p)
(setf (single-line-mode-p flags) nil))
(otherwise
(error "Unknown flag token ~A" token))))
(defun add-range-to-hash (hash from to)
(declare (optimize speed space))
(declare (special flags))
"Adds all characters from character FROM to character TO (inclusive)
to the char class hash HASH. Does the right thing with respect to
case-(in)sensitivity as specified by the special variable FLAGS."
(let ((from-code (char-code from))
(to-code (char-code to)))
(when (> from-code to-code)
(error "Invalid range from ~A to ~A in char-class" from to))
(if (case-insensitive-mode-p flags)
(loop for code from from-code to to-code
for chr = (code-char code)
do (setf (gethash (char-upcase chr) hash) 1
(gethash (char-downcase chr) hash) 1))
(loop for code from from-code to to-code
do (setf (gethash (code-char code) hash) 1)))
hash))
(defun convert-char-class-to-hash (list)
(declare (optimize speed space))
"Combines all items in LIST into one char class hash and returns it.
Items can be single characters, character ranges like (:RANGE #\A #\E),
or special character classes like :DIGIT-CLASS. Does the right thing
with respect to case-(in)sensitivity as specified by the special
variable FLAGS."
(loop with hash = (make-hash-table)
for item in list
if (characterp item)
;; treat a single character C like a range (:RANGE C C)
do (add-range-to-hash hash item item)
else if (symbolp item)
;; special character classes
do (case item
((:digit-class)
(merge-hash hash +digit-hash+))
((:non-digit-class)
(merge-inverted-hash hash +digit-hash+))
((:whitespace-char-class)
(merge-hash hash +whitespace-char-hash+))
((:non-whitespace-char-class)
(merge-inverted-hash hash +whitespace-char-hash+))
((:word-char-class)
(merge-hash hash +word-char-hash+))
((:non-word-char-class)
(merge-inverted-hash hash +word-char-hash+))
(otherwise
(error "Unknown symbol ~A in character class" item)))
else if (and (consp item)
(eq (car item) :range))
;; proper ranges
do (add-range-to-hash hash
(second item)
(third item))
else do (error "Unknown item ~A in char-class list" item)
finally (return hash)))
(defun maybe-split-repetition (regex
greedyp
minimum
maximum
min-len
length
reg-seen)
(declare (optimize speed space))
"Splits a REPETITION object into a constant and a varying part if
applicable, i.e. something like
a{3,} -> a{3}a*
The arguments to this function correspond to the REPETITION slots of
the same name."
;; note the usage of COPY-REGEX here; we can't use the same REGEX
;; object in both REPETITIONS because they will have different
;; offsets
(when maximum
(when (zerop maximum)
;; trivial case: don't repeat at all
(return-from maybe-split-repetition
(make-instance 'void)))
(when (= 1 minimum maximum)
;; another trivial case: "repeat" exactly once
(return-from maybe-split-repetition
regex)))
;; first set up the constant part of the repetition
;; maybe that's all we need
(let ((constant-repetition (if (plusp minimum)
(make-instance 'repetition
:regex (copy-regex regex)
:greedyp greedyp
:minimum minimum
:maximum minimum
:min-len min-len
:len length
:contains-register-p reg-seen)
nil)))
(when (and maximum
(= maximum minimum))
(return-from maybe-split-repetition
;; no varying part needed because min = max
constant-repetition))
;; now construct the varying part
(let* ((new-maximum (if maximum (- maximum minimum) nil))
(varying-repetition
(make-instance 'repetition
:regex regex
:greedyp greedyp
:minimum 0
:maximum new-maximum
:min-len min-len
:len length
:contains-register-p reg-seen)))
(cond ((zerop minimum)
;; min = 0, no constant part needed
varying-repetition)
((= 1 minimum)
;; min = 1, constant part needs no REPETITION wrapped around
(make-instance 'seq
:elements (list (copy-regex regex)
varying-repetition)))
(t
;; general case
(make-instance 'seq
:elements (list constant-repetition
varying-repetition)))))))
;; During the conversion of the parse tree we keep track of the start
;; of the parse tree in the special variable STARTS-WITH which'll
;; either hold a STR object or an EVERYTHING object. The latter is the
;; case if the regex starts with ".*" which implicitely anchors the
;; regex at the start (perhaps modulo #\Newline).
(defmethod maybe-accumulate ((str str))
(declare (optimize speed space))
(declare (special accumulate-start-p starts-with))
"Accumulate STR into the special variable STARTS-WITH if
ACCUMULATE-START-P (also special) is true and STARTS-WITH is either
NIL or a STR object of the same case mode. Always returns NIL."
(when accumulate-start-p
(etypecase starts-with
(str
;; STARTS-WITH already holds a STR, so we check if we can
;; concatenate
(if (eq (case-insensitive-p starts-with)
(case-insensitive-p str))
;; we modify STARTS-WITH in place
(setf (len starts-with)
(+ (len starts-with) (len str))
(str starts-with)
(concatenate 'string (str starts-with) (str str))
;; STR objects that are parts of STARTS-WITH always
;; have their SKIP slot set to true because the SCAN
;; function will take care of them, i.e. the matcher
;; can ignore them
(skip str) t)
(setq accumulate-start-p nil)))
(null
;; STARTS-WITH is still empty, so we create a new STR object
(setf starts-with
(make-instance 'str
:str (str str)
:case-insensitive-p (case-insensitive-p str))
;; see remark about SKIP above
(skip str) t))
(everything
;; STARTS-WITH already holds an EVERYTHING object - we can't
;; concatenate
(setq accumulate-start-p nil))))
nil)
(defun convert-aux (parse-tree)
(declare (optimize speed space))
(declare (special flags reg-num accumulate-start-p starts-with max-back-ref))
"Converts the parse tree PARSE-TREE into a REGEX object and returns it.
Will also
- split and optimize repetitions,
- accumulate strings or EVERYTHING objects into the special variable
STARTS-WITH,
- keep track of all registers seen in the special variable REG-NUM,
- keep track of the highest backreference seen in the special
variable MAX-BACK-REF,
- maintain and adher to the currently applicable modifiers in the special
variable FLAGS, and
- maybe even wash your car..."
(cond ((consp parse-tree)
(case (first parse-tree)
;; (:SEQUENCE {<regex>}*)
((:sequence)
(make-instance 'seq
:elements (mapcar #'convert-aux
(pre-flatten (rest parse-tree)
:sequence))))
;; (:GROUP {<regex>}*)
;; this is a syntactical construct equivalent to :SEQUENCE
;; intended to keep the effect of modifiers local
((:group)
;; make a local copy of FLAGS and shadow the global
;; value while we descend into the enclosed regexes
(let ((flags (copy-list flags)))
(declare (special flags))
(make-instance 'seq
:elements (mapcar #'convert-aux
(pre-flatten (rest parse-tree)
:sequence)))))
;; (:ALTERNATION {<regex>}*)
((:alternation)
;; we must stop accumulating objects into STARTS-WITH
;; once we reach an alternation
(setq accumulate-start-p nil)
(make-instance 'alternation
:choices (mapcar #'convert-aux
(pre-flatten (rest parse-tree)
:alternation))))
;; (:BRANCH <test> <regex>)
;; <test> must be look-ahead, look-behind or number;
;; if <regex> is an alternation it must have one or two
;; choices
((:branch)
(setq accumulate-start-p nil)
(let* ((test-candidate (second parse-tree))
(test (cond ((numberp test-candidate)
(when (zerop test-candidate)
(error "Register 0 doesn't exist: ~S"
parse-tree))
(1- test-candidate))
(t (convert-aux test-candidate))))
(alternations (convert-aux (third parse-tree))))
(when (and (not (numberp test))
(not (typep test 'lookahead))
(not (typep test 'lookbehind)))
(error "Branch test must be look-ahead, look-behind or number: ~S"
parse-tree))
(typecase alternations
(alternation
(case (length (choices alternations))
((0)
(error "No choices in branch: ~S" parse-tree))
((1)
(make-instance 'branch
:test test
:then-regex (first
(choices alternations))))
((2)
(make-instance 'branch
:test test
:then-regex (first
(choices alternations))
:else-regex (second
(choices alternations))))
(otherwise
(error "Too much choices in branch: ~S"
parse-tree))))
(otherwise
(make-instance 'branch
:test test
:then-regex alternations)))))
;; (:POSITIVE-LOOKAHEAD|:NEGATIVE-LOOKAHEAD <regex>)
((:positive-lookahead :negative-lookahead)
;; keep the effect of modifiers local to the enclosed
;; regex and temporarily stop accumulating into
;; STARTS-WITH
(let ((flags (copy-list flags))
(accumulate-start-p nil))
(declare (special flags accumulate-start-p))
(make-instance 'lookahead
:regex (convert-aux (second parse-tree))
:positivep (eq (first parse-tree)
:positive-lookahead))))
;; (:POSITIVE-LOOKBEHIND|:NEGATIVE-LOOKBEHIND <regex>)
((:positive-lookbehind :negative-lookbehind)
;; keep the effect of modifiers local to the enclosed
;; regex and temporarily stop accumulating into
;; STARTS-WITH
(let* ((flags (copy-list flags))
(accumulate-start-p nil)
(regex (convert-aux (second parse-tree)))
(len (regex-length regex)))
(declare (special flags accumulate-start-p))
;; lookbehind assertions must be of fixed length
(unless len
(error "Variable length look-behind not implemented (yet): ~S"
parse-tree))
(make-instance 'lookbehind
:regex regex
:positivep (eq (first parse-tree)
:positive-lookbehind)
:len len)))
;; (:GREEDY-REPETITION|:NON-GREEDY-REPETITION <min> <max> <regex>)
((:greedy-repetition :non-greedy-repetition)
;; remember the value of ACCUMULATE-START-P upon entering
(let ((local-accumulate-start-p accumulate-start-p))
(let ((minimum (second parse-tree))
(maximum (third parse-tree)))
(unless (and maximum
(= 1 minimum maximum))
;; set ACCUMULATE-START-P to NIL for the rest of
;; the conversion because we can't continue to
;; accumulate inside as well as after a proper
;; repetition
(setq accumulate-start-p nil))
(let* (reg-seen
(regex (convert-aux (fourth parse-tree)))
(min-len (regex-min-length regex))
(greedyp (eq (first parse-tree) :greedy-repetition))
(length (regex-length regex)))
;; note that this declaration already applies to
;; the call to CONVERT-AUX above
(declare (special reg-seen))
(when (and local-accumulate-start-p
(not starts-with)
(zerop minimum)
(not maximum))
;; if this repetition is (equivalent to) ".*"
;; and if we're at the start of the regex we
;; remember it for ADVANCE-FN (see the SCAN
;; function)
(setq starts-with (everythingp regex)))
(if (or (not reg-seen)
(not greedyp)
(not length)
(zerop length)
(and maximum (= minimum maximum)))
;; the repetition doesn't enclose a register, or
;; it's not greedy, or we can't determine it's
;; (inner) length, or the length is zero, or the
;; number of repetitions is fixed; in all of
;; these cases we don't bother to optimize
(maybe-split-repetition regex
greedyp
minimum
maximum
min-len
length
reg-seen)
;; otherwise we make a transformation that looks
;; roughly like one of
;; <regex>* -> (?:<regex'>*<regex>)?
;; <regex>+ -> <regex'>*<regex>
;; where the trick is that as much as possible
;; registers from <regex> are removed in
;; <regex'>
(let* (reg-seen ; new instance for REMOVE-REGISTERS
(remove-registers-p t)
(inner-regex (remove-registers regex))
(inner-repetition
;; this is the "<regex'>" part
(maybe-split-repetition inner-regex
;; always greedy
t
;; reduce minimum by 1
;; unless it's already 0
(if (zerop minimum)
0
(1- minimum))
;; reduce maximum by 1
;; unless it's NIL
(and maximum
(1- maximum))
min-len
length
reg-seen))
(inner-seq
;; this is the "<regex'>*<regex>" part
(make-instance 'seq
:elements (list inner-repetition
regex))))
;; note that this declaration already applies
;; to the call to REMOVE-REGISTERS above
(declare (special remove-registers-p reg-seen))
;; wrap INNER-SEQ with a greedy
;; {0,1}-repetition (i.e. "?") if necessary
(if (plusp minimum)
inner-seq
(maybe-split-repetition inner-seq
t
0
1
min-len
nil
t))))))))
;; (:REGISTER <regex>)
((:register)
;; keep the effect of modifiers local to the enclosed
;; regex; also, assign the current value of REG-NUM to
;; the corresponding slot of the REGISTER object and
;; increase this counter afterwards
(let ((flags (copy-list flags))
(stored-reg-num reg-num))
(declare (special flags reg-seen))
(setq reg-seen t)
(incf reg-num)
(make-instance 'register
:regex (convert-aux (second parse-tree))
:num stored-reg-num)))
;; (:STANDALONE <regex>)
((:standalone)
;; keep the effect of modifiers local to the enclosed
;; regex
(let ((flags (copy-list flags)))
(declare (special flags))
(make-instance 'standalone
:regex (convert-aux (second parse-tree)))))
;; (:BACK-REFERENCE <number>)
((:back-reference)
(let ((backref-number (second parse-tree)))
(when (or (not (typep backref-number 'fixnum))
(<= backref-number 0))
(error "Illegal back-reference: ~S" parse-tree))
;; stop accumulating into STARTS-WITH and increase
;; MAX-BACK-REF if necessary
(setq accumulate-start-p nil
max-back-ref (max max-back-ref
backref-number))
(make-instance 'back-reference
;; we start counting from 0 internally
:num (1- backref-number)
:case-insensitive-p (case-insensitive-mode-p
flags))))
;; (:CHAR-CLASS|:INVERTED-CHAR-CLASS {<item>}*)
;; where item is one of
;; - a character
;; - a character range: (:RANGE <char1> <char2>)
;; - a special char class symbol like :DIGIT-CHAR-CLASS
((:char-class :inverted-char-class)
;; first create the hash-table and some auxiliary values
(let* ((item-list (rest parse-tree))
(hash (convert-char-class-to-hash item-list))
(invertedp (eq (first parse-tree) :inverted-char-class))
(count (hash-table-count hash))
;; collect the hash-table keys into a list if
;; COUNT is smaller than 3
(hash-keys (if (<= count 2)
(loop for chr being the hash-keys of hash
collect chr)
nil))
(word-char-class-p nil))
(when (every (lambda (item) (eq item :word-char-class))
item-list)
;; treat "[\\w]" like "\\w"
(setq word-char-class-p t))
(when (every (lambda (item) (eq item :non-word-char-class))
item-list)
;; treat "[\\W]" like "\\W"
(setq word-char-class-p t)
(setq invertedp (not invertedp)))
(cond ((and (not invertedp)
(= count 1))
;; convert one-element hash table into a STR
;; object and try to accumulate into
;; STARTS-WITH
(let ((str (make-instance 'str
:str (string
(first hash-keys))
:case-insensitive-p nil)))
(maybe-accumulate str)
str))
((and (not invertedp)
(= count 2)
(char-equal (first hash-keys) (second hash-keys)))
;; convert two-element hash table into a
;; case-insensitive STR object and try to
;; accumulate into STARTS-WITH if the two
;; characters are CHAR-EQUAL
(let ((str (make-instance 'str
:str (string
(first hash-keys))
:case-insensitive-p t)))
(maybe-accumulate str)
str))
(t
;; the general case; stop accumulating into STARTS-WITH
(setq accumulate-start-p nil)
(make-instance 'char-class
:hash hash
:case-insensitive-p
(case-insensitive-mode-p flags)
:invertedp invertedp
:word-char-class-p word-char-class-p)))))
;; (:FLAGS {<flag>}*)
;; where flag is a modifier symbol like :CASE-INSENSITIVE-P
((:flags)
;; set/unset the flags corresponding to the symbols
;; following :FLAGS
(mapcar #'set-flag (rest parse-tree))
;; we're only interested in the side effect of
;; setting/unsetting the flags and turn this syntactical
;; construct into a VOID object which'll be optimized
;; away when creating the matcher
(make-instance 'void))
(otherwise
(error "Unknown token ~A in parse-tree" (first parse-tree)))))
((or (characterp parse-tree) (stringp parse-tree))
;; turn characters or strings into STR objects and try to
;; accumulate into STARTS-WITH
(let ((str (make-instance 'str
:str (string parse-tree)
:case-insensitive-p
(case-insensitive-mode-p flags))))
(maybe-accumulate str)
str))
(t
;; and now for the tokens which are symbols
(case parse-tree
((:void)
(make-instance 'void))
((:word-boundary)
(make-instance 'word-boundary :negatedp nil))
((:non-word-boundary)
(make-instance 'word-boundary :negatedp t))
;; the special character classes
((:digit-class
:non-digit-class
:word-char-class
:non-word-char-class
:whitespace-char-class
:non-whitespace-char-class)
;; stop accumulating into STARTS-WITH
(setq accumulate-start-p nil)
(make-instance 'char-class
;; use the constants defined in util.lisp
:hash (case parse-tree
((:digit-class
:non-digit-class)
+digit-hash+)
((:word-char-class
:non-word-char-class)
+word-char-hash+)
((:whitespace-char-class
:non-whitespace-char-class)
+whitespace-char-hash+))
;; this value doesn't really matter but
;; NIL should result in slightly faster
;; matchers
:case-insensitive-p nil
:invertedp (member parse-tree
'(:non-digit-class
:non-word-char-class
:non-whitespace-char-class)
:test #'eq)
:word-char-class-p (member parse-tree
'(:word-char-class
:non-word-char-class)
:test #'eq)))
((:start-anchor ; Perl's "^"
:end-anchor ; Perl's "$"
:modeless-end-anchor-no-newline
; Perl's "\z"
:modeless-start-anchor ; Perl's "\A"
:modeless-end-anchor) ; Perl's "\Z"
(make-instance 'anchor
:startp (member parse-tree
'(:start-anchor
:modeless-start-anchor)
:test #'eq)
;; set this value according to the
;; current settings of FLAGS (unless it's
;; a modeless anchor)
:multi-line-p
(and (multi-line-mode-p flags)
(not (member parse-tree
'(:modeless-start-anchor
:modeless-end-anchor
:modeless-end-anchor-no-newline)
:test #'eq)))
:no-newline-p
(eq parse-tree
:modeless-end-anchor-no-newline)))
((:everything)
;; stop accumulating into STARTS-WITHS
(setq accumulate-start-p nil)
(make-instance 'everything
:single-line-p (single-line-mode-p flags)))
;; special tokens corresponding to Perl's "ism" modifiers
((:case-insensitive-p
:case-sensitive-p
:multi-line-mode-p
:not-multi-line-mode-p
:single-line-mode-p
:not-single-line-mode-p)
;; we're only interested in the side effect of
;; setting/unsetting the flags and turn these tokens
;; into VOID objects which'll be optimized away when
;; creating the matcher
(set-flag parse-tree)
(make-instance 'void))
(otherwise
(error "Unknown token ~A in parse-tree" parse-tree))))))
(defun convert (parse-tree)
(declare (optimize speed space))
"Converts the parse tree PARSE-TREE into an equivalent REGEX object
and returns three values: the REGEX object, the number of registers
seen and an object the regex starts with which is either a STR object
or an EVERYTHING object (if the regex starts with something like
\".*\") or NIL."
;; this function basically just initializes the special variables
;; and then calls CONVERT-AUX to do all the work
(let* ((flags (list nil nil nil))
(reg-num 0)
(accumulate-start-p t)
starts-with
(max-back-ref 0)
(converted-parse-tree (convert-aux parse-tree)))
(declare (special flags reg-num accumulate-start-p starts-with max-back-ref))
;; make sure we don't reference registers which aren't there
(if (> max-back-ref reg-num)
(error "Backreference to register ~A which has not been defined" max-back-ref))
(values converted-parse-tree reg-num starts-with)))

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,10 @@
(setq *compile-verbose* nil)
(setq *load-verbose* nil)
#+(or) (pushnew "/home/edi/parker/" mk:*central-registry*)
#+(or) (mk:load-system "regex2" :compile-during-load t)
(mk:compile-system "cl-ppcre-test")
(load "testdata.lisp")
(dribble "results")
#+(or) (cl-ppcre-test::time-all-parker)
(cl-ppcre-test:test)
(quit)

View File

@@ -0,0 +1,704 @@
;;; -*- Mode: LISP; Syntax: COMMON-LISP; Package: CL-PPCRE; Base: 10 -*-
;;; $Header: /usr/local/cvsrep/cl-ppcre/lexer.lisp,v 1.1.1.1 2002/12/20 10:10:44 edi Exp $
;;; The lexer's responsibility is to convert the regex string into a
;;; sequence of tokens which are in turn consumed by the parser.
;;;
;;; The lexer is aware of Perl's 'extended mode' and it also 'knows'
;;; (with a little help from the parser) how many register groups it
;;; has opened so far. (The latter is necessary for interpreting
;;; strings like "\\10" correctly.)
;;; Copyright (c) 2002, Dr. Edmund Weitz. All rights reserved.
;;; Redistribution and use in source and binary forms, with or without
;;; modification, are permitted provided that the following conditions
;;; are met:
;;; * Redistributions of source code must retain the above copyright
;;; notice, this list of conditions and the following disclaimer.
;;; * Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials
;;; provided with the distribution.
;;; THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESSED
;;; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
;;; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;;; ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
;;; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
;;; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
;;; GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
;;; WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(in-package "CL-PPCRE")
(defun fix-pos (pos)
(declare (optimize speed space))
(declare (type fixnum pos))
"Will fix positions reported by error messages so that they match the
original regex string and not the one prefixed by modifiers like (?i)."
(- pos *error-msg-offset*))
(defun map-char-to-special-char-class (chr)
(declare (optimize speed space))
"Maps escaped characters like \"\\d\" to the tokens which represent
their associated character classes."
(case chr
((#\d)
:digit-class)
((#\D)
:non-digit-class)
((#\w)
:word-char-class)
((#\W)
:non-word-char-class)
((#\s)
:whitespace-char-class)
((#\S)
:non-whitespace-char-class)))
(defclass lexer ()
((str :initarg :str
:reader str
:type string
:documentation "The regex string which is lexed by this lexer.")
(len :reader len
:type fixnum
:documentation "The length of the regex string.")
(reg :initform 0
:accessor reg
:type fixnum
:documentation "The number of register groups opened so far.")
(pos :initform 0
:accessor pos
:type fixnum
:documentation "The current position within the regex string,
i.e. the next character that will be read.")
;; looks like we actually don't need a stack here,
;; remembering one position would suffice - but hey...
(last-pos :initform nil
:accessor last-pos
:documentation "A stack which holds older positions
the lexer might wish to get back to."))
(:documentation "LEXER objects are used to hold the regex string which is
currently lexed and to keep track of the lexer's state."))
(defmethod initialize-instance :after ((lexer lexer) &rest init-args)
(declare (optimize speed space))
(declare (ignore init-args))
"Computes the length of the regex string after initializing the lexer object."
(setf (slot-value lexer 'len) (length (str lexer))))
(defmethod end-of-string-p ((lexer lexer))
(declare (optimize speed space))
"Tests whether we're at the end of the regex string."
(<= (len lexer) (pos lexer)))
(defmethod next-char-non-extended ((lexer lexer))
(declare (optimize speed space))
"Returns the next character which is to be examined and updates the
POS slot. Does not respect extended mode."
(with-accessors ((pos pos) (str str))
lexer
(if (end-of-string-p lexer)
nil
(prog1
(char str pos)
(incf pos)))))
(defmethod next-char ((lexer lexer))
(declare (optimize speed space))
(declare (special extended-mode-p))
"Returns the next character which is to be examined and updates the
POS slot. Respects extended mode, i.e. whitespace, comments, and also
nested comments are skipped if applicable."
(with-slots ((pos pos))
lexer
(let ((next-char (next-char-non-extended lexer))
last-loop-pos)
(loop
;; remember where we started
(setq last-loop-pos pos)
;; first we look for nested comments like (?#foo)
(when (and next-char
(char= next-char #\()
(looking-at-p lexer #\?))
(incf pos)
(cond ((looking-at-p lexer #\#)
;; must be a nested comment - so we have to search
;; for the closing parenthesis
(let ((error-pos (- pos 2)))
(unless
;; loop 'til ')' or end of regex string and
;; return NIL if ')' wasn't encountered
(loop for skip-char = next-char
then (next-char-non-extended lexer)
while (and skip-char
(char/= skip-char #\)))
finally (return skip-char))
(error "Comment group started at position ~A not closed"
(fix-pos error-pos))))
(setq next-char (next-char-non-extended lexer)))
(t
;; undo effect of previous INCF if we didn't see a #
(decf pos))))
(when extended-mode-p
;; now - if we're in extended mode - we skip whitespace and
;; comments; repeat the following loop while we look at
;; whitespace or #\#
(loop while (and next-char
(or (char= next-char #\#)
(whitespacep next-char)))
do (setq next-char
(if (char= next-char #\#)
;; if we saw a comment marker skip until
;; we're behind #\Newline...
(loop for skip-char = next-char
then (next-char-non-extended lexer)
while (and skip-char
(char/= skip-char #\Newline))
finally (return (next-char-non-extended lexer)))
;; ...otherwise (whitespace) skip until
;; we see the next non-whitespace
;; character
(loop for skip-char = next-char
then (next-char-non-extended lexer)
while (and skip-char
(whitespacep skip-char))
finally (return skip-char))))))
;; if the position has moved we have to repeat our tests
;; because of cases like /^a (?#xxx) (?#yyy) {3}c/x which
;; would be equivalent to /^a{3}c/ in Perl
(unless (> pos last-loop-pos)
(return next-char))))))
(defmethod looking-at-p ((lexer lexer) chr)
(declare (optimize speed space))
"Tests whether the next character the lexer would see is CHR.
Does not respect extended mode."
(and (not (end-of-string-p lexer))
(char= (char (str lexer) (pos lexer))
chr)))
(defmethod fail ((lexer lexer))
(declare (optimize speed space))
"Moves (POS LEXER) back to the last position stored in (LAST-POS LEXER)
and pops the LAST-POS stack."
(with-accessors ((pos pos) (last-pos last-pos))
lexer
(unless last-pos
(error "LAST-POS stack of LEXER ~A is empty" lexer))
(setq pos (pop last-pos))
nil))
(defmethod get-number ((lexer lexer) &key (radix 10) max-length no-whitespace-p)
(declare (optimize speed space))
"Read and consume the number the lexer is currently looking at and
return it. Returns NIL if no number could be identified.
RADIX is used as in PARSE-INTEGER. If MAX-LENGTH is not NIL we'll read
at most the next MAX-LENGTH characters. If NO-WHITESPACE-P is not NIL
we don't tolerate whitespace in front of the number."
(when (and no-whitespace-p
(not (end-of-string-p lexer))
(whitespacep (char (str lexer) (pos lexer))))
(return-from get-number nil))
(multiple-value-bind (integer new-pos)
(parse-integer (str lexer)
:start (pos lexer)
:end (if max-length
(let ((end-pos (+ (pos lexer) max-length)))
(if (< end-pos (len lexer))
end-pos
nil))
nil)
:radix radix
:junk-allowed t)
(cond ((and integer (>= integer 0))
(setf (pos lexer) new-pos)
integer)
(t nil))))
(defmethod try-number ((lexer lexer) &key (radix 10) max-length no-whitespace-p)
"Like GET-NUMBER but won't consume anything if no number is seen."
(declare (optimize speed space))
;; remember current position
(push (pos lexer) (last-pos lexer))
(let ((number (get-number lexer
:radix radix
:max-length max-length
:no-whitespace-p no-whitespace-p)))
(or number (fail lexer))))
(defun make-char-from-code (number error-pos)
(declare (optimize speed space))
"Create character from char-code NUMBER. NUMBER can be NIL
which is interpreted as 0. ERROR-POS is the position where
the corresponding number started within the regex string."
;; Only look at rightmost eight bits in compliance with Perl
(let ((code (logand #o377 (or number 0))))
(or (and (< code char-code-limit)
(code-char code))
(error "No character for hex-code ~X at position ~A~%"
number (fix-pos error-pos)))))
(defmethod unescape-char ((lexer lexer))
(declare (optimize speed space))
(declare (special extended-mode-p))
"Convert the characters(s) following a backslash into a token
which is returned. This function is to be called when the backslash
has already been consumed. Special character classes like \\W are
handled elsewhere."
(when (end-of-string-p lexer)
(error "String ends with backslash"))
(let ((chr (next-char-non-extended lexer)))
(case chr
((#\c)
;; \cx means control-x in Perl
(let ((next-char (next-char-non-extended lexer)))
(unless next-char
(error "Character missing after '\\c' at position ~A"
(fix-pos (pos lexer))))
(code-char (logxor #x40 (char-code (char-upcase next-char))))))
((#\x)
;; \x should be followed by a hexadecimal char code,
;; two digits or less
(let* ((error-pos (pos lexer))
(number (get-number lexer :radix 16 :max-length 2 :no-whitespace-p t)))
;; note that it is OK if \x is followed by zero digits
(make-char-from-code number error-pos)))
((#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9)
;; \x should be followed by an octal char code,
;; three digits or less
(let* ((error-pos (decf (pos lexer)))
(number (get-number lexer :radix 8 :max-length 3)))
(make-char-from-code number error-pos)))
;; the following five character names are 'semi-standard'
;; according to the CLHS but I'm not aware of any implementation
;; that doesn't implement them
((#\t)
#\Tab)
((#\n)
#\Newline)
((#\r)
#\Return)
((#\f)
#\Page)
((#\b)
#\Backspace)
((#\a)
(code-char 7)) ; ASCII bell
((#\e)
(code-char 27)) ; ASCII escape
(otherwise
;; all other characters aren't affected by a backslash
chr))))
(defmethod collect-char-class ((lexer lexer))
(declare (optimize speed space))
"Reads and consumes characters from regex string until a right
bracket is seen. Assembles them into a list (which is returned) of
characters, character ranges, like (:RANGE #\A #\E) for a-e, and
tokens representing special character classes."
(let ((start-pos (pos lexer)) ; remember start for error message
hyphen-seen
last-char
list)
(flet ((handle-char (c)
"Do the right thing with character C depending on whether
we're inside a range or not."
(cond ((and hyphen-seen last-char)
(pop list)
(push (list :range last-char c) list)
(setq last-char nil))
(t
(push c list)
(setq last-char c)))
(setq hyphen-seen nil)))
(loop for first = t then nil
for c = (next-char-non-extended lexer)
;; leave loop if at end of string
while c
do (cond
((char= c #\\)
;; we've seen a backslash
(let ((next-char (next-char-non-extended lexer)))
(case next-char
((#\d #\D #\w #\W #\s #\S)
;; a special character class
(push (map-char-to-special-char-class next-char) list)
;; if the last character was a hyphen
;; just collect it literally
(when hyphen-seen
(push #\- list))
;; if the next character is a hyphen do the same
(when (looking-at-p lexer #\-)
(push #\- list)
(incf (pos lexer)))
(setq hyphen-seen nil))
(otherwise
;; otherwise unescape the following character(s)
(decf (pos lexer))
(handle-char (unescape-char lexer))))))
(first
;; the first character must not be a right bracket
;; and isn't treated specially if it's a hyphen
(handle-char c))
((char= c #\])
;; end of character class
;; make sure we collect a pending hyphen
(when hyphen-seen
(setq hyphen-seen nil)
(handle-char #\-))
;; reverse the list to preserve the order intended
;; by the author of the regex string
(return-from collect-char-class (nreverse list)))
((and (char= c #\-)
last-char
(not hyphen-seen))
;; if the last character was 'just a character'
;; we expect to be in the middle of a range
(setq hyphen-seen t))
((char= c #\-)
;; otherwise this is just an ordinary hyphen
(handle-char #\-))
(t
;; default case - just collect the character
(handle-char c))))
;; we can only exit the loop normally if we've reached the end
;; of the regex string without seeing a right bracket
(error "Missing right bracket to close character class started at pos ~A"
(fix-pos start-pos)))))
(defmethod maybe-parse-flags ((lexer lexer) test-only)
(declare (optimize speed space))
(declare (special extended-mode-p))
"Reads a sequence of modifiers (including #\- to reverse their
meaning) and returns a corresponding list of \"flag\" tokens. The
\"x\" modifier is treated specially in that it dynamically modifies
the behaviour of the lexer itself via the special variable
EXTENDED-MODE-P (unless TEST-ONLY is true)."
(prog1
(loop with set = t
for chr = (next-char-non-extended lexer)
unless chr
do (error "Unexpected end of string")
while (find chr "-imsx" :test #'char=)
;; the first #\- will invert the meaning of all modifiers
;; following it
if (char= chr #\-)
do (setq set nil)
else if (and (char= chr #\x)
(not test-only))
;; only modify current setting if we're not testing
do (setq extended-mode-p set)
else collect (if set
(case chr
((#\i)
:case-insensitive-p)
((#\m)
:multi-line-mode-p)
((#\s)
:single-line-mode-p))
(case chr
((#\i)
:case-sensitive-p)
((#\m)
:not-multi-line-mode-p)
((#\s)
:not-single-line-mode-p))))
(decf (pos lexer))))
(defmethod get-quantifier ((lexer lexer))
(declare (optimize speed space))
"Returns a list of two values (min max) if what the lexer is looking
at can be interpreted as a quantifier. Otherwise returns NIL and
resets the lexer to its old position."
;; remember starting position for FAIL and UNGET-TOKEN functions
(push (pos lexer) (last-pos lexer))
(let ((next-char (next-char lexer)))
(case next-char
((#\*)
;; * (Kleene star): match 0 or more times
'(0 nil))
((#\+)
;; +: match 1 or more times
'(1 nil))
((#\?)
;; ?: match 0 or 1 times
'(0 1))
((#\{)
;; one of
;; {n}: match exactly n times
;; {n,}: match at least n times
;; {n,m}: match at least n but not more than m times
;; note that anything not matching one of these patterns will
;; be interpreted literally - even whitespace isn't allowed
(let ((num1 (get-number lexer :no-whitespace-p t)))
(if num1
(let ((next-char (next-char-non-extended lexer)))
(case next-char
((#\,)
(let* ((num2 (get-number lexer :no-whitespace-p t))
(next-char (next-char-non-extended lexer)))
(case next-char
((#\})
;; this is the case {n,} (NUM2 is NIL) or {n,m}
(list num1 num2))
(otherwise
(fail lexer)))))
((#\})
;; this is the case {n}
(list num1 num1))
(otherwise
(fail lexer))))
;; no number following left curly brace, so we treat it
;; like a normal character
(fail lexer))))
;; cannot be a quantifier
(otherwise
(fail lexer)))))
(defmethod get-token ((lexer lexer) &key test-only)
(declare (optimize speed space))
"Returns and consumes the next token from the regex string (or NIL)."
;; remember starting position for UNGET-TOKEN function
(with-accessors ((pos pos) (str str) (reg reg) (last-pos last-pos))
lexer
(push pos last-pos)
(let ((next-char (next-char lexer)))
(cond (next-char
(case next-char
;; the easy cases first - the following six characters
;; always have a special meaning and get translated
;; into tokens immediately
((#\))
:close-paren)
((#\|)
:vertical-bar)
((#\?)
:question-mark)
((#\.)
:everything)
((#\^)
:start-anchor)
((#\$)
:end-anchor)
((#\+ #\*)
;; quantifiers will always be consumend by
;; GET-QUANTIFIER, they must not appear here
(error "Quantifier '~A' not allowed at position ~A"
next-char (fix-pos (1- pos))))
((#\{)
;; left brace isn't a special character in it's own
;; right but we must check if what follows might
;; look like a quantifier
(let ((this-pos pos)
(this-last-pos last-pos))
(unget-token lexer)
(when (get-quantifier lexer)
(error "Quantifier '~A' not allowed at position ~A"
(subseq str (car this-last-pos) pos)
(fix-pos (car this-last-pos))))
(setq pos this-pos
last-pos this-last-pos)
next-char))
((#\[)
;; left bracket always starts a character class
(if test-only
;; if we're only testing the contents of the
;; character class don't really matter
'(:char-class)
(cons (cond ((looking-at-p lexer #\^)
(incf pos)
:inverted-char-class)
(t
:char-class))
(collect-char-class lexer))))
((#\\)
;; backslash might mean different things so we have
;; to peek one char ahead:
(let ((next-char (next-char-non-extended lexer)))
(case next-char
((#\A)
:modeless-start-anchor)
((#\Z)
:modeless-end-anchor)
((#\z)
:modeless-end-anchor-no-newline)
((#\b)
:word-boundary)
((#\B)
:non-word-boundary)
((#\d #\D #\w #\W #\s #\S)
;; these will be treated like character classes
(map-char-to-special-char-class next-char))
((#\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9)
;; uh, a digit...
(let* ((old-pos (decf pos))
;; ...so let's get the whole number first
(backref-number (get-number lexer)))
(cond ((and (> backref-number reg)
(<= 10 backref-number))
;; \10 and higher are treated as
;; octal character codes if we
;; haven't opened that much register
;; groups yet
(setq pos old-pos)
;; re-read the number from the old
;; position and convert it to its
;; corresponding character
(make-char-from-code (get-number lexer :radix 8 :max-length 3)
old-pos))
(t
;; otherwise this must refer to a
;; backreference
(list :back-reference backref-number)))))
((#\0)
;; this always means an octal character code
;; (at most three digits)
(let ((old-pos (decf pos)))
(make-char-from-code (get-number lexer :radix 8 :max-length 3)
old-pos)))
(otherwise
;; in all other cases just unescape the
;; character
(decf pos)
(unescape-char lexer)))))
((#\()
;; an open parenthesis might mean different things
;; depending on what follows...
(cond ((looking-at-p lexer #\?)
;; this is the case '(?' (and probably more behind)
(incf pos)
;; we have to check for modifiers first
;; because a colon might follow
(let* ((flags (maybe-parse-flags lexer test-only))
(next-char (next-char-non-extended lexer)))
;; modifiers are only allowed if a colon
;; or a closing parenthesis are following
(when (and flags
(not (find next-char ":)" :test #'char=)))
(error "Sequence '~A' not recognized at position ~A"
(subseq str (car last-pos) pos)
(fix-pos (car last-pos))))
(case next-char
((nil)
;; syntax error
(error "End of string following '(?'"))
((#\))
;; an empty group except for the flags
;; (if there are any)
(or (and flags
(cons :flags flags))
:void))
((#\()
;; branch
:open-paren-paren)
((#\>)
;; standalone
:open-paren-greater)
((#\=)
;; positive look-ahead
:open-paren-equal)
((#\!)
;; negative look-ahead
:open-paren-exclamation)
((#\:)
;; non-capturing group - return flags
;; as second value
(values :open-paren-colon flags))
((#\<)
;; might be a look-behind assertion,
;; so check next character
(let ((next-char (next-char-non-extended lexer)))
(case next-char
((#\=)
;; positive look-behind
:open-paren-less-equal)
((#\!)
;; negative look-behind
:open-paren-less-exclamation)
((#\))
;; Perl allows "(?<)" and treats
;; it like a null string
:void)
((nil)
;; syntax error
(error "End of string following '(?<'"))
(t
;; also syntax error
(error "Character '~A' may not follow '(?<' at position ~A"
next-char (fix-pos (1- pos)))))))
(otherwise
(error "Character '~A' may not follow '(?' at position ~A"
next-char (fix-pos (1- pos)))))))
(t
;; if next-char was not #\? (this is within
;; the first COND), we've just seen an
;; opening parenthesis and leave it like
;; that
:open-paren)))
(otherwise
;; all other characters are their own tokens
next-char)))
;; we didn't get a character (this if the "else" branch from
;; the first IF), so we don't return a token but NIL
(t
(pop last-pos)
nil)))))
(defmethod unget-token ((lexer lexer))
(declare (optimize speed space))
"Moves the lexer back to the last position stored in the LAST-POS stack."
(with-accessors ((pos pos) (last-pos last-pos))
lexer
(if last-pos
(setq pos (pop last-pos))
(error "No token to unget"))))
(defmethod start-of-subexpr-p ((lexer lexer))
(declare (optimize speed space))
"Tests whether the next token can start a valid sub-expression, i.e.
a stand-alone regex. So, e.g., tokens like :QUESTION-MARK are not
allowed. Note that no token (i.e. we're at the end of the regex
string) is fine."
(let ((token (get-token lexer :test-only t)))
(cond (token
(unget-token lexer)
(or (member token '(:open-paren
:open-paren-colon
:open-paren-greater
:open-paren-paren
:open-paren-equal
:open-paren-exclamation
:open-paren-less-equal
:open-paren-less-exclamation
:digit-class
:non-digit-class
:whitespace-char-class
:non-whitespace-char-class
:word-char-class
:non-word-char-class
:start-anchor
:end-anchor
:modeless-start-anchor
:modeless-end-anchor
:modeless-end-anchor-no-newline
:word-boundary
:non-word-boundary
:everything
:void)
:test #'eq)
(and (consp token)
(member (car token) '(:char-class
:inverted-char-class
:back-reference
:flags)
:test #'eq))
(characterp token)))
(t nil))))

View File

@@ -0,0 +1,468 @@
;;; -*- Mode: LISP; Syntax: COMMON-LISP; Package: CL-PPCRE; Base: 10 -*-
;;; $Header: /usr/local/cvsrep/cl-ppcre/optimize.lisp,v 1.1.1.1 2002/12/20 10:10:44 edi Exp $
;;; This file contains optimizations which can be applied to converted
;;; parse trees.
;;; Copyright (c) 2002, Dr. Edmund Weitz. All rights reserved.
;;; Redistribution and use in source and binary forms, with or without
;;; modification, are permitted provided that the following conditions
;;; are met:
;;; * Redistributions of source code must retain the above copyright
;;; notice, this list of conditions and the following disclaimer.
;;; * Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials
;;; provided with the distribution.
;;; THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESSED
;;; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
;;; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;;; ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
;;; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
;;; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
;;; GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
;;; WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(in-package "CL-PPCRE")
(defun string-list-to-simple-string (string-list)
(declare (optimize speed space))
"Concatenates a list of strings to one simple-string."
;; note that we can't use APPLY with CONCATENATE here because of
;; CALL-ARGUMENTS-LIMIT
(reduce #'(lambda (str1 str2)
(concatenate 'simple-string str1 str2))
string-list
:initial-value (make-string 0)))
(defgeneric flatten (regex)
(declare (optimize speed space))
(:documentation "Merges adjacent sequences and alternations, i.e. it
transforms #<SEQ #<STR \"a\"> #<SEQ #<STR \"b\"> #<STR \"c\">>> to
#<SEQ #<STR \"a\"> #<STR \"b\"> #<STR \"c\">>. This is a destructive
operation on REGEX."))
(defmethod flatten ((seq seq))
(with-slots ((elements elements))
seq
(setq elements
;; loop through all elements of the sequence
(loop for element in elements
;; flatten inner elements first
for flattened-element = (flatten element)
if (typep flattened-element 'seq)
;; "splice" sequences into collected list
nconc (elements flattened-element)
else if flattened-element
;; and collect other regexes as is
nconc (list flattened-element)))
(if elements
seq
(make-instance 'void))))
(defmethod flatten ((alternation alternation))
(with-slots ((choices choices))
alternation
(setq choices
;; loop through all choices of the alternation
(loop for choice in choices
for flattened-choice = (flatten choice)
if (typep flattened-choice 'alternation)
;; "splice" alternations into collected list
nconc (choices flattened-choice)
else
;; and collect other regexes as is
nconc (list flattened-choice)))
alternation))
(defmethod flatten ((branch branch))
(with-slots ((test test)
(then-regex then-regex)
(else-regex else-regex))
branch
(setq test
(if (numberp test)
test
(flatten test))
then-regex (flatten then-regex)
else-regex (flatten else-regex))
branch))
(defmethod flatten ((regex regex))
(typecase regex
((or repetition register lookahead lookbehind standalone)
;; if REGEX contains exactly one inner REGEX object flatten it
(setf (regex regex)
(flatten (regex regex)))
regex)
(otherwise
;; otherwise (ANCHOR, BACK-REFERENCE, CHAR-CLASS, EVERYTHING,
;; LOOKAHEAD, LOOKBEHIND, STR, VOID, and WORD-BOUNDARY) do
;; nothing
regex)))
(defgeneric gather-strings (regex)
(declare (optimize speed space))
(:documentation "Collects adjacent strings or characters into one
string provided they have the same case mode. This is a destructive
operation on REGEX."))
(defmethod gather-strings ((seq seq))
;; note that GATHER-STRINGS is to be applied after FLATTEN, i.e. it
;; expects SEQ to be flattened already; in particular, SEQ cannot be
;; empty and cannot contain embedded SEQ objects
(with-slots ((elements elements))
seq
(let ((gathered-seq
(loop with collector and skip
;; loop through all elements of SEQ
for element in elements
for old-case-mode = nil then case-mode
for case-mode = (case-mode element old-case-mode)
if (and case-mode
(eq case-mode old-case-mode))
;; if ELEMENT is a STR and we have collected a STR
;; of the same case mode in the last iteration we
;; push ELEMENT onto COLLECTOR and remember the
;; value of its SKIP slot
do (push (str element) collector)
;; it suffices to remember the last SKIP slot
;; because due to the way MAYBE-ACCUMULATE works
;; adjacent STR objects have the same SKIP value
and do (setq skip (skip element))
else
if collector
;; otherwise if we have collected something
;; already we convert it into a STR
collect (make-instance 'str
:skip skip
:str (string-list-to-simple-string
(nreverse collector))
:case-insensitive-p
(eq old-case-mode
:case-insensitive))
into result
end
and if case-mode
;; if ELEMENT is a string with a different case
;; mode than the last one we have either just
;; converted COLLECTOR into a STR or COLLECTOR
;; is still empty; in both cases we can now
;; begin to fill it anew
do (setq collector (list (str element)))
;; and we remember the SKIP value as above
and do (setq skip (skip element))
else
;; otherwise this is not a STR so we apply
;; GATHER-STRINGS to it and collect it directly
;; into RESULT
collect (gather-strings element) into result
;; we also have to empty COLLECTOR here in case
;; it was still filled from the last iteration
and do (setq collector nil)
finally (return (if collector
;; if COLLECTOR isn't empty we
;; have to convert it to a STR as
;; above and append it to RESULT
;; before we return it
(nconc result
(list
(make-instance
'str
:skip skip
:str (string-list-to-simple-string
(nreverse collector))
:case-insensitive-p
(eq case-mode
:case-insensitive))))
;; otherwise just return RESULT
result)))))
(cond ((rest gathered-seq)
;; if GATHERED-SEQ has at least two elements we set the
;; ELEMENTS slot of SEQ accordingly and return SEQ
(setq elements gathered-seq)
seq)
(t
;; otherwise it suffices to return the one element of
;; GATHERED-SEQ, i.e. we drop the enclosing SEQ
(car gathered-seq))))))
(defmethod gather-strings ((alternation alternation))
;; loop ON the choices of ALTERNATION so we can modify them directly
(loop for choices-rest on (choices alternation)
while choices-rest
do (setf (car choices-rest)
(gather-strings (car choices-rest))))
alternation)
(defmethod gather-strings ((branch branch))
(with-slots ((test test)
(then-regex then-regex)
(else-regex else-regex))
branch
(setq test
(if (numberp test)
test
(gather-strings test))
then-regex (gather-strings then-regex)
else-regex (gather-strings else-regex))
branch))
(defmethod gather-strings ((regex regex))
(typecase regex
((or repetition register lookahead lookbehind standalone)
;; if REGEX contains exactly one inner REGEX object apply
;; GATHER-STRINGS to it
(setf (regex regex)
(gather-strings (regex regex)))
regex)
(otherwise
;; otherwise (ANCHOR, BACK-REFERENCE, CHAR-CLASS, EVERYTHING,
;; LOOKAHEAD, LOOKBEHIND, STR, VOID, and WORD-BOUNDARY) do
;; nothing
regex)))
;; Note that START-ANCHORED-P will be called after FLATTEN and GATHER-STRINGS.
(defgeneric start-anchored-p (regex &optional in-seq-p)
(declare (optimize speed space))
(:documentation "Returns T if REGEX starts with a \"real\" start
anchor, i.e. one that's not in multi-line mode, NIL otherwise. If
IN-SEQ-P is true the function will return :ZERO-LENGTH if REGEX is a
zero-length assertion."))
(defmethod start-anchored-p ((seq seq) &optional in-seq-p)
(declare (ignore in-seq-p))
;; note that START-ANCHORED-P is to be applied after FLATTEN and
;; GATHER-STRINGS, i.e. SEQ cannot be empty and cannot contain
;; embedded SEQ objects
(loop for element in (elements seq)
for anchored-p = (start-anchored-p element t)
;; skip zero-length elements because they won't affect the
;; "anchoredness" of the sequence
while (eq anchored-p :zero-length)
finally (return (and anchored-p (not (eq anchored-p :zero-length))))))
(defmethod start-anchored-p ((alternation alternation) &optional in-seq-p)
(declare (ignore in-seq-p))
;; clearly an alternation can only be start-anchored if all of its
;; choices are start-anchored
(loop for choice in (choices alternation)
always (start-anchored-p choice)))
(defmethod start-anchored-p ((branch branch) &optional in-seq-p)
(declare (ignore in-seq-p))
(and (start-anchored-p (then-regex branch))
(start-anchored-p (else-regex branch))))
(defmethod start-anchored-p ((repetition repetition) &optional in-seq-p)
(declare (ignore in-seq-p))
;; well, this wouldn't make much sense, but anyway...
(and (plusp (minimum repetition))
(start-anchored-p (regex repetition))))
(defmethod start-anchored-p ((register register) &optional in-seq-p)
(declare (ignore in-seq-p))
(start-anchored-p (regex register)))
(defmethod start-anchored-p ((standalone standalone) &optional in-seq-p)
(declare (ignore in-seq-p))
(start-anchored-p (regex standalone)))
(defmethod start-anchored-p ((anchor anchor) &optional in-seq-p)
(declare (ignore in-seq-p))
(and (startp anchor)
(not (multi-line-p anchor))))
(defmethod start-anchored-p ((regex regex) &optional in-seq-p)
(typecase regex
((or lookahead lookbehind word-boundary void)
;; zero-length assertions
(if in-seq-p
:zero-length
nil))
(otherwise
;; BACK-REFERENCE, CHAR-CLASS, EVERYTHING, and STR
nil)))
;; Note that END-STRING-AUX will be called after FLATTEN and GATHER-STRINGS.
(defgeneric end-string-aux (regex &optional old-case-insensitive-p)
(declare (optimize speed space))
(:documentation "Returns the constant string (if it exists) REGEX
ends with wrapped into a STR object, otherwise NIL.
OLD-CASE-INSENSITIVE-P is the CASE-INSENSITIVE-P slot of the last STR
collected or :VOID if no STR has been collected yet. (This is a helper
function called by END-STRIN.)"))
(defmethod end-string-aux ((str str)
&optional (old-case-insensitive-p :void))
(declare (special last-str))
(cond ((and (not (skip str)) ; avoid constituents of STARTS-WITH
;; only use STR if nothing has been collected yet or if
;; the collected string has the same value for
;; CASE-INSENSITIVE-P
(or (eq old-case-insensitive-p :void)
(eq (case-insensitive-p str) old-case-insensitive-p)))
(setf last-str str
;; set the SKIP property of this STR
(skip str) t)
str)
(t nil)))
(defmethod end-string-aux ((seq seq)
&optional (old-case-insensitive-p :void))
(declare (special continuep))
(let* ((collected-strings
(nreverse
;; loop through all elements in reverse order
(loop for element in (reverse (elements seq))
;; remember the case-(in)sensitivity of the last
;; relevant STR object
for loop-old-case-insensitive-p = old-case-insensitive-p
then (if skip
loop-old-case-insensitive-p
(case-insensitive-p element-end))
;; the end-string of the current element
for element-end = (end-string-aux element
loop-old-case-insensitive-p)
;; whether we encountered a zero-length element
for skip = (if element-end
(zerop (len element-end))
nil)
;; set CONTINUEP to NIL if we have to stop
;; collecting to alert END-STRING-AUX methods on
;; enclosing SEQ objects
unless element-end
do (setq continuep nil)
;; end loop if we neither got a STR nor a
;; zero-length element
while element-end
;; only collect if not zero-length
unless skip
collect element-end
;; stop collecting if END-STRING-AUX on inner SEQ
;; has said so
while continuep)))
(concatenated-string
(string-list-to-simple-string (mapcar #'str collected-strings))))
(if (zerop (length concatenated-string))
;; don't bother to return zero-length strings
nil
(make-instance 'str
:str concatenated-string
:case-insensitive-p (case-insensitive-p
(first collected-strings))))))
(defmethod end-string-aux ((register register)
&optional (old-case-insensitive-p :void))
(end-string-aux (regex register) old-case-insensitive-p))
(defmethod end-string-aux ((standalone standalone)
&optional (old-case-insensitive-p :void))
(end-string-aux (regex standalone) old-case-insensitive-p))
(defmethod end-string-aux ((regex regex)
&optional (old-case-insensitive-p :void))
(declare (special last-str end-anchored-p continuep))
(typecase regex
((or anchor lookahead lookbehind word-boundary void)
;; a zero-length REGEX object - for the sake of END-STRING-AUX
;; this is a zero-length string
(when (and (typep regex 'anchor)
(not (startp regex))
(or (no-newline-p regex)
(not (multi-line-p regex)))
(eq old-case-insensitive-p :void))
;; if this is a "real" end-anchor and we haven't collected
;; anything so far we can set END-ANCHORED-P (where 1 or 0
;; indicate whether we accept a #\Newline at the end or not)
(setq end-anchored-p (if (no-newline-p regex) 0 1)))
(make-instance 'str
:str ""
:case-insensitive-p :void))
(otherwise
;; (ALTERNATION, BACK-REFERENCE, BRANCH, CHAR-CLASS, EVERYTHING,
;; REPETITION)
nil)))
(defmethod end-string ((regex regex))
(declare (optimize speed space))
"Returns the constant string (if it exists) REGEX ends with wrapped
into a STR object, otherwise NIL."
(declare (special end-string-offset))
;; LAST-STR points to the last STR object (seen from the end) that's
;; part of END-STRING; CONTINUEP is set to T if we stop collecting
;; in the middle of a SEQ
(let ((continuep t)
last-str)
(declare (special continuep last-str))
(prog1
(end-string-aux regex)
(when last-str
;; if we've found something set the START-OF-END-STRING-P of
;; the leftmost STR collected accordingly and remember the
;; OFFSET of this STR (in a special variable provided by the
;; caller of this function)
(setf (start-of-end-string-p last-str) t
end-string-offset (offset last-str))))))
(defgeneric compute-min-rest (regex current-min-rest)
(declare (optimize speed space))
(:documentation "Returns the minimal length of REGEX plus
CURRENT-MIN-REST. This is similar to REGEX-MIN-LENGTH except that it
recurses down into REGEX and sets the MIN-REST slots of REPETITION
objects."))
(defmethod compute-min-rest ((seq seq) current-min-rest)
(loop for element in (reverse (elements seq))
for last-min-rest = current-min-rest then this-min-rest
for this-min-rest = (compute-min-rest element last-min-rest)
finally (return this-min-rest)))
(defmethod compute-min-rest ((alternation alternation) current-min-rest)
(loop for choice in (choices alternation)
minimize (compute-min-rest choice current-min-rest)))
(defmethod compute-min-rest ((branch branch) current-min-rest)
(min (compute-min-rest (then-regex branch) current-min-rest)
(compute-min-rest (else-regex branch) current-min-rest)))
(defmethod compute-min-rest ((str str) current-min-rest)
(+ current-min-rest (len str)))
(defmethod compute-min-rest ((repetition repetition) current-min-rest)
(setf (min-rest repetition) current-min-rest)
(compute-min-rest (regex repetition) current-min-rest)
(+ current-min-rest (* (minimum repetition) (min-len repetition))))
(defmethod compute-min-rest ((register register) current-min-rest)
(compute-min-rest (regex register) current-min-rest))
(defmethod compute-min-rest ((standalone standalone) current-min-rest)
(declare (ignore current-min-rest))
(compute-min-rest (regex standalone) 0))
(defmethod compute-min-rest ((lookahead lookahead) current-min-rest)
(compute-min-rest (regex lookahead) current-min-rest)
current-min-rest)
(defmethod compute-min-rest ((lookbehind lookbehind) current-min-rest)
(compute-min-rest (regex lookbehind) (+ current-min-rest (len lookbehind)))
current-min-rest)
(defmethod compute-min-rest ((regex regex) current-min-rest)
(typecase regex
((or char-class everything)
(1+ current-min-rest))
(otherwise
;; zero min-len and no embedded regexes (ANCHOR,
;; BACK-REFERENCE, VOID, and WORD-BOUNDARY)
current-min-rest)))

View File

@@ -0,0 +1,47 @@
;;; -*- Mode: LISP; Syntax: COMMON-LISP; Package: CL-USER; Base: 10 -*-
;;; $Header: /usr/local/cvsrep/cl-ppcre/packages.lisp,v 1.1.1.1 2002/12/20 10:10:44 edi Exp $
;;; Copyright (c) 2002, Dr. Edmund Weitz. All rights reserved.
;;; Redistribution and use in source and binary forms, with or without
;;; modification, are permitted provided that the following conditions
;;; are met:
;;; * Redistributions of source code must retain the above copyright
;;; notice, this list of conditions and the following disclaimer.
;;; * Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials
;;; provided with the distribution.
;;; THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESSED
;;; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
;;; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;;; ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
;;; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
;;; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
;;; GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
;;; WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(in-package "CL-USER")
(defpackage "CL-PPCRE"
(:nicknames "PPCRE")
(:use "CL")
(:export "CREATE-SCANNER"
"SCAN"
"SCAN-TO-STRINGS"
"DO-SCANS"
"DO-MATCHES"
"DO-MATCHES-AS-STRINGS"
"ALL-MATCHES"
"ALL-MATCHES-AS-STRINGS"
"SPLIT"
"REGEX-REPLACE"
"REGEX-REPLACE-ALL"
"REGEX-APROPOS"
"REGEX-APROPOS-LIST"))

View File

@@ -0,0 +1,243 @@
;;; -*- Mode: LISP; Syntax: COMMON-LISP; Package: CL-PPCRE; Base: 10 -*-
;;; $Header: /usr/local/cvsrep/cl-ppcre/parser.lisp,v 1.1.1.1 2002/12/20 10:10:44 edi Exp $
;;; The parser will - with the help of the lexer - parse a regex
;;; string and convert it into a "parse tree" (see docs for details
;;; about the syntax of these trees). Note that the lexer might return
;;; illegal parse trees. It is assumed that the conversion process
;;; later on will track them down.
;;; Copyright (c) 2002, Dr. Edmund Weitz. All rights reserved.
;;; Redistribution and use in source and binary forms, with or without
;;; modification, are permitted provided that the following conditions
;;; are met:
;;; * Redistributions of source code must retain the above copyright
;;; notice, this list of conditions and the following disclaimer.
;;; * Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials
;;; provided with the distribution.
;;; THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESSED
;;; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
;;; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;;; ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
;;; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
;;; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
;;; GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
;;; WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(in-package "CL-PPCRE")
(defmethod group ((lexer lexer))
(declare (optimize speed space)
(special extended-mode-p))
"Parses and consumes a <group>.
The productions are: <group> -> \"(\"<regex>\")\"
\"(?:\"<regex>\")\"
\"(?<\"<regex>\")\"
\"(?<flags>:\"<regex>\")\"
\"(?=\"<regex>\")\"
\"(?!\"<regex>\")\"
\"(?<=\"<regex>\")\"
\"(?<!\"<regex>\")\"
\"(?(\"<num>\")\"<regex>\")\"
\"(?(\"<regex>\")\"<regex>\")\"
<legal-token>
where <flags> is parsed by the lexer function MAYBE-PARSE-FLAGS.
Will return <parse-tree> or (<grouping-type> <parse-tree>) where
<grouping-type> is one of six keywords - see source for details."
;; make sure modifications of extended mode are discarded at closing
;; parenthesis
(multiple-value-bind (open-token flags)
(get-token lexer)
(cond ((eq open-token :open-paren-paren)
;; special case for conditional regular expressions; note
;; that at this point we accept a couple of illegal
;; combinations which'll be sorted out later by the
;; converter
(let* ((open-paren-pos (car (last-pos lexer)))
;; check if what follows "(?(" is a number
(number (try-number lexer :no-whitespace-p t))
;; make changes to extended-mode-p local
(extended-mode-p extended-mode-p))
(declare (special extended-mode-p))
(cond (number
;; condition is a number (i.e. refers to a
;; back-reference)
(let* ((inner-close-token (get-token lexer))
(regex (regex lexer))
(close-token (get-token lexer)))
(unless (eq inner-close-token :close-paren)
(error "Opening paren at position ~A has no matching closing paren"
(fix-pos (+ open-paren-pos 2))))
(unless (eq close-token :close-paren)
(error "Opening paren at position ~A has no matching closing paren"
(fix-pos open-paren-pos)))
(list :branch number regex)))
(t
;; condition must be a full regex (actually a
;; look-behind or look-ahead); and here comes a
;; terrible kludge: instead of being cleanly
;; separated from the lexer, the parser pushes
;; back the lexer by one position, thereby
;; landing in the middle of the 'token' "(?(" -
;; yuck!!
(decf (pos lexer))
(let* ((inner-regex (group lexer))
(regex (regex lexer))
(close-token (get-token lexer)))
(unless (eq close-token :close-paren)
(error "Opening paren at position ~A has no matching closing paren"
(fix-pos open-paren-pos)))
(list :branch inner-regex regex))))))
((member open-token '(:open-paren
:open-paren-colon
:open-paren-greater
:open-paren-equal
:open-paren-exclamation
:open-paren-less-equal
:open-paren-less-exclamation)
:test #'eq)
;; make changes to extended-mode-p local
(let ((extended-mode-p extended-mode-p))
(declare (special extended-mode-p))
;; we saw one of the six token representing opening
;; parentheses
(let* ((open-paren-pos (car (last-pos lexer)))
(regex (regex lexer))
(close-token (get-token lexer)))
(when (eq open-token :open-paren)
;; if this is the "("<regex>")" production we have to
;; increment the register counter of the lexer
(incf (reg lexer)))
(unless (eq close-token :close-paren)
;; the token following <regex> must be the closing
;; parenthesis or this is a syntax error
(error "Opening paren at position ~A has no matching closing paren"
(fix-pos open-paren-pos)))
(if flags
;; if the lexer has returned a list of flags this must
;; have been the "(?:"<regex>")" production
(cons :group (nconc flags (list regex)))
(list (case open-token
((:open-paren)
:register)
((:open-paren-colon)
:group)
((:open-paren-greater)
:standalone)
((:open-paren-equal)
:positive-lookahead)
((:open-paren-exclamation)
:negative-lookahead)
((:open-paren-less-equal)
:positive-lookbehind)
((:open-paren-less-exclamation)
:negative-lookbehind))
regex)))))
(t
;; this is the <legal-token> production; <legal-token> is
;; any token which passes START-OF-SUBEXPR-P (otherwise
;; parsing had already stopped in the SEQ method)
open-token))))
(defmethod greedy-quant ((lexer lexer))
(declare (optimize speed space))
"Parses and consumes a <greedy-quant>.
The productions are: <greedy-quant> -> <group> | <group><quantifier>
where <quantifier> is parsed by the lexer function GET-QUANTIFIER.
Will return <parse-tree> or (:GREEDY-REPETITION <min> <max> <parse-tree>)."
(let* ((group (group lexer))
(token (get-quantifier lexer)))
(if token
;; if GET-QUANTIFIER returned a non-NIL value it's the
;; two-element list (<min> <max>)
(list :greedy-repetition (first token) (second token) group)
group)))
(defmethod quant ((lexer lexer))
(declare (optimize speed space))
"Parses and consumes a <quant>.
The productions are: <quant> -> <greedy-quant> | <greedy-quant>\"?\".
Will return the <parse-tree> returned by GREEDY-QUANT and optionally
change :GREEDY-REPETITION to :NON-GREEDY-REPETITION."
(let* ((greedy-quant (greedy-quant lexer))
(token (get-token lexer :test-only t)))
(when token
(if (eq token :question-mark)
(setf (car greedy-quant) :non-greedy-repetition)
(unget-token lexer)))
greedy-quant))
(defmethod seq ((lexer lexer))
(declare (optimize speed space))
"Parses and consumes a <seq>.
The productions are: <seq> -> <quant> | <quant><seq>.
Will return <parse-tree> or (:SEQUENCE <parse-tree> <parse-tree>)."
;; Note that we're calling START-OF-SUBEXPR-P before we actually try
;; to parse a <seq> or <quant> in order to catch empty regular
;; expressions
(if (start-of-subexpr-p lexer)
(let ((quant (quant lexer)))
(if (start-of-subexpr-p lexer)
(list :sequence quant (seq lexer))
quant))
:void))
(defmethod regex ((lexer lexer))
(declare (optimize speed space))
"Parses and consumes a <regex>, a complete regular expression.
The productions are: <regex> -> <seq> | <seq>\"|\"<regex>.
Will return <parse-tree> or (:ALTERNATION <parse-tree> <parse-tree>)."
(let ((token (get-token lexer :test-only t)))
(cond ((not token)
;; if we didn't get any token we return :VOID which stands
;; for "empty regular expression"
:void)
((eq token :vertical-bar)
;; now check whether the expression started with a
;; vertical bar, i.e. <seq> - the left alternation - is
;; empty
(list :alternation :void (regex lexer)))
(t
;; otherwise un-read the token we just saw and parse a
;; <seq> plus the token following it
(unget-token lexer)
(let* ((seq (seq lexer))
(token (get-token lexer :test-only t)))
(cond ((not token)
;; no further token, just a <seq>
seq)
((eq token :vertical-bar)
;; if the token was a vertical bar, this is an
;; alternation and we have the second production
(list :alternation seq (regex lexer)))
(t
;; a token which is not a vertical bar - this is
;; either a syntax error or we're inside of a
;; group and the next token is a closing
;; parenthesis; so we just un-read the token and
;; let another function take care of it
(unget-token lexer)
seq)))))))
(defun parse-string (str)
(declare (optimize speed space))
"Translate the regex string STR into a parse tree."
(let* ((lexer (make-instance 'lexer :str str))
(extended-mode-p nil)
(parse-tree (regex lexer)))
;; initialize the extended mode flag to NIL before starting the lexer
(declare (special extended-mode-p))
;; check whether we've consumed the whole regex string
(if (end-of-string-p lexer)
parse-tree
(error "Expected end of string at position ~A"
(fix-pos (pos lexer))))))

View File

@@ -0,0 +1,174 @@
#!/usr/bin/perl
# This is a heavily modified version of the file 'perltest' which
# comes with the PCRE library package, which is open source software,
# written by Philip Hazel, and copyright by the University of
# Cambridge, England.
# The PCRE library package is available from
# <ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/>
use Time::HiRes qw(time);
sub string_for_lisp {
my(@a, $t, $in_string, $switch);
my $string = shift;
$string =~ s/\\/\\\\/g;
$string =~ s/"/\\"/g;
$in_string = 1;
foreach $c (split(//, $string)) {
if (ord $c >= 32 && ord $c < 127) {
if ($in_string) {
$t .= $c;
} else {
$in_string = 1;
$t = $c;
}
} else {
if ($in_string) {
push @a, "\"$t\"";
$in_string = 0;
$switch = 1;
}
push @a, sprintf("(list (code-char %d))", ord $c);
}
}
if ($switch) {
if ($in_string) {
push @a, "\"$t\"";
}
"(cl-ppcre::string-list-to-simple-string (let (list) " . (join ' ', map {"(push $_ list)"} @a) . "(nreverse list)))";
} else {
"\"$t\"";
}
}
$min_time = shift;
print "(defparameter cl-ppcre-test::*testdata* nil)\n";
NEXT_RE: while (1) {
last
if !($_ = <>);
next
if $_ eq "";
$pattern = $_;
while ($pattern !~ /^\s*(.).*\1/s) {
last
if !($_ = <>);
$pattern .= $_;
}
chomp($pattern);
$pattern =~ s/\s+$//;
$pattern =~ s/\+(?=[a-z]*$)//;
$multi_line_mode = ($pattern =~ /m[a-z]*$/) ? 't' : 'nil';
$single_line_mode = ($pattern =~ /s[a-z]*$/) ? 't' : 'nil';
$extended_mode = ($pattern =~ /x[a-z]*$/) ? 't' : 'nil';
$case_insensitive_mode = ($pattern =~ /i[a-z]*$/) ? 't' : 'nil';
$pattern =~ s/^(.*)g([a-z]*)$/\1\2/;
$pattern_for_lisp = $pattern;
$pattern_for_lisp =~ s/[a-z]*$//;
$pattern_for_lisp =~ s/^\s*(.)(.*)\1/$2/s;
$pattern_for_lisp =~ s/\\/\\\\/g;
$pattern_for_lisp =~ s/"/\\"/g;
$pattern = "/(?#)/$2"
if ($pattern =~ /^(.)\1(.*)$/);
while (1) {
last NEXT_RE
if !($_ = <>);
chomp;
s/\s+$//;
s/^\s+//;
last
if ($_ eq "");
$info_string = string_for_lisp "\"$_\" =~ $pattern";
$x = eval "\"$_\"";
@subs = ();
eval <<"END";
if (\$x =~ ${pattern}) {
push \@subs,\$&;
push \@subs,\$1;
push \@subs,\$2;
push \@subs,\$3;
push \@subs,\$4;
push \@subs,\$5;
push \@subs,\$6;
push \@subs,\$7;
push \@subs,\$8;
push \@subs,\$9;
push \@subs,\$10;
push \@subs,\$11;
push \@subs,\$12;
push \@subs,\$13;
push \@subs,\$14;
push \@subs,\$15;
push \@subs,\$16;
}
\$test = sub {
my \$times = shift;
my \$start = time;
for (my \$i = 0; \$i < \$times; \$i++) {
\$x =~ ${pattern};
}
return time - \$start;
};
END
$times = 1;
$used = 0;
$counter++;
print STDERR "$counter\n";
if ($@) {
$error = 't';
} else {
$error = 'nil';
if ($min_time) {
$times = 10;
while (1) {
$used = &$test($times);
last
if $used > $min_time;
$times *= 10;
}
}
}
print "(push (list $counter $info_string \"$pattern_for_lisp\" $case_insensitive_mode $multi_line_mode $single_line_mode $extended_mode " . string_for_lisp($x) . " $error $times $used ";
if (!@subs) {
print 'nil nil';
} else {
print string_for_lisp($subs[0]) . ' (list';
undef $not_first;
for ($i = 1; $i <= 16; $i++) {
print ' ';
if (defined $subs[$i]) {
print string_for_lisp $subs[$i];
} else {
print 'nil';
}
}
print ')';
}
print ") cl-ppcre-test::*testdata*)\n";
}
}
print "(setq cl-ppcre-test::*testdata* (nreverse cl-ppcre-test::*testdata*))\n";

View File

@@ -0,0 +1,207 @@
;;; -*- Mode: LISP; Syntax: COMMON-LISP; Package: CL-PPCRE-TEST; Base: 10 -*-
;;; $Header: /usr/local/cvsrep/cl-ppcre/ppcre-tests.lisp,v 1.1.1.1 2002/12/20 10:10:44 edi Exp $
;;; Copyright (c) 2002, Dr. Edmund Weitz. All rights reserved.
;;; Redistribution and use in source and binary forms, with or without
;;; modification, are permitted provided that the following conditions
;;; are met:
;;; * Redistributions of source code must retain the above copyright
;;; notice, this list of conditions and the following disclaimer.
;;; * Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials
;;; provided with the distribution.
;;; THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESSED
;;; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
;;; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;;; ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
;;; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
;;; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
;;; GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
;;; WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(in-package "CL-USER")
(defpackage "CL-PPCRE-TEST"
(:use "CL" "CL-PPCRE")
(:export "TEST"))
(in-package "CL-PPCRE-TEST")
(defparameter *testdata* nil
"A list with test and benchmark data produced by an external Perl script.")
(defun gc ()
"Start a full garbage collection."
;; what are the corresponding values for MCL and OpenMCL?
#+allegro (excl:gc t)
#+(or cmu scl) (ext:gc :full t)
#+clisp (ext:gc)
#+lispworks (hcl:mark-and-sweep 3)
#+sbcl (sb-ext:gc :full t))
;; warning: ugly code ahead!!
;; this is just a quick hack for testing purposes
(defun time-regex (factor regex string
&key case-insensitive-mode
multi-line-mode
single-line-mode
extended-mode)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Auxiliary function used by TEST to benchmark a regex scanner
against Perl timings."
(declare (type string string))
(let* ((scanner (create-scanner regex
:case-insensitive-mode case-insensitive-mode
:multi-line-mode multi-line-mode
:single-line-mode single-line-mode
:extended-mode extended-mode))
;; make sure GC doesn't invalidate our benchmarking
(dummy (gc))
(start (get-internal-real-time)))
(declare (ignore dummy))
(dotimes (i factor)
(funcall scanner string 0 (length string)))
(float (/ (- (get-internal-real-time) start) internal-time-units-per-second))))
#+(or scl lispworks)
(defun threaded-scan (scanner target-string &key (threads 10) (repetitions 5000))
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Auxiliary function used by TEST to check whether SCANNER is thread-safe."
(gc)
(let ((collector (make-array threads))
(counter 0))
(loop for i below threads
do (let* ((j i)
(fn
(lambda ()
(let ((r (random repetitions)))
(loop for k below repetitions
if (= k r)
do (setf (aref collector j)
(let ((result
(multiple-value-list
(cl-ppcre:scan scanner target-string))))
(unless (cdr result)
(setq result '(nil nil #() #())))
result))
else
do (cl-ppcre:scan scanner target-string))
(incf counter)))))
#+scl (thread:thread-create fn)
#+lispworks (mp:process-run-function "" nil fn)))
(loop while (< counter threads)
do (sleep .1))
(destructuring-bind (first-start first-end first-reg-starts first-reg-ends)
(aref collector 0)
(loop for (start end reg-starts reg-ends) across collector
if (or (not (eql first-start start))
(not (eql first-end end))
(/= (length first-reg-starts) (length reg-starts))
(/= (length first-reg-ends) (length reg-ends))
(loop for first-reg-start across first-reg-starts
for reg-start across reg-starts
thereis (not (eql first-reg-start reg-start)))
(loop for first-reg-end across first-reg-ends
for reg-end across reg-ends
thereis (not (eql first-reg-end reg-end))))
do (return (format nil "~&Inconsistent results during multi-threading"))))))
(defun test (&key threaded)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
(declare (ignorable threaded))
"Loop through all test cases in TESTDATA and print report. Only in
LispWorks and SCL: If THREADED is true, also test whether the scanners
work multi-threaded."
(loop for (counter info-string regex
case-insensitive-mode multi-line-mode
single-line-mode extended-mode
string perl-error factor
perl-time ex-result ex-subs) in *testdata*
do (let ((errors '()))
;; uncomment the next line for the ACL trial version (and
;; probably also for the LW trial version)
#+(or) (gc)
(handler-case
(let ((scanner (create-scanner regex
:case-insensitive-mode case-insensitive-mode
:multi-line-mode multi-line-mode
:single-line-mode single-line-mode
:extended-mode extended-mode)))
(multiple-value-bind (result1 result2 sub-starts sub-ends)
(scan scanner string)
(cond (perl-error
(push (format nil
"~&expected an error but got a result")
errors))
(t
(when (not (eq result1 ex-result))
(if result1
(let ((result (subseq string result1 result2)))
(unless (string= result ex-result)
(push (format nil
"~&expected ~:[NIL~;~:*'~A'~] but got ~:[NIL~;~:*'~A'~]"
ex-result result)
errors))
(setq sub-starts (coerce sub-starts 'list)
sub-ends (coerce sub-ends 'list))
(loop for i from 0
for ex-sub in ex-subs
for sub-start = (nth i sub-starts)
for sub-end = (nth i sub-ends)
for sub = (if (and sub-start sub-end)
(subseq string sub-start sub-end)
nil)
unless (string= ex-sub sub)
do (push (format nil
"~&\\~A: expected ~:[NIL~;~:*'~A'~] but got ~:[NIL~;~:*'~A'~]"
(1+ i) ex-sub sub) errors)))
(push (format nil "~&expected ~:[NIL~;~:*'~A'~] but got ~:[NIL~;~:*'~A'~]"
ex-result result1)
errors)))))
#+(or scl lispworks)
(when threaded
(let ((thread-result (threaded-scan scanner string)))
(when thread-result
(push thread-result errors))))))
(condition (msg)
(unless perl-error
(push (format nil "~&got an unexpected error: '~A'" msg)
errors))))
(setq errors (nreverse errors))
(cond (errors
(when (or (<= factor 1) (zerop perl-time))
(format t "~&~4@A (~A):~{~& ~A~}"
counter info-string errors)))
((and (> factor 1) (plusp perl-time))
(let ((result (time-regex factor regex string
:case-insensitive-mode case-insensitive-mode
:multi-line-mode multi-line-mode
:single-line-mode single-line-mode
:extended-mode extended-mode)))
(format t "~&~4@A: ~,4F (~A repetitions, Perl: ~,4F seconds, CL-PPCRE: ~,4F seconds)" counter
(float (/ result perl-time)) factor perl-time result)))
(t nil)))))

View File

@@ -0,0 +1,755 @@
;;; -*- Mode: LISP; Syntax: COMMON-LISP; Package: CL-PPCRE; Base: 10 -*-
;;; $Header: /usr/local/cvsrep/cl-ppcre/regex-class.lisp,v 1.1.1.1 2002/12/20 10:10:44 edi Exp $
;;; This file defines the REGEX class and some utility methods for
;;; this class. REGEX objects are used to represent the (transformed)
;;; parse trees internally
;;; Copyright (c) 2002, Dr. Edmund Weitz. All rights reserved.
;;; Redistribution and use in source and binary forms, with or without
;;; modification, are permitted provided that the following conditions
;;; are met:
;;; * Redistributions of source code must retain the above copyright
;;; notice, this list of conditions and the following disclaimer.
;;; * Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials
;;; provided with the distribution.
;;; THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESSED
;;; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
;;; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;;; ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
;;; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
;;; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
;;; GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
;;; WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(in-package "CL-PPCRE")
(defclass regex ()
()
(:documentation "The REGEX base class. All other classes inherit from this one."))
(defclass seq (regex)
((elements :initarg :elements
:accessor elements
:type cons
:documentation "A list of REGEX objects."))
(:documentation "SEQ objects represents sequences of regexes.
(Like \"ab\" is the sequence of \"a\" and \"b\".)"))
(defclass alternation (regex)
((choices :initarg :choices
:accessor choices
:type cons
:documentation "A list of REGEX objects"))
(:documentation "ALTERNATION objects represent alternations of regexes.
(Like \"a|b\" ist the alternation of \"a\" or \"b\".)"))
(defclass lookahead (regex)
((regex :initarg :regex
:accessor regex
:documentation "The REGEX object we're checking.")
(positivep :initarg :positivep
:reader positivep
:documentation "Whether this assertion is positive."))
(:documentation "LOOKAHEAD objects represent look-ahead assertions."))
(defclass lookbehind (regex)
((regex :initarg :regex
:accessor regex
:documentation "The REGEX object we're checking.")
(positivep :initarg :positivep
:reader positivep
:documentation "Whether this assertion is positive.")
(len :initarg :len
:accessor len
:type fixnum
:documentation "The (fixed) length of the enclosed regex."))
(:documentation "LOOKBEHIND objects represent look-behind assertions."))
(defclass repetition (regex)
((regex :initarg :regex
:accessor regex
:documentation "The REGEX that's repeated.")
(greedyp :initarg :greedyp
:reader greedyp
:documentation "Whether the repetition is greedy.")
(minimum :initarg :minimum
:accessor minimum
:type fixnum
:documentation "The minimal number of repetitions.")
(maximum :initarg :maximum
:accessor maximum
:documentation "The maximal number of repetitions.
Can be NIL for unbounded.")
(min-len :initarg :min-len
:reader min-len
:documentation "The minimal length of the enclosed regex.")
(len :initarg :len
:reader len
:documentation "The length of the enclosed regex. NIL if unknown.")
(min-rest :initform 0
:accessor min-rest
:type fixnum
:documentation "The minimal number of characters which must
appear after this repetition.")
(contains-register-p :initarg :contains-register-p
:reader contains-register-p
:documentation "If the regex contains a register."))
(:documentation "REPETITION objects represent repetitions of regexes."))
(defclass register (regex)
((regex :initarg :regex
:accessor regex
:documentation "The inner regex.")
(num :initarg :num
:reader num
:type fixnum
:documentation "The number of this register, starting from 0.
This is the index into *REGS-START* and *REGS-END*."))
(:documentation "REGISTER objects represent register groups."))
(defclass standalone (regex)
((regex :initarg :regex
:accessor regex
:documentation "The inner regex."))
(:documentation "A standalone regular expression."))
(defclass back-reference (regex)
((num :initarg :num
:reader num
:type fixnum
:documentation "The number of the register this reference refers to.")
(case-insensitive-p :initarg :case-insensitive-p
:reader case-insensitive-p
:documentation "Whether we check case-insensitively."))
(:documentation "BACK-REFERENCE objects represent backreferences."))
(defclass char-class (regex)
((hash :initarg :hash
:reader hash
:type hash-table
:documentation "A hash table the keys of which are the characters;
the values are always T.")
(case-insensitive-p :initarg :case-insensitive-p
:reader case-insensitive-p
:documentation "If the char class case-insensitive.")
(invertedp :initarg :invertedp
:reader invertedp
:documentation "Whether we mean the inverse of the char class.")
(word-char-class-p :initarg :word-char-class-p
:reader word-char-class-p
:documentation "Whether this CHAR CLASS
represents the special class WORD-CHAR-CLASS."))
(:documentation "CHAR-CLASS objects represent character classes."))
(defclass str (regex)
((str :initarg :str
:accessor str
:type string
:documentation "The actual string.")
(len :initform 0
:accessor len
:type fixnum
:documentation "The length of the string.")
(case-insensitive-p :initarg :case-insensitive-p
:reader case-insensitive-p
:documentation "If we match case-insensitively.")
(offset :initform nil
:accessor offset
:documentation "Offset from the left of the whole parse tree.
The first regex has offset 0.
NIL if unknown, i.e. behind a variable-length regex.")
(skip :initform nil
:initarg :skip
:accessor skip
:documentation "If we can avoid testing for this string
because the SCAN function has done this already.")
(start-of-end-string-p :initform nil
:accessor start-of-end-string-p
:documentation "If this is the unique STR which
starts END-STRING (a slot of MATCHER)."))
(:documentation "STR objects represent string."))
(defmethod initialize-instance :after ((str str) &rest init-args)
(declare (optimize speed space))
(declare (ignore init-args))
"Automatically computes the length of a STR after initialization."
(setf (len str) (length (str str))))
(defclass anchor (regex)
((startp :initarg :startp
:reader startp
:documentation "Whether this is a \"start anchor\".")
(multi-line-p :initarg :multi-line-p
:reader multi-line-p
:documentation "Whether we're in multi-line mode,
i.e. whether each #\Newline is surrounded by anchors.")
(no-newline-p :initarg :no-newline-p
:reader no-newline-p
:documentation "Whether we ignore #\Newline at the end."))
(:documentation "ANCHOR objects represent anchors like \"^\" or \"$\"."))
(defclass everything (regex)
((single-line-p :initarg :single-line-p
:reader single-line-p
:documentation "Whether we're in single-line mode,
i.e. whether we also match #\Newline."))
(:documentation "EVERYTHING objects represent regexes matching
\"everything\", i.e. dots."))
(defclass word-boundary (regex)
((negatedp :initarg :negatedp
:reader negatedp
:documentation "Whether we mean the opposite,
i.e. no word-boundary."))
(:documentation "WORD-BOUNDARY objects represent word-boundary assertions."))
(defclass branch (regex)
((test :initarg :test
:reader test
:documentation "The test of this branch, one of LOOKAHEAD,
LOOKBEHIND, or a number.")
(then-regex :initarg :then-regex
:reader then-regex
:documentation "The regex that's to be matched if the
test succeeds.")
(else-regex :initarg :else-regex
:initform (make-instance 'void)
:reader else-regex
:documentation "The regex that's to be matched if the
test fails."))
(:documentation "BRANCH objects represent Perl's conditional regular
expressions."))
(defclass void (regex)
()
(:documentation "VOID objects represent empty regular expressions."))
;;; The following four methods allow a VOID object to behave like a
;;; zero-length STR object (only readers needed)
(defmethod len ((void void))
(declare (optimize speed space))
0)
(defmethod str ((void void))
(declare (optimize speed space))
"")
(defmethod skip ((void void))
(declare (optimize speed space))
nil)
(defmethod start-of-end-string-p ((void void))
(declare (optimize speed space))
nil)
;;; The PRINT-OBJECT below are here only for debugging purposes, they
;;; aren't used by the regex engine.
(defmethod print-object ((seq seq) stream)
(print-unreadable-object (seq stream :type t)
(with-slots ((elements elements))
seq
(prin1 (first elements))
(format stream "~{ ~S~}" (rest elements)))))
(defmethod print-object ((alternation alternation) stream)
(print-unreadable-object (alternation stream :type t)
(with-slots ((choices choices))
alternation
(prin1 (first choices))
(format stream "~{~S ~}" (rest choices)))))
(defmethod print-object ((branch branch) stream)
(print-unreadable-object (branch stream :type t)
(format stream "(~S) ~S ~S" (test branch) (then-regex branch) (else-regex branch))))
(defmethod print-object ((lookahead lookahead) stream)
(print-unreadable-object (lookahead stream :type t)
(format stream "~A~S ~S"
(if (positivep lookahead) "" "(neg.) ")
(regex lookahead))))
(defmethod print-object ((lookbehind lookbehind) stream)
(print-unreadable-object (lookbehind stream :type t)
(format stream "~A~S"
(if (positivep lookbehind) "" "(neg.) ")
(regex lookbehind))))
(defmethod print-object ((repetition repetition) stream)
(print-unreadable-object (repetition stream :type t)
(format stream "(~A ~A ~A) ~S"
(if (greedyp repetition) "greedy" "non-greedy")
(minimum repetition)
(maximum repetition)
(regex repetition))))
(defmethod print-object ((register register) stream)
(print-unreadable-object (register stream :type t)
(format stream "~A ~S"
(num register)
(regex register))))
(defmethod print-object ((standalone standalone) stream)
(print-unreadable-object (standalone stream :type t)
(format stream "~S"
(regex standalone))))
(defmethod print-object ((back-reference back-reference) stream)
(print-unreadable-object (back-reference stream :type t)
(format stream "~A~A"
(if (case-insensitive-p back-reference) "(c-i) " "")
(num back-reference))))
(defmethod print-object ((char-class char-class) stream)
(print-unreadable-object (char-class stream :type t)
(if (word-char-class-p char-class)
(format stream "~Aalphanumeric"
(if (invertedp char-class) "(inv.) " ""))
(format stream "~Awith ~A element~:P"
(if (invertedp char-class) "(inv.) " "")
(hash-table-count (hash char-class))))))
(defmethod print-object ((str str) stream)
(print-unreadable-object (str stream :type t)
(format stream "~S~A"
(str str)
(if (case-insensitive-p str) " (c-i)" ""))))
(defmethod print-object ((void void) stream)
(print-unreadable-object (void stream)
(prin1 'void)))
(defgeneric case-mode (regex old-case-mode)
(declare (optimize speed space))
(:documentation "Utility function used by the optimizer (see GATHER-STRINGS).
Returns a keyword denoting the case-(in)sensitivity of a STR or its
second argument if the STR has length 0. Returns NIL for REGEX objects
which are not of type STR."))
(defmethod case-mode ((str str) old-case-mode)
(cond ((zerop (len str))
old-case-mode)
((case-insensitive-p str)
:case-insensitive)
(t
:case-sensitive)))
(defmethod case-mode ((regex regex) old-case-mode)
(declare (ignore old-case-mode))
nil)
(defgeneric copy-regex (regex)
(declare (optimize speed space))
(:documentation "Implements a deep copy of a REGEX object."))
(defmethod copy-regex ((anchor anchor))
(make-instance 'anchor
:startp (startp anchor)
:multi-line-p (multi-line-p anchor)
:no-newline-p (no-newline-p anchor)))
(defmethod copy-regex ((everything everything))
(make-instance 'everything
:single-line-p (single-line-p everything)))
(defmethod copy-regex ((word-boundary word-boundary))
(make-instance 'copy-regex
:negatedp (negatedp word-boundary)))
(defmethod copy-regex ((void void))
(make-instance 'void))
(defmethod copy-regex ((lookahead lookahead))
(make-instance 'lookahead
:regex (copy-regex (regex lookahead))
:positivep (positivep lookahead)))
(defmethod copy-regex ((seq seq))
(make-instance 'seq
:elements (mapcar #'copy-regex (elements seq))))
(defmethod copy-regex ((alternation alternation))
(make-instance 'alternation
:choices (mapcar #'copy-regex (choices alternation))))
(defmethod copy-regex ((branch branch))
(with-slots ((test test))
branch
(make-instance 'branch
:test (if (typep test 'regex)
(copy-regex test)
test)
:then-regex (copy-regex (then-regex branch))
:else-regex (copy-regex (else-regex branch)))))
(defmethod copy-regex ((lookbehind lookbehind))
(make-instance 'lookbehind
:regex (copy-regex (regex lookbehind))
:positivep (positivep lookbehind)
:len (len lookbehind)))
(defmethod copy-regex ((repetition repetition))
(make-instance 'repetition
:regex (copy-regex (regex repetition))
:greedyp (greedyp repetition)
:minimum (minimum repetition)
:maximum (maximum repetition)
:min-len (min-len repetition)
:len (len repetition)
:contains-register-p (contains-register-p repetition)))
(defmethod copy-regex ((register register))
(make-instance 'register
:regex (copy-regex (regex register))
:num (num register)))
(defmethod copy-regex ((standalone standalone))
(make-instance 'standalone
:regex (copy-regex (regex standalone))))
(defmethod copy-regex ((back-reference back-reference))
(make-instance 'back-reference
:num (num back-reference)
:case-insensitive-p (case-insensitive-p back-reference)))
(defmethod copy-regex ((char-class char-class))
(make-instance 'char-class
:hash (hash char-class)
:case-insensitive-p (case-insensitive-p char-class)
:invertedp (invertedp char-class)
:word-char-class-p (word-char-class-p char-class)))
(defmethod copy-regex ((str str))
(make-instance 'str
:str (str str)
:case-insensitive-p (case-insensitive-p str)))
;;; Note that COPY-REGEX and REMOVE-REGISTERS could have easily been
;;; wrapped into one function. Maybe in the next release...
;;; Further note that this function is used by CONVERT to factor out
;;; complicated repetitions, i.e. cases like
;;; (a)* -> (?:a*(a))?
;;; This won't work for, say,
;;; ((a)|(b))* -> (?:(?:a|b)*((a)|(b)))?
;;; and therefore we stop REGISTER removal once we see an ALTERNATION.
(defgeneric remove-registers (regex)
(declare (optimize speed space))
(:documentation "Returns a deep copy of a REGEX (see COPY-REGEX) and
optionally removes embedded REGISTER objects if possible and if the
special variable REMOVE-REGISTERS-P is true."))
(defmethod remove-registers ((register register))
(declare (special remove-registers-p reg-seen))
(cond (remove-registers-p
(remove-registers (regex register)))
(t
;; mark REG-SEEN as true so enclosing REPETITION objects
;; (see method below) know if they contain a register or not
(setq reg-seen t)
(copy-regex register))))
(defmethod remove-registers ((repetition repetition))
(let* (reg-seen
(inner-regex (remove-registers (regex repetition))))
;; REMOVE-REGISTERS will set REG-SEEN (see method above) if
;; (REGEX REPETITION) contains a REGISTER
(declare (special reg-seen))
(make-instance 'repetition
:regex inner-regex
:greedyp (greedyp repetition)
:minimum (minimum repetition)
:maximum (maximum repetition)
:min-len (min-len repetition)
:len (len repetition)
:contains-register-p reg-seen)))
(defmethod remove-registers ((standalone standalone))
(make-instance 'standalone
:regex (remove-registers (regex standalone))))
(defmethod remove-registers ((lookahead lookahead))
(make-instance 'lookahead
:regex (remove-registers (regex lookahead))
:positivep (positivep lookahead)))
(defmethod remove-registers ((lookbehind lookbehind))
(make-instance 'lookbehind
:regex (remove-registers (regex lookbehind))
:positivep (positivep lookbehind)
:len (len lookbehind)))
(defmethod remove-registers ((branch branch))
(with-slots ((test test))
branch
(make-instance 'branch
:test (if (typep test 'regex)
(remove-registers test)
test)
:then-regex (remove-registers (then-regex branch))
:else-regex (remove-registers (else-regex branch)))))
(defmethod remove-registers ((alternation alternation))
(declare (special remove-registers-p))
;; an ALTERNATION, so we can't remove REGISTER objects further down
(setq remove-registers-p nil)
(copy-regex alternation))
(defmethod remove-registers ((regex regex))
(copy-regex regex))
(defmethod remove-registers ((seq seq))
(make-instance 'seq
:elements (mapcar #'remove-registers (elements seq))))
(defgeneric everythingp (regex)
(declare (optimize speed space))
(:documentation "Returns an EVERYTHING object if REGEX is equivalent
to this object, otherwise NIL. So, \"(.){1}\" would return true
(i.e. the object corresponding to \".\", for example."))
(defmethod everythingp ((seq seq))
;; we might have degenerate cases like (:SEQUENCE :VOID ...)
;; due to the parsing process
(let ((cleaned-elements (remove-if #'(lambda (element)
(typep element 'void))
(elements seq))))
(and (= 1 (length cleaned-elements))
(everythingp (first cleaned-elements)))))
(defmethod everythingp ((alternation alternation))
(with-slots ((choices choices))
alternation
(and (= 1 (length choices))
;; this is unlikely to happen for human-generated regexes,
;; but machine-generated ones might look like this
(everythingp (first choices)))))
(defmethod everythingp ((repetition repetition))
(with-slots ((maximum maximum)
(minimum minimum)
(regex regex))
repetition
(and maximum
(= 1 minimum maximum)
;; treat "<regex>{1,1}" like "<regex>"
(everythingp regex))))
(defmethod everythingp ((register register))
(everythingp (regex register)))
(defmethod everythingp ((standalone standalone))
(everythingp (regex standalone)))
(defmethod everythingp ((everything everything))
everything)
(defmethod everythingp ((regex regex))
;; the general case for ANCHOR, BACK-REFERENCE, BRANCH, CHAR-CLASS,
;; LOOKAHEAD, LOOKBEHIND, STR, VOID, and WORD-BOUNDARY
nil)
(defgeneric regex-length (regex)
(declare (optimize speed space))
(:documentation "Return the length of REGEX if it is fixed, NIL otherwise."))
(defmethod regex-length ((seq seq))
;; simply add all inner lengths unless one of them is NIL
(loop for sub-regex in (elements seq)
for len = (regex-length sub-regex)
if (not len) do (return nil)
sum len))
(defmethod regex-length ((alternation alternation))
;; only return a true value if all inner lengths are non-NIL and
;; mutually equal
(loop for sub-regex in (choices alternation)
for old-len = nil then len
for len = (regex-length sub-regex)
if (or (not len)
(and old-len (/= len old-len))) do (return nil)
finally (return len)))
(defmethod regex-length ((branch branch))
;; only return a true value if both alternations have a length and
;; if they're equal
(let ((then-length (regex-length (then-regex branch))))
(and then-length
(eql then-length (regex-length (else-regex branch)))
then-length)))
(defmethod regex-length ((repetition repetition))
;; we can only compute the length of a REPETITION object if the
;; number of repetitions is fixed; note that we don't call
;; REGEX-LENGTH for the inner regex, we assume that the LEN slot is
;; always set correctly
(with-slots ((len len) (minimum minimum) (maximum maximum))
repetition
(if (and len
(eq minimum maximum))
(* minimum len)
nil)))
(defmethod regex-length ((register register))
(regex-length (regex register)))
(defmethod regex-length ((standalone standalone))
(regex-length (regex standalone)))
(defmethod regex-length ((back-reference back-reference))
;; with enough effort we could possibly do better here, but
;; currently we just give up and return NIL
nil)
(defmethod regex-length ((char-class char-class))
1)
(defmethod regex-length ((everything everything))
1)
(defmethod regex-length ((str str))
(len str))
(defmethod regex-length ((regex regex))
;; the general case for ANCHOR, LOOKAHEAD, LOOKBEHIND, VOID, and
;; WORD-BOUNDARY (which all have zero-length)
0)
(defgeneric regex-min-length (regex)
(declare (optimize speed space))
(:documentation "Returns the minimal length of REGEX."))
(defmethod regex-min-length ((seq seq))
;; simply add all inner minimal lengths
(loop for sub-regex in (elements seq)
for len = (regex-min-length sub-regex)
sum len))
(defmethod regex-min-length ((alternation alternation))
;; minimal length of an alternation is the minimal length of the
;; "shortest" element
(loop for sub-regex in (choices alternation)
for len = (regex-min-length sub-regex)
minimize len))
(defmethod regex-min-length ((branch branch))
;; minimal length of both alternations
(min (regex-min-length (then-regex branch))
(regex-min-length (else-regex branch))))
(defmethod regex-min-length ((repetition repetition))
;; obviously the product of the inner minimal length and the minimal
;; number of repetitions
(* (minimum repetition) (min-len repetition)))
(defmethod regex-min-length ((register register))
(regex-min-length (regex register)))
(defmethod regex-min-length ((standalone standalone))
(regex-min-length (regex standalone)))
(defmethod regex-min-length ((char-class char-class))
1)
(defmethod regex-min-length ((everything everything))
1)
(defmethod regex-min-length ((str str))
(len str))
(defmethod regex-min-length ((regex regex))
;; the general case for ANCHOR, BACK-REFERENCE, LOOKAHEAD,
;; LOOKBEHIND, VOID, and WORD-BOUNDARY
0)
(defgeneric compute-offsets (regex start-pos)
(declare (optimize speed space))
(:documentation "Returns the offset the following regex would have
relative to START-POS or NIL if we can't compute it. Sets the OFFSET
slot of REGEX to START-POS if REGEX is a STR. May also affect OFFSET
slots of STR objects further down the tree."))
;; note that we're actually only interested in the offset of
;; "top-level" STR objects (see ADVANCE-FN in the SCAN function) so we
;; can stop at variable-length alternations and don't need to descend
;; into repetitions
(defmethod compute-offsets ((seq seq) start-pos)
(loop for element in (elements seq)
;; advance offset argument for next call while looping through
;; the elements
for pos = start-pos then curr-offset
for curr-offset = (compute-offsets element pos)
while curr-offset
finally (return curr-offset)))
(defmethod compute-offsets ((alternation alternation) start-pos)
(loop for choice in (choices alternation)
for old-offset = nil then curr-offset
for curr-offset = (compute-offsets choice start-pos)
;; we stop immediately if two alternations don't result in the
;; same offset
if (or (not curr-offset)
(and old-offset (/= curr-offset old-offset)))
do (return nil)
finally (return curr-offset)))
(defmethod compute-offsets ((branch branch) start-pos)
;; only return offset if both alternations have equal value
(let ((then-offset (compute-offsets (then-regex branch) start-pos)))
(and then-offset
(eql then-offset (compute-offsets (else-regex branch) start-pos))
then-offset)))
(defmethod compute-offsets ((repetition repetition) start-pos)
;; no need to descend into the inner regex
(with-slots ((len len)
(minimum minimum)
(maximum maximum))
repetition
(if (and len
(eq minimum maximum))
;; fixed number of repetitions, so we know how to proceed
(+ start-pos (* minimum len))
;; otherwise return NIL
nil)))
(defmethod compute-offsets ((register register) start-pos)
(compute-offsets (regex register) start-pos))
(defmethod compute-offsets ((standalone standalone) start-pos)
(compute-offsets (regex standalone) start-pos))
(defmethod compute-offsets ((char-class char-class) start-pos)
(1+ start-pos))
(defmethod compute-offsets ((everything everything) start-pos)
(1+ start-pos))
(defmethod compute-offsets ((str str) start-pos)
(setf (offset str) start-pos)
(+ start-pos (len str)))
(defmethod compute-offsets ((back-reference back-reference) start-pos)
;; with enough effort we could possibly do better here, but
;; currently we just give up and return NIL
(declare (ignore start-pos))
nil)
(defmethod compute-offsets ((regex regex) start-pos)
;; the general case for ANCHOR, LOOKAHEAD, LOOKBEHIND, VOID, and
;; WORD-BOUNDARY (which all have zero-length)
start-pos)

View File

@@ -0,0 +1,868 @@
;;; -*- Mode: LISP; Syntax: COMMON-LISP; Package: CL-PPCRE; Base: 10 -*-
;;; $Header: /usr/local/cvsrep/cl-ppcre/repetition-closures.lisp,v 1.1.1.1 2002/12/20 10:10:44 edi Exp $
;;; This is actually a part of closures.lisp which we put into a
;;; separate file because it is rather complex. We only deal with
;;; REPETITIONs here. Note that this part of the code contains some
;;; rather crazy micro-optimizations which were introduced to be as
;;; competitive with Perl as possible in tight loops.
;;; Copyright (c) 2002, Dr. Edmund Weitz. All rights reserved.
;;; Redistribution and use in source and binary forms, with or without
;;; modification, are permitted provided that the following conditions
;;; are met:
;;; * Redistributions of source code must retain the above copyright
;;; notice, this list of conditions and the following disclaimer.
;;; * Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials
;;; provided with the distribution.
;;; THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESSED
;;; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
;;; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;;; ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
;;; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
;;; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
;;; GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
;;; WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(in-package "CL-PPCRE")
(defmacro incf-after (place &optional (delta 1) &environment env)
"Utility macro inspired by C's \"place++\", i.e. first return the
value of PLACE and afterwards increment it by DELTA."
(let ((=temp= (gensym)))
(multiple-value-bind (vars vals store-vars writer-form reader-form)
(get-setf-expansion place env)
`(let* (,@(mapcar #'list vars vals)
(,=temp= ,reader-form)
(,(car store-vars) (+ ,=temp= ,delta)))
,writer-form
,=temp=))))
;; code for greedy repetitions with minimum zero
(defmacro greedy-constant-length-closure (check-curr-pos)
"This is the template for simple greedy repetitions (where simple
means that the minimum number of repetitions is zero, that the inner
regex to be checked is of fixed length LEN, and that it doesn't
contain registers, i.e. there's no need for backtracking).
CHECK-CURR-POS is a form which checks whether the inner regex of the
repetition matches at CURR-POS."
`(if maximum
(lambda (start-pos)
(declare (type fixnum start-pos maximum))
;; because we know LEN we know in advance where to stop at the
;; latest; we also take into consideration MIN-REST, i.e. the
;; minimal length of the part behind the repetition
(let ((target-end-pos (min (1+ (- *end-pos* len min-rest))
;; don't go further than MAXIMUM
;; repetitions, of course
(+ start-pos
(the fixnum (* len maximum)))))
(curr-pos start-pos))
(declare (type fixnum target-end-pos curr-pos))
(block greedy-constant-length-matcher
;; we use an ugly TAGBODY construct because this might be a
;; tight loop and this version is a bit faster than our LOOP
;; version (at least in CMUCL)
(tagbody
forward-loop
;; first go forward as far as possible, i.e. while
;; the inner regex matches
(when (>= curr-pos target-end-pos)
(go backward-loop))
(when ,check-curr-pos
(incf curr-pos len)
(go forward-loop))
backward-loop
;; now go back LEN steps each until we're able to match
;; the rest of the regex
(when (< curr-pos start-pos)
(return-from greedy-constant-length-matcher nil))
(let ((result (funcall next-fn curr-pos)))
(when result
(return-from greedy-constant-length-matcher result)))
(decf curr-pos len)
(go backward-loop)))))
;; basically the same code; it's just a bit easier because we're
;; not bounded by MAXIMUM
(lambda (start-pos)
(declare (type fixnum start-pos))
(let ((target-end-pos (1+ (- *end-pos* len min-rest)))
(curr-pos start-pos))
(declare (type fixnum target-end-pos curr-pos))
(block greedy-constant-length-matcher
(tagbody
forward-loop
(when (>= curr-pos target-end-pos)
(go backward-loop))
(when ,check-curr-pos
(incf curr-pos len)
(go forward-loop))
backward-loop
(when (< curr-pos start-pos)
(return-from greedy-constant-length-matcher nil))
(let ((result (funcall next-fn curr-pos)))
(when result
(return-from greedy-constant-length-matcher result)))
(decf curr-pos len)
(go backward-loop)))))))
(defun create-greedy-everything-matcher (maximum min-rest next-fn)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
(declare (type fixnum min-rest)
(type function next-fn))
"Creates a closure which just matches as far ahead as possible,
i.e. a closure for a dot in single-line mode."
(if maximum
(lambda (start-pos)
(declare (type fixnum start-pos maximum))
;; because we know LEN we know in advance where to stop at the
;; latest; we also take into consideration MIN-REST, i.e. the
;; minimal length of the part behind the repetition
(let ((target-end-pos (min (+ start-pos maximum)
(- *end-pos* min-rest))))
(declare (type fixnum target-end-pos))
;; start from the highest possible position and go backward
;; until we're able to match the rest of the regex
(loop for curr-pos of-type fixnum from target-end-pos downto start-pos
thereis (funcall next-fn curr-pos))))
;; basically the same code; it's just a bit easier because we're
;; not bounded by MAXIMUM
(lambda (start-pos)
(declare (type fixnum start-pos))
(let ((target-end-pos (- *end-pos* min-rest)))
(declare (type fixnum target-end-pos))
(loop for curr-pos of-type fixnum from target-end-pos downto start-pos
thereis (funcall next-fn curr-pos))))))
(defmethod create-greedy-constant-length-matcher ((repetition repetition)
next-fn)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Creates a closure which tries to match REPETITION. It is assumed
that REPETITION is greedy and the minimal number of repetitions is
zero. It is furthermore assumed that the inner regex of REPETITION is
of fixed length and doesn't contain registers."
(let ((len (len repetition))
(maximum (maximum repetition))
(regex (regex repetition))
(min-rest (min-rest repetition)))
(declare (type fixnum len min-rest)
(type function next-fn))
(cond ((zerop len)
;; inner regex has zero-length, so we can discard it
;; completely
next-fn)
(t
;; now first try to optimize for a couple of common cases
(typecase regex
(str
(let ((str (str regex)))
(if (= 1 len)
;; a single character
(let ((chr (schar str 0)))
(if (case-insensitive-p regex)
(greedy-constant-length-closure
(char-equal chr (schar *string* curr-pos)))
(greedy-constant-length-closure
(char= chr (schar *string* curr-pos)))))
;; a string
(if (case-insensitive-p regex)
(greedy-constant-length-closure
(*string*-equal str curr-pos (+ curr-pos len) 0 len))
(greedy-constant-length-closure
(*string*= str curr-pos (+ curr-pos len) 0 len))))))
(char-class
;; a character class
(insert-char-class-tester (regex (schar *string* curr-pos))
(if (invertedp regex)
(greedy-constant-length-closure
(not (char-class-test)))
(greedy-constant-length-closure
(char-class-test)))))
(everything
;; an EVERYTHING object, i.e. a dot
(if (single-line-p regex)
(create-greedy-everything-matcher maximum min-rest next-fn)
(greedy-constant-length-closure
(char/= #\Newline (schar *string* curr-pos)))))
(otherwise
;; the general case - we build an inner matcher which
;; just checks for immediate success, i.e. NEXT-FN is
;; #'IDENTITY
(let ((inner-matcher (create-matcher-aux regex #'identity)))
(declare (type function inner-matcher))
(greedy-constant-length-closure
(funcall inner-matcher curr-pos)))))))))
(defmethod create-greedy-no-zero-matcher ((repetition repetition) next-fn)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Creates a closure which tries to match REPETITION. It is assumed
that REPETITION is greedy and the minimal number of repetitions is
zero. It is furthermore assumed that the inner regex of REPETITION can
never match a zero-length string (or instead the maximal number of
repetitions is 1)."
(let ((maximum (maximum repetition))
;; REPEAT-MATCHER is part of the closure's environment but it
;; can only be defined after GREEDY-AUX is defined
repeat-matcher)
(declare (type function next-fn))
(cond
((eql maximum 1)
;; this is essentially like the next case but with a known
;; MAXIMUM of 1 we can get away without a counter; note that
;; we always arrive here if CONVERT optimizes <regex>* to
;; (?:<regex'>*<regex>)?
(setq repeat-matcher
(create-matcher-aux (regex repetition) next-fn))
(lambda (start-pos)
(declare (type function repeat-matcher))
(or (funcall repeat-matcher start-pos)
(funcall next-fn start-pos))))
(maximum
;; we make a reservation for our slot in *REPEAT-COUNTERS*
;; because we need to keep track whether we've reached MAXIMUM
;; repetitions
(let ((rep-num (incf-after *rep-num*)))
(flet ((greedy-aux (start-pos)
(declare (type fixnum start-pos maximum rep-num)
(type function repeat-matcher))
;; the actual matcher which first tries to match the
;; inner regex of REPETITION (if we haven't done so
;; too often) and on failure calls NEXT-FN
(or (and (< (aref *repeat-counters* rep-num) maximum)
(incf (aref *repeat-counters* rep-num))
;; note that REPEAT-MATCHER will call
;; GREEDY-AUX again recursively
(prog1
(funcall repeat-matcher start-pos)
(decf (aref *repeat-counters* rep-num))))
(funcall next-fn start-pos))))
;; create a closure to match the inner regex and to
;; implement backtracking via GREEDY-AUX
(setq repeat-matcher
(create-matcher-aux (regex repetition) #'greedy-aux))
;; the closure we return is just a thin wrapper around
;; GREEDY-AUX to initialize the repetition counter
(lambda (start-pos)
(declare (type fixnum start-pos))
(setf (aref *repeat-counters* rep-num) 0)
(greedy-aux start-pos)))))
(t
;; easier code because we're not bounded by MAXIMUM, but
;; basically the same
(flet ((greedy-aux (start-pos)
(declare (type fixnum start-pos)
(type function repeat-matcher))
(or (funcall repeat-matcher start-pos)
(funcall next-fn start-pos))))
(setq repeat-matcher
(create-matcher-aux (regex repetition) #'greedy-aux))
#'greedy-aux)))))
(defmethod create-greedy-matcher ((repetition repetition) next-fn)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Creates a closure which tries to match REPETITION. It is assumed
that REPETITION is greedy and the minimal number of repetitions is
zero."
(let ((maximum (maximum repetition))
;; we make a reservation for our slot in *LAST-POS-STORES* because
;; we have to watch out for endless loops as the inner regex might
;; match zero-length strings
(zero-length-num (incf-after *zero-length-num*))
;; REPEAT-MATCHER is part of the closure's environment but it
;; can only be defined after GREEDY-AUX is defined
repeat-matcher)
(declare (type fixnum zero-length-num)
(type function next-fn))
(cond
(maximum
;; we make a reservation for our slot in *REPEAT-COUNTERS*
;; because we need to keep track whether we've reached MAXIMUM
;; repetitions
(let ((rep-num (incf-after *rep-num*)))
(flet ((greedy-aux (start-pos)
;; the actual matcher which first tries to match the
;; inner regex of REPETITION (if we haven't done so
;; too often) and on failure calls NEXT-FN
(declare (type fixnum start-pos maximum rep-num)
(type function repeat-matcher))
(let ((old-last-pos
(svref *last-pos-stores* zero-length-num)))
(when (and old-last-pos
(= (the fixnum old-last-pos) start-pos))
;; stop immediately if we've been here before,
;; i.e. if the last attempt matched a zero-length
;; string
(return-from greedy-aux (funcall next-fn start-pos)))
;; otherwise remember this position for the next
;; repetition
(setf (svref *last-pos-stores* zero-length-num) start-pos)
(or (and (< (aref *repeat-counters* rep-num) maximum)
(incf (aref *repeat-counters* rep-num))
;; note that REPEAT-MATCHER will call
;; GREEDY-AUX again recursively
(prog1
(funcall repeat-matcher start-pos)
(decf (aref *repeat-counters* rep-num))
(setf (svref *last-pos-stores* zero-length-num)
old-last-pos)))
(funcall next-fn start-pos)))))
;; create a closure to match the inner regex and to
;; implement backtracking via GREEDY-AUX
(setq repeat-matcher
(create-matcher-aux (regex repetition) #'greedy-aux))
;; the closure we return is just a thin wrapper around
;; GREEDY-AUX to initialize the repetition counter and our
;; slot in *LAST-POS-STORES*
(lambda (start-pos)
(declare (type fixnum start-pos))
(setf (aref *repeat-counters* rep-num) 0
(svref *last-pos-stores* zero-length-num) nil)
(greedy-aux start-pos)))))
(t
;; easier code because we're not bounded by MAXIMUM, but
;; basically the same
(flet ((greedy-aux (start-pos)
(declare (type fixnum start-pos)
(type function repeat-matcher))
(let ((old-last-pos
(svref *last-pos-stores* zero-length-num)))
(when (and old-last-pos
(= (the fixnum old-last-pos) start-pos))
(return-from greedy-aux (funcall next-fn start-pos)))
(setf (svref *last-pos-stores* zero-length-num) start-pos)
(or (prog1
(funcall repeat-matcher start-pos)
(setf (svref *last-pos-stores* zero-length-num) old-last-pos))
(funcall next-fn start-pos)))))
(setq repeat-matcher
(create-matcher-aux (regex repetition) #'greedy-aux))
(lambda (start-pos)
(declare (type fixnum start-pos))
(setf (svref *last-pos-stores* zero-length-num) nil)
(greedy-aux start-pos)))))))
;; code for non-greedy repetitions with minimum zero
(defmacro non-greedy-constant-length-closure (check-curr-pos)
"This is the template for simple non-greedy repetitions (where
simple means that the minimum number of repetitions is zero, that the
inner regex to be checked is of fixed length LEN, and that it doesn't
contain registers, i.e. there's no need for backtracking).
CHECK-CURR-POS is a form which checks whether the inner regex of the
repetition matches at CURR-POS."
`(if maximum
(lambda (start-pos)
(declare (type fixnum start-pos maximum))
;; because we know LEN we know in advance where to stop at the
;; latest; we also take into consideration MIN-REST, i.e. the
;; minimal length of the part behind the repetition
(let ((target-end-pos (min (1+ (- *end-pos* len min-rest))
(+ start-pos
(the fixnum (* len maximum))))))
;; move forward by LEN and always try NEXT-FN first, then
;; CHECK-CUR-POS
(loop for curr-pos of-type fixnum from start-pos
below target-end-pos
by len
thereis (funcall next-fn curr-pos)
while ,check-curr-pos
finally (return (funcall next-fn curr-pos)))))
;; basically the same code; it's just a bit easier because we're
;; not bounded by MAXIMUM
(lambda (start-pos)
(declare (type fixnum start-pos))
(let ((target-end-pos (1+ (- *end-pos* len min-rest))))
(loop for curr-pos of-type fixnum from start-pos
below target-end-pos
by len
thereis (funcall next-fn curr-pos)
while ,check-curr-pos
finally (return (funcall next-fn curr-pos)))))))
(defmethod create-non-greedy-constant-length-matcher ((repetition repetition) next-fn)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Creates a closure which tries to match REPETITION. It is assumed
that REPETITION is non-greedy and the minimal number of repetitions is
zero. It is furthermore assumed that the inner regex of REPETITION is
of fixed length and doesn't contain registers."
(let ((len (len repetition))
(maximum (maximum repetition))
(regex (regex repetition))
(min-rest (min-rest repetition)))
(declare (type fixnum len min-rest)
(type function next-fn))
(cond ((zerop len)
;; inner regex has zero-length, so we can discard it
;; completely
next-fn)
(t
;; now first try to optimize for a couple of common cases
(typecase regex
(str
(let ((str (str regex)))
(if (= 1 len)
;; a single character
(let ((chr (schar str 0)))
(if (case-insensitive-p regex)
(non-greedy-constant-length-closure
(char-equal chr (schar *string* curr-pos)))
(non-greedy-constant-length-closure
(char= chr (schar *string* curr-pos)))))
;; a string
(if (case-insensitive-p regex)
(non-greedy-constant-length-closure
(*string*-equal str curr-pos (+ curr-pos len) 0 len))
(non-greedy-constant-length-closure
(*string*= str curr-pos (+ curr-pos len) 0 len))))))
(char-class
;; a character class
(insert-char-class-tester (regex (schar *string* curr-pos))
(if (invertedp regex)
(non-greedy-constant-length-closure
(not (char-class-test)))
(non-greedy-constant-length-closure
(char-class-test)))))
(everything
(if (single-line-p regex)
;; a dot which really can match everything; we rely
;; on the compiler to optimize this away
(non-greedy-constant-length-closure
t)
;; a dot which has to watch out for #\Newline
(non-greedy-constant-length-closure
(char/= #\Newline (schar *string* curr-pos)))))
(otherwise
;; the general case - we build an inner matcher which
;; just checks for immediate success, i.e. NEXT-FN is
;; #'IDENTITY
(let ((inner-matcher (create-matcher-aux regex #'identity)))
(declare (type function inner-matcher))
(non-greedy-constant-length-closure
(funcall inner-matcher curr-pos)))))))))
(defmethod create-non-greedy-no-zero-matcher ((repetition repetition) next-fn)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Creates a closure which tries to match REPETITION. It is assumed
that REPETITION is non-greedy and the minimal number of repetitions is
zero. It is furthermore assumed that the inner regex of REPETITION can
never match a zero-length string (or instead the maximal number of
repetitions is 1)."
(let ((maximum (maximum repetition))
;; REPEAT-MATCHER is part of the closure's environment but it
;; can only be defined after NON-GREEDY-AUX is defined
repeat-matcher)
(declare (type function next-fn))
(cond
((eql maximum 1)
;; this is essentially like the next case but with a known
;; MAXIMUM of 1 we can get away without a counter
(setq repeat-matcher
(create-matcher-aux (regex repetition) next-fn))
(lambda (start-pos)
(declare (type function repeat-matcher))
(or (funcall next-fn start-pos)
(funcall repeat-matcher start-pos))))
(maximum
;; we make a reservation for our slot in *REPEAT-COUNTERS*
;; because we need to keep track whether we've reached MAXIMUM
;; repetitions
(let ((rep-num (incf-after *rep-num*)))
(flet ((non-greedy-aux (start-pos)
;; the actual matcher which first calls NEXT-FN and
;; on failure tries to match the inner regex of
;; REPETITION (if we haven't done so too often)
(declare (type fixnum start-pos maximum rep-num)
(type function repeat-matcher))
(or (funcall next-fn start-pos)
(and (< (aref *repeat-counters* rep-num) maximum)
(incf (aref *repeat-counters* rep-num))
;; note that REPEAT-MATCHER will call
;; NON-GREEDY-AUX again recursively
(prog1
(funcall repeat-matcher start-pos)
(decf (aref *repeat-counters* rep-num)))))))
;; create a closure to match the inner regex and to
;; implement backtracking via NON-GREEDY-AUX
(setq repeat-matcher
(create-matcher-aux (regex repetition) #'non-greedy-aux))
;; the closure we return is just a thin wrapper around
;; NON-GREEDY-AUX to initialize the repetition counter
(lambda (start-pos)
(declare (type fixnum start-pos))
(setf (aref *repeat-counters* rep-num) 0)
(non-greedy-aux start-pos)))))
(t
;; easier code because we're not bounded by MAXIMUM, but
;; basically the same
(flet ((non-greedy-aux (start-pos)
(declare (type fixnum start-pos)
(type function repeat-matcher))
(or (funcall next-fn start-pos)
(funcall repeat-matcher start-pos))))
(setq repeat-matcher
(create-matcher-aux (regex repetition) #'non-greedy-aux))
#'non-greedy-aux)))))
(defmethod create-non-greedy-matcher ((repetition repetition) next-fn)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Creates a closure which tries to match REPETITION. It is assumed
that REPETITION is non-greedy and the minimal number of repetitions is
zero."
;; we make a reservation for our slot in *LAST-POS-STORES* because
;; we have to watch out for endless loops as the inner regex might
;; match zero-length strings
(let ((zero-length-num (incf-after *zero-length-num*))
(maximum (maximum repetition))
;; REPEAT-MATCHER is part of the closure's environment but it
;; can only be defined after NON-GREEDY-AUX is defined
repeat-matcher)
(declare (type fixnum zero-length-num)
(type function next-fn))
(cond
(maximum
;; we make a reservation for our slot in *REPEAT-COUNTERS*
;; because we need to keep track whether we've reached MAXIMUM
;; repetitions
(let ((rep-num (incf-after *rep-num*)))
(flet ((non-greedy-aux (start-pos)
;; the actual matcher which first calls NEXT-FN and
;; on failure tries to match the inner regex of
;; REPETITION (if we haven't done so too often)
(declare (type fixnum start-pos maximum rep-num)
(type function repeat-matcher))
(let ((old-last-pos
(svref *last-pos-stores* zero-length-num)))
(when (and old-last-pos
(= (the fixnum old-last-pos) start-pos))
;; stop immediately if we've been here before,
;; i.e. if the last attempt matched a zero-length
;; string
(return-from non-greedy-aux (funcall next-fn start-pos)))
;; otherwise remember this position for the next
;; repetition
(setf (svref *last-pos-stores* zero-length-num) start-pos)
(or (funcall next-fn start-pos)
(and (< (aref *repeat-counters* rep-num) maximum)
(incf (aref *repeat-counters* rep-num))
;; note that REPEAT-MATCHER will call
;; NON-GREEDY-AUX again recursively
(prog1
(funcall repeat-matcher start-pos)
(decf (aref *repeat-counters* rep-num))
(setf (svref *last-pos-stores* zero-length-num)
old-last-pos)))))))
;; create a closure to match the inner regex and to
;; implement backtracking via NON-GREEDY-AUX
(setq repeat-matcher
(create-matcher-aux (regex repetition) #'non-greedy-aux))
;; the closure we return is just a thin wrapper around
;; NON-GREEDY-AUX to initialize the repetition counter and our
;; slot in *LAST-POS-STORES*
(lambda (start-pos)
(declare (type fixnum start-pos))
(setf (aref *repeat-counters* rep-num) 0
(svref *last-pos-stores* zero-length-num) nil)
(non-greedy-aux start-pos)))))
(t
;; easier code because we're not bounded by MAXIMUM, but
;; basically the same
(flet ((non-greedy-aux (start-pos)
(declare (type fixnum start-pos)
(type function repeat-matcher))
(let ((old-last-pos
(svref *last-pos-stores* zero-length-num)))
(when (and old-last-pos
(= (the fixnum old-last-pos) start-pos))
(return-from non-greedy-aux (funcall next-fn start-pos)))
(setf (svref *last-pos-stores* zero-length-num) start-pos)
(or (funcall next-fn start-pos)
(prog1
(funcall repeat-matcher start-pos)
(setf (svref *last-pos-stores* zero-length-num)
old-last-pos))))))
(setq repeat-matcher
(create-matcher-aux (regex repetition) #'non-greedy-aux))
(lambda (start-pos)
(declare (type fixnum start-pos))
(setf (svref *last-pos-stores* zero-length-num) nil)
(non-greedy-aux start-pos)))))))
;; code for constant repetitions, i.e. those with a fixed number of repetitions
(defmacro constant-repetition-constant-length-closure (check-curr-pos)
"This is the template for simple constant repetitions (where simple
means that the inner regex to be checked is of fixed length LEN, and
that it doesn't contain registers, i.e. there's no need for
backtracking) and where constant means that MINIMUM is equal to
MAXIMUM. CHECK-CURR-POS is a form which checks whether the inner regex
of the repetition matches at CURR-POS."
`(lambda (start-pos)
(declare (type fixnum start-pos))
(let ((target-end-pos (+ start-pos
(the fixnum (* len repetitions)))))
(declare (type fixnum target-end-pos))
;; first check if we won't go beyond the end of the string
(and (>= *end-pos* target-end-pos)
;; then loop through all repetitions step by step
(loop for curr-pos of-type fixnum from start-pos
below target-end-pos
by len
always ,check-curr-pos)
;; finally call NEXT-FN if we made it that far
(funcall next-fn target-end-pos)))))
(defmethod create-constant-repetition-constant-length-matcher
((repetition repetition) next-fn)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Creates a closure which tries to match REPETITION. It is assumed
that REPETITION has a constant number of repetitions. It is
furthermore assumed that the inner regex of REPETITION is of fixed
length and doesn't contain registers."
(let ((len (len repetition))
(repetitions (minimum repetition))
(regex (regex repetition)))
(declare (type fixnum len repetitions)
(type function next-fn))
(if (zerop len)
;; if the length is zero it suffices to try once
(create-matcher-aux regex next-fn)
;; otherwise try to optimize for a couple of common cases
(typecase regex
(str
(let ((str (str regex)))
(if (= 1 len)
;; a single character
(let ((chr (schar str 0)))
(if (case-insensitive-p regex)
(constant-repetition-constant-length-closure
(and (char-equal chr (schar *string* curr-pos))
(1+ curr-pos)))
(constant-repetition-constant-length-closure
(and (char= chr (schar *string* curr-pos))
(1+ curr-pos)))))
;; a string
(if (case-insensitive-p regex)
(constant-repetition-constant-length-closure
(let ((next-pos (+ curr-pos len)))
(declare (type fixnum next-pos))
(and (*string*-equal str curr-pos next-pos 0 len)
next-pos)))
(constant-repetition-constant-length-closure
(let ((next-pos (+ curr-pos len)))
(declare (type fixnum next-pos))
(and (*string*= str curr-pos next-pos 0 len)
next-pos)))))))
(char-class
;; a character class
(insert-char-class-tester (regex (schar *string* curr-pos))
(if (invertedp regex)
(constant-repetition-constant-length-closure
(and (not (char-class-test))
(1+ curr-pos)))
(constant-repetition-constant-length-closure
(and (char-class-test)
(1+ curr-pos))))))
(everything
(if (single-line-p regex)
;; a dot which really matches everything - we just have to
;; advance the index into *STRING* accordingly and check
;; if we didn't go past the end
(lambda (start-pos)
(declare (type fixnum start-pos))
(let ((next-pos (+ start-pos repetitions)))
(declare (type fixnum next-pos))
(or (<= next-pos *end-pos*)
(funcall next-fn next-pos))))
;; a dot which is not in single-line-mode - make sure we
;; don't match #\Newline
(constant-repetition-constant-length-closure
(and (char/= #\Newline (schar *string* curr-pos))
(1+ curr-pos)))))
(otherwise
;; the general case - we build an inner matcher which just
;; checks for immediate success, i.e. NEXT-FN is #'IDENTITY
(let ((inner-matcher (create-matcher-aux regex #'identity)))
(declare (type function inner-matcher))
(constant-repetition-constant-length-closure
(funcall inner-matcher curr-pos))))))))
(defmethod create-constant-repetition-matcher ((repetition repetition) next-fn)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Creates a closure which tries to match REPETITION. It is assumed
that REPETITION has a constant number of repetitions."
(let ((repetitions (minimum repetition))
;; we make a reservation for our slot in *REPEAT-COUNTERS*
;; because we need to keep track of the number of repetitions
(rep-num (incf-after *rep-num*))
;; REPEAT-MATCHER is part of the closure's environment but it
;; can only be defined after NON-GREEDY-AUX is defined
repeat-matcher)
(declare (type fixnum repetitions rep-num)
(type function next-fn))
(if (zerop (min-len repetition))
;; we make a reservation for our slot in *LAST-POS-STORES*
;; because we have to watch out for needless loops as the inner
;; regex might match zero-length strings
(let ((zero-length-num (incf-after *zero-length-num*)))
(declare (type fixnum zero-length-num))
(flet ((constant-aux (start-pos)
;; the actual matcher which first calls NEXT-FN and
;; on failure tries to match the inner regex of
;; REPETITION (if we haven't done so too often)
(declare (type fixnum start-pos)
(type function repeat-matcher))
(let ((old-last-pos
(svref *last-pos-stores* zero-length-num)))
(when (and old-last-pos
(= (the fixnum old-last-pos) start-pos))
;; if we've been here before we matched a
;; zero-length string the last time, so we can
;; just carry on because we will definitely be
;; able to do this again often enough
(return-from constant-aux (funcall next-fn start-pos)))
;; otherwise remember this position for the next
;; repetition
(setf (svref *last-pos-stores* zero-length-num) start-pos)
(cond ((< (aref *repeat-counters* rep-num) repetitions)
;; not enough repetitions yet, try it again
(incf (aref *repeat-counters* rep-num))
;; note that REPEAT-MATCHER will call
;; CONSTANT-AUX again recursively
(prog1
(funcall repeat-matcher start-pos)
(decf (aref *repeat-counters* rep-num))
(setf (svref *last-pos-stores* zero-length-num)
old-last-pos)))
(t
;; we're done - call NEXT-FN
(funcall next-fn start-pos))))))
;; create a closure to match the inner regex and to
;; implement backtracking via CONSTANT-AUX
(setq repeat-matcher
(create-matcher-aux (regex repetition) #'constant-aux))
;; the closure we return is just a thin wrapper around
;; CONSTANT-AUX to initialize the repetition counter
(lambda (start-pos)
(declare (type fixnum start-pos))
(setf (aref *repeat-counters* rep-num) 0
(aref *last-pos-stores* zero-length-num) nil)
(constant-aux start-pos))))
;; easier code because we don't have to care about zero-length
;; matches but basically the same
(flet ((constant-aux (start-pos)
(declare (type fixnum start-pos)
(type function repeat-matcher))
(cond ((< (aref *repeat-counters* rep-num) repetitions)
(incf (aref *repeat-counters* rep-num))
(prog1
(funcall repeat-matcher start-pos)
(decf (aref *repeat-counters* rep-num))))
(t (funcall next-fn start-pos)))))
(setq repeat-matcher
(create-matcher-aux (regex repetition) #'constant-aux))
(lambda (start-pos)
(declare (type fixnum start-pos))
(setf (aref *repeat-counters* rep-num) 0)
(constant-aux start-pos))))))
;; the actual CREATE-MATCHER-AUX method for REPETITION objects which
;; utilizes all the functions and macros defined above
(defmethod create-matcher-aux ((repetition repetition) next-fn)
(with-slots ((minimum minimum)
(maximum maximum)
(len len)
(min-len min-len)
(greedyp greedyp)
(contains-register-p contains-register-p))
repetition
(cond ((and maximum
(zerop maximum))
;; this should have been optimized away by CONVERT but just
;; in case...
(error "Got REPETITION with MAXIMUM 0~%"))
((and maximum
(= minimum maximum 1))
;; this should have been optimized away by CONVERT but just
;; in case...
(error "Got REPETITION with MAXIMUM 1 and MINIMUM 1~%"))
((and (eql minimum maximum)
len
(not contains-register-p))
(create-constant-repetition-constant-length-matcher repetition next-fn))
((eql minimum maximum)
(create-constant-repetition-matcher repetition next-fn))
((and greedyp
len
(not contains-register-p))
(create-greedy-constant-length-matcher repetition next-fn))
((and greedyp
(or (plusp min-len)
(eql maximum 1)))
(create-greedy-no-zero-matcher repetition next-fn))
(greedyp
(create-greedy-matcher repetition next-fn))
((and len
(plusp len)
(not contains-register-p))
(create-non-greedy-constant-length-matcher repetition next-fn))
((or (plusp min-len)
(eql maximum 1))
(create-non-greedy-no-zero-matcher repetition next-fn))
(t
(create-non-greedy-matcher repetition next-fn)))))

View File

@@ -0,0 +1,488 @@
;;; -*- Mode: LISP; Syntax: COMMON-LISP; Package: CL-PPCRE; Base: 10 -*-
;;; $Header: /usr/local/cvsrep/cl-ppcre/scanner.lisp,v 1.1.1.1 2002/12/20 10:10:44 edi Exp $
;;; Here the scanner for the actual regex as well as utility scanners
;;; for the constant start and end strings are created.
;;; Copyright (c) 2002, Dr. Edmund Weitz. All rights reserved.
;;; Redistribution and use in source and binary forms, with or without
;;; modification, are permitted provided that the following conditions
;;; are met:
;;; * Redistributions of source code must retain the above copyright
;;; notice, this list of conditions and the following disclaimer.
;;; * Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials
;;; provided with the distribution.
;;; THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESSED
;;; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
;;; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;;; ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
;;; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
;;; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
;;; GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
;;; WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(in-package "CL-PPCRE")
(defmacro bmh-matcher-aux (&key case-insensitive-p)
"Auxiliary macro used by CREATE-BMH-MATCHER."
(let ((char-compare (if case-insensitive-p 'char-equal 'char=)))
`(lambda (start-pos)
(declare (type fixnum start-pos))
(if (> (the fixnum (+ start-pos m)) *end-pos*)
nil
(loop named bmh-matcher
for k of-type fixnum = (+ start-pos m -1)
then (+ k (max 1 (aref skip (char-code (schar *string* k)))))
while (< k *end-pos*)
do (loop for j of-type fixnum downfrom (1- m)
for i of-type fixnum downfrom k
while (and (>= j 0)
(,char-compare (schar *string* i)
(schar pattern j)))
finally (if (minusp j)
(return-from bmh-matcher (1+ i)))))))))
(defun create-bmh-matcher (pattern case-insensitive-p)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Returns a Boyer-Moore-Horspool matcher which searches the (special)
simple-string *STRING* for the first occurence of the substring
PATTERN. The search starts at the position START-POS within *STRING*
and stops before *END-POS* is reached. Depending on the second
argument the search is case-insensitive or not."
;; see <http://www-igm.univ-mlv.fr/~lecroq/string/node18.html> for
;; details
(let* ((m (length pattern))
(skip (make-array +regex-char-code-limit+
:element-type 'fixnum
:initial-element m)))
(declare (type fixnum m))
(loop for k of-type fixnum below m
if case-insensitive-p
do (setf (aref skip (char-code (char-upcase (schar pattern k)))) (- m k 1)
(aref skip (char-code (char-downcase (schar pattern k)))) (- m k 1))
else
do (setf (aref skip (char-code (schar pattern k))) (- m k 1)))
(if case-insensitive-p
(bmh-matcher-aux :case-insensitive-p t)
(bmh-matcher-aux))))
(defmacro char-searcher-aux (&key case-insensitive-p)
"Auxiliary macro used by CREATE-CHAR-SEARCHER."
(let ((char-compare (if case-insensitive-p 'char-equal 'char=)))
`(lambda (start-pos)
(declare (type fixnum start-pos))
(loop for i of-type fixnum from start-pos below *end-pos*
thereis (and (,char-compare (schar *string* i) chr) i)))))
(defun create-char-searcher (chr case-insensitive-p)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Returns a function which searches the (special) simple-string
*STRING* for the first occurence of the character CHR. The search
starts at the position START-POS within *STRING* and stops before
*END-POS* is reached. Depending on the second argument the search is
case-insensitive or not."
(if case-insensitive-p
(char-searcher-aux :case-insensitive-p t)
(char-searcher-aux)))
(declaim (inline newline-skipper))
(defun newline-skipper (start-pos)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
(declare (type fixnum start-pos))
"Find the next occurence of a character in *STRING* which is behind
a #\Newline."
(loop for i of-type fixnum from start-pos below *end-pos*
thereis (and (char= (schar *string* i) #\Newline)
(1+ i))))
(defmacro insert-advance-fn (advance-fn)
"Creates the actual closure returned by CREATE-SCANNER-AUX by
replacing '(ADVANCE-FN-DEFINITION) with a suitable definition for
ADVANCE-FN. This is a utility macro used by CREATE-SCANNER-AUX."
(subst
advance-fn '(advance-fn-definition)
'(lambda (string start end)
(block scan
;; initialize a couple of special variables used by the
;; matchers (see file specials.lisp)
(let* ((*string* string)
(*start-pos* start)
(*end-pos* end)
;; we will search forward for END-STRING if this value
;; isn't at least as big as POS (see ADVANCE-FN), so it
;; is safe to start to the left of *START-POS*; note
;; that this value will _never_ be decremented - this
;; is crucial to the scanning process
(*end-string-pos* (1- *start-pos*))
;; the next five will shadow the variables defined by
;; DEFPARAMETER; at this point, we don't know if we'll
;; actually use them, though
(*repeat-counters* *repeat-counters*)
(*last-pos-stores* *last-pos-stores*)
(*reg-starts* *reg-starts*)
(*regs-maybe-start* *regs-maybe-start*)
(*reg-ends* *reg-ends*)
;; we might be able to optimize the scanning process by
;; (virtually) shifting *START-POS* to the right
(scan-start-pos *start-pos*)
(starts-with-str (if start-string-test
(str starts-with)
nil))
;; we don't need to try further than MAX-END-POS
(max-end-pos (- *end-pos* min-len)))
(declare (type fixnum scan-start-pos)
(type function match-fn))
;; definition of ADVANCE-FN will be inserted here by macrology
(labels ((advance-fn-definition))
(declare (inline advance-fn))
(when (plusp rep-num)
;; we have at least one REPETITION which needs to count
;; the number of repetitions
(setq *repeat-counters* (make-array rep-num
:initial-element 0
:element-type 'fixnum)))
(when (plusp zero-length-num)
;; we have at least one REPETITION which needs to watch
;; out for zero-length repetitions
(setq *last-pos-stores* (make-array zero-length-num
:initial-element nil)))
(when (plusp reg-num)
;; we have registers in our regular expression
(setq *reg-starts* (make-array reg-num :initial-element nil)
*regs-maybe-start* (make-array reg-num :initial-element nil)
*reg-ends* (make-array reg-num :initial-element nil)))
(when end-anchored-p
;; the regular expression has a constant end string which
;; is anchored at the very end of the target string
;; (perhaps modulo a #\Newline)
(let ((end-test-pos (- *end-pos* (the fixnum end-string-len))))
(declare (type fixnum end-test-pos)
(type function end-string-test))
(unless (setq *end-string-pos* (funcall end-string-test
end-test-pos))
(when (and (= 1 (the fixnum end-anchored-p))
(char= #\Newline (schar *string* (1- *end-pos*)))
(> end-test-pos *start-pos*))
;; if we didn't find an end string candidate from
;; END-TEST-POS and if a #\Newline at the end is
;; allowed we try it again from one position to the
;; left
(setq *end-string-pos* (funcall end-string-test
(1- end-test-pos))))))
(unless *end-string-pos*
;; no end string candidate found, so give up
(return-from scan nil))
(when end-string-offset
;; if the offset of the constant end string from the
;; left of the regular expression is known we can start
;; scanning further to the right; this is similar to
;; what we might do in ADVANCE-FN
(setq scan-start-pos (- (the fixnum *end-string-pos*)
(the fixnum end-string-offset)))))
(cond
(start-anchored-p
;; we're anchored at the start of the target string,
;; so no need to try again after first failure
(when (or (/= *start-pos* scan-start-pos)
(< max-end-pos *start-pos*))
;; if END-STRING-OFFSET has proven that we don't
;; need to bother to scan from *START-POS* or if the
;; minimal length of the regular expression is
;; longer than the target string we give up
(return-from scan nil))
(when starts-with-str
(locally
(declare (type fixnum starts-with-len))
(cond ((and (case-insensitive-p starts-with)
(not (*string*-equal starts-with-str
*start-pos*
(+ *start-pos*
starts-with-len)
0 starts-with-len)))
;; the regular expression has a
;; case-insensitive constant start string
;; and we didn't find it
(return-from scan nil))
((and (not (case-insensitive-p starts-with))
(not (*string*= starts-with-str
*start-pos*
(+ *start-pos* starts-with-len)
0 starts-with-len)))
;; the regular expression has a
;; case-sensitive constant start string
;; and we didn't find it
(return-from scan nil))
(t nil))))
(when (and end-string-test
(not end-anchored-p))
;; the regular expression has a constant end string
;; which isn't anchored so we didn't check for it
;; already
(unless (setq *end-string-pos*
(funcall (the function end-string-test)
*start-pos*))
;; no end string candidate found, so give up
(return-from scan nil))
(when (and end-string-offset
(/= (- (the fixnum *end-string-pos*)
(the fixnum end-string-offset))
*start-pos*))
;; end string candidate found but its offset into
;; the regular expression contradicts the start
;; anchor, so give up
(return-from scan nil)))
;; if we got here we scan exactly once
(let ((next-pos (funcall match-fn *start-pos*)))
(when next-pos
(values (if next-pos *start-pos* nil)
next-pos
*reg-starts*
*reg-ends*))))
(t
(loop for pos = (if starts-with-everything
;; don't jump to the next
;; #\Newline on the first
;; iteration
scan-start-pos
(advance-fn scan-start-pos))
then (advance-fn pos)
;; give up if the regular expression can't fit
;; into the rest of the target string
while (and pos
(<= (the fixnum pos) max-end-pos))
do (let ((next-pos (funcall match-fn pos)))
(when next-pos
(return-from scan (values pos
next-pos
*reg-starts*
*reg-ends*)))
;; not yet found, increment POS
(incf (the fixnum pos))))))))))
:test #'equalp))
(defun create-scanner-aux (match-fn
min-len
start-anchored-p
starts-with
start-string-test
end-anchored-p
end-string-test
end-string-len
end-string-offset
rep-num
zero-length-num
reg-num)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
(declare (type fixnum min-len zero-length-num rep-num reg-num))
"Auxiliary function to create and return a scanner (which is
actually a closure). Used by CREATE-SCANNER."
(let ((starts-with-len (if (typep starts-with 'str)
(len starts-with)))
(starts-with-everything (typep starts-with 'everything)))
(cond
;; this COND statement dispatches on the different versions we
;; have for ADVANCE-FN and creates different closures for each;
;; note that you see only the bodies of ADVANCE-FN below - the
;; actual scanner is defined in INSERT-ADVANCE-FN above; (we
;; could have done this with closures instead of macrology but
;; would have consed a lot more)
((and start-string-test end-string-test end-string-offset)
;; we know that the regular expression has constant start and
;; end strings and we know the end string's offset (from the
;; left)
(insert-advance-fn
(advance-fn (pos)
(declare (type fixnum end-string-offset starts-with-len)
(type function start-string-test end-string-test))
(loop
(unless (setq pos (funcall start-string-test pos))
;; give up completely if we can't find a start string
;; candidate
(return-from scan nil))
(locally
;; from here we know that POS is a FIXNUM
(declare (type fixnum pos))
(when (= pos (- (the fixnum *end-string-pos*) end-string-offset))
;; if we already found an end string candidate the
;; position of which matches the start string
;; candidate we're done
(return-from advance-fn pos))
(let ((try-pos (+ pos starts-with-len)))
;; otherwise try (again) to find an end string
;; candidate which starts behind the start string
;; candidate
(loop
(unless (setq *end-string-pos*
(funcall end-string-test try-pos))
;; no end string candidate found, so give up
(return-from scan nil))
;; NEW-POS is where we should start scanning
;; according to the end string candidate
(let ((new-pos (- (the fixnum *end-string-pos*)
end-string-offset)))
(declare (type fixnum new-pos *end-string-pos*))
(cond ((= new-pos pos)
;; if POS and NEW-POS are equal then the
;; two candidates agree so we're fine
(return-from advance-fn pos))
((> new-pos pos)
;; if NEW-POS is further to the right we
;; advance POS and try again, i.e. we go
;; back to the start of the outer LOOP
(setq pos new-pos)
;; this means "return from inner LOOP"
(return))
(t
;; otherwise NEW-POS is smaller than POS,
;; so we have to redo the inner LOOP to
;; find another end string candidate
;; further to the right
(setq try-pos (1+ *end-string-pos*))))))))))))
((and starts-with-everything end-string-test end-string-offset)
;; we know that the regular expression starts with ".*" (which
;; is not in single-line-mode, see CREATE-SCANNER-AUX) and ends
;; with a constant end string and we know the end string's
;; offset (from the left)
(insert-advance-fn
(advance-fn (pos)
(declare (type fixnum end-string-offset)
(type function end-string-test))
(loop
(unless (setq pos (newline-skipper pos))
;; if we can't find a #\Newline we give up immediately
(return-from scan nil))
(locally
;; from here we know that POS is a FIXNUM
(declare (type fixnum pos))
(when (= pos (- (the fixnum *end-string-pos*) end-string-offset))
;; if we already found an end string candidate the
;; position of which matches the place behind the
;; #\Newline we're done
(return-from advance-fn pos))
(let ((try-pos pos))
;; otherwise try (again) to find an end string
;; candidate which starts behind the #\Newline
(loop
(unless (setq *end-string-pos*
(funcall end-string-test try-pos))
;; no end string candidate found, so we give up
(return-from scan nil))
;; NEW-POS is where we should start scanning
;; according to the end string candidate
(let ((new-pos (- (the fixnum *end-string-pos*)
end-string-offset)))
(declare (type fixnum new-pos *end-string-pos*))
(cond ((= new-pos pos)
;; if POS and NEW-POS are equal then the
;; the end string candidate agrees with
;; the #\Newline so we're fine
(return-from advance-fn pos))
((> new-pos pos)
;; if NEW-POS is further to the right we
;; advance POS and try again, i.e. we go
;; back to the start of the outer LOOP
(setq pos new-pos)
;; this means "return from inner LOOP"
(return))
(t
;; otherwise NEW-POS is smaller than POS,
;; so we have to redo the inner LOOP to
;; find another end string candidate
;; further to the right
(setq try-pos (1+ *end-string-pos*))))))))))))
((and start-string-test end-string-test)
;; we know that the regular expression has constant start and
;; end strings; similar to the first case but we only need to
;; check for the end string, it doesn't provide enough
;; information to advance POS
(insert-advance-fn
(advance-fn (pos)
(declare (type function start-string-test end-string-test))
(unless (setq pos (funcall start-string-test pos))
(return-from scan nil))
(if (<= (the fixnum pos)
(the fixnum *end-string-pos*))
(return-from advance-fn pos))
(unless (setq *end-string-pos* (funcall end-string-test pos))
(return-from scan nil))
pos)))
((and starts-with-everything end-string-test)
;; we know that the regular expression starts with ".*" (which
;; is not in single-line-mode, see CREATE-SCANNER-AUX) and ends
;; with a constant end string; similar to the second case but we
;; only need to check for the end string, it doesn't provide
;; enough information to advance POS
(insert-advance-fn
(advance-fn (pos)
(declare (type function end-string-test))
(unless (setq pos (newline-skipper pos))
(return-from scan nil))
(if (<= (the fixnum pos)
(the fixnum *end-string-pos*))
(return-from advance-fn pos))
(unless (setq *end-string-pos* (funcall end-string-test pos))
(return-from scan nil))
pos)))
(start-string-test
;; just check for constant start string candidate
(insert-advance-fn
(advance-fn (pos)
(declare (type function start-string-test))
(unless (setq pos (funcall start-string-test pos))
(return-from scan nil))
pos)))
(starts-with-everything
;; just advance POS with NEWLINE-SKIPPER
(insert-advance-fn
(advance-fn (pos)
(unless (setq pos (newline-skipper pos))
(return-from scan nil))
pos)))
(end-string-test
;; just check for the next end string candidate if POS has
;; advanced beyond the last one
(insert-advance-fn
(advance-fn (pos)
(declare (type function end-string-test))
(if (<= (the fixnum pos)
(the fixnum *end-string-pos*))
(return-from advance-fn pos))
(unless (setq *end-string-pos* (funcall end-string-test pos))
(return-from scan nil))
pos)))
(t
;; not enough optimization information about the regular
;; expression to optimize so we just return POS
(insert-advance-fn
(advance-fn (pos)
pos))))))

View File

@@ -0,0 +1,29 @@
(with-open-file (r "results.sorted" :direction :output :if-exists :supersede)
(loop for (number result) in
(sort
(with-open-file (s "results" :direction :input)
(loop for line = (read-line s nil 'eof)
until (eq line 'eof)
for (number next) = (multiple-value-list (parse-integer line :junk-allowed t))
for result = (read-from-string line nil 'eof :start (1+ next))
collect (list number result)))
#'>
:key #'second)
do (format r "~@3A ~5,2F~%" number result)))
(with-open-file (s "results" :direction :input)
(loop for line = (read-line s nil 'eof)
until (eq line 'eof)
for (number next) = (multiple-value-list (parse-integer line :junk-allowed t))
for result = (read-from-string line nil 'eof :start (1+ next))
if (<= 1.0 result)
collect (list number result)))
(with-open-file (s "results" :direction :input)
(loop for line = (read-line s nil 'eof)
until (eq line 'eof)
for (number next) = (multiple-value-list (parse-integer line :junk-allowed t))
for result = (read-from-string line nil 'eof :start (1+ next))
count result into c
sum result into x
finally (return (float (/ x c)))))

View File

@@ -0,0 +1,96 @@
;;; -*- Mode: LISP; Syntax: COMMON-LISP; Package: CL-PPCRE; Base: 10 -*-
;;; $Header: /usr/local/cvsrep/cl-ppcre/specials.lisp,v 1.1.1.1 2002/12/20 10:10:44 edi Exp $
;;; globally declared special variables
;;; Copyright (c) 2002, Dr. Edmund Weitz. All rights reserved.
;;; Redistribution and use in source and binary forms, with or without
;;; modification, are permitted provided that the following conditions
;;; are met:
;;; * Redistributions of source code must retain the above copyright
;;; notice, this list of conditions and the following disclaimer.
;;; * Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials
;;; provided with the distribution.
;;; THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESSED
;;; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
;;; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;;; ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
;;; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
;;; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
;;; GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
;;; WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(in-package "CL-PPCRE")
;;; special variables used by the lexer/parser combo
(defvar *error-msg-offset* 0
"Offset to substract from positions in error messages.
This is necessary because CREATE-MATCHER-FROM-REGEX-STRING might prefix
the regex string with modifiers like (?i).")
(declaim (type fixnum *error-msg-offset*))
;;; special variables used by the SCAN function and the matchers
(defvar *string* ""
"The string which is currently scanned by SCAN.
Will always be coerced to a SIMPLE-STRING.")
(declaim (type simple-string *string*))
(defvar *start-pos* 0
"Where to start scanning within *STRING*.")
(declaim (type fixnum *start-pos*))
(defvar *end-pos* 0
"Where to stop scanning within *STRING*.")
(declaim (type fixnum *end-pos*))
(defvar *reg-starts* (make-array 0)
"An array which holds the start positions
of the current register candidates.")
(declaim (type simple-vector *reg-starts*))
(defvar *regs-maybe-start* (make-array 0)
"An array which holds the next start positions
of the current register candidates.")
(declaim (type simple-vector *regs-maybe-start*))
(defvar *reg-ends* (make-array 0)
"An array which holds the end positions
of the current register candidates.")
(declaim (type simple-vector *reg-ends*))
(defvar *end-string-pos* nil
"Start of the next possible end-string candidate.")
(defvar *rep-num* 0
"Counts the number of \"complicated\" repetitions while the matchers
are built.")
(declaim (type fixnum *rep-num*))
(defvar *zero-length-num* 0
"Counts the number of repetitions the inner regexes of which may
have zero-length while the matchers are built.")
(declaim (type fixnum *zero-length-num*))
(defvar *repeat-counters* (make-array 0
:initial-element 0
:element-type 'fixnum)
"An array to keep track of how often
repetitive patterns have been tested already.")
(declaim (type (array fixnum (*)) *repeat-counters*))
(defvar *last-pos-stores* (make-array 0)
"An array to keep track of the last positions
where we saw repetitive patterns.
Only used for patterns which might have zero length.")
(declaim (type simple-vector *last-pos-stores*))

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,178 @@
;;; -*- Mode: LISP; Syntax: COMMON-LISP; Package: CL-PPCRE; Base: 10 -*-
;;; $Header: /usr/local/cvsrep/cl-ppcre/util.lisp,v 1.1.1.1 2002/12/20 10:10:44 edi Exp $
;;; Utility functions and constants dealing with the hash-tables
;;; we use to encode character classes
;;; Hash-tables are treated like sets, i.e. a character C is a member of the
;;; hash-table H iff (GETHASH C H) is true.
;;; Copyright (c) 2002, Dr. Edmund Weitz. All rights reserved.
;;; Redistribution and use in source and binary forms, with or without
;;; modification, are permitted provided that the following conditions
;;; are met:
;;; * Redistributions of source code must retain the above copyright
;;; notice, this list of conditions and the following disclaimer.
;;; * Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials
;;; provided with the distribution.
;;; THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESSED
;;; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
;;; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;;; ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
;;; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
;;; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
;;; GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
;;; WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(in-package "CL-PPCRE")
(eval-when (:compile-toplevel :execute :load-toplevel)
(unless (boundp '+regex-char-code-limit+)
(defconstant +regex-char-code-limit+ char-code-limit
"The upper exclusive bound on the char-codes of characters
which can occur in character classes.
Change this value BEFORE compiling CL-PPCRE if you don't need
the full Unicode support of LW, ACL, or CLISP."))
(defun make-char-hash (test)
(declare (optimize speed space))
"Returns a hash-table of all characters satisfying test."
(loop with hash = (make-hash-table)
for c of-type fixnum from 0 below +regex-char-code-limit+
for chr = (code-char c)
if (and chr (funcall test chr))
do (setf (gethash chr hash) t)
finally (return hash)))
(declaim (inline word-char-p))
(defun word-char-p (chr)
(declare (optimize speed
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
"Tests whether a character is a \"word\" character.
In the ASCII charset this is equivalent to a-z, A-Z, 0-9, or _,
i.e. the same as Perl's [\\w]."
(or (alphanumericp chr)
(char= chr #\_)))
(unless (boundp '+whitespace-char-string+)
(defconstant +whitespace-char-string+
(coerce
'(#\Space #\Tab #\Linefeed #\Return #\Page)
'string)
"A string of all characters which are considered to be whitespace.
Same as Perl's [\\s]."))
(defun whitespacep (chr)
(declare (optimize speed space))
"Tests whether a character is whitespace,
i.e. whether it would match [\\s] in Perl."
(find chr +whitespace-char-string+ :test #'char=)))
;; the following DEFCONSTANT statements are wrapped with
;; (UNLESS (BOUNDP ...) ...) to make SBCL happy
(unless (boundp '+digit-hash+)
(defconstant +digit-hash+
(make-char-hash (lambda (chr) (char<= #\0 chr #\9)))
"Hash-table containing the digits from 0 to 9."))
(unless (boundp '+word-char-hash+)
(defconstant +word-char-hash+
(make-char-hash #'word-char-p)
"Hash-table containing all \"word\" characters."))
(unless (boundp '+whitespace-char-hash+)
(defconstant +whitespace-char-hash+
(make-char-hash #'whitespacep)
"Hash-table containing all whitespace characters."))
(defun merge-hash (hash1 hash2)
(declare (optimize speed space))
"Returns the \"sum\" of two hashes."
(loop for chr being the hash-keys of hash2
do (setf (gethash chr hash1) t))
hash1)
(defun merge-inverted-hash (hash1 hash2)
(declare (optimize speed space))
"Returns the \"sum\" of hash1 and the \"inverse\" of hash2."
(loop for c of-type fixnum from 0 below +regex-char-code-limit+
for chr = (code-char c)
if (and chr (not (gethash chr hash2)))
do (setf (gethash chr hash1) t))
hash1)
(defun create-ranges-from-hash (hash &key downcasep)
(declare (optimize speed space))
"Tries to identify up to three intervals (with respect to CHAR<)
which together comprise HASH. Returns NIL if this is not possible.
If DOWNCASEP is true it will treat the hash-table as if it represents
both the lower-case and the upper-case variants of its members and
will only return the respective lower-case intervals."
;; discard empty hash-tables
(unless (plusp (hash-table-count hash))
(return-from create-ranges-from-hash nil))
(loop with min1 and min2 and min3
and max1 and max2 and max3
;; loop through all characters in HASH, sorted by CHAR<
for chr in (sort (loop for chr being the hash-keys of hash
collect (if downcasep
(char-downcase chr)
chr))
#'char<)
for code = (char-code chr)
;; MIN1, MAX1, etc. are _exclusive_
;; bounds of the intervals identified so far
do (cond
((not min1)
;; this will only happen once, for the first character
(setq min1 (1- code)
max1 (1+ code)))
((<= min1 code max1)
;; we're here as long as CHR fits into the first interval
(setq min1 (min min1 (1- code))
max1 (max max1 (1+ code))))
((not min2)
;; we need to open a second interval
;; this'll also happen only once
(setq min2 (1- code)
max2 (1+ code)))
((<= min2 code max2)
;; CHR fits into the second interval
(setq min2 (min min2 (1- code))
max2 (max max2 (1+ code))))
((not min3)
;; we need to open the third interval
;; happens only once
(setq min3 (1- code)
max3 (1+ code)))
((<= min3 code max3)
;; CHR fits into the third interval
(setq min3 (min min3 (1- code))
max3 (max max3 (1+ code))))
(t
;; we're out of luck, CHR doesn't fit
;; into one of the three intervals
(return nil)))
;; on success return all bounds
;; make them inclusive bounds before returning
finally (return (values (code-char (1+ min1))
(code-char (1- max1))
(and min2 (code-char (1+ min2)))
(and max2 (code-char (1- max2)))
(and min3 (code-char (1+ min3)))
(and max3 (code-char (1- max3)))))))

View File

@@ -0,0 +1,266 @@
;; clos.lisp -- CLOS benchmarking code
;;
;; Author: Eric Marsden <emarsden@laas.fr>
;; Time-stamp: <2004-03-10 emarsden>
;; 20030203 james.anderson@setf.de changes to distinguish first from
;; successive passes
;;
;;
;; This file does some benchmarking of CLOS functionality. It creates
;; a class hierarchy of the form
;;
;; class-0-0
;; / | \
;; / | \
;; / | \
;; class-0-1 class-1-1 . class-2-1
;; | / | . . / |
;; | / . | . / |
;; | / . | / |
;; class-0-2 class-1-2 class-2-2
;;
;;
;; where the shape of the hierarchy is controlled by the parameters
;; +HIERARCHY-DEPTH+ and +HIERARCHY-WIDTH+. Note that classes to the
;; left of the diagram have more superclasses than those to the right.
;; It then defines methods specializing on each class (simple methods,
;; after methods and AND-type method combination), and
;; INITIALIZE-INSTANCE methods. The code measures the speed of
;;
;; - creation of the class hierarchy (time taken to compile and
;; execute the DEFCLASS forms)
;;
;; - instance creation
;;
;; - method definition (time taken to compile and execute the
;; DEFMETHOD forms)
;;
;; - execution of "simple" method invocations, both with and
;; without :after methods
;;
;; - execution of "complex" method invocations (using AND-type
;; method combination)
;;
;;
;; This code is probably not representative of real usage of CLOS, but
;; should give an idea of the speed of a particular CLOS
;; implementation.
;;
;; Note: warnings about undefined accessors and types are normal when
;; compiling this code.
(in-package :cl-bench.clos)
(eval-when (:compile-toplevel :load-toplevel :execute)
(defconstant +hierarchy-depth+ 10)
(defconstant +hierarchy-width+ 5))
;; the level-0 hierarchy
(defclass class-0-0 () ())
(defvar *instances* (make-array +hierarchy-width+ :element-type 'class-0-0))
(when (fboundp 'simple-method) (fmakunbound 'simple-method))
(when (fboundp 'complex-method) (fmakunbound 'complex-method))
(defgeneric simple-method (a b))
(defmethod simple-method ((self class-0-0) other) other)
#-(or poplog)
(defgeneric complex-method (a b &rest rest)
(:method-combination and))
#-(or poplog)
(defmethod complex-method and ((self class-0-0) other &rest rest)
(declare (ignore rest))
other)
(defmacro make-class-name (depth width)
(with-standard-io-syntax
`(intern (format nil "CLASS-~d-~d" ,depth ,width))))
(defmacro make-attribute-name (depth width)
(with-standard-io-syntax
`(intern (format nil "ATTRIBUTE-~d-~d" ,depth ,width))))
(defmacro make-initarg-name (depth width)
(with-standard-io-syntax
`(intern (format nil "INITARG-~d-~d" ,depth ,width) :keyword)))
(defmacro make-accessor-name (depth width)
(with-standard-io-syntax
`(intern (format nil "GET-ATTRIBUTE-~d-~d" ,depth ,width))))
(defmacro class-definition (depth width)
`(defclass ,(make-class-name depth width)
,(loop :for w :from width :below +hierarchy-width+
:collect (make-class-name (1- depth) w))
(( ,(make-attribute-name depth width)
:initarg ,(make-initarg-name depth width)
:initform (* ,depth ,width)
:accessor ,(make-accessor-name depth width)))))
(defmacro init-instance-definition (depth width)
`(defmethod initialize-instance :after ((self ,(make-class-name depth width)) &rest initargs)
(declare (ignore initargs))
(incf (,(make-accessor-name depth width) self))))
(defmacro simple-method-definition (depth width)
`(defmethod simple-method ((self ,(make-class-name depth width))
(n number))
(* n (call-next-method) (,(make-accessor-name depth width) self))))
(defmacro complex-method-definition (depth width)
`(defmethod complex-method and ((self ,(make-class-name depth width))
(n number) &rest rest)
(declare (ignore rest))
(,(make-accessor-name depth width) self)))
(defmacro after-method-definition (depth width)
`(defmethod simple-method :after ((self ,(make-class-name depth width))
(n number))
(setf (,(make-accessor-name depth width) self) ,(* depth width width))))
(defun defclass-forms ()
(let (forms)
(loop :for width :to +hierarchy-width+ :do
(push `(defclass ,(make-class-name 1 width) (class-0-0) ()) forms))
(loop :for dpth :from 2 :to +hierarchy-depth+ :do
(loop :for wdth :to +hierarchy-width+ :do
(push `(class-definition ,dpth ,wdth) forms)
(push `(init-instance-definition ,dpth ,wdth) forms)))
(nreverse forms)))
(defun defmethod-forms ()
(let (forms)
(loop :for dpth :from 2 to +hierarchy-depth+ :do
(loop :for wdth :to +hierarchy-width+ :do
(push `(simple-method-definition ,dpth ,wdth) forms)
#-(or poplog)
(push `(complex-method-definition ,dpth ,wdth) forms)))
(nreverse forms)))
(defun after-method-forms ()
(let (forms)
(loop :for depth :from 2 :to +hierarchy-depth+ :do
(loop :for width :to +hierarchy-width+ :do
(push `(after-method-definition ,depth ,width) forms)))
(nreverse forms)))
(defparameter *defclass-operator* nil)
(defun run-defclass ()
(setq *defclass-operator* (compile nil `(lambda () ,@(defclass-forms))))
(funcall *defclass-operator*))
(defun run-defclass-precompiled ()
(funcall *defclass-operator*))
(defparameter *defmethod-operator* nil)
(defun run-defmethod ()
(setq *defmethod-operator* (compile nil `(lambda () ,@(defmethod-forms))))
(funcall *defmethod-operator*))
(defun run-defmethod-precompiled ()
(funcall *defmethod-operator*))
(defun add-after-methods ()
(funcall (compile nil `(lambda () ,@(after-method-forms)))))
#+i-do-not-understand
(defun make-instances ()
(dotimes (i 5000)
(dotimes (w +hierarchy-width+)
(setf (aref *instances* w)
(make-instance (make-class-name +hierarchy-depth+ w)
(make-initarg-name +hierarchy-depth+ w) 42))
`(incf (slot-value (aref *instances* w) ',(make-attribute-name +hierarchy-depth+ w))))))
(defparameter *make-instances-operator* nil)
(defun make-instances ()
(setq *make-instances-operator*
(compile nil `(lambda ()
(dotimes (i 5000)
,@(let ((forms nil))
(dotimes (w +hierarchy-width+)
(push `(progn (setf (aref *instances* ,w)
(make-instance ',(make-class-name +hierarchy-depth+ w)
,(make-initarg-name +hierarchy-depth+ w) 42))
(incf (slot-value (aref *instances* ,w)
',(make-attribute-name +hierarchy-depth+ w))))
forms))
(reverse forms))))))
(funcall *make-instances-operator*))
(defun make-instances-precompiled ()
(funcall *make-instances-operator*))
;; the code in the function MAKE-INSTANCES is very difficult to
;; optimize, because the arguments to MAKE-INSTANCE are not constant.
;; This test attempts to simulate the common case where some of the
;; parameters to MAKE-INSTANCE are constants.
(defclass a-simple-base-class ()
((attribute-one :accessor attribute-one
:initarg :attribute-one
:type string)))
(defclass a-derived-class (a-simple-base-class)
((attribute-two :accessor attribute-two
:initform 42
:type integer)))
(defun make-instances/simple ()
(dotimes (i 5000)
(make-instance 'a-derived-class
:attribute-one "The first attribute"))
(dotimes (i 5000)
(make-instance 'a-derived-class
:attribute-one "The non-defaulting attribute")))
(defun methodcall/simple (num)
(dotimes (i 5000)
(simple-method (aref *instances* num) i)))
(defun methodcalls/simple ()
(dotimes (w +hierarchy-width+)
(methodcall/simple w)))
(defun methodcalls/simple+after ()
(add-after-methods)
(dotimes (w +hierarchy-width+)
(methodcall/simple w)))
#-(or poplog)
(defun methodcall/complex (num)
(dotimes (i 5000)
(complex-method (aref *instances* num) i)))
#-(or poplog)
(defun methodcalls/complex ()
(dotimes (w +hierarchy-width+)
(methodcall/complex w)))
;;; CLOS implementation of the Fibonnaci function, with EQL specialization
(defmethod eql-fib ((x (eql 0)))
1)
(defmethod eql-fib ((x (eql 1)))
1)
; a method for all other cases
(defmethod eql-fib (x)
(+ (eql-fib (- x 1))
(eql-fib (- x 2))))
;; EOF

232
cl-bench/files/clos.lisp Normal file
View File

@@ -0,0 +1,232 @@
;; clos.lisp -- CLOS benchmarking code
;;
;; Author: Eric Marsden <emarsden@laas.fr>
;; Time-stamp: <2003-12-30 emarsden>
;;
;;
;; This file does some benchmarking of CLOS functionality. It creates
;; a class hierarchy of the form
;;
;; class-0-0
;; / | \
;; / | \
;; / | \
;; class-0-1 class-1-1 . class-2-1
;; | / | . . / |
;; | / . | . / |
;; | / . | / |
;; class-0-2 class-1-2 class-2-2
;;
;;
;; where the shape of the hierarchy is controlled by the parameters
;; +HIERARCHY-DEPTH+ and +HIERARCHY-WIDTH+. Note that classes to the
;; left of the diagram have more superclasses than those to the right.
;; It then defines methods specializing on each class (simple methods,
;; after methods and AND-type method combination), and
;; INITIALIZE-INSTANCE methods. The code measures the speed of
;;
;; - creation of the class hierarchy (time taken to compile and
;; execute the DEFCLASS forms)
;;
;; - instance creation
;;
;; - method definition (time taken to compile and execute the
;; DEFMETHOD forms)
;;
;; - execution of "simple" method invocations, both with and
;; without :after methods
;;
;; - execution of "complex" method invocations (using AND-type
;; method combination)
;;
;;
;; This code is probably not representative of real usage of CLOS, but
;; should give an idea of the speed of a particular CLOS
;; implementation.
;;
;; Note: warnings about undefined accessors and types are normal when
;; compiling this code.
(in-package :cl-bench.clos)
(eval-when (:compile-toplevel :load-toplevel :execute)
(defconstant +hierarchy-depth+ 10)
(defconstant +hierarchy-width+ 5))
;; the level-0 hierarchy
(defclass class-0-0 () ())
(defvar *instances* (make-array #.+hierarchy-width+ :element-type 'class-0-0))
(defgeneric simple-method (a b))
(defmethod simple-method ((self class-0-0) other) other)
#-(or clisp poplog)
(defgeneric complex-method (a b &rest rest)
(:method-combination and))
#-(or clisp poplog)
(defmethod complex-method and ((self class-0-0) other &rest rest)
(declare (ignore rest))
other)
(defmacro make-class-name (depth width)
(with-standard-io-syntax
`(intern (format nil "CLASS-~d-~d" ,depth ,width))))
(defmacro make-attribute-name (depth width)
(with-standard-io-syntax
`(intern (format nil "ATTRIBUTE-~d-~d" ,depth ,width))))
(defmacro make-initarg-name (depth width)
(with-standard-io-syntax
`(intern (format nil "INITARG-~d-~d" ,depth ,width) :keyword)))
(defmacro make-accessor-name (depth width)
(with-standard-io-syntax
`(intern (format nil "GET-ATTRIBUTE-~d-~d" ,depth ,width))))
(defmacro class-definition (depth width)
`(defclass ,(make-class-name depth width)
,(loop :for w :from width :below #.+hierarchy-width+
:collect (make-class-name (1- depth) w))
(( ,(make-attribute-name depth width)
:initarg ,(make-initarg-name depth width)
:initform (* ,depth ,width)
:accessor ,(make-accessor-name depth width)))))
(defmacro init-instance-definition (depth width)
`(defmethod initialize-instance :after ((self ,(make-class-name depth width)) &rest initargs)
(declare (ignore initargs))
(incf (,(make-accessor-name depth width) self))))
(defmacro simple-method-definition (depth width)
`(defmethod simple-method ((self ,(make-class-name depth width))
(n number))
(* n (call-next-method) (,(make-accessor-name depth width) self))))
(defmacro complex-method-definition (depth width)
`(defmethod complex-method and ((self ,(make-class-name depth width))
(n number) &rest rest)
(declare (ignore rest))
(,(make-accessor-name depth width) self)))
(defmacro after-method-definition (depth width)
`(defmethod simple-method :after ((self ,(make-class-name depth width))
(n number))
(setf (,(make-accessor-name depth width) self) ,(* depth width width))))
(defun defclass-forms ()
(let (forms)
(loop :for width :to #.+hierarchy-width+ :do
(push `(defclass ,(make-class-name 1 width) (class-0-0) ()) forms))
(loop :for dpth :from 2 :to +hierarchy-depth+ :do
(loop :for wdth :to #.+hierarchy-width+ :do
(push `(class-definition ,dpth ,wdth) forms)
(push `(init-instance-definition ,dpth ,wdth) forms)))
(nreverse forms)))
(defun defmethod-forms ()
(let (forms)
(loop :for dpth :from 2 to #.+hierarchy-depth+ :do
(loop :for wdth :to #.+hierarchy-width+ :do
(push `(simple-method-definition ,dpth ,wdth) forms)
#-(or clisp poplog)
(push `(complex-method-definition ,dpth ,wdth) forms)))
(nreverse forms)))
(defun after-method-forms ()
(let (forms)
(loop :for depth :from 2 :to #.+hierarchy-depth+ :do
(loop :for width :to #.+hierarchy-width+ :do
(push `(after-method-definition ,depth ,width) forms)))
(nreverse forms)))
(defun run-defclass ()
(funcall (compile nil `(lambda () ,@(defclass-forms)))))
(defun run-defmethod ()
(funcall (compile nil `(lambda () ,@(defmethod-forms)))))
(defun add-after-methods ()
(funcall (compile nil `(lambda () ,@(after-method-forms)))))
(defun make-instances ()
(dotimes (i 5000)
(dotimes (w #.+hierarchy-width+)
(setf (aref *instances* w)
(make-instance (make-class-name #.+hierarchy-depth+ w)
(make-initarg-name #.+hierarchy-depth+ w) 42))
`(incf (slot-value (aref *instances* w) ',(make-attribute-name #.+hierarchy-depth+ w))))))
;; the code in the function MAKE-INSTANCES is very difficult to
;; optimize, because the arguments to MAKE-INSTANCE are not constant.
;; This test attempts to simulate the common case where some of the
;; parameters to MAKE-INSTANCE are constants.
(defclass a-simple-base-class ()
((attribute-one :accessor attribute-one
:initarg :attribute-one
:type string)))
(defclass a-derived-class (a-simple-base-class)
((attribute-two :accessor attribute-two
:initform 42
:type integer)))
(defun make-instances/simple ()
(dotimes (i 5000)
(make-instance 'a-derived-class
:attribute-one "The first attribute"))
(dotimes (i 5000)
(make-instance 'a-derived-class
:attribute-one "The non-defaulting attribute")))
(defun methodcall/simple (num)
(dotimes (i 5000)
(simple-method (aref *instances* num) i)))
(defun methodcalls/simple ()
(dotimes (w #.+hierarchy-width+)
(methodcall/simple w)))
(defun methodcalls/simple+after ()
(add-after-methods)
(dotimes (w #.+hierarchy-width+)
(methodcall/simple w)))
#-(or clisp poplog)
(defun methodcall/complex (num)
(dotimes (i 5000)
(complex-method (aref *instances* num) i)))
#-(or clisp poplog)
(defun methodcalls/complex ()
(dotimes (w #.+hierarchy-width+)
(methodcall/complex w)))
;;; CLOS implementation of the Fibonnaci function, with EQL specialization
(defmethod eql-fib ((x (eql 0)))
1)
(defmethod eql-fib ((x (eql 1)))
1)
; a method for all other cases
(defmethod eql-fib (x)
(+ (eql-fib (- x 1))
(eql-fib (- x 2))))
(defun run-eql-fib ()
(eql-fib 30))
;; EOF

43
cl-bench/files/crc40.lisp Normal file
View File

@@ -0,0 +1,43 @@
;;; crc40.lisp -- a CR calculation that uses 40 bit integers
;;
;; from Raymond Toy
(in-package :cl-bench.crc)
(declaim (inline crc-division-step))
(defun crc-division-step (bit rmdr poly msb-mask)
(declare (type (signed-byte 56) rmdr poly msb-mask)
(type bit bit))
;; Shift in the bit into the LSB of the register (rmdr)
(let ((new-rmdr (logior bit (* rmdr 2))))
;; Divide by the polynomial, and return the new remainder
(if (zerop (logand msb-mask new-rmdr))
new-rmdr
(logxor new-rmdr poly))))
(defun compute-adjustment (poly n)
(declare (type (signed-byte 56) poly)
(fixnum n))
;; Precompute X^(n-1) mod poly
(let* ((poly-len-mask (ash 1 (1- (integer-length poly))))
(rmdr (crc-division-step 1 0 poly poly-len-mask)))
(dotimes (k (- n 1))
(setf rmdr (crc-division-step 0 rmdr poly poly-len-mask)))
rmdr))
(defun calculate-crc40 (iterations)
(declare (fixnum iterations))
(let ((crc-poly 1099587256329)
(len 3014633)
(answer 0))
(dotimes (k iterations)
(declare (fixnum k))
(setf answer (compute-adjustment crc-poly len)))
answer))
(defun run-crc40 ()
(calculate-crc40 10))
;; EOF

295
cl-bench/files/deflate.lisp Normal file
View File

@@ -0,0 +1,295 @@
;;;; -*- Mode: Lisp; Syntax: Common-Lisp; Package: png; -*-
;;;; ------------------------------------------------------------------------------------------
;;;; Title: The DEFLATE Compression (rfc1951)
;;;; Created: Thu Apr 24 22:12:58 1997
;;;; Author: Gilbert Baumann <unk6@rz.uni-karlsruhe.de>
;;;; ------------------------------------------------------------------------------------------
;;;; (c) copyright 1997,1998 by Gilbert Baumann
(in-package :cl-bench.deflate)
;; Note: This implementation is inherently sloooow. On the other hand
;; it is safe and complete and easily verify-able. See
;; <URL:http://www.gzip.org/zlib/feldspar.html> for an explanation of
;; the algorithm.
;; these DEFVAR used to be DEFCONSTANT, but with a very strict reading
;; of CLtS, they are not truly constant.
(defvar +length-encoding+
'#((0 3) (0 4) (0 5) (0 6) (0 7) (0 8)
(0 9) (0 10) (1 11) (1 13) (1 15) (1 17)
(2 19) (2 23) (2 27) (2 31) (3 35) (3 43)
(3 51) (3 59) (4 67) (4 83) (4 99) (4 115)
(5 131) (5 163) (5 195) (5 227) (0 258) ))
(defvar +dist-encoding+
'#( (0 1) (0 2) (0 3) (0 4) (1 5) (1 7)
(2 9) (2 13) (3 17) (3 25) (4 33) (4 49)
(5 65) (5 97) (6 129) (6 193) (7 257) (7 385)
(8 513) (8 769) (9 1025) (9 1537) (10 2049) (10 3073)
(11 4097) (11 6145) (12 8193) (12 12289) (13 16385) (13 24577)))
(defvar +fixed-huffman-code-lengths+
(let ((res (make-array 288)))
(loop for i from 0 to 143 do (setf (aref res i) 8))
(loop for i from 144 to 255 do (setf (aref res i) 9))
(loop for i from 256 to 279 do (setf (aref res i) 7))
(loop for i from 280 to 287 do (setf (aref res i) 8))
res))
(defstruct bit-stream
(octets nil :type (vector (unsigned-byte 8))) ;a vector of octets
(pos 0 :type fixnum)) ;bit position within octet stream
(declaim (inline bit-stream-read-bit))
(declaim (inline bit-stream-read-byte))
(defun bit-stream-read-bit (source)
(prog1
(the fixnum
(logand (the fixnum #x1)
(the fixnum
(ash (the fixnum
(aref (the (array (unsigned-byte 8) (*)) (bit-stream-octets source))
(the fixnum (ash (the fixnum (bit-stream-pos source)) -3))))
(the fixnum (- (the fixnum (logand (the fixnum (bit-stream-pos source)) (the fixnum #x7)))))))))
(incf (the fixnum (bit-stream-pos source)))))
(defun bit-stream-read-byte (source n)
"Read one unsigned byte of width 'n' from the bit stream 'source'."
(let* ((data (bit-stream-octets source))
(pos (bit-stream-pos source))
(i (ash pos -3)))
(declare (type fixnum i)
(type fixnum pos))
(prog1
(logand
(the fixnum (1- (the fixnum (ash 1 (the fixnum n)))))
(the fixnum
(logior
(the fixnum (ash (aref (the (array (unsigned-byte 8) (*)) data) i) (- (logand pos #x7))))
(the fixnum (ash (aref (the (array (unsigned-byte 8) (*)) data) (+ i 1)) (+ (- 8 (logand pos #x7)))))
(the fixnum (ash (aref (the (array (unsigned-byte 8) (*)) data) (+ i 2)) (+ (- 16 (logand pos #x7)))))
#|(the fixnum (ash (aref (the (simple-array (unsigned-byte 8) (*)) data) (+ i 3)) (+ (- 24 (logand pos #x7)))))|#
)))
(incf (the fixnum (bit-stream-pos source)) (the fixnum n)) )))
(defun bit-stream-read-reversed-byte (source n)
"Read one unsigned byte of width 'n' from the bit stream 'source'."
(let ((res 0))
(dotimes (k n res)
(setf res (logior res (ash (bit-stream-read-bit source) (1- (- n k))))) )))
(defun bit-stream-skip-to-byte-boundary (bs)
(setf (bit-stream-pos bs) (* 8 (floor (+ 7 (bit-stream-pos bs)) 8))))
(defun bit-stream-read-symbol (source tree)
"Read one symbol (code) from the bit-stream source using the huffman code provided by 'tree'."
(do ()
((atom tree) tree)
(setf tree (if (zerop (bit-stream-read-bit source)) (car tree) (cdr tree)))))
(defun build-huffman-tree (lengthen)
"Build up a huffman tree given a vector of code lengthen as described in RFC1951."
(let* ((max-bits (reduce #'max (map 'list #'identity lengthen)))
(max-symbol (1- (length lengthen)))
(bl-count (make-array (+ 1 max-bits) :initial-element 0))
(next-code (make-array (+ 1 max-bits) :initial-element 0))
(ht nil))
(dotimes (i (Length lengthen))
(let ((x (aref lengthen i)))
(unless (zerop x)
(incf (aref bl-count x)))))
(let ((code 0))
(loop
for bits from 1 to max-bits
do
(progn
(setf code (ash (+ code (aref bl-count (1- bits))) 1))
(setf (aref next-code bits) code))))
(loop
for n from 0 to max-symbol
do
(let ((len (aref lengthen n)))
(unless (zerop len)
(setf ht (huffman-insert ht len (aref next-code len) n))
(incf (aref next-code len)) )))
ht ))
(defun huffman-insert (ht len code sym)
(cond ((= 0 len)
(assert (null ht))
sym)
((logbitp (- len 1) code)
(unless (consp ht) (setq ht (cons nil nil)))
(setf (cdr ht) (huffman-insert (cdr ht) (1- len) code sym))
ht)
(t
(unless (consp ht) (setq ht (cons nil nil)))
(setf (car ht) (huffman-insert (car ht) (1- len) code sym))
ht) ))
(defun rfc1951-read-huffman-code-lengthen (source code-length-huffman-tree number)
(let ((res (make-array number :initial-element 0))
(i 0))
(do ()
((= i number))
(let ((qux (bit-stream-read-symbol source code-length-huffman-tree)))
(case qux
(16
(let ((cnt (+ 3 (bit-stream-read-byte source 2))))
(dotimes (k cnt)
(setf (aref res (+ i k)) (aref res (- i 1))))
(incf i cnt)))
(17
(let ((cnt (+ 3 (bit-stream-read-byte source 3))))
(dotimes (k cnt)
(setf (aref res (+ i k)) 0))
(incf i cnt)))
(18
(let ((cnt (+ 11 (bit-stream-read-byte source 7))))
(dotimes (k cnt)
(setf (aref res (+ i k)) 0))
(incf i cnt)))
(otherwise
(setf (aref res i) qux)
(incf i)) )))
res))
(defun rfc1951-read-length-dist (source code hdists-ht)
(values
(+ (cadr (aref +length-encoding+ (- code 257)))
(bit-stream-read-byte source (car (aref +length-encoding+ (- code 257)))))
(let ((dist-sym (if hdists-ht
(bit-stream-read-symbol source hdists-ht)
(bit-stream-read-reversed-byte source 5) )))
(+ (cadr (aref +dist-encoding+ dist-sym))
(bit-stream-read-byte source (car (aref +dist-encoding+ dist-sym)))) ) ))
(defun rfc1951-uncompress-octets (octets &key (start 0))
(let ((res (make-array 0 :element-type '(unsigned-byte 8) :fill-pointer 0 :adjustable t))
(ptr 0))
(rfc1951-uncompress-bit-stream (make-bit-stream :octets octets :pos (* 8 start))
#'(lambda (buf n)
(progn
(adjust-array res (list (+ ptr n)))
(setf (fill-pointer res) (+ ptr n))
(replace res buf
:start1 ptr :end1 (+ ptr n)
:start2 0 :end2 n)
(incf ptr n))))
res))
(defun rfc1951-uncompress-bit-stream (bs cb)
(let (final? ctype
(buffer (make-array #x10000 :element-type '(unsigned-byte 8)))
(bptr 0))
(declare (type (simple-array (unsigned-byte 8) (#x10000)) buffer)
(type fixnum bptr))
(macrolet ((put-byte (byte)
`(let ((val ,byte))
(setf (aref buffer bptr) (the (unsigned-byte 8) val))
(setf bptr (the fixnum (logand #xffff (the fixnum (+ bptr 1)))))
(when (zerop bptr)
(funcall cb buffer #x10000) ))))
(loop
(setf final? (= (bit-stream-read-bit bs) 1)
ctype (bit-stream-read-byte bs 2))
(ecase ctype
(0
;; no compression
(bit-stream-skip-to-byte-boundary bs)
(let ((len (bit-stream-read-byte bs 16))
(nlen (bit-stream-read-byte bs 16)))
(assert (= (logand #xFFFF (lognot nlen)) len))
(dotimes (k len)
(put-byte (bit-stream-read-byte bs 8)))))
(1
;; compressed with fixed Huffman code
(let ((literal-ht (build-huffman-tree +fixed-huffman-code-lengths+)))
(do ((x (bit-stream-read-symbol bs literal-ht) (bit-stream-read-symbol bs literal-ht)))
((= x 256))
(cond ((<= 0 x 255)
(put-byte x))
(t
(multiple-value-bind (length dist) (rfc1951-read-length-dist bs x nil)
(dotimes (k length)
(put-byte (aref buffer (logand (- bptr dist) #xffff)))))) )) ))
(2
;; compressed with dynamic Huffman codes
(let* ((hlit (+ 257 (bit-stream-read-byte bs 5))) ;number of literal code lengths
(hdist (+ 1 (bit-stream-read-byte bs 5))) ;number of distance code lengths
(hclen (+ 4 (bit-stream-read-byte bs 4))) ;number of code lengths for code
(hclens (make-array 19 :initial-element 0)) ; length huffman tree
literal-ht distance-ht code-len-ht)
;; slurp the code lengths code lengths
(loop
for i from 1 to hclen
for j in '(16 17 18 0 8 7 9 6 10 5 11 4 12 3 13 2 14 1 15)
do (setf (aref hclens j) (bit-stream-read-byte bs 3)))
;; slurp the huffman trees for literals and distances
(setf code-len-ht (build-huffman-tree hclens))
(setf literal-ht (build-huffman-tree (rfc1951-read-huffman-code-lengthen bs code-len-ht hlit))
distance-ht (build-huffman-tree (rfc1951-read-huffman-code-lengthen bs code-len-ht hdist)))
;; actually slurp the contents
(do ((x (bit-stream-read-symbol bs literal-ht) (bit-stream-read-symbol bs literal-ht)))
((= x 256))
(cond ((<= 0 x 255)
(put-byte x))
(t
(multiple-value-bind (length dist) (rfc1951-read-length-dist bs x distance-ht)
(dotimes (k length)
(put-byte (aref buffer (logand (- bptr dist) #xffff)))))) )) )) )
(when final?
(funcall cb buffer bptr)
(return-from rfc1951-uncompress-bit-stream 'foo)) ))))
;; deflate a gzipped file. Requires reading the header, putting the
;; data into an array, and deflating the array. The format of the
;; header is documented in RFC1952.
(defun test-deflate-file (filename)
(let ((compressed (make-array 0 :adjustable t
:fill-pointer t
:element-type '(unsigned-byte 8)))
(header-flags 0)
(xlen 0))
(with-open-file (in filename :direction :input
:element-type '(unsigned-byte 8))
(unless (and (= #x1f (read-byte in))
(= #x8b (read-byte in)))
(error "~a is not a gzipped file" filename))
(unless (= #x8 (read-byte in))
(error "~a is not using deflate compression" filename))
(setq header-flags (read-byte in))
;; skip over the modification time + XFL + OS marker
(dotimes (i 6) (read-byte in))
(unless (zerop (logand header-flags 4)) ; contains FEXTRA data
(incf xlen (read-byte in))
(incf xlen (* 256 (read-byte in)))
(dotimes (i xlen) (read-byte in)))
(unless (zerop (logand header-flags 8)) ; contains FNAME data
(loop :until (zerop (read-byte in))))
(unless (zerop (logand header-flags 16)) ; contains FCOMMENT data
(loop :until (zerop (read-byte in))))
(unless (zerop (logand header-flags 2)) ; contains FHCRC
(read-byte in)
(read-byte in))
(loop :for byte = (read-byte in nil)
:while byte :do (vector-push-extend byte compressed)))
(rfc1951-uncompress-octets compressed)
(values)))
(defun run-deflate-file ()
(test-deflate-file "files/message.gz"))
;; EOF

1801
cl-bench/files/gabriel.lisp Normal file

File diff suppressed because it is too large Load Diff

60
cl-bench/files/hash.lisp Normal file
View File

@@ -0,0 +1,60 @@
;; hashtable and READ-LINE benchmarking code
;;
;; some code by Paul Foley
;; Time-stamp: <2003-12-23 emarsden>
(in-package :cl-bench.hash)
(defun read-many-lines (file)
(with-open-file (f file :direction :input)
(loop :for l = (read-line f nil)
:while l
:count (length l))))
(defun run-slurp-lines ()
(cond ((probe-file "/usr/share/dict/words")
(read-many-lines "/usr/share/dict/words"))
((probe-file "/usr/dict/words")
(read-many-lines "/usr/dict/words"))))
(eval-when (:compile-toplevel :load-toplevel)
(defconstant +digit+ "0123456789ABCDEF")
(defconstant +digits-needed+
#((10 100 1000 10000 100000 10000000 100000000 536870911)
(16 256 4096 65536 1048576 16777216 268435456 4294967296 536870911))))
(defvar *table* nil)
(defun fixnum-to-string (n base)
(declare (fixnum n base))
(let* ((tsize (position-if (lambda (x) (> (the fixnum x) n))
(aref +digits-needed+ (ash base -4))))
(result (make-string (1+ tsize))))
(loop for i fixnum from tsize downto 0 with q fixnum = n and r fixnum = 0
do (multiple-value-setq (q r) (floor q base))
(setf (schar result i) (aref +digit+ r)))
result))
;; CMUCL-18c seems to run into a bug here: it mistakenly declares
;; counter to be a fixnum
(defun hash-strings (&optional (size 300))
(declare (fixnum size))
(setq *table* (make-hash-table :test #'equal :size size))
(dotimes (i 100000)
(setf (gethash (fixnum-to-string i 16) *table*) i))
(maphash (lambda (key value) (incf (gethash key *table*) value)) *table*))
(defun hash-integers (&optional (size 300))
(declare (fixnum size))
(setq *table* (make-hash-table :test #'eql :size size))
(dotimes (i 100000)
(setf (gethash i *table*) (1+ i)))
(maphash (lambda (key value) (incf (gethash key *table*) value)) *table*))
;; EOF

181
cl-bench/files/math.lisp Normal file
View File

@@ -0,0 +1,181 @@
;;; math.lisp -- various numerical operations
;;
;; Time-stamp: <2004-01-05 emarsden>
;;
;; some basic mathematical benchmarks
(in-package :cl-bench.math)
(defun factorial (n)
(declare (type integer n))
(if (zerop n) 1
(* n (factorial (1- n)))))
(defun run-factorial ()
(declare (inline factorial))
(factorial 500))
(defun fib (n)
(declare (type integer n))
(if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2)))))
(defun run-fib ()
(declare (inline fib))
(fib 25))
(defun fib-ratio (n)
(declare (type integer n))
(labels ((fr (n)
(if (= n 1) 1
(1+ (/ (fr (- n 1)))))))
(numerator (fr n))))
(defun run-fib-ratio ()
(declare (inline fib-ratio))
(fib-ratio 150))
;; The Ackermann function is the simplest example of a well-defined total
;; function which is computable but not primitive recursive, providing a
;; counterexample to the belief in the early 1900s that every computable
;; function was also primitive recursive (Dtzel 1991). It grows faster
;; than an exponential function, or even a multiple exponential function.
(defun ackermann (m n)
(declare (type integer m n))
(cond
((zerop m) (1+ n))
((zerop n) (ackermann (1- m) 1))
(t (ackermann (1- m) (ackermann m (1- n))))))
(defun run-ackermann ()
(ackermann 3 11))
;; calculate the "level" of a point in the Mandebrot Set, which is the
;; number of iterations taken to escape to "infinity" (points that
;; don't escape are included in the Mandelbrot Set). This version is
;; intended to test performance when programming in naïve math-style.
(defun mset-level/complex (c)
(declare (type complex c))
(loop :for z = #c(0 0) :then (+ (* z z) c)
:for iter :from 1 :to 300
:until (> (abs z) 4.0)
:finally (return iter)))
;; this version is intended to test lower-level performance-oriented
;; coding of the same function; hence the extra declarations and the
;; decoding of the operations on complex numbers.
(defun mset-level/dfloat (c1 c2)
(declare (type double-float c1 c2))
(let ((z1 0.0d0)
(z2 0.0d0)
(aux 0.0d0))
(declare (double-float z1 z2 aux))
(do ((iter 0 (1+ iter)))
((or (> (abs (+ (* z1 z1) (* z2 z2))) 4.0)
(> iter 300))
iter)
(setq aux z1
z1 (+ (* z1 z1) (- (* z2 z2)) c1)
z2 (+ (* 2.0d0 z2 aux) c2)))))
(defun run-mandelbrot/complex ()
(let ((n 100)
(sum 0))
(dotimes (i n)
(incf sum (mset-level/complex (complex 0.0001d0 (/ i n 0.25d0)))))))
(defun run-mandelbrot/dfloat ()
(let ((n 100)
(sum 0))
(dotimes (i n)
(incf sum (mset-level/dfloat 0.0001d0 (/ i n 0.25d0))))))
;; Common Lisp implementation of the multiple recursive random number
;; generator (MRG) of l'Ecuyer. Written by Raymond Toy.
(eval-when (:compile-toplevel :load-toplevel)
(defconstant +m1+ 4294967087d0)
(defconstant +m2+ 4294944443d0)
(defconstant +a12+ 1403580d0)
(defconstant +a13n+ 810728d0)
(defconstant +a21+ 527612d0)
(defconstant +a23n+ 1370589d0)
(defconstant +norm+ (/ (1+ +m1+))))
(declaim (inline mrg32k3a-comp-1 mrg32k3a-comp-2))
(defun mrg32k3a-comp-1 (state)
(declare (type (simple-array double-float (6)) state)
(optimize (speed 3) (safety 0)))
(let ((s10 (aref state 0))
(s11 (aref state 1)))
(declare (type (double-float 0d0 4294967086d0) s10 s11))
(let* ((p1 (- (* s11 +a12+) (* s10 +a13n+)))
(k (ftruncate (/ p1 +m1+)))
(p1b (- p1 (* k +m1+)))
(p1c (if (< p1b 0)
(+ p1b +m1+)
p1b)))
(shiftf (aref state 0)
(aref state 1)
(aref state 2)
p1c)
p1c)))
(defun mrg32k3a-comp-2 (state)
(declare (type (simple-array double-float (6)) state)
(optimize (speed 3) (safety 0)))
(let ((s20 (aref state 3))
(s22 (aref state 5)))
(declare (type (double-float 0d0 4294944442d0) s20 s22))
(let* ((p2 (- (* s22 +a21+) (* s20 +a23n+)))
(k (ftruncate (/ p2 +m2+)))
(p2b (- p2 (* k +m2+)))
(p2c (if (< p2b 0)
(+ p2b +m2+)
p2b)))
(shiftf (aref state 3)
(aref state 4)
(aref state 5)
p2c)
p2c)))
(declaim (inline mrg32k3a))
(defun mrg32k3a (state)
(declare (type (simple-array double-float (6)) state)
(optimize (speed 3) (safety 0)))
(let ((p1 (mrg32k3a-comp-1 state))
(p2 (mrg32k3a-comp-2 state)))
(if (<= p1 p2)
(* (+ (- p1 p2) +m1+) +norm+)
(* (- p1 p2) +norm+))))
(defun gen-mrg32 (n state)
(declare (fixnum n)
(optimize (speed 3) (safety 0)))
(let ((y 0d0))
(dotimes (k n)
(declare (fixnum k))
(setf y (mrg32k3a state)))
(* 3 y)))
(defun gen-ran (n)
(declare (fixnum n)
(optimize (speed 3) (safety 0)))
(let ((y 0d0))
(dotimes (k n)
(declare (fixnum k))
(setf y (random 1d0)))
(* 3 y)))
(defun run-mrg32k3a ()
(declare (inline gen-ran))
(gen-ran 1000000))
;; EOF

BIN
cl-bench/files/message.gz Normal file

Binary file not shown.

188
cl-bench/files/misc.lisp Normal file
View File

@@ -0,0 +1,188 @@
;;; misc.lisp
;;;
;;; Time-stamp: <2004-06-28 emarsden>
(in-package :cl-bench.misc)
(defun run-compiler ()
(compile-file (make-pathname :directory '(:relative "files")
:name "gabriel"
:type "olisp")
:print nil
#-gcl :verbose #-gcl nil))
(defun run-fasload ()
(load
(compile-file-pathname
(make-pathname :directory '(:relative "files")
:name "gabriel"
:type "olisp"))))
;; by Gene Luks (adapted from the Larceny benchmarks)
;
; Procedure P_n generates a grey code of all perms of n elements
; on top of stack ending with reversal of starting sequence
;
; F_n is flip of top n elements.
;
; procedure P_n
; if n>1 then
; begin
; repeat P_{n-1},F_n n-1 times;
; P_{n-1}
; end
(defun permutations (x)
(let* ((x x)
(perms (list x)))
(labels ((P (n)
(if (> n 1)
(do ((j (- n 1) (- j 1)))
((zerop j)
(P (- n 1)))
(P (- n 1))
(F n))))
(F (n)
(setf x (revloop x n (list-tail x n)))
(push x perms))
(revloop (x n y)
(if (zerop n) y
(revloop (cdr x)
(- n 1)
(cons (car x) y))))
(list-tail (x n)
(if (zerop n) x
(list-tail (cdr x) (- n 1)))))
(P (length x))
perms)))
(defun iota (n)
(do ((n n (- n 1))
(p '() (cons n p)))
((zerop n) p)))
(defun run-permutations ()
(let* ((perms (permutations (iota 9)))
(sums (mapcar (lambda (l) (reduce '+ l)) perms)))
(assert (eql 1 (length (remove-duplicates sums))))))
;; Destructive merge of two sorted lists.
;; From Hansen's MS thesis.
(defun merge! (a b predicate)
(labels ((merge-loop (r a b)
(cond ((funcall predicate (car b) (car a))
(setf (cdr r) b)
(if (null (cdr b))
(setf (cdr b) a)
(merge-loop b a (cdr b))))
(t ; (car a) <= (car b)
(setf (cdr r) a)
(if (null (cdr a))
(setf (cdr a) b)
(merge-loop a (cdr a) b))))))
(cond ((null a) b)
((null b) a)
((funcall predicate (car b) (car a))
(if (null (cdr b))
(setf (cdr b) a)
(merge-loop b a (cdr b)))
b)
(t ; (car a) <= (car b)
(if (null (cdr a))
(setf (cdr a) b)
(merge-loop a (cdr a) b))
a))))
;; Stable sort procedure which copies the input list and then sorts
;; the new list imperatively. On the systems we have benchmarked,
;; this generic list sort has been at least as fast and usually much
;; faster than the library's sort routine.
;; Due to Richard O'Keefe; algorithm attributed to D.H.D. Warren.
(defun sort! (seq predicate)
(labels ((astep (n)
(cond ((> n 2)
(let* ((j (truncate n 2))
(a (astep j))
(k (- n j))
(b (astep k)))
(merge! a b predicate)))
((= n 2)
(let ((x (car seq))
(y (cadr seq))
(p seq))
(setf seq (cddr seq))
(when (funcall predicate y x)
(setf (car p) y)
(setf (cadr p) x))
(setf (cddr p) nil)
p))
((= n 1)
(let ((p seq))
(setf seq (cdr seq))
(setf (cdr p) nil)
p))
(t nil))))
(astep (length seq))))
(defun integer-hash (key)
(declare (type (unsigned-byte 32) key))
(flet ((u32* (a b) (ldb (byte 32 0) (* a b)))
(u32-right-shift (integer count)
(ldb (byte 32 0) (ash integer count))))
(u32* (u32-right-shift key 3) 2654435761)))
(defun make-big-list (n)
(let ((list (list)))
(dotimes (i n)
(push (integer-hash n) list))
list))
(defparameter *big-seq-list* nil)
(defparameter *big-mess-list* nil)
;; This setup function is called before the main benchmark function,
;; without an intervening GC. The allocation time here doesn't count
;; towards the benchmark. It's important to avoid an intervening GC,
;; because the compaction resulting from the collector could skew
;; results (esp for /mess below).
(defun setup-walk-list/seq ()
(setf *big-seq-list* (make-big-list 2000000)))
;; walk the list to calculate its length
(defun walk-list/seq ()
(let (before after)
(setf before (length *big-seq-list*))
(push 42 *big-seq-list*)
(setf after (length *big-seq-list*))
(assert (eql after (1+ before)))
(setq *big-seq-list* nil)))
;; allocate a large list of fixnums, and merge-sort the list so that
;; pointers in the list are maximally spread out over memory.
(defun setup-walk-list/mess ()
(setf *big-mess-list* (make-big-list 2000000))
(sort! *big-mess-list* #'<))
(defun walk-list/mess ()
(let ((before 0)
(after 0))
(dolist (i *big-mess-list*)
(incf before))
(push 42 *big-mess-list*)
(dolist (i *big-mess-list*)
(incf after))
(assert (eql after (1+ before)))
(setq *big-mess-list* nil)))
;; EOF

157
cl-bench/files/ratios.lisp Normal file
View File

@@ -0,0 +1,157 @@
;;; ratios.lisp -- calculate digits of pi using ratios
;;
;; Time-stamp: <2003-12-29 emarsden>
;;
;;
;; This code was posted to comp.lang.lisp on 2001-12-28 by Vladimir
;; Nesterovsky <vnestr@netvision.net.il>, in message
;; <tcio2ucso5eace5dc7cnbevpc2nqukgofh@4ax.com>.
;;
;; "I played with some code to calculate more then 16 digits of the
;; number pi. The simplest series for pi/4 is 1 - 1/3 + 1/5 - 1/7 ...
;; with Euler transformation applied on its partial sums series
;; several times (as per SICP).
;;
;; Of course denominators will grow very fast if we'd just
;; perform calculations in a straightforward manner which would
;; make them very slow, so I tried to "adjust" the ratio to having
;; no more precision than I need (say 1000 digits), which greatly
;; improved the speed and made it possible to calculate much more
;; digits of pi in the same amount of time."
;;
;;
;; sample calls (timings on PIII/550/Win98):
;; make 20 euler transforms, pre-generate 41 terms
;; (2n+1 by default because each transform shortens the list by two)
;; (find-pi 20) ; 0.3 sec CLISP // 5 sec ACL
;; make 20 euler transforms, pre-generate 100 terms
;; (find-pi 20 100) ; 0.3 sec CLISP // 6 sec ACL
;; make 20 euler transforms, pre-generate 200 terms
;; (find-pi 20 200) ; 0.3 sec // 7 sec
;; 50 transforms, 101 pre-generated terms
;; (find-pi 50) ; 2 sec // 37 sec (35 compiled)
;; 1000 pre-generated (working only on 101 last) terms for 120 digits
;; (find-pi 50 1000 120) ; 2 sec // 84 sec (80 compiled)
;; 10,000 pre-generated terms for better precision for 150 digits
;; (find-pi 50 10000 150) ; 4.5 sec // ??
;; find 1040 digits of the number pi
;; (find-pi 120 150000 1039) ; 6.5 min, CLISP // ?? ACL
(in-package :cl-bench.ratios)
(defvar *report-digits* 55) ; digits to report
(defvar *interim-precision* 95) ; decimal places of interim precision
(defvar *supress-interim-printout* t)
(defvar *supress-interim-dot-printout* t)
(defun adjust-ratio (r n)
"adjust ratio's precision to n decimal places"
(let ((base (expt 10 n)))
(multiple-value-bind (quot rem)
(round r)
(declare (ignore quot))
(if (< (denominator rem) base)
r
(/ (round (* r base)) base)))))
;; a series for pi/4 =
;; = 1 - 1/3 + 1/5 - 1/7 + 1/9 - 1/11 + ....
(defun s-pi/4 ( m &optional (len-from-end m))
"generate up to m partial sums for pi series"
(do* ((i0 (- m len-from-end))
(i 0 (+ i 1))
(s 0 (adjust-ratio (+ s xi) *interim-precision*))
(j 1 (- j))
(n 1 (+ n 2))
(xi 1 (adjust-ratio (/ j n) *interim-precision*))
(sums () (if (> i i0) (cons s sums))))
((>= i m) (reverse sums))
(unless *supress-interim-dot-printout*
(if (and (zerop (rem i 1000)) (> i 0))
(format t ".")))))
;; the Euler transformation of a partial sums series of alternating series
;; A[n] = S[n+1] - (S[n+1]-S[n])^2/(S[n-1]-2S[n]+S[n+1])
;; returns (cons 'div-zero seq) on 0/0 (to stop further transformations)
(defun euler-trans (seq)
"perform a euler transformation on a sequence"
(let ((a1 (car seq))
(a2 (cadr seq)))
(mapcar
#'(lambda(a3)
(let ((den (+ a1 (* -2 a2) a3)))
(if (zerop den) ; sould I use throw/catch here?
(return-from euler-trans (cons 'div-zero seq)))
(let ((r (- a3 (/ (sqr (- a3 a2)) den))))
(psetq a1 a2
a2 a3)
(adjust-ratio r *interim-precision*))))
(cddr seq))))
;; main function to call
(defun find-pi (n &optional
(m (+ (* 2 n) 1))
(dig *report-digits*)
(prec (+ dig 40))
(sup *supress-interim-printout*))
"(find-pi n [m d p]) to find d digits of the pi value by performing n euler
transformations on m terms of pi series with p digits of interim precision."
(let ((*report-digits* dig)
(*interim-precision* prec)
(*supress-interim-printout* sup))
(when (< m (+ (* 2 n) 1))
(setq m (+ (* 2 n) 1)))
;; (format t "~&Generating ~A terms of pi/4 series " m)
(do ((s (s-pi/4 m (+ (* 2 n) 1))
(euler-trans s))
(i 1 (+ i 1)))
((or (> i n) (eq 'div-zero (car s)))
;; (format t "~& After ~A Euler transforms:~%" (- i 1))
(n-digits (* 4 (car (last s)))
*report-digits*))
(unless *supress-interim-printout*
(format t "~&~A"
(last-n-digits
(n-digits (* 4 (car (last s)))
*report-digits*)
79))))))
(defun n-digits (v n)
"return v with n digits after decimal point as integer"
(multiple-value-bind (a b)
(round (* v (expt 10 n)))
(declare (ignore b))
a))
(defun last-n-digits (v n)
"return last n digits of long integer as integer"
(multiple-value-bind (a b)
(floor v (expt 10 n))
(declare (ignore a))
b))
(defun sqr(x)
"find the square of number"
(* x x))
;; entry point
(defun run-pi-ratios ()
(assert (eql (find-pi 20 1000 78)
(find-pi 90 0 78 81))))
#|
(find-pi 20 100) ; 55 digits
31415926535897932384626433832795028841971693993751058210
(find-pi 20 1000 78) ; 78 digits, 1000 terms
3141592653589793238462643383279502884197169399375105820974944592307816406286209
(find-pi 90 0 78 81) ; 181 terms --
; like SICP's "tableau" of streams --
; all in all slower because it produces bigger interim lists to be able
; to run much more transforms (although it exits in the middle on 0/0).
3141592653589793238462643383279502884197169399375105820974944592307816406286209
|#
;; EOF

View File

@@ -0,0 +1,433 @@
;;; richards.lisp -- operating system simulation code
;;
;; Time-stamp: <2003-12-30 emarsden>
;; ======================================================================
;; Newsgroups: comp.lang.smalltalk
;; Distribution: comp
;; Subject: Smalltalk vs. C(++) performance
;;
;; As some have pointed out, it is difficult to compare the runtime
;; performance of Smalltalk programs with the performance of equivalent C
;; programs. One reason for this is that for most non-trivial programs
;; there is no equivalent program written in the other language (because
;; it would be a non-trivial effort to write it).
;;
;; The "best" benchmark I know of is the Richards benchmark, an operating
;; system simulation. It is written in an object-oriented style, uses
;; polymorphism, and is reasonably non-trivial (700 lines). It's
;; probably not the world's greatest benchmark, but better than
;; micro-benchamrks, and it is available in Smalltalk, Self, T (an
;; object-oriented version of Scheme) and C++.
;;
;; [Historical note: the Richards benchmark was originally written in
;; BCPL by Mark Richards. Many thanks to L. Peter Deutsch for the
;; Smalltalk version.]
;;
;; Disclaimer: Richards is *not* a typical application: it is relatively
;; small and contains no graphics or other user interaction. Thus it may
;; not reflect the relative performance of Your Own Real-World (TM)
;; Application, but I think it tests the efficiency of the basic language
;; mechanisms fairly well.
(in-package :cl-bench.richards)
(eval-when (:compile-toplevel :load-toplevel :execute)
(defconstant deviceA 5)
(defconstant deviceB 6)
(defconstant devicePacketKind 1)
(defconstant handlerA 3)
(defconstant handlerB 4)
(defconstant idler 1)
(defconstant noWork nil)
(defconstant noTask nil)
(defconstant worker 2)
(defconstant workPacketKind 2))
(defvar taskList noTask)
(defvar currentTask nil)
(defvar currentTaskIdentity nil)
(defvar taskTable (make-array 6 :initial-element noTask))
(declaim (simple-vector taskTable))
(defvar tracing nil)
(defvar layout 0)
(defvar queuePacketCount 0)
(defvar holdCount 0)
(declaim (fixnum layout queuePacketCount holdCount))
(declaim (inline make-taskControlBlock make-packet make-deviceTaskDataRecord
make-handlerTaskDataRecord make-idleTaskDataRecord
make-workerTaskDataRecord wait))
(defstruct (taskControlBlock (:constructor make-taskControlBLock ()))
packetPending taskWaiting taskHolding link identity
(priority 0 :type fixnum)
input state handle)
(defstruct (packet (:constructor make-packet ()))
link identity
(kind 0 :type fixnum)
(datum 0 :type fixnum)
(data '#() :type simple-vector))
(defstruct (deviceTaskDataRecord (:constructor make-deviceTaskDataRecord ()))
pending)
(defstruct (handlerTaskDataRecord (:constructor make-handlerTaskDataRecord ()))
workIn deviceIn)
(defstruct (idleTaskDataRecord (:constructor make-idleTaskDataRecord ()))
(control 0 :type fixnum)
(count 0 :type fixnum))
(defstruct (workerTaskDataRecord (:constructor make-workerTaskDataRecord ()))
(destination 0 :type fixnum)
(count 0 :type fixnum))
(defun wait ()
(setf (taskControlBlock-taskWaiting currentTask) t)
currentTask)
;; #+cmu
;; (declaim (ext:freeze-type taskControlBlock packet deviceTaskDataRecord
;; handlerTaskDataRecord idleTaskDataRecord
;; workerTaskDataRecord))
;; #+cmu
;; (declaim (ext:start-block richards))
(defun deviceTaskDataRecord-run (self work)
(let ((functionWork work))
(if (eq noWork functionWork)
(progn
(setq functionWork (deviceTaskDataRecord-pending self))
(if (eq noWork functionWork)
(wait)
(progn
(setf (deviceTaskDataRecord-pending self) noWork)
(queuePacket functionWork))))
(progn
(setf (deviceTaskDataRecord-pending self) functionWork)
(if tracing (trace-it (packet-datum functionWork)))
(holdSelf)))))
(defun handlerTaskDataRecord-run (self work)
(if (eq noWork work)
nil
(if (= workPacketKind (packet-kind work))
(workInAdd self work)
(deviceInAdd self work)))
(let ((workPacket (handlerTaskDataRecord-workIn self)))
(if (eq noWork workPacket)
(wait)
(let ((count (packet-datum workPacket)))
(if (> count 4)
(progn
(setf (handlerTaskDataRecord-workIn self)
(packet-link workPacket))
(queuePacket workPacket))
(let ((devicePacket (handlerTaskDataRecord-deviceIn self)))
(if (eq noWork devicePacket)
(wait)
(progn
(setf (handlerTaskDataRecord-deviceIn self)
(packet-link devicePacket))
(setf (packet-datum devicePacket)
(svref (packet-data workPacket) (- count 1)))
(setf (packet-datum workPacket) (+ count 1))
(queuePacket devicePacket)))))))))
(defun idleTaskDataRecord-run (self work)
(declare (ignore work))
(setf (idleTaskDataRecord-count self)
(- (idleTaskDataRecord-count self) 1))
(if (= 0 (idleTaskDataRecord-count self))
(holdSelf)
(if (= 0 (logand (idleTaskDataRecord-control self) 1))
(progn
(setf (idleTaskDataRecord-control self)
(floor (idleTaskDataRecord-control self) 2))
(release deviceA))
(progn
(setf (idleTaskDataRecord-control self)
(logxor (floor (idleTaskDataRecord-control self) 2)
53256))
(release deviceB)))))
(defun workerTaskDataRecord-run (self work)
(if (eq noWork work)
(wait)
(progn
(setf (workerTaskDataRecord-destination self)
(if (= handlerA (workerTaskDataRecord-destination self))
handlerB
handlerA))
(setf (packet-identity work) (workerTaskDataRecord-destination self))
(setf (packet-datum work) 1)
(do ((i 0 (+ i 1)))
((> i 3) nil)
(declare (fixnum i))
(setf (workerTaskDataRecord-count self)
(+ (workerTaskDataRecord-count self) 1))
(if (> (workerTaskDataRecord-count self) 256)
(setf (workerTaskDataRecord-count self) 1))
(setf (svref (packet-data work) i)
(the fixnum
(+ (char-code #\A)
(- (workerTaskDataRecord-count self) 1)))))
(queuePacket work))))
(defun appendHead (packet queueHead)
(setf (packet-link packet) noWork)
(if (eq noWork queueHead)
packet
(let ((mouse queueHead))
(let ((link (packet-link mouse)))
(do ()
((eq noWork link) nil)
(setq mouse link)
(setq link (packet-link mouse)))
(setf (packet-link mouse) packet)
queueHead))))
(defun initialize-globals ()
(setq taskList noTask)
(setq currentTask nil)
(setq currentTaskIdentity nil)
(setq taskTable (make-array 6 :initial-element noTask))
(setq tracing nil)
(setq layout 0)
(setq queuePacketCount 0)
(setq holdCount 0))
(defun richards (&optional (iterations 1000000))
(initialize-globals)
(createIdler idler 0 noWork (running (make-taskControlBlock)))
(let ((workQ))
(setq workQ (createPacket noWork worker workPacketKind))
(setq workQ (createPacket workQ worker workPacketKind))
(createWorker worker 1000 workQ (waitingWithPacket))
(setq workQ (createPacket noWork deviceA devicePacketKind))
(setq workQ (createPacket workQ deviceA devicePacketKind))
(setq workQ (createPacket workQ deviceA devicePacketKind))
(createHandler handlerA 2000 workQ (waitingWithPacket))
(setq workQ (createPacket noWork deviceB devicePacketKind))
(setq workQ (createPacket workQ deviceB devicePacketKind))
(setq workQ (createPacket workQ deviceB devicePacketKind))
(createHandler handlerB 3000 workQ (waitingWithPacket))
(createDevice deviceA 4000 noWork (waiting))
(createDevice deviceB 5000 noWork (waiting)))
(dotimes (i iterations) (schedule))
(values))
(defun schedule ()
(setq currentTask taskList)
(do ()
((eq noTask currentTask) nil)
(if (isTaskHoldingOrWaiting currentTask)
(setq currentTask (taskControlBlock-link currentTask))
(progn
(setq currentTaskIdentity (taskControlBlock-identity currentTask))
(if tracing (trace-it currentTaskIdentity))
(setq currentTask (runTask currentTask))))))
(defun findTask (identity)
(declare (fixnum identity))
(let ((tk (svref taskTable (- identity 1))))
(if (eq noTask tk) (error "findTask failed"))
tk))
(defun holdSelf ()
(setq holdCount (+ holdCount 1))
(setf (taskControlBlock-taskHolding currentTask) t)
(taskControlBlock-link currentTask))
(defun queuePacket (packet)
(let ((tk (findTask (packet-identity packet))))
(if (eq noTask tk)
noTask
(progn
(setq queuePacketCount (+ queuePacketCount 1))
(setf (packet-link packet) noWork)
(setf (packet-identity packet) currentTaskIdentity)
(addInput tk packet currentTask)))))
(defun release (identity)
(let ((tk (findTask identity)))
(if (eq noTask tk)
noTask
(progn
(setf (taskControlBlock-taskHolding tk) nil)
(if (> (taskControlBlock-priority tk)
(taskControlBlock-priority currentTask))
tk
currentTask)))))
(defun trace-it (id)
(setq layout (- layout 1))
(if (>= 0 layout)
(progn
(format t "~%")
(setq layout 30)))
(format t "~a " id))
(defun createDevice (identity priority work state)
(let ((data (create-deviceTaskDataRecord)))
(createTask identity priority work state data)))
(defun createHandler (identity priority work state)
(let ((data (create-handlerTaskDataRecord)))
(createTask identity priority work state data)))
(defun createIdler (identity priority work state)
(let ((data (create-idleTaskDataRecord)))
(createTask identity priority work state data)))
(defun createWorker (identity priority work state)
(let ((data (create-workerTaskDataRecord)))
(createTask identity priority work state data)))
(defun createTask (identity priority work state data)
(let ((tk (create-taskControlBlock
taskList identity priority work state data)))
(setq taskList tk)
(setf (svref taskTable (- identity 1)) tk)))
(defun createPacket (link identity kind)
(create-packet link identity kind))
(defun running (tcb)
(setf (taskControlBlock-packetPending tcb) nil)
(setf (taskControlBlock-taskWaiting tcb) nil)
(setf (taskControlBlock-taskHolding tcb) nil)
tcb)
(defun waiting ()
(let ((tcb (make-taskControlBlock)))
(setf (taskControlBlock-packetPending tcb) nil)
(setf (taskControlBlock-taskWaiting tcb) t)
(setf (taskControlBlock-taskHolding tcb) nil)
tcb))
(defun waitingWithPacket ()
(let ((tcb (make-taskControlBlock)))
(setf (taskControlBlock-packetPending tcb) t)
(setf (taskControlBlock-taskWaiting tcb) t)
(setf (taskControlBlock-taskHolding tcb) nil)
tcb))
(defun isTaskHoldingOrWaiting (tcb)
(or (taskControlBlock-taskHolding tcb)
(and (not (taskControlBlock-packetPending tcb))
(taskControlBlock-taskWaiting tcb))))
(defun isWaitingWithPacket (tcb)
(and (taskControlBlock-packetPending tcb)
(and (taskControlBlock-taskWaiting tcb)
(not (taskControlBlock-taskHolding tcb)))))
(defun packetNowPending (tcb)
(setf (taskControlBlock-packetPending tcb) t)
(setf (taskControlBlock-taskWaiting tcb) nil)
(setf (taskControlBlock-taskHolding tcb) nil)
tcb)
(defun create-taskControlBlock
(link identity priority initialWorkQueue initialState privateData)
(let ((r (make-taskControlBlock)))
(setf (taskControlBlock-link r) link)
(setf (taskControlBlock-identity r) identity)
(setf (taskControlBlock-priority r) priority)
(setf (taskControlBlock-input r) initialWorkQueue)
(setf (taskControlBlock-packetPending r)
(taskControlBlock-packetPending initialState))
(setf (taskControlBlock-taskWaiting r)
(taskControlBlock-taskWaiting initialState))
(setf (taskControlBlock-taskHolding r)
(taskControlBlock-taskHolding initialState))
(setf (taskControlBlock-handle r) privateData)
(setf (taskControlBlock-state r) nil)
r))
(defun addInput (tcb packet oldTask)
(if (eq noWork (taskControlBlock-input tcb))
(progn
(setf (taskControlBlock-input tcb) packet)
(setf (taskControlBlock-packetPending tcb) t)
(if (> (taskControlBlock-priority tcb)
(taskControlBlock-priority oldTask))
tcb
oldTask))
(progn
(setf (taskControlBlock-input tcb)
(appendHead packet (taskControlBlock-input tcb)))
oldTask)))
(defun runTask (tcb)
(let ((message nil))
(if (isWaitingWithPacket tcb)
(progn
(setq message (taskControlBlock-input tcb))
(setf (taskControlBlock-input tcb) (packet-link message))
(if (eq noWork (taskControlBlock-input tcb))
(running tcb)
(packetNowPending tcb)))
(setq message noWork))
(run (taskControlBlock-handle tcb) message)))
(defun run (self work)
(typecase self
(deviceTaskDataRecord (deviceTaskDataRecord-run self work))
(handlerTaskDataRecord (handlerTaskDataRecord-run self work))
(idleTaskDataRecord (idleTaskDataRecord-run self work))
(workerTaskDataRecord (workerTaskDataRecord-run self work))))
(defun create-packet (link identity kind)
(let ((p (make-packet)))
(setf (packet-link p) link)
(setf (packet-identity p) identity)
(setf (packet-kind p) kind)
(setf (packet-datum p) 1)
(let ((v (make-array 4 :initial-element 0)))
(setf (packet-data p) v))
p))
(defun create-deviceTaskDataRecord ()
(let ((tk (make-deviceTaskDataRecord)))
(setf (deviceTaskDataRecord-pending tk) noWork)
tk))
(defun create-handlerTaskDataRecord ()
(let ((tk (make-handlerTaskDataRecord)))
(setf (handlerTaskDataRecord-workIn tk) noWork)
(setf (handlerTaskDataRecord-deviceIn tk) noWork)
tk))
(defun deviceInAdd (tk packet)
(setf (handlerTaskDataRecord-deviceIn tk)
(appendHead packet (handlerTaskDataRecord-deviceIn tk)))
tk)
(defun workInAdd (tk packet)
(setf (handlerTaskDataRecord-workIn tk)
(appendHead packet (handlerTaskDataRecord-workIn tk)))
tk)
(defun create-idleTaskDataRecord ()
(let ((tk (make-idleTaskDataRecord)))
(setf (idleTaskDataRecord-control tk) 1)
(setf (idleTaskDataRecord-count tk) 10000)
tk))
(defun create-workerTaskDataRecord ()
(let ((tk (make-workerTaskDataRecord)))
(setf (workerTaskDataRecord-destination tk) handlerA)
(setf (workerTaskDataRecord-count tk) 0)
tk))
;; EOF

File diff suppressed because one or more lines are too long

1
cl-bench/gabriel/README Normal file
View File

@@ -0,0 +1 @@
How to run the Gabriel benchmarks in this directory

File diff suppressed because it is too large Load Diff

90
cl-bench/generate.lisp Normal file
View File

@@ -0,0 +1,90 @@
;;; generate.lisp
;;;
;;; Time-stamp: <2004-01-01 emarsden>
;;
;;
;; Load into a CL implementation that has a working pathname
;; implementation, in order to generate a load-script and a
;; compilation-script, for implementations whose pathname support is
;; suboptimal.
(load #p"defpackage.lisp")
(load #p"support.lisp")
(load #p"tests.lisp")
(in-package :cl-bench)
(with-open-file (run "do-execute-script.lisp"
:direction :output
:if-exists :supersede
:if-does-not-exist :create)
(with-open-file (compile "do-compilation-script.lisp"
:direction :output
:if-exists :supersede
:if-does-not-exist :create)
(format run ";;; auto-generated from file ~S~%" *load-pathname*)
(format compile ";;; auto-generated from file ~S~%" *load-pathname*)
(format run "(IN-PACKAGE :CL-USER)~%")
(format compile "(IN-PACKAGE :CL-USER)~%")
(format run "(LOAD ~S)~%" #p"defpackage.lisp")
(format compile "(LOAD ~S)~%" #p"defpackage.lisp")
(dolist (f '("arrays" "bignum" "boehm-gc"
"clos" "crc40" "deflate" "gabriel"
"hash" "math" "ratios" "richards" "misc")
#+nil (directory (make-pathname :directory '(:relative "files")
:name :wild
:version :wild
:type "olisp")))
(let ((p (make-pathname :directory '(:relative "files")
:name f
:type "olisp")))
(format compile "(COMPILE-FILE ~S)~%" p)
(format run "(LOAD (COMPILE-FILE-PATHNAME ~S))~%" p)))
;; for CL-PPCRE
#+nil
(dolist (name '("packages" "specials" "util" "lexer"
"parser" "regex-class" "convert" "optimize"
"closures" "repetition-closures" "scanner" "api"
"ppcre-tests"))
(format compile "(COMPILE-FILE ~S)~%"
(make-pathname :directory '(:relative "files" "cl-ppcre")
:name name
:type "lisp"))
(format run "(LOAD (COMPILE-FILE-PATHNAME ~S))~%"
(make-pathname :directory '(:relative "files" "cl-ppcre")
:name name)))
(format run "(COMPILE-FILE ~S)~%" #p"support.lisp")
(format run "(LOAD (COMPILE-FILE-PATHNAME ~S))~%" #p"support.lisp")
(format run "(IN-PACKAGE :CL-BENCH)~%")
(format run "(defun run-benchmarks ()
(with-open-file (f (benchmark-report-file)
:direction :output
:if-exists :supersede)
(let ((*benchmark-output* f)
(*print-length* nil)
(*load-verbose* nil)
(*compile-verbose* nil)
(*compile-print* nil))
(bench-report-header)~%")
(dolist (b (reverse *benchmarks*))
(with-slots (setup disabled-for function short runs) b
(when disabled-for
(format run "~%#-~S~%" `(or ,@(benchmark-disabled-for b))))
(format run "(progn~%")
(format run " (format t \"=== running ~A~~%\")~%" b)
(format run " (force-output)~%")
(format run " (bench-gc)~%")
(when setup
(format run " (funcall '~S)~%" setup))
(format run " (bench-report '~S ~S ~S))~%" function short runs)))
(format run "(bench-report-footer))))~%")
(format run "(run-benchmarks)~%")))
#+cmu (ext:quit)
;; EOF

View File

@@ -0,0 +1,85 @@
;;; graph-report.lisp
;;
;; Author: Johannes Grødem <johs@copyleft.no>
;; Time-stamp: <2004-02-29 emarsden>
;;
;;
;; When loaded into CMUCL, this should generate a report comparing the
;; performance of the different CL implementations which have been
;; tested. Reads the /var/tmp/CL-benchmark* files to obtain data from
;; previous runs.
(defparameter *screen-width* 80)
(defun print-bar (scale)
(if (< scale 0)
(princ "n/a")
(let ((chars (* (- *screen-width* 18)
scale)))
(if (< chars 1)
(princ #\%)
(dotimes (i (round chars))
(princ #\#))))))
(defun bench-analysis ()
(let (data implementations benchmarks impl-scores)
(dolist (f (directory "/var/tmp/CL-benchmark*.*"))
(with-open-file (f f :direction :input)
(let ((*read-eval* nil))
(push (read f) data))))
(setf implementations (mapcar #'car data))
(setf impl-scores (make-list (length implementations)
:initial-element 0))
(setf benchmarks (reverse (mapcar #'first (cdr (first data)))))
(dolist (b benchmarks)
(format t "=== ~a~%" b)
(let* ((results
(loop :for i in implementations
:collect (let* ((id (cdr (assoc i data :test #'string=)))
(ir (third (assoc b id :test #'string=))))
(if (numberp ir)
ir
-1))))
(ref (apply #'max results))
(min (apply #'min (remove -1 results))))
(loop :for res in results
:for cnt from 0
:do (let ((i (elt implementations cnt)))
(format t "~&~a~5a (~6,2,0,'XF): "
(if (= res min)
(progn
(incf (elt impl-scores cnt))
#\>)
#\Space)
(subseq i 0 (min (length i) 5))
res)
(print-bar (/ res ref)))))
(terpri)
(terpri))
(format t "~&--- Total wins: ---------------------------------------------------------------~%")
(loop :for impl in implementations
:for score in impl-scores
:do (progn
(format t "~a~5a (~2d): "
(if (= score (apply #'max impl-scores))
#\>
#\Space)
(subseq impl 0 (min (length impl) 5))
score)
(dotimes (i score)
(princ #\#))
(terpri)))
(format t "~&~%===============================================================================~%")
(dolist (impl implementations)
(format t "~&Impl ~a: ~a~%"
(subseq impl 0 (min (length impl) 5))
impl))
(force-output)))
(bench-analysis)
(quit)
;; EOF

221
cl-bench/pdf-report.lisp Normal file
View File

@@ -0,0 +1,221 @@
;;; pdf-report.lisp
;;
;; Author: Eric Marsden <emarsden@laas.fr>
;; Time-stamp: <2004-03-09 emarsden>
;;
;;
;; When loaded into CMUCL, this should generate a report comparing the
;; performance of the different CL implementations which have been
;; tested. Reads the /var/tmp/CL-benchmark* files to obtain data from
;; previous runs. Requires the cl-pdf library.
(in-package :cl-user)
(eval-when (:load-toplevel :execute)
(require :asdf)
(asdf:oos 'asdf:load-op :uffi)
(ext:load-foreign "/usr/lib/libz.so")
(asdf:oos 'asdf:load-op :cl-pdf)
(asdf:oos 'asdf:load-op :cl-typesetting))
(load #p"defpackage.lisp")
(load #p"support.lisp")
(load #p"tests.lisp")
(in-package :cl-bench)
(defun histogram-maker (title label-names results)
(lambda (box x y)
(pdf:in-text-mode
(pdf:move-text x y)
(pdf:set-font (pdf:get-font "Helvetica") 11)
(pdf:show-text title))
(pdf:draw-object
(make-instance 'pdf:histogram
:x x :y (- y 95)
:width 90 :height 90
:label-names label-names
:labels&colors '(("ignore" (0.0 0.0 1.0)))
:series (list results)
:y-axis-options '(:min-value -0.5 :title "seconds")
:background-color '(0.9 0.9 0.9)))))
;; FIXME annotate each benchmark with estimated allocation volume & peak storage requirement
(defun bench-analysis (&optional (filename #p"/tmp/cl-bench.pdf"))
(let (content data groups implementations benchmarks impl-scores impl-labels)
(dolist (f (directory "/var/tmp/CL-benchmark*.*"))
(ignore-errors
(with-open-file (f f :direction :input)
(let ((*read-eval* nil))
(push (read f) data)))))
(setf data (sort data #'string< :key #'car))
(setf implementations (mapcar #'car data))
(loop :for b :in *benchmarks*
:do (pushnew (benchmark-group b) groups))
(setf impl-scores (make-list (length implementations)
:initial-element 0))
(setf impl-labels (loop :for i :from 0 :below (length implementations)
:collect (string (code-char (+ i (char-code #\A))))))
(setf benchmarks (reverse (mapcar #'first (cdr (first data)))))
(let ((*break-on-signals* 'condition)
(header (typeset::compile-text ()
(typeset::paragraph (:h-align :centered
:font "Times-Italic" :font-size 10)
"cl-bench performance benchmarks")
(typeset:hrule :dy 1/2)))
(footer (lambda (pdf:*page*)
(typeset::compile-text (:font "Helvetica" :font-size 9)
(typeset:hrule :dy 1/2)
(typeset::hbox (:align :center :adjustable-p t)
(typeset::put-string "2004-03-09")
:hfill
(typeset::put-string
(format nil "page ~d"
(1+ (position pdf:*page* (typeset::pages pdf:*document*))))))))))
(typeset::with-document (:author "Éric Marsden"
:title "cl-bench performance results")
(dolist (group groups)
(setq content
(typeset::compile-text (:first-line-indent 0)
(typeset:paragraph (:font "Times-Italic" :font-size 16)
(typeset:put-string (concatenate 'string (string group) " group")))
(typeset::vspace 10)
(typeset:hrule :dy 1)
(typeset::vspace 40)
(typeset::paragraph (:first-line-indent 0)
(dolist (bm (remove-if-not (lambda (b) (eql (benchmark-group b) group)) *benchmarks*))
(let* ((bn (benchmark-name bm))
(results (loop :for i :in implementations
:collect (let* ((id (cdr (assoc i data :test #'string=)))
(ir (third (assoc bn id :test #'string=))))
(if (numberp ir) (float ir) -0.02)))))
;; (typeset::hspace 10)
(typeset::user-drawn-box
:inline t :dx 130 :dy 150
:stroke-fn (histogram-maker bn impl-labels results))
(typeset::hspace 20)))
:eop)))
(typeset::draw-pages content :margins '(72 72 72 72) :header nil :footer footer))
(setq content
;; index of implementation names
(typeset::compile-text (:first-line-indent 0)
(typeset::paragraph (:font-size 16) "Implementations")
(typeset::vspace 10)
(typeset::hrule :dy 1)
(typeset::vspace 10)
(dotimes (i (length implementations))
(typeset::paragraph (:font "Times-Roman" :font-size 12)
(typeset::put-string (format nil "~A: ~A~%~%" (nth i impl-labels) (nth i implementations))))
:eol)
:eop :eop))
(typeset::draw-pages content :margins '(72 72 72 72) :header nil :footer footer)
(pdf:write-document filename)))))
;; (defun bench-analysis (&optional (filename #p"/tmp/cl-bench.pdf"))
;; (let (data groups implementations benchmarks impl-scores impl-labels)
;; (dolist (f (directory "/var/tmp/CL-benchmark*.*"))
;; (ignore-errors
;; (with-open-file (f f :direction :input)
;; (let ((*read-eval* nil))
;; (push (read f) data)))))
;; (setf data (sort data #'string< :key #'car))
;; (setf implementations (mapcar #'car data))
;; (loop :for b :in *benchmarks*
;; :do (pushnew (benchmark-group b) groups))
;; (setf impl-scores (make-list (length implementations)
;; :initial-element 0))
;; (setf impl-labels (loop :for i :from 0 :below (length implementations)
;; :collect (string (code-char (+ i (char-code #\A))))))
;; (setf benchmarks (reverse (mapcar #'first (cdr (first data)))))
;;
;; ;; FIXME possibly group graphs one group per page
;; ;;
;; ;; add numbers on bars
;; ;;
;; ;; annotate each benchmark with estimated allocation volume & peak storage requirement
;; (pdf:with-document (:author "Éric Marsden"
;; :title "cl-bench performance results")
;; (let ((helvetica (pdf:get-font "Helvetica"))
;; (helvetica-bold (pdf:get-font "Helvetica-Bold"))
;; (helvetica-oblique (pdf:get-font "Helvetica-Oblique"))
;; (times (pdf:get-font "Times-Roman"))
;; (page 0)
;; (per-page 0)
;; (page-name ""))
;; (dolist (group groups)
;; (incf page)
;; (setf page-name (format nil "page ~d" page))
;; (pdf:with-page ()
;; (pdf:register-page-reference page-name)
;; (pdf:with-outline-level (group page-name)
;; ;; group name
;; (pdf:in-text-mode
;; (pdf:set-font helvetica-bold 16.0)
;; (pdf:move-text 100 700)
;; (pdf:draw-text (string group)))
;; ;; version number
;; (pdf:in-text-mode
;; (pdf:set-font helvetica-oblique 8)
;; (pdf:move-text 10 10)
;; (pdf:draw-text (format nil "cl-bench version ~A" *version*)))
;; (setf per-page 1)
;; (dolist (bm (remove-if-not (lambda (b) (eql (benchmark-group b) group)) *benchmarks*))
;; (format *debug-io* "Group ~A, benchmark ~A~%" group bm)
;; (let* ((bn (benchmark-name bm))
;; (results (loop :for i :in implementations
;; :collect (let* ((id (cdr (assoc i data :test #'string=)))
;; (ir (third (assoc bn id :test #'string=))))
;; (if (numberp ir) (float ir) -0.02))))
;; (ypos (- 640 (* 150 per-page))))
;; ;; with-column-layout
;; ;; test title
;; (incf per-page)
;; `(pdf:in-text-mode
;; (pdf:set-font helvetica-bold 16.0)
;; (pdf:move-text 100 ,ypos)
;; (pdf:draw-text bn))
;; ;; optional extra description
;; ;; (when (and b (benchmark-long b))
;; ;; (pdf:in-text-mode
;; ;; (pdf:set-font helvetica 12)
;; ;; (pdf:move-text 100 520)
;; ;; (pdf:draw-text (benchmark-long b))))
;; ;; y-axis title
;; ;; (pdf:with-saved-state
;; ;; (pdf:translate 65 350)
;; ;; (pdf:rotate 90)
;; ;; (pdf:in-text-mode
;; ;; (pdf:set-font helvetica 10)
;; ;; (pdf:draw-text "seconds")))
;; (pdf:draw-object
;; (make-instance 'pdf:histogram
;; :x 100 :y ypos
;; :width 100 :height 100
;; :label-names impl-labels
;; :labels&colors '(("ignore" (0.0 0.0 1.0)))
;; :series (list results)
;; :y-axis-options '(:min-value -0.5 :title "seconds")
;; :background-color '(0.9 0.9 0.9)))
;; #+nil
;; (pdf:in-text-mode
;; (pdf:move-text 100 250)
;; (pdf:set-font times 12.0)
;; (dotimes (i (length implementations))
;; (pdf:move-text 0 -14)
;; (pdf:draw-text (format nil "~A: ~A"
;; (nth i impl-labels)
;; (nth i implementations)))))))))))
;; (pdf:write-document filename))))
(bench-analysis)
(quit)
;; EOF

View File

@@ -0,0 +1,40 @@
;;; playing with the CPU Performance Counters on CMUCL for Solaris
;;;
;;; Time-stamp: <2004-01-05 emarsden>
;;
;;
;; This will run the performance benchmarks with instrumentation from
;; the CPC library (see <http://www.chez.com/emarsden/downloads/ for
;; cpc.lisp, an FFI interface to Solaris libcpc). It will produce a
;; report in /tmp/cmucl-cpc.txt regarding various microarchitectural
;; measurements: average CPI for each test, proportion of icache
;; misses, number of cycles stalled due to icache misses or due to a
;; load-use dependency.
;;
;; This runs the each test a number of times, one for each observation
;; made, so it takes longer to run than a normal cl-bench run. Each
;; test is executed for a maximum of 3 seconds, using a SIGALRM-driven
;; interrupt. Some of the tests, in particular the CLOS tests, will
;; not work in this case, because their premature abortion means that
;; the following tests don't have all necessary classes defined. I
;; normally interrupt the tests manually at the SEARCH-SEQUENCE test.
#-sparc-v9
(error "Performance counters are only present on an UltraSPARC")
(unless
(ignore-errors (require :cpc))
(error "Can't load CPC subsystem"))
(setq ext:*gc-verbose* nil)
(push :performance-counters *features*)
(load "defpackage")
(compile-file "sysdep/setup-cmucl" :load t)
(load "do-compilation-script")
(load "do-execute-script")
;; EOF

View File

@@ -0,0 +1,181 @@
=== Dual UltraSPARC III @ 700 MHz, 8MB ecache ===
;; COMPILER 2.09 [i: 17.2 34.8 ] [e: 1.1] 6.4
;; LOAD-FASL 1.51 [i: 8.6 20.3 ] [e: 0.8] 11.0
;; PERMUTATIONS 1.44 [i: 7.5 1.3 ] [e: 1.0] 6.8
;; WALK-LIST/SEQ 2.08 [i: 0.4 0.1 ] [e: 9.9] 4.3
;; WALK-LIST/MESS 6.69 [i: 2.2 0.0 ] [e: 27.3] 1.8
;; BOYER 1.44 [i: 10.2 1.1 ] [e: 0.3] 8.5
;; BROWSE 1.48 [i: 6.6 3.5 ] [e: 0.6] 7.5
;; DDERIV 2.42 [i: 9.0 3.2 ] [e: 2.2] 6.0
;; DERIV 2.43 [i: 9.8 3.1 ] [e: 2.2] 6.2
;; DESTRUCTIVE 1.47 [i: 12.1 1.3 ] [e: 0.6] 5.5
;; DIV2-TEST-1 3.40 [i: 2.0 2.5 ] [e: 2.3] 5.2
;; DIV2-TEST-2 2.41 [i: 7.9 2.0 ] [e: 1.2] 5.2
;; FFT 1.26 [i: 2.2 2.0 ] [e: 0.0] 6.2
;; FRPOLY/FIXNUM 1.74 [i: 14.5 1.7 ] [e: 0.4] 9.1
;; FRPOLY/BIGNUM 1.35 [i: 6.9 3.6 ] [e: 0.4] 10.4
;; FRPOLY/FLOAT 1.59 [i: 10.0 1.6 ] [e: 0.3] 13.1
;; PUZZLE 0.94 [i: 1.7 1.7 ] [e: 0.0] 20.5
;; TAK 1.00 [i: 10.7 0.5 ] [e: 0.0] 9.8
;; CTAK 1.29 [i: 0.1 0.5 ] [e: 0.0] 8.1
;; TRTAK 1.04 [i: 0.1 12.8 ] [e: 0.0] 3.6
;; TAKL 1.12 [i: 4.3 0.1 ] [e: 0.0] 3.0
;; STAK 1.25 [i: 5.6 0.2 ] [e: 0.0] 19.9
;; FPRINT/UGLY 1.04 [i: 4.7 3.6 ] [e: 0.0] 15.2
;; FPRINT/PRETTY 1.47 [i: 15.1 29.6 ] [e: 0.1] 10.3
;; TRAVERSE 1.44 [i: 5.6 0.2 ] [e: 0.2] 7.3
;; TRIANGLE 0.96 [i: 4.5 0.8 ] [e: 0.0] 18.2
;; RICHARDS 1.43 [i: 15.3 0.9 ] [e: 0.0] 14.1
;; FACTORIAL 1.91 [i: 6.2 16.0 ] [e: 1.3] 12.2
;; FIB 1.57 [i: 22.7 0.6 ] [e: 0.0] 2.7
;; FIB-RATIO 0.95 [i: 3.6 1.4 ] [e: 0.1] 9.0
;; ACKERMANN 1.73 [i: 9.5 0.3 ] [e: 0.1] 4.4
;; MANDELBROT 1.41 [i: 6.5 13.8 ] [e: 0.6] 7.4
;; MRG32K3A 0.92 [i: 0.0 0.0 ] [e: 0.0] 13.1
;; CRC40 1.08 [i: 6.1 1.6 ] [e: 0.7] 9.0
;; BIGNUM/ELEM-100-1000 0.96 [i: 3.4 1.6 ] [e: 0.1] 9.8
;; BIGNUM/ELEM-1000-100 0.89 [i: 0.8 0.3 ] [e: 0.0] 11.4
;; BIGNUM/ELEM-10000-1 0.95 [i: 0.1 0.1 ] [e: 0.0] 14.1
;; BIGNUM/PARI-100-10 0.84 [i: 2.4 0.6 ] [e: 0.0] 10.4
;; BIGNUM/PARI-200-5 0.81 [i: 1.0 0.3 ] [e: 0.0] 11.4
;; PI-DECIMAL/SMALL 0.87 [i: 1.9 1.1 ] [e: 0.1] 10.7
;; PI-DECIMAL/BIG 0.85 [i: 1.2 0.6 ] [e: 0.1] 11.5
;; PI-ATAN 1.55 [i: 3.6 6.0 ] [e: 1.2] 8.7
;; PI-RATIOS 0.93 [i: 3.4 2.8 ] [e: 0.1] 9.4
;; SLURP-LINES 1.54 [i: 8.6 23.4 ] [e: 0.7] 10.5
;; HASH-STRINGS 1.01 [i: 5.0 0.7 ] [e: 0.2] 12.6
;; HASH-INTEGERS 1.14 [i: 4.5 1.1 ] [e: 0.5] 10.6
;; BOEHM-GC 1.88 [i: 10.9 2.4 ] [e: 2.8] 4.6
;; DEFLATE-FILE 1.00 [i: 2.9 2.4 ] [e: 0.1] 14.7
;; 1D-ARRAYS 0.95 [i: 5.5 0.9 ] [e: 0.1] 18.3
;; 2D-ARRAYS 1.07 [i: 2.5 0.1 ] [e: 0.4] 22.2
;; 3D-ARRAYS 0.89 [i: 3.5 0.2 ] [e: 0.2] 22.7
;; BITVECTORS 1.27 [i: 0.1 0.5 ] [e: 1.5] 8.2
;; BENCH-STRINGS 1.26 [i: 13.4 0.3 ] [e: 0.1] 10.9
;; fill-strings/adjustable 0.90 [i: 6.3 0.3 ] [e: 0.0] 16.8
;; STRING-CONCAT 1.05 [i: 6.0 0.6 ] [e: 0.2] 10.7
;; SEARCH-SEQUENCE 1.18 [i: 8.2 1.5 ] [e: 0.3] 13.1
=== Dual UltraSPARC IIIi @ 1GHz, 1MB ecache ===
;; COMPILER 2.14 [i: 14.5 38.8 ] [e: 1.7] 5.8
;; LOAD-FASL 1.55 [i: 7.6 18.4 ] [e: 0.8] 11.8
;; PERMUTATIONS 1.30 [i: 7.2 1.2 ] [e: 1.0] 6.9
;; WALK-LIST/SEQ 2.24 [i: 0.0 0.0 ] [e: 12.5] 3.8
;; WALK-LIST/MESS 11.77 [i: 2.1 0.0 ] [e: 67.4] 0.8
;; BOYER 1.43 [i: 9.5 0.9 ] [e: 0.3] 8.5
;; BROWSE 1.53 [i: 4.5 2.8 ] [e: 0.6] 8.0
;; DDERIV 2.62 [i: 7.4 2.3 ] [e: 2.2] 5.4
;; DERIV 2.68 [i: 8.5 2.5 ] [e: 2.2] 5.7
;; DESTRUCTIVE 1.51 [i: 12.9 0.9 ] [e: 0.6] 5.1
;; DIV2-TEST-1 3.60 [i: 1.9 2.0 ] [e: 2.3] 4.8
;; DIV2-TEST-2 2.47 [i: 7.4 1.6 ] [e: 1.2] 5.0
;; FFT 1.22 [i: 1.2 0.7 ] [e: 0.0] 6.4
;; FRPOLY/FIXNUM 1.72 [i: 13.0 1.5 ] [e: 0.4] 8.7
;; FRPOLY/BIGNUM 1.38 [i: 6.7 3.3 ] [e: 0.4] 10.1
;; FRPOLY/FLOAT 1.59 [i: 9.2 1.3 ] [e: 0.3] 13.1
;; PUZZLE 1.01 [i: 2.2 1.6 ] [e: 0.0] 20.4
;; TAK 0.98 [i: 6.0 0.4 ] [e: 0.0] 3.5
;; CTAK 1.24 [i: 7.6 0.5 ] [e: 0.0] 8.1
;; TRTAK 0.99 [i: 7.8 2.6 ] [e: 0.0] 5.6
;; TAKL 1.07 [i: 3.8 0.1 ] [e: 0.0] 4.3
;; STAK 1.37 [i: 4.8 0.1 ] [e: 0.0] 18.7
;; FPRINT/UGLY 1.03 [i: 5.2 2.7 ] [e: 0.0] 15.4
;; FPRINT/PRETTY 1.37 [i: 11.5 26.1 ] [e: 0.1] 10.8
;; TRAVERSE 1.38 [i: 5.3 0.1 ] [e: 0.0] 7.2
;; TRIANGLE 0.98 [i: 5.2 0.0 ] [e: 0.0] 17.9
;; RICHARDS 1.51 [i: 13.0 0.7 ] [e: 0.0] 14.9
;; FACTORIAL 1.85 [i: 8.3 12.2 ] [e: 1.3] 13.2
;; FIB 1.59 [i: 18.6 0.5 ] [e: 0.0] 2.5
;; FIB-RATIO 0.97 [i: 5.6 1.2 ] [e: 0.2] 9.1
;; ACKERMANN 1.73 [i: 17.8 0.9 ] [e: 0.0] 1.4
;; MANDELBROT 1.37 [i: 9.3 11.4 ] [e: 0.6] 6.7
;; MRG32K3A 0.90 [i: 0.0 0.0 ] [e: 0.0] 13.3
;; CRC40 1.08 [i: 5.7 1.3 ] [e: 0.7] 9.1
;; BIGNUM/ELEM-100-1000 0.97 [i: 3.5 1.4 ] [e: 0.2] 9.7
;; BIGNUM/ELEM-1000-100 0.88 [i: 0.8 0.3 ] [e: 0.0] 11.4
;; BIGNUM/ELEM-10000-1 0.93 [i: 0.1 0.0 ] [e: 0.0] 13.6
;; BIGNUM/PARI-100-10 0.84 [i: 2.6 0.5 ] [e: 0.0] 10.3
;; BIGNUM/PARI-200-5 0.81 [i: 1.1 0.2 ] [e: 0.0] 11.3
;; PI-DECIMAL/SMALL 0.87 [i: 2.4 1.0 ] [e: 0.1] 10.6
;; PI-DECIMAL/BIG 0.85 [i: 1.2 0.4 ] [e: 0.0] 11.7
;; PI-ATAN 1.51 [i: 3.8 5.1 ] [e: 1.1] 8.8
;; PI-RATIOS 0.94 [i: 4.4 2.4 ] [e: 0.1] 9.1
;; SLURP-LINES 1.37 [i: 7.7 14.4 ] [e: 0.2] 11.8
;; HASH-STRINGS 1.00 [i: 5.1 0.8 ] [e: 0.2] 12.1
;; HASH-INTEGERS 1.19 [i: 1.8 0.9 ] [e: 0.8] 9.8
;; BOEHM-GC 1.89 [i: 9.5 1.8 ] [e: 2.8] 4.7
;; DEFLATE-FILE 1.02 [i: 5.1 1.6 ] [e: 0.1] 14.7
;; 1D-ARRAYS 0.96 [i: 5.7 0.7 ] [e: 0.0] 19.0
;; 2D-ARRAYS 1.10 [i: 2.6 0.0 ] [e: 0.5] 20.9
;; 3D-ARRAYS 1.02 [i: 3.1 0.0 ] [e: 0.3] 19.5
;; BITVECTORS 1.46 [i: 0.1 0.5 ] [e: 1.6] 7.1
;; BENCH-STRINGS 1.29 [i: 11.9 0.0 ] [e: 0.2] 10.8
;; fill-strings/adjustable 0.90 [i: 6.0 0.6 ] [e: 0.0] 16.9
;; STRING-CONCAT 0.93 [i: 6.2 0.4 ] [e: 0.2] 11.5
;; SEARCH-SEQUENCE 1.17 [i: 5.9 1.2 ] [e: 0.3] 13.7
=== UltraSPARC-IIe @ 500 MHz (Blade 100) ===
;; COMPILER 2.79 [i: 15.6 37.4 ] [e: 59.2] 21.1
;; LOAD-FASL 1.89 [i: 9.3 24.0 ] [e: 46.2] 23.8
;; PERMUTATIONS 1.19 [i: 0.7 1.8 ] [e: 19.0] 25.2
;; WALK-LIST/SEQ 1.65 [i: 0.0 0.1 ] [e: 57.2] 69.6
;; WALK-LIST/MESS 6.18 [i: 0.0 2.5 ] [e: 91.3] 84.7
;; BOYER 1.49 [i: 3.1 6.6 ] [e: 36.1] 22.0
;; BROWSE 1.34 [i: 2.2 5.7 ] [e: 30.5] 25.6
;; DDERIV 1.76 [i: 1.5 4.0 ] [e: 31.0] 30.8
;; DERIV 1.79 [i: 1.5 3.8 ] [e: 25.9] 24.2
;; DESTRUCTIVE 1.19 [i: 0.3 1.1 ] [e: 17.2] 11.9
;; DIV2-TEST-1 1.98 [i: 1.6 4.1 ] [e: 40.8] 55.9
;; DIV2-TEST-2 1.60 [i: 0.9 2.9 ] [e: 28.1] 43.1
;; FFT 1.12 [i: 0.1 0.3 ] [e: 9.0] 14.1
;; FRPOLY/FIXNUM 1.56 [i: 0.9 2.4 ] [e: 22.0] 18.1
;; FRPOLY/BIGNUM 1.55 [i: 6.6 12.4 ] [e: 43.2] 15.8
;; FRPOLY/FLOAT 1.40 [i: 1.8 4.0 ] [e: 30.8] 23.2
;; PUZZLE 0.96 [i: 0.4 1.8 ] [e: 22.5] 23.2
;; TAK 1.17 [i: 0.1 0.5 ] [e: 1.4] 20.5
;; CTAK 1.19 [i: 0.2 0.7 ] [e: 1.1] 18.7
;; TRTAK 1.17 [i: 0.1 0.4 ] [e: 1.6] 18.4
;; TAKL 0.79 [i: 0.0 0.1 ] [e: 0.9] 2.7
;; STAK 1.10 [i: 0.1 0.2 ] [e: 0.2] 20.9
;; FPRINT/UGLY 1.54 [i: 10.2 18.4 ] [e: 59.8] 15.8
;; FPRINT/PRETTY 1.97 [i: 20.4 30.9 ] [e: 65.3] 13.2
;; TRAVERSE 1.69 [i: 0.0 0.2 ] [e: 19.1] 52.4
;; TRIANGLE 0.84 [i: 0.1 0.3 ] [e: 0.9] 16.9
;; RICHARDS 1.55 [i: 0.0 0.0 ] [e: 1.0] 19.3
;; FACTORIAL 2.07 [i: 10.4 15.4 ] [e: 54.4] 9.6
;; FIB 1.32 [i: 0.0 0.1 ] [e: 0.3] 0.9
;; FIB-RATIO 1.02 [i: 1.0 3.3 ] [e: 12.4] 11.7
;; ACKERMANN 1.38 [i: 0.0 0.0 ] [e: 24.5] 15.2
;; MANDELBROT 1.66 [i: 12.1 24.6 ] [e: 54.3] 12.5
;; MRG32K3A 0.87 [i: 0.0 0.0 ] [e: 0.1] 23.3
;; CRC40 1.10 [i: 2.3 5.6 ] [e: 25.0] 13.6
;; BIGNUM/ELEM-100-1000 0.95 [i: 0.6 2.2 ] [e: 9.3] 16.7
;; BIGNUM/ELEM-1000-100 0.87 [i: 0.1 0.3 ] [e: 2.8] 25.8
;; BIGNUM/ELEM-10000-1 1.02 [i: 0.0 0.1 ] [e: 10.3] 27.0
;; BIGNUM/PARI-100-10 0.82 [i: 0.1 0.5 ] [e: 2.0] 20.1
;; BIGNUM/PARI-200-5 0.76 [i: 0.0 0.2 ] [e: 1.1] 26.2
;; PI-DECIMAL/SMALL 0.84 [i: 0.2 1.1 ] [e: 3.9] 20.6
;; PI-DECIMAL/BIG 0.81 [i: 0.1 0.6 ] [e: 2.5] 24.4
;; PI-ATAN 1.52 [i: 3.2 6.9 ] [e: 27.1] 15.2
;; PI-RATIOS 0.95 [i: 0.8 3.1 ] [e: 12.1] 16.1
;; SLURP-LINES 1.47 [i: 10.5 20.7 ] [e: 54.5] 12.0
;; HASH-STRINGS 1.07 [i: 1.2 3.2 ] [e: 29.3] 17.1
;; HASH-INTEGERS 1.27 [i: 1.0 1.6 ] [e: 23.3] 26.3
;; BOEHM-GC 1.48 [i: 0.8 2.9 ] [e: 24.6] 17.2
;; DEFLATE-FILE 0.97 [i: 0.2 1.9 ] [e: 10.3] 15.9
;; 1D-ARRAYS 0.97 [i: 0.0 0.1 ] [e: 2.5] 18.6
;; 2D-ARRAYS 1.12 [i: 0.0 0.1 ] [e: 6.4] 28.5
;; 3D-ARRAYS 0.82 [i: 0.0 0.1 ] [e: 1.1] 20.3
;; BITVECTORS 3.35 [i: 0.2 0.5 ] [e: 42.9] 84.1
;; BENCH-STRINGS 1.17 [i: 0.0 0.0 ] [e: 5.5] 11.3
;; fill-strings/adjustable 0.92 [i: 0.0 0.1 ] [e: 2.8] 17.0
;; STRING-CONCAT 1.19 [i: 4.1 8.1 ] [e: 40.1] 17.1
;; SEARCH-SEQUENCE 1.07 [i: 0.5 1.2 ] [e: 15.5] 19.7

203
cl-bench/play/ddot.txt Normal file
View File

@@ -0,0 +1,203 @@
Path: news.laas.fr!news.cict.fr!cines.fr!ciril.fr!deine.net!newsfeed00.sul.t-online.de!newsfeed01.sul.t-online.de!t-online.de!news.belwue.de!news.uni-stuttgart.de!news.urz.uni-heidelberg.de!not-for-mail
From: Nicolas Neuss <Nicolas.Neuss@iwr.uni-heidelberg.de>
Newsgroups: comp.lang.lisp
Subject: Floating-point performance of Lisp compared to C
Date: 05 Jul 2002 12:22:35 +0200
Organization: IWR
Lines: 187
Message-ID: <87hejefp90.fsf@ortler.iwr.uni-heidelberg.de>
NNTP-Posting-Host: ortler.iwr.uni-heidelberg.de
X-Trace: news.urz.uni-heidelberg.de 1025864555 4971 129.206.120.136 (5 Jul 2002 10:22:35 GMT)
X-Complaints-To: usenet@news.urz.uni-heidelberg.de
NNTP-Posting-Date: 5 Jul 2002 10:22:35 GMT
X-Newsreader: Gnus v5.7/Emacs 20.7
Xref: news.laas.fr comp.lang.lisp:80424
Hello, Lispers.
In spite of Erik's nice signature I have chosen for this message, too,
I'm still interested in low-level performance of my programs. In my
case (I'm doing numerical analysis for partial differential
equations), it is especially the floating point performance which
matters. I'm using CMUCL and it doesn't perform badly in comparison
with C, at least on my computer (some of you will remember that they
helped me with my first steps in CL exactly at this problem).
Now, what I would like to have is some more data, about how Lisp
implementations run this program. Especially, I would be interested
with CMUCL on SUN workstations, ACL, Lispworks, ... on X86 and other
architectures. If someone would like to test it, please go ahead.
I'm very interested in the results. Please always report the results
for the C program
Nicolas.
P.S.: The demo versions for commercial Lisps will probably not
allocate the memory needed by the program. Also: don't be too
disappointed if your Lisp does not perform very well. Floating-point
performance ist not of highest importance for most of applications.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;; mflop.lisp
;;;; (C) Nicolas Neuss (Nicolas.Neuss@iwr.uni-heidelberg.de)
;;;; mflop.lisp is in the public domain.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defconstant +N-long+ #x100000) ; does not fit in secondary cache
(defconstant +N-short+ #x100) ; fits in primary cache
(defparameter *mflop-delta* 5.0
"Time interval in seconds over which we measure performance.")
(defun make-double-float-array (size &optional (initial 0.0d0))
(make-array size :element-type 'double-float :initial-element initial))
(defun ddot (x y n)
(declare (type fixnum n)
(type (simple-array double-float (*)) x y))
(declare (optimize (safety 0) (space 0) (debug 0) (speed 3)))
(loop for i fixnum from 0 below n
summing (* (aref x i) (aref y i)) double-float))
(defun daxpy (x y n)
(declare (type fixnum n)
(type (simple-array double-float (*)) x y))
(declare (optimize (safety 0) (space 0) (debug 0) (speed 3)))
(loop with s double-float = 0.3d0
for i from 0 below n do
(setf (aref x i) (+ (* s (aref y i))))))
(defun test (fn size)
(let ((x (make-double-float-array +N-long+))
(y (make-double-float-array +N-long+)))
(format
t "~A-~A: ~$ MFLOPS~%"
fn
(if (= size +N-long+) "long" "short")
(loop with after = 0
for before = (get-internal-run-time) then after
and count = 1 then (* count 2)
do
(loop repeat count do (funcall fn x y size))
(setq after (get-internal-run-time))
(when (> (/ (- after before) internal-time-units-per-second)
*mflop-delta*)
(return (/ (* 2 size count internal-time-units-per-second)
(* 1e6 (- after before)))))))))
(defun mflop-test ()
"Returns several numbers characteristic for floating point efficiency of
your CL implementation. Please compare these numbers to those obtained by
the C version in mflop.c."
(test 'ddot +N-long+)
(test 'ddot +N-short+)
(test 'daxpy +N-long+)
(test 'daxpy +N-short+))
#+ignore (mflop-test)
/**********************************************************************
mflop.c -- performance testing
(C) Nicolas Neuss (Nicolas.Neuss@iwr.uni-heidelberg.de)
mflop.c is public domain.
**********************************************************************/
/* Reasonable compilation lines are
Linux: gcc -O3 mflop.c
IRIS Octane: cc -Ofast mflop.c
Sparc Ultra II: cc -fast mflop.c
IBM RS6000/590: xlc -O3 -qarch=pwrx -qtune=pwrx mflop.c */
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#define MFLOP_DELTA 5.0 /* time interval over which we measure performance */
#define Nlong 1000000 /* does not fit in secondary cache */
#define Nshort 256 /* fits in primary cache */
#define CURRENT_TIME (((double)clock()) / ((double)CLOCKS_PER_SEC))
double ddot (double *x, double *y, int n) {
int j;
double s = 0.0;
for (j=0; j<n; j++)
s += x[j]*y[j];
return s;
}
double daxpy (double *x, double *y, int n) {
int j;
double s = 0.1;
for (j=0; j<n; j++)
y[j] += s*x[j];
return 0.0;
}
typedef double testfun (double *, double *, int n);
void test (testfun f, char *name, int n) {
int i, nr;
double start_time, end_time;
double s = 0.0;
double *x = (double *) malloc(sizeof(double)*Nlong);
double *y = (double *) malloc(sizeof(double)*Nlong);
for (i=0; i<Nlong; i++)
x[i] = 0.0; y[i] = 0.9;
nr = 1;
do {
nr = 2*nr;
start_time = CURRENT_TIME;
for (i=0; i<nr; i++)
s += f(x, y, n);
end_time = CURRENT_TIME;
} while (end_time-start_time<MFLOP_DELTA);
printf ("%s%s %4.2f MFLOPS\n", name, ((n==Nlong) ? "-long":"-short"),
1.0e-6*2*n*(s+nr/(end_time-start_time)));
}
int main (void) {
test(ddot, "ddot", Nlong);
test(ddot, "ddot", Nshort);
test(daxpy, "daxpy", Nlong);
test(daxpy, "daxpy", Nshort);
return 0;
}
Sample results for my Toshiba TECRA 8000 Laptop:
CMUCL:
* ;;; Evaluate mflop-test
DDOT-long: 42.01 MFLOPS
DDOT-short: 108.90 MFLOPS
DAXPY-long: 23.46 MFLOPS
DAXPY-short: 136.26 MFLOPS
NIL
gcc -O3 mflop-neu.c; a.out
ddot-long 62.75 MFLOPS
ddot-short 178.36 MFLOPS
daxpy-long 22.82 MFLOPS
daxpy-short 119.70 MFLOPS
Speed disadvantage of CMUCL:
ddot-long: 1.7
ddot-short: 0.61
daxpy-long: 1.0
daxpy-short: 0.9
--
Performance is the last refuge of the miserable programmer.
-- Erik Naggum

61
cl-bench/report.lisp Normal file
View File

@@ -0,0 +1,61 @@
;;; report.lisp
;;
;; Author: Eric Marsden <emarsden@laas.fr>
;; Time-stamp: <2004-02-29 emarsden>
;;
;;
;; When loaded into CMUCL, this should generate a report comparing the
;; performance of the different CL implementations which have been
;; tested. Reads the /var/tmp/CL-benchmark* files to obtain data from
;; previous runs.
;;
;; FIXME could create graphical version using ploticus
(in-package :cl-user)
(defconstant +implementation+
(concatenate 'string
(lisp-implementation-type) " "
(lisp-implementation-version)))
(defun bench-analysis ()
(let (data implementations benchmarks)
(dolist (f (directory "/var/tmp/CL-benchmark*.*"))
(ignore-errors
(with-open-file (f f :direction :input)
(let ((*read-eval* nil))
(push (read f) data)))))
(setq implementations (delete +implementation+ (mapcar #'car data) :test #'string=))
(setq benchmarks (reverse (mapcar #'first (cdr (first data)))))
(format t "~25a~10@a" "Benchmark" "Reference")
(dolist (impl implementations)
(format t "~7@a" (subseq impl 0 5)))
(format t "~%-------------------------------------------------------------------------------------~%")
(dolist (b benchmarks)
(format t "~&~25a" b)
(let* ((reference-data (cdr (assoc +implementation+ data :test #'string=)))
(reference-user (third (assoc b reference-data :test #'string=))))
;; user time spent by reference implementation, in seconds
(format t "[~10,2f]" reference-user)
(dolist (i implementations)
(let* ((id (cdr (assoc i data :test #'string=)))
(ir (third (assoc b id :test #'string=))))
(format t "~7,2f" (handler-case (/ ir reference-user) (error () -1)))))))
(terpri)
(format t "Reference time in first column is in seconds; other columns are relative~%")
(format t "Reference implementation: ~a~%" +implementation+)
(dolist (impl implementations)
(format t "~&Impl ~a: ~a~%" (subseq impl 0 5) impl))
(format t "=== Test machine ===~%")
(format t " Machine-type: ~A~%" (machine-type))
(format t " Machine-version: ~A~%" (machine-version))
#+cmu
(run-program "uname" '("-a") :output t)
(terpri)
(force-output)))
(bench-analysis)
(quit)
;; EOF

10
cl-bench/run-acl.sh Normal file
View File

@@ -0,0 +1,10 @@
#!/bin/bash
#
# contributed by Kevin Layer
ACL=${ACL:-"alisp -qq"}
make clean optimize-files
$ACL -L sysdep/setup-acl.lisp -L do-compilation-script.lisp -kill
$ACL -L sysdep/setup-acl.lisp -L do-execute-script.lisp -kill

18
cl-bench/run-armedbear.sh Normal file
View File

@@ -0,0 +1,18 @@
#!/bin/bash
ABLISP=${ABLISP:-ablisp}
HERE=`pwd`
make clean optimize-files
$ABLISP <<EOF
:cd $HERE
(load "sysdep/setup-ablisp.lisp")
(load "do-compilation-script.lisp")
:exit
EOF
$ABLISP <<EOF
:cd $HERE
(load "sysdep/setup-ablisp.lisp")
(load "do-execute-script")
:exit
EOF

9
cl-bench/run-clisp.sh Normal file
View File

@@ -0,0 +1,9 @@
#!/bin/bash
CLISP_OPT="-q -norc -ansi -m 200MB -E iso-8859-1"
CLISP=${CLISP:-"clisp"}
make clean optimize-files
ulimit -s 8192
${CLISP} ${CLISP_OPT} -i sysdep/setup-clisp.lisp -i do-compilation-script.lisp -x '(quit)'
${CLISP} ${CLISP_OPT} -i sysdep/setup-clisp.lisp -i do-execute-script.lisp -x '(quit)'

7
cl-bench/run-cmucl.sh Normal file
View File

@@ -0,0 +1,7 @@
#!/bin/bash
CMUCL=${CMUCL:-"cmucl-latest"}
make clean optimize-files
${CMUCL} -noinit -load sysdep/setup-cmucl -load do-compilation-script -eval '(ext:quit)'
${CMUCL} -noinit -load sysdep/setup-cmucl -load do-execute-script -eval '(ext:quit)'

15
cl-bench/run-ecl.sh Normal file
View File

@@ -0,0 +1,15 @@
#!/bin/sh
ECL=ecl
${ECL} <<EOF
(load "sysdep/setup-ecl.lisp")
(load "do-compilation-script.lisp")
(quit)
EOF
${ECL} <<EOF
(load "sysdep/setup-ecl.lisp")
(load "do-execute-script.lisp")
(quit)
EOF

12
cl-bench/run-ecli.sh Normal file
View File

@@ -0,0 +1,12 @@
#!/bin/sh
#
# ECL in interpreted mode
ECL=ecl
make clean optimize-files
${ECL} <<EOF
(load "sysdep/setup-ecl.lisp")
(load "do-interpret-script.lisp")
(quit)
EOF

9
cl-bench/run-gcl.sh Normal file
View File

@@ -0,0 +1,9 @@
#!/bin/sh
GCL_ANSI=1
export GCL_ANSI
GCL=${GCL:-gclcvs}
make clean optimize-files
${GCL} -load sysdep/setup-gcl.lisp -load do-compilation-script.lisp -eval '(quit)'
${GCL} -load sysdep/setup-gcl.lisp -load do-execute-script.lisp -eval '(quit)'

10
cl-bench/run-gcli.sh Normal file
View File

@@ -0,0 +1,10 @@
#!/bin/sh
# GCL in interpreted mode
GCL_ANSI=1
export GCL_ANSI
GCL=${GCL:-gclcvs}
make clean optimize-files
${GCL} -load sysdep/setup-gcl.lisp -load do-interpret-script.lisp -eval '(quit)'

16
cl-bench/run-lisp500.sh Normal file
View File

@@ -0,0 +1,16 @@
#!/bin/sh
L500=lisp500
make clean optimize-files
${L500} <<EOF
(load "sysdep/setup-lisp500.lisp")
(load "do-compilation-script.lisp")
(quit)
EOF
${L500} <<EOF
(load "sysdep/setup-lisp500.lisp")
(load "do-execute-script.lisp")
(quit)
EOF

8
cl-bench/run-openmcl.sh Normal file
View File

@@ -0,0 +1,8 @@
#!/bin/sh
OPENMCL=${OPENMCL:-openmcl}
OPENMCL_OPT="--batch --no-init"
make clean optimize-files
${OPENMCL} ${OPENMCL_OPT} --load sysdep/setup-openmcl.lisp --load do-compilation-script.lisp --eval '(ccl:quit)'
${OPENMCL} ${OPENMCL_OPT} --load sysdep/setup-openmcl.lisp --load do-execute-script.lisp --eval '(ccl:quit)'

14
cl-bench/run-poplog.sh Normal file
View File

@@ -0,0 +1,14 @@
#!/bin/bash
#
#
# See <URL:http://www.poplog.org/>. Poplog compiles while loading, so
# there is no need to load the compilation-script.
POPLOG=${POPLOG:-"poplog"}
ulimit -s 8192
make clean optimize-files
$POPLOG <<EOF
(load "sysdep/setup-poplog.lisp")
(load "do-execute-script.lisp")
EOF

8
cl-bench/run-sbcl.sh Normal file
View File

@@ -0,0 +1,8 @@
#!/bin/bash
SBCL=${SBCL:-sbcl}
SBCL_OPT="--disable-debugger --userinit /dev/null"
make clean optimize-files
$SBCL ${SBCL_OPT} --load sysdep/setup-sbcl.lisp --load do-compilation-script.lisp --eval '(quit)'
$SBCL ${SBCL_OPT} --load sysdep/setup-sbcl.lisp --load do-execute-script.lisp --eval '(quit)'

185
cl-bench/support.lisp Normal file
View File

@@ -0,0 +1,185 @@
;;; support.lisp --- performance benchmarks for Common Lisp implementations
;;
;; Author: Eric Marsden <emarsden@laas.fr>
;; Time-stamp: <2004-08-01 emarsden>
;;
;;
;; The benchmarks consist of
;;
;; - the Gabriel benchmarks
;; - some mathematical operations (factorial, fibonnaci, CRC)
;; - some bignum-intensive operations
;; - hashtable and READ-LINE tests
;; - CLOS tests
;; - array, string and bitvector exercises
;;
(in-package :cl-bench)
(defvar *version* "20040801")
(defvar *benchmarks* '())
(defvar *benchmark-results* '())
(defvar +implementation+
(concatenate 'string
(lisp-implementation-type) " "
(lisp-implementation-version)))
(defclass benchmark ()
((name :accessor benchmark-name
:initarg :name)
(short :accessor benchmark-short
:initarg :short
:type string)
(long :accessor benchmark-long
:initarg :long
:initform nil
:type string)
(group :accessor benchmark-group
:initarg :group)
(runs :accessor benchmark-runs
:initarg :runs
:initform 1
:type integer)
(disabled-for :accessor benchmark-disabled-for
:initarg :disabled-for
:initform nil)
(setup :initarg :setup
:initform nil)
(function :initarg :function
:accessor benchmark-function)))
(defmethod print-object ((self benchmark) stream)
(print-unreadable-object (self stream :type nil)
(format stream "benchmark ~a for ~d runs"
(benchmark-short self)
(benchmark-runs self))))
(defmethod initialize-instance :after ((self benchmark)
&rest initargs
&key &allow-other-keys)
(declare (ignore initargs))
(unless (slot-boundp self 'short)
(setf (benchmark-short self) (string (benchmark-name self))))
self)
;; (setf (benchmark-function self)
;; (compile nil `(lambda ()
;; (dotimes (i ,(benchmark-runs self))
;; `(funcall ',(benchmark-function ,self))))))
(defmacro defbench (fun &rest args)
`(push (make-instance 'benchmark :name ',fun ,@args)
*benchmarks*))
(defvar *benchmark-output*)
(defvar *current-test*)
(defmacro with-bench-output (&body body)
`(with-open-file (f (benchmark-report-file)
:direction :output
:if-exists :supersede)
(let ((*benchmark-output* f)
(*load-verbose* nil)
(*print-length* nil)
(*compile-verbose* nil)
(*compile-print* nil))
(bench-report-header)
(progn ,@body)
(bench-report-footer))))
(defun bench-run ()
(with-open-file (f (benchmark-report-file)
:direction :output
:if-exists :supersede)
(let ((*benchmark-output* f)
(*print-length*)
(*load-verbose* nil)
(*compile-verbose* nil)
(*compile-print* nil))
(bench-report-header)
(dolist (b (reverse *benchmarks*))
(bench-gc)
(with-spawned-thread
(with-slots (setup function short runs) b
(when setup (funcall setup))
(format t "~&=== running ~a~%" b)
(bench-report function short runs))))
(bench-report-footer))))
(defun benchmark-report-file ()
(multiple-value-bind (second minute hour date month year)
(get-decoded-time)
(declare (ignore second))
(format nil "~aCL-benchmark-~d~2,'0d~2,'0dT~2,'0d~2,'0d"
#+win32 "" #-win32 "/var/tmp/"
year month date hour minute)))
;; grr, CLISP doesn't implement ~<..~:>
;; CormanLisp bug:
;;; An error occurred in function FORMAT:
;;; Error: Invalid format directive : character #\< in control string ";; -*- lisp -*- ~a~%;;~%;; Implementation *features*:~%~@<;; ~@;~s~:>~%;;~%"
;;; Entering Corman Lisp debug loop.
(defun bench-report-header ()
(format *benchmark-output*
#-(or clisp ecl gcl cormanlisp) ";; -*- lisp -*- ~a~%;;~%;; Implementation *features*:~%~@<;; ~@;~s~:>~%;;~%"
#+(or clisp ecl gcl cormanlisp) ";; -*- lisp -*- ~a~%;; Implementation *features*: ~s~%;;~%"
+implementation+ *features*)
(format *benchmark-output*
";; Function real user sys consed~%")
(format *benchmark-output*
";; ----------------------------------------------------------------~%"))
(defun bench-report-footer ()
(format *benchmark-output* "~%~s~%"
(cons +implementation+ *benchmark-results*)))
;; generate a report to *benchmark-output* on the calling of FUNCTION
(defun bench-report (function name times)
(multiple-value-bind (real user sys consed)
(bench-time function times name)
(format *benchmark-output*
#-armedbear ";; ~25a ~8,2f ~8,2f ~8,2f ~12d"
#+armedbear ";; ~a ~f ~f ~f ~d"
name real user sys consed)
(terpri *benchmark-output*)
(force-output *benchmark-output*)
(push (cons name (list real user sys consed))
*benchmark-results*)))
;; a generic timing function, that depends on GET-INTERNAL-RUN-TIME
;; and GET-INTERNAL-REAL-TIME returning sensible results. If a version
;; was defined in sysdep/setup-<impl>, we use that instead
(defun generic-bench-time (fun times name)
(declare (ignore name))
(let (before-real after-real before-user after-user)
(setq before-user (get-internal-run-time))
(setq before-real (get-internal-real-time))
(dotimes (i times)
(funcall fun))
(setq after-user (get-internal-run-time))
(setq after-real (get-internal-real-time))
;; return real user sys consed
(values (/ (- after-real before-real) internal-time-units-per-second)
(/ (- after-user before-user) internal-time-units-per-second)
0 0)))
(eval-when (:load-toplevel :execute)
(unless (fboundp 'bench-time)
;; GCL as of 20040628 does not implement (setf fdefinition)
#-gcl (setf (fdefinition 'bench-time) #'generic-bench-time)
#+gcl (defun bench-time (fun times name) (generic-bench-time fun times name))))
;; EOF

View File

@@ -0,0 +1,30 @@
;;; setup file for running cl-bench in ArmedBear Lisp
(load "defpackage")
(in-package :cl-bench)
(defun bench-gc () (ext:gc))
(defmacro with-spawned-thread (&body body)
`(progn ,@body))
;; as of 2003-12-05, this is not quite ready yet. In support.lisp we compile
;; various ABL internal packages and the cl-bench packages using
;; JVM::JVM-COMPILE-PACKAGE
#+nil (setq jvm:*auto-compile* t)
(eval-when (:load-toplevel :execute)
(format *debug-io* "Loading JVM compiler ...~%")
(load "/opt/src/cvs-armedbear/j/src/org/armedbear/lisp/jvm.lisp")
(dolist (p '("CL" "SYS" "EXT" "PRECOMPILER"))
(jvm::jvm-compile-package p))
(format *debug-io* "Compiling all cl-bench packages ...~%")
(dolist (p (list-all-packages))
(when (eql 0 (search "CL-BENCH" (package-name p)))
(jvm::jvm-compile-package p))))
;; EOF

View File

@@ -0,0 +1,43 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; setup file for cl-bench running in ACL 6.2
;;;
;;; contributed by Kevin Layer
(eval-when (compile eval load) (load "defpackage"))
;; disable for the same tests as Lispworks Personal edition (limited heap size)
(when (search "Trial Edition" (lisp-implementation-type))
(push :lispworks-personal-edition *features*))
(in-package :cl-bench)
(defmacro with-spawned-thread (&body body)
;; run BODY inside a new thread
;;
;; KL: what's the point of running them in a separate process? It sure
;; makes debugging problems harder...
#+ignore
`(mp:process-run-function "cl-bench"
(lambda () ,@body))
#-ignore
`(progn ,@body))
(defun bench-gc () (gc t))
(setq excl:*record-source-file-info* nil)
(setq excl:*load-source-file-info* nil)
(setq excl:*record-xref-info* nil)
(setq excl:*load-xref-info* nil)
(setq excl:*global-gc-behavior* nil)
;; disabled by emarsden: I don't consider this fair
;; (setq excl::*default-rehash-size* 2.0)
;; (setq *print-pretty* nil)
;; for debugging:
#+ignore
(progn
(setf (sys:gsgc-switch :print) t)
(setf (sys:gsgc-switch :verbose) t))
;; EOF

View File

@@ -0,0 +1,55 @@
;;; setup file for running cl-bench in CLISP
(load "defpackage")
(in-package :cl-bench)
(setq custom:*warn-on-floating-point-contagion* nil)
(defun bench-gc () (gc))
(defmacro with-spawned-thread (&body body)
`(progn ,@body))
(defun bench-time (fun times name)
(declare (ignore name))
(labels ((merge-2-values (val1 val2)
(if (< internal-time-units-per-second 1000000)
(dpb val1 (byte 16 16) val2) ; TIME_1: AMIGA, DOS, OS/2, UNIX_TIMES
(+ (* val1 internal-time-units-per-second) val2))) ; TIME_2: UNIX sonst, WIN32
(secs (v1 v2 v3 v4)
(/ (- (merge-2-values v1 v2)
(merge-2-values v3 v4))
internal-time-units-per-second)))
(multiple-value-bind (new-real1
new-real2
new-run1
new-run2
new-gc1
new-gc2
new-space1
new-space2
new-gccount)
(sys::%%time)
(dotimes (i times) (funcall fun))
(multiple-value-bind (old-real1
old-real2
old-run1
old-run2
old-gc1
old-gc2
old-space1
old-space2
old-gccount)
(sys::%%time)
;; returns real user sys consed
(values (secs old-real1 old-real2 new-real1 new-real2)
(secs old-run1 old-run2 new-run1 new-run2)
0.0
(- old-gccount new-gccount))))))
;; EOF

View File

@@ -0,0 +1,164 @@
;;; setup file for cl-bench running in CMUCL
(load "defpackage")
(in-package :cl-bench)
(setq ext:*bytes-consed-between-gcs* 25000000)
;; to avoid problems when running the bignum code (the default of
;; 40000 is too low for some of the tests)
(setq ext:*intexp-maximum-exponent* 100000)
#+(and mp experimental)
;; run BODY inside a new thread
(defmacro with-spawned-thread (&body body)
`(mp:make-process (lambda () ,@body) :name "cl-bench"))
;; #-mp
(defmacro with-spawned-thread (&body body)
`(progn ,@body))
(defun bench-gc ()
(ext:gc #+gencgc :full #+gencgc t))
;; internals stuff that is mostly for stress-testing CMUCL
;; disable the byte compiler
#+nil (setq ext:*byte-compile-default* nil)
#+nil (setq ext:*block-compile-default* t)
;; testing the dynamic-extent support
#+nil
(setq ext:*trust-dynamic-extent-declarations* t
*suppress-dynamic-extent-safe-closures* t)
#+gencgc
(progn
(alien:def-alien-variable ("gencgc_verbose" gencgc-verbose) c-call::int)
(alien:def-alien-variable ("verify_gens" gencgc-verify-gens) c-call::int)
(alien:def-alien-variable ("pre_verify_gen_0" gencgc-pre-verify-gen0) c-call::int)
(alien:def-alien-variable ("verify_after_free_heap" gencgc-verify-after-free-heap) c-call::int)
(alien:def-alien-variable ("verify_dynamic_code_check" gencgc-verify-dynamic-code-check) c-call::int)
(alien:def-alien-variable ("check_code_fixups" gencgc-check-code-fixups) c-call::int)
(alien:def-alien-variable ("gencgc_zero_check" gencgc-zero-check) c-call::int)
(alien:def-alien-variable ("gencgc_enable_verify_zero_fill" gencgc-enable-verify-zero-fill) c-call::int)
(alien:def-alien-variable ("gencgc_zero_check_during_free_heap"
gencgc-zero-check-during-free-heap) c-call::int)
(defun gencgc-enable-checking ()
(setf gencgc-verify-gens 0)
(dolist (check '(gencgc-pre-verify-gen0
gencgc-verify-after-free-heap
gencgc-verify-dynamic-code-check
gencgc-check-code-fixups
gencgc-zero-check
gencgc-enable-verify-zero-fill
gencgc-zero-check-during-free-heap))
(setf check 1))))
;; enable as much internal checking as possible
(defun ecm-paranoid ()
#+gencgc (gencgc-enable-checking)
(setf c::*check-consistency* t)
(setf c::*always-clear-stack* t)
(setf (gethash "Target for ~S isn't complementary write-p." c::*ignored-errors*) t)
(setf pcl::*check-cache-p* t))
#+nil (ecm-paranoid)
;; adapted from code pulled from src/code/time.lisp
#-(and sparc-v9 performance-counters)
(defun bench-time (fun times name)
(declare (ignore name))
(let (old-run-utime
new-run-utime
old-run-stime
new-run-stime
old-real-time
new-real-time
old-page-faults
new-page-faults
real-time-overhead
run-utime-overhead
run-stime-overhead
old-bytes-consed
new-bytes-consed
cons-overhead)
;; Calculate the overhead...
(multiple-value-setq
(old-run-utime old-run-stime old-page-faults old-bytes-consed)
(lisp::time-get-sys-info))
;; Do it a second time to make sure everything is faulted in.
(multiple-value-setq
(old-run-utime old-run-stime old-page-faults old-bytes-consed)
(lisp::time-get-sys-info))
(multiple-value-setq
(new-run-utime new-run-stime new-page-faults new-bytes-consed)
(lisp::time-get-sys-info))
(setq run-utime-overhead (- new-run-utime old-run-utime))
(setq run-stime-overhead (- new-run-stime old-run-stime))
(setq old-real-time (get-internal-real-time))
(setq old-real-time (get-internal-real-time))
(setq new-real-time (get-internal-real-time))
(setq real-time-overhead (- new-real-time old-real-time))
(setq cons-overhead (- new-bytes-consed old-bytes-consed))
;; Now get the initial times.
(multiple-value-setq
(old-run-utime old-run-stime old-page-faults old-bytes-consed)
(lisp::time-get-sys-info))
(setq old-real-time (get-internal-real-time))
(dotimes (i times)
(funcall fun))
(multiple-value-setq
(new-run-utime new-run-stime new-page-faults new-bytes-consed)
(lisp::time-get-sys-info))
(setq new-real-time (- (get-internal-real-time) real-time-overhead))
;; returns real user sys consed
(values
(max (/ (- new-real-time old-real-time)
(float internal-time-units-per-second))
0.0)
(max (/ (- new-run-utime old-run-utime) 1000000.0) 0.0)
(max (/ (- new-run-stime old-run-stime) 1000000.0) 0.0)
(max (- new-bytes-consed old-bytes-consed) 0))))
#+(and sparc-v9 performance-counters)
(eval-when (:compile-toplevel :load-toplevel :execute)
(require :cpc))
#+(and sparc-v9 performance-counters)
(defun bench-time (function times name)
(declare (ignore times))
(with-open-file (dis "/tmp/cmucl-disassembly.txt"
:direction :output
:if-exists :append
:if-does-not-exist :create)
(format dis "~%~% === Disassembly for ~A~%" function)
(disassemble function :stream dis))
(let* ((cpi (cpc::calculate-cpi function))
(icache-miss (cpc::calculate-icache-miss function))
(ecache-miss (cpc::calculate-ecache-miss function))
(stall-icache (cpc::calculate-instruction-stall function))
;; (stall-mispredict (cpc::calculate-mispredict-stall function))
(stall-load (cpc::calculate-load-stall function)))
(with-open-file (out "/tmp/cmucl-cpc.txt"
:direction :output
:if-exists :append
:if-does-not-exist :create)
(format out
";; ~25a ~5,2f [i: ~4,1f ~4,1f ] [e: ~4,1f] ~4,1f~%"
name cpi
(* 100 icache-miss) (* 100 stall-icache)
(* 100 ecache-miss) (* 100 stall-load)))))
;; EOF

View File

@@ -0,0 +1,16 @@
;;; setup file for running cl-bench in Corman CL
(defun corman-compile-all-files ()
(dolist (f (directory "files/*.lisp"))
(compile-file f :print nil)))
(defun corman-load-and-run ()
(dolist (f (directory "files/*.fasl"))
(load f))
(bench-run))
;; EOF

View File

@@ -0,0 +1,22 @@
;;; setup file for cl-bench running in ECL
;;
;; tested with ECL 0.9
;;
;; see <URL:http://ecls.sf.net/>
(load "defpackage")
(in-package :cl-bench)
(defun bench-gc ()
(si:gc t))
(defmacro with-spawned-thread (&body body)
`(progn ,@body))
;; to autoload the compiler
(compile 'bench-gc)
(setq c::*cc-flags* (concatenate 'string "-I. " c::*cc-flags*))
;; EOF

View File

@@ -0,0 +1,19 @@
;;; setup file for running cl-bench in GCL
(load "defpackage.lisp")
(in-package :cl-bench)
(defun bench-gc () (lisp:gbc t))
(defmacro with-spawned-thread (&body body)
`(progn ,@body))
#+older-gcl
(defmacro with-standard-io-syntax (&body body)
`(progn ,@body))
;; EOF

View File

@@ -0,0 +1,13 @@
;;; setup file for running cl-bench in lisp500
(load "defpackage")
(in-package :cl-bench)
(defun bench-gc () (cl:gc))
(defmacro with-spawned-thread (&body body)
`(progn ,@body))
;; EOF

View File

@@ -0,0 +1,15 @@
;;; setup file for running cl-bench in Lispworks
(load "defpackage")
(in-package :cl-bench)
(hcl:toggle-source-debugging nil)
(defun bench-gc () (hcl:gc-if-needed))
(defmacro with-spawned-thread (&body body)
`(progn ,@body))
;; EOF

View File

@@ -0,0 +1 @@
;;; setup file for cl-bench running in MCL

View File

@@ -0,0 +1,50 @@
;;; setup file for cl-bench running in OpenMCL
(load "defpackage")
(in-package :cl-bench)
;; 20 MB heap
(ccl:set-lisp-heap-gc-threshold (ash 2 20))
(defun bench-gc () (ccl:gc))
(setq ccl:*request-terminal-input-via-break* t)
(defvar *thread-pool-lock*
(ccl:make-lock "cl-bench thread pool lock"))
(defvar *thread-pool-semaphore* (ccl:make-semaphore))
(defvar *thread-pool* (list))
(ccl:with-lock-grabbed (*thread-pool-lock*)
(dotimes (i 3)
(push (ccl:make-process "cl-bench") *thread-pool*)
(ccl:signal-semaphore *thread-pool-semaphore*)))
;; run BODY inside a new thread
#+nil
(defmacro with-spawned-thread (&body body)
`(let ((thread nil))
(ccl:wait-on-semaphore *thread-pool-semaphore*)
(ccl:with-lock-grabbed (*thread-pool-lock*)
(setq thread (pop *thread-pool*)))
(format *debug-io* "Acquired process ~A~%" thread)
(assert (ccl::processp thread))
(ccl:process-preset thread
(lambda ()
,@body
(ccl:process-reset ccl:*current-process*)
(ccl:with-lock-grabbed (*thread-pool-lock*)
(push ccl:*current-process* *thread-pool*)
(ccl:signal-semaphore *thread-pool-semaphore*))))
(ccl:process-enable thread)))
(defmacro with-spawned-thread (&body body)
`(progn ,@body))
;; EOF

View File

@@ -0,0 +1,41 @@
;;; setup file for cl-bench running in Poplog CL
(in-package :common-lisp-user)
;; see <URL:http://www.cs.bham.ac.uk/research/poplog/doc/lisphelp/storeutils>
;; this is to allow the heap to grow as needed
(cl:require :storeutils)
(setq *max-store-size* nil)
(setf *default-pathname-defaults* #p"temp.lisp")
;; increase the maximum recursion level
(poplog:pop11)
section $-lisp;
uses pop_callstack_lim;
pop_callstack_lim * 10 -> pop_callstack_lim;
endsection;
lisp
(in-package :cl)
(defun compile-file-pathname (pathname)
pathname)
(export 'compile-file-pathname (find-package "CL"))
(load "defpackage")
(in-package :cl-bench)
(defmacro with-spawned-thread (&body body)
`(progn ,@body))
(defun bench-gc () (poplog:gc))
;; EOF

View File

@@ -0,0 +1,32 @@
;;; setup file for running cl-bench in SBCL
(load "defpackage")
(in-package :cl-bench)
(setf (sb-ext:bytes-consed-between-gcs) 25000000)
(setq sb-ext:*intexp-maximum-exponent* 100000)
(defun bench-gc ()
(sb-ext:gc #+gencgc :full #+gencgc t))
;; (condition-wait queue lock)
;; (condition-notify queue)
;; run BODY inside a new thread
;; #+sb-thread
;; (defmacro with-spawned-thread (&body body)
;; `(sb-thread:make-thread (lambda () ,@body)))
;;
;; #-sb-thread
(defmacro with-spawned-thread (&body body)
`(progn ,@body))
;; EOF

429
cl-bench/tests.lisp Normal file
View File

@@ -0,0 +1,429 @@
;;; all the performance benchmarks
;;;
;;; Time-stamp: <2004-06-28 emarsden>
(in-package :cl-bench)
(defbench compiler
:group :misc
:function 'cl-bench.misc:run-compiler
:long "Compilation of the Gabriel benchmarks"
:runs 3
:disabled-for '(gcl armedbear))
(defbench load-fasl
:group :misc
:function 'cl-bench.misc:run-fasload
:runs 20
:disabled-for '(gcl armedbear ecl))
(defbench sum-permutations
:group :misc
:long "traversal of a large, linked, self-sharing structure"
:function 'cl-bench.misc:run-permutations
:runs 2
:disabled-for '(lispworks-personal-edition ecl))
(defbench walk-list/seq
:group :misc
:long "Walk a list of 2M fixnums that were sequentially allocated"
:setup 'cl-bench.misc::setup-walk-list/seq
:function 'cl-bench.misc:walk-list/seq
:runs 2
:disabled-for '(lispworks-personal-edition armedbear))
(defbench walk-list/mess
:group :misc
:long "Walk a list of 2M fixnums that were mergesorted to spread pointers"
:setup 'cl-bench.misc::setup-walk-list/mess
:function 'cl-bench.misc:walk-list/mess
:runs 1
:disabled-for '(lispworks-personal-edition armedbear poplog))
(defbench boyer
:group :gabriel
:function 'cl-bench.gabriel:boyer
:long "CONS-intensive logic-programming code"
:runs 30)
(defbench browse
:group :gabriel
:function 'cl-bench.gabriel:browse
:runs 10)
(defbench dderiv
:group :gabriel
:function 'cl-bench.gabriel:dderiv-run
:runs 50)
(defbench deriv
:group :gabriel
:function 'cl-bench.gabriel:deriv-run
:runs 60)
(defbench destructive
:group :gabriel
:function 'cl-bench.gabriel:run-destructive
:runs 100)
(defbench div2-test-1
:group :gabriel
:function 'cl-bench.gabriel:run-div2-test1
:runs 200)
(defbench div2-test-2
:group :gabriel
:function 'cl-bench.gabriel:run-div2-test2
:runs 200)
(defbench fft
:group :gabriel
:function 'cl-bench.gabriel:run-fft
:runs 30)
(defbench frpoly/fixnum
:group :gabriel
:function 'cl-bench.gabriel:run-frpoly/fixnum
:runs 100)
(defbench frpoly/bignum
:group :gabriel
:function 'cl-bench.gabriel:run-frpoly/bignum
:runs 30)
(defbench frpoly/float
:group :gabriel
:function 'cl-bench.gabriel:run-frpoly/float
:runs 100)
(defbench puzzle
:group :gabriel
:long "Forest Baskett's Puzzle, exercising simple-vectors"
:function 'cl-bench.gabriel:run-puzzle
:runs 1500)
(defbench tak
:group :gabriel
:function 'cl-bench.gabriel:run-tak
:runs 500)
(defbench ctak
:group :gabriel
:long "TAKeuchi function using the catch/throw facility"
:function 'cl-bench.gabriel:run-ctak
:runs 900)
(defbench trtak
:group :gabriel
:long "TAKeuchi function without tail recursion"
:function 'cl-bench.gabriel:run-trtak
:runs 500)
(defbench takl
:group :gabriel
:long "TAKeuchi function with lists as counters"
:function 'cl-bench.gabriel:run-takl
:runs 150)
(defbench stak
:group :gabriel
:long "TAKeuchi function with special variables instead of parameter passing"
:function 'cl-bench.gabriel:run-stak
:runs 200)
(defbench fprint/ugly
:group :gabriel
:long "Pretty-printer and write operations to file, no *PRINT-PRETTY*"
:function 'cl-bench.gabriel:fprint/ugly
:runs 200)
(defbench fprint/pretty
:group :gabriel
:long "Pretty-printer and write operations to file, with *PRINT-PRETTY*"
:function 'cl-bench.gabriel:fprint/pretty
:runs 100)
(defbench traverse
:group :gabriel
:long "Creates and traverses a tree structure"
:function 'cl-bench.gabriel:run-traverse
:runs 15)
(defbench triangle
:group :gabriel
:long "Puzzle solving (board game) using combinatorial search"
:function 'cl-bench.gabriel:run-triangle
:runs 5)
;; end of Gabriel benchmarks
(defbench richards
:group :misc
:long "Operating system simulation"
:function 'cl-bench.richards:richards
:runs 5)
(defbench factorial
:group :math
:function 'cl-bench.math:run-factorial
:runs 1000)
(defbench fib
:group :math
:function 'cl-bench.math:run-fib
:runs 50)
(defbench fib-ratio
:group :math
:function 'cl-bench.math:run-fib-ratio
:runs 500)
(defbench ackermann
:group :math
:long "Calculating Ackermann's number (heavy recursion)"
:function 'cl-bench.math:run-ackermann
:runs 1)
(defbench mandelbrot/complex
:group :math
:long "Mandelbrot Set computation using complex numbers"
:function 'cl-bench.math:run-mandelbrot/complex
:runs 100)
(defbench mandelbrot/dfloat
:group :math
:long "Mandelbrot Set computation using double-floats"
:function 'cl-bench.math:run-mandelbrot/dfloat
:runs 100)
(defbench mrg32k3a
:group :math
:long "multiple recursive random number generator of l'Ecuyer"
:function 'cl-bench.math:run-mrg32k3a
:runs 20)
(defbench crc40
:group :math
:long "Cyclic redundancy check calculation using 40-bit integers"
:function 'cl-bench.crc:run-crc40
:runs 2)
(defbench bignum/elem-100-1000
:group :bignum
:function 'cl-bench.bignum:run-elem-100-1000
:runs 1)
(defbench bignum/elem-1000-100
:group :bignum
:function 'cl-bench.bignum:run-elem-1000-100
:runs 1)
(defbench bignum/elem-10000-1
:group :bignum
:function 'cl-bench.bignum:run-elem-10000-1
:runs 1)
(defbench bignum/pari-100-10
:group :bignum
:function 'cl-bench.bignum:run-pari-100-10
:runs 1)
(defbench bignum/pari-200-5
:group :bignum
:function 'cl-bench.bignum:run-pari-200-5
:runs 1)
;; this one takes ages to run
#+slow-tests
(defbench bignum/pari-1000-1
:group :bignum
:short "bignum/pari-1000-1"
:function 'cl-bench.bignum:run-pari-1000-1
:runs 1)
(defbench pi-decimal/small
:group :bignum
:function 'cl-bench.bignum:run-pi-decimal/small
:runs 100)
(defbench pi-decimal/big
:group :bignum
:function 'cl-bench.bignum:run-pi-decimal/big
:runs 2)
(defbench pi-atan
:group :bignum
:function 'cl-bench.bignum:run-pi-atan
:runs 200)
(defbench pi-ratios
:group :bignum
:function 'cl-bench.ratios:run-pi-ratios
:runs 2)
(defbench hash-strings
:group :hash
:function 'cl-bench.hash:hash-strings
:runs 2)
(defbench hash-integers
:group :hash
:function 'cl-bench.hash:hash-integers
:runs 10)
(defbench slurp-lines
:group :gc
:long "Line-by-line read of a large file (mostly testing allocation speed)"
:function 'cl-bench.hash:run-slurp-lines
:runs 30)
(defbench boehm-gc
:group :gc
:function 'cl-bench.boehm-gc:gc-benchmark
:runs 1
:disabled-for '(lispworks-personal-edition))
(defbench deflate-file
:group :misc
:function 'cl-bench.deflate:run-deflate-file
:runs 100)
;; these tests exceed the limited stack size in the trial version of LW
(defbench 1d-arrays
:group :sequence
:long "Adding together two vectors"
:function 'cl-bench.arrays:bench-1d-arrays
:runs 1
:disabled-for '(lispworks-personal-edition))
(defbench 2d-arrays
:group :sequence
:long "Adding together two 2-dimensional arrays"
:function 'cl-bench.arrays:bench-2d-arrays
:runs 1
:disabled-for '(lispworks-personal-edition))
(defbench 3d-arrays
:group :sequence
:long "Adding together two 3-dimensional arrays"
:function 'cl-bench.arrays:bench-3d-arrays
:runs 1
:disabled-for '(lispworks-personal-edition))
;; Poplog seems to have a buggy implementation of bitvectors
(defbench bitvectors
:group :sequence
:long "BIT-XOR, BIT-AND on big bitvectors"
:function 'cl-bench.arrays:bench-bitvectors
:runs 3
:disabled-form '(lispworks-personal-edition poplog))
(defbench bench-strings
:group :sequence
:long "Allocate and fill large strings"
:function 'cl-bench.arrays:bench-strings
:runs 1
:disabled-for '(lispworks-personal-edition))
(defbench fill-strings/adjust
:group :sequence
:short "fill-strings/adjustable"
:long "Fill an adjustable array with characters"
:function 'cl-bench.arrays:bench-strings/adjustable
:runs 1
:disabled-for '(lispworks-personal-edition))
;; as of 2002-01-20 this crashes CLISP, both release and CVS versions.
;; It exceeds maximum array size for both Allegro CL and LispWorks.
;; It takes AGES and consumes around 120MB RSS with Poplog
(defbench string-concat
:group :sequence
:long "WITH-OUTPUT-TO-STRING and much output"
:function 'cl-bench.arrays:bench-string-concat
:runs 1
:disabled-for '(allegro lispworks-personal-edition poplog))
(defbench search-sequence
:group :sequence
:long "FIND, FIND-IF, POSITION on a simple-vector"
:function 'cl-bench.arrays:bench-search-sequence
:runs 1
:disabled-for '(lispworks-personal-edition))
(defbench clos-defclass
:group :clos
:short "CLOS/defclass"
:long "Defines a class hierarchy"
:function 'cl-bench.clos:run-defclass
:runs 1)
(defbench clos-defmethod
:group :clos
:short "CLOS/defmethod"
:long "Defines methods on the class hierarchy"
:function 'cl-bench.clos:run-defmethod
:runs 1)
(defbench clos-instantiate
:group :clos
:short "CLOS/instantiate"
:long "Instantiates a complicated class hierarchy"
:function 'cl-bench.clos:make-instances
:runs 2)
(defbench clos-instantiate
:group :clos
:short "CLOS/simple-instantiate"
:long "Instantiates a simple class hierarchy"
:function 'cl-bench.clos:make-instances/simple
:runs 200)
(defbench methodcalls
:group :clos
:short "CLOS/methodcalls"
:long "Make method calls against the created instances."
:function 'cl-bench.clos:methodcalls/simple
:runs 5)
(defbench methodcalls+after
:group :clos
:short "CLOS/method+after"
:long "Define after methods on our instances, then run some method calls"
:function 'cl-bench.clos:methodcalls/simple+after
:runs 2)
(defbench methodcalls/complex
:group :clos
:short "CLOS/complex-methods"
:long "Run methodcalls with and method combination."
:function 'cl-bench.clos:methodcalls/complex
:runs 5
:disabled-for '(clisp poplog))
(defbench eql-specialized-fib
:group :clos
:long "Fibonnaci function implemented with EQL specialization"
:function 'cl-bench.clos:run-eql-fib
:runs 2)
;; this is really a test of the speed of loading a source file full of data
#+nil
(defbench ppcre-load/source
:long "CL-PPCRE, Perl-compatible regular expressions: loading data file"
:function '(load (make-pathname :directory '(:relative "files" "cl-ppcre")
:name "testdata"
:type "lisp"))
:runs 1
:disabled-for '(armedbear))
#+nil
(defbench ppcre-match
:long "CL-PPCRE, perl-compatible regular expressions: matching speed"
:function '(cl-ppcre-test:test)
:runs 20
:disabled-for '(armedbear))
;; EOF