aboutsummaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@home.transmeta.com>2003-03-21 10:58:28 -0700
committerLinus Torvalds <torvalds@ppc970.osdl.org>2005-04-07 20:59:34 -0700
commita8971acb6cbfa8cfa6f5105340fcdee13c649590 (patch)
tree3011554bbfa156146b6b86c91754a771c5e0c9ae /README
parentShow for/while/do statements in debugging. Make symbol type (diff)
downloadsparse-a8971acb6cbfa8cfa6f5105340fcdee13c649590.tar.gz
sparse-a8971acb6cbfa8cfa6f5105340fcdee13c649590.tar.bz2
sparse-a8971acb6cbfa8cfa6f5105340fcdee13c649590.zip
Add copyright statements and file comments. Add a FAQ, README, and
placeholder LICENSE file.
Diffstat (limited to 'README')
-rw-r--r--README72
1 files changed, 72 insertions, 0 deletions
diff --git a/README b/README
new file mode 100644
index 0000000..82b90e1
--- /dev/null
+++ b/README
@@ -0,0 +1,72 @@
+
+ sparse (spärs), adj,., spars-er, spars-est.
+ 1. thinly scattered or distributed; "a sparse population"
+ 2. thin; not thick or dense: "sparse hair"
+ 3. scanty; meager.
+ 4. semantic parse
+ [ from Latin: spars(us) scattered, past participle of
+ spargere 'to sparge' ]
+
+ Antonym: abundant
+
+Sparse is a semantic parser of source files: it's neither a compiler
+(although it could be used as a front-end for one) nor is it a
+preprocessor (although it contains as a part of it a preprocessing
+phase).
+
+It is meant to be a small - and simple - library. Scanty and meager,
+and partly because of that easy to use. It has one mission in life:
+create a semantic parse tree for some arbitrary user for further
+analysis. It's not a tokenizer, nor is it some generic context-free
+parser. In fact, context (semantics) is what it's all about - figuring
+out not just what the grouping of tokens are, but what the _types_ are
+that the grouping implies.
+
+And no, it doesn't use lex and yacc (or flex and bison). In my personal
+opinion, the result of using lex/yacc tends to end up just having to
+fight the assumptions the tools make.
+
+The parsing is done in three phases:
+
+ - full-file tokenization
+ - pre-processing (which can cause another tokenization phase of another
+ file)
+ - semantic parsing.
+
+Note the "full file" part. Partly for efficiency, but mostly for ease of
+use, there are no "partial results". The library completely parses one
+whole source file, and builds up the _complete_ parse tree in memory.
+
+This means that a user of the library will literally just need to do
+
+ struct token *token;
+ int fd = open(filename, O_RDONLY);
+ struct symbol_list *list = NULL;
+
+ if (fd < 0)
+ exit_with_complaint();
+
+ // Initialize parse symbols
+ init_symbols();
+
+ // Tokenize the input stream
+ token = tokenize(filename, fd, NULL);
+
+ // Pre-process the stream
+ token = preprocess(token);
+
+ // Parse the resulting C code
+ translation_unit(token, &list);
+
+and he is now done - having a full C parse of the file he opened. The
+library doesn't need any more setup, and once done does not impose any
+more requirements. The user is free to do whatever he wants with the
+parse tree that got built up, and needs not worry about the library ever
+again. There is no extra state, there are no parser callbacks, there is
+only the parse tree that is described by the header files.
+
+The library also contains (as an example user) a few clients that do the
+preprocessing and the parsing and just print out the results. These
+clients were done to verify and debug the library, and also as trivial
+examples of what you can do with the parse tree once it is formed, so
+that users can see how the tree is organized.